Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386996296> ?p ?o ?g. }
- W4386996296 endingPage "4682" @default.
- W4386996296 startingPage "4682" @default.
- W4386996296 abstract "Remote-sensing visual question answering (RSVQA) aims to provide accurate answers to remote sensing images and their associated questions by leveraging both visual and textual information during the inference process. However, most existing methods ignore the significance of the interaction between visual and language features, which typically adopt simple feature fusion strategies and fail to adequately model cross-modal attention, struggling to capture the complex semantic relationships between questions and images. In this study, we introduce a unified transformer with cross-modal mixture expert (TCMME) model to address the RSVQA problem. Specifically, we utilize the vision transformer (VIT) and BERT to extract visual and language features, respectively. Furthermore, we incorporate cross-modal mixture experts (CMMEs) to facilitate cross-modal representation learning. By leveraging the shared self-attention and cross-modal attention within CMMEs, as well as the modality experts, we effectively capture the intricate interactions between visual and language features and better focus on their complex semantic relationships. Finally, we conduct qualitative and quantitative experiments on two benchmark datasets: RSVQA-LR and RSVQA-HR. The results demonstrate that our proposed method surpasses the current state-of-the-art (SOTA) techniques. Additionally, we perform an extensive analysis to validate the effectiveness of different components in our framework." @default.
- W4386996296 created "2023-09-25" @default.
- W4386996296 creator A5006400852 @default.
- W4386996296 creator A5009475507 @default.
- W4386996296 creator A5028441465 @default.
- W4386996296 creator A5076511143 @default.
- W4386996296 creator A5077209124 @default.
- W4386996296 creator A5083095344 @default.
- W4386996296 date "2023-09-24" @default.
- W4386996296 modified "2023-10-16" @default.
- W4386996296 title "Unified Transformer with Cross-Modal Mixture Experts for Remote-Sensing Visual Question Answering" @default.
- W4386996296 cites W1933349210 @default.
- W4386996296 cites W1994790229 @default.
- W4386996296 cites W2009873714 @default.
- W4386996296 cites W2024106491 @default.
- W4386996296 cites W2064675550 @default.
- W4386996296 cites W2097117768 @default.
- W4386996296 cites W2108598243 @default.
- W4386996296 cites W2194775991 @default.
- W4386996296 cites W2538244214 @default.
- W4386996296 cites W2552955500 @default.
- W4386996296 cites W2745461083 @default.
- W4386996296 cites W2783165089 @default.
- W4386996296 cites W2795490417 @default.
- W4386996296 cites W2962858109 @default.
- W4386996296 cites W2963150162 @default.
- W4386996296 cites W2963954913 @default.
- W4386996296 cites W2967079938 @default.
- W4386996296 cites W2998030133 @default.
- W4386996296 cites W3012111773 @default.
- W4386996296 cites W3034427230 @default.
- W4386996296 cites W3035682985 @default.
- W4386996296 cites W3097947628 @default.
- W4386996296 cites W3100711768 @default.
- W4386996296 cites W3121068881 @default.
- W4386996296 cites W3128592650 @default.
- W4386996296 cites W3131151895 @default.
- W4386996296 cites W3140792177 @default.
- W4386996296 cites W3165629794 @default.
- W4386996296 cites W3200870516 @default.
- W4386996296 cites W3201623325 @default.
- W4386996296 cites W3205865791 @default.
- W4386996296 cites W3205944634 @default.
- W4386996296 cites W3217153199 @default.
- W4386996296 cites W4200547174 @default.
- W4386996296 cites W4214910799 @default.
- W4386996296 cites W4224067506 @default.
- W4386996296 cites W4225991573 @default.
- W4386996296 cites W4229373348 @default.
- W4386996296 cites W4281951604 @default.
- W4386996296 cites W4285818301 @default.
- W4386996296 cites W4292828962 @default.
- W4386996296 cites W4312335509 @default.
- W4386996296 cites W4312593844 @default.
- W4386996296 cites W4386472879 @default.
- W4386996296 doi "https://doi.org/10.3390/rs15194682" @default.
- W4386996296 hasPublicationYear "2023" @default.
- W4386996296 type Work @default.
- W4386996296 citedByCount "0" @default.
- W4386996296 crossrefType "journal-article" @default.
- W4386996296 hasAuthorship W4386996296A5006400852 @default.
- W4386996296 hasAuthorship W4386996296A5009475507 @default.
- W4386996296 hasAuthorship W4386996296A5028441465 @default.
- W4386996296 hasAuthorship W4386996296A5076511143 @default.
- W4386996296 hasAuthorship W4386996296A5077209124 @default.
- W4386996296 hasAuthorship W4386996296A5083095344 @default.
- W4386996296 hasBestOaLocation W43869962961 @default.
- W4386996296 hasConcept C111919701 @default.
- W4386996296 hasConcept C119857082 @default.
- W4386996296 hasConcept C120665830 @default.
- W4386996296 hasConcept C121332964 @default.
- W4386996296 hasConcept C13280743 @default.
- W4386996296 hasConcept C154945302 @default.
- W4386996296 hasConcept C165801399 @default.
- W4386996296 hasConcept C185592680 @default.
- W4386996296 hasConcept C185798385 @default.
- W4386996296 hasConcept C188027245 @default.
- W4386996296 hasConcept C192209626 @default.
- W4386996296 hasConcept C204321447 @default.
- W4386996296 hasConcept C205649164 @default.
- W4386996296 hasConcept C2776214188 @default.
- W4386996296 hasConcept C2983448237 @default.
- W4386996296 hasConcept C41008148 @default.
- W4386996296 hasConcept C44291984 @default.
- W4386996296 hasConcept C59404180 @default.
- W4386996296 hasConcept C62520636 @default.
- W4386996296 hasConcept C66322947 @default.
- W4386996296 hasConcept C71139939 @default.
- W4386996296 hasConcept C98045186 @default.
- W4386996296 hasConceptScore W4386996296C111919701 @default.
- W4386996296 hasConceptScore W4386996296C119857082 @default.
- W4386996296 hasConceptScore W4386996296C120665830 @default.
- W4386996296 hasConceptScore W4386996296C121332964 @default.
- W4386996296 hasConceptScore W4386996296C13280743 @default.
- W4386996296 hasConceptScore W4386996296C154945302 @default.
- W4386996296 hasConceptScore W4386996296C165801399 @default.
- W4386996296 hasConceptScore W4386996296C185592680 @default.
- W4386996296 hasConceptScore W4386996296C185798385 @default.