Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387623721> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W4387623721 endingPage "1" @default.
- W4387623721 startingPage "1" @default.
- W4387623721 abstract "This paper tackles the challenging yet significant task of grounding a natural language query to the corresponding region onto an image. The main challenge in visual grounding is to model the correspondence between visual context and semantic concept referred by the language expression, that is, multi-modal fusion. Nevertheless, there is an inherent deficiency in the current fusion module designs, which makes visual and linguistic feature embeddings cannot be unified into the same semantic space. To address the issue, we present a novel and effective visual grounding framework based on joint multi-modal representation and interaction (JMRI). Specifically, we propose to perform image-text alignment in a multi-modal embedding space learned by a large-scale foundation model, so as to obtain semantically unified joint representations. Furthermore, the transformer-based deep interactor is designed to capture intra-modal and inter-modal correlations, rendering our model to highlight the localization-relevant cues for accurate reasoning. By freezing the pre-trained vision-language foundation model and updating the other modules, we achieve the best performance with the lowest training cost. Extensive experimental results on five benchmark datasets with quantitative and qualitative analysis show that the proposed method performs favorably against the state-of-the-arts." @default.
- W4387623721 created "2023-10-14" @default.
- W4387623721 creator A5002250162 @default.
- W4387623721 creator A5005612832 @default.
- W4387623721 creator A5058101262 @default.
- W4387623721 creator A5075202213 @default.
- W4387623721 creator A5082531343 @default.
- W4387623721 date "2023-01-01" @default.
- W4387623721 modified "2023-10-18" @default.
- W4387623721 title "Visual Grounding with Joint Multi-modal Representation and Interaction" @default.
- W4387623721 doi "https://doi.org/10.1109/tim.2023.3324362" @default.
- W4387623721 hasPublicationYear "2023" @default.
- W4387623721 type Work @default.
- W4387623721 citedByCount "0" @default.
- W4387623721 crossrefType "journal-article" @default.
- W4387623721 hasAuthorship W4387623721A5002250162 @default.
- W4387623721 hasAuthorship W4387623721A5005612832 @default.
- W4387623721 hasAuthorship W4387623721A5058101262 @default.
- W4387623721 hasAuthorship W4387623721A5075202213 @default.
- W4387623721 hasAuthorship W4387623721A5082531343 @default.
- W4387623721 hasConcept C119599485 @default.
- W4387623721 hasConcept C119857082 @default.
- W4387623721 hasConcept C127413603 @default.
- W4387623721 hasConcept C153180895 @default.
- W4387623721 hasConcept C154945302 @default.
- W4387623721 hasConcept C165801399 @default.
- W4387623721 hasConcept C168993435 @default.
- W4387623721 hasConcept C171018156 @default.
- W4387623721 hasConcept C185592680 @default.
- W4387623721 hasConcept C188027245 @default.
- W4387623721 hasConcept C204321447 @default.
- W4387623721 hasConcept C2524010 @default.
- W4387623721 hasConcept C2777508537 @default.
- W4387623721 hasConcept C33923547 @default.
- W4387623721 hasConcept C41008148 @default.
- W4387623721 hasConcept C41608201 @default.
- W4387623721 hasConcept C66322947 @default.
- W4387623721 hasConcept C71139939 @default.
- W4387623721 hasConceptScore W4387623721C119599485 @default.
- W4387623721 hasConceptScore W4387623721C119857082 @default.
- W4387623721 hasConceptScore W4387623721C127413603 @default.
- W4387623721 hasConceptScore W4387623721C153180895 @default.
- W4387623721 hasConceptScore W4387623721C154945302 @default.
- W4387623721 hasConceptScore W4387623721C165801399 @default.
- W4387623721 hasConceptScore W4387623721C168993435 @default.
- W4387623721 hasConceptScore W4387623721C171018156 @default.
- W4387623721 hasConceptScore W4387623721C185592680 @default.
- W4387623721 hasConceptScore W4387623721C188027245 @default.
- W4387623721 hasConceptScore W4387623721C204321447 @default.
- W4387623721 hasConceptScore W4387623721C2524010 @default.
- W4387623721 hasConceptScore W4387623721C2777508537 @default.
- W4387623721 hasConceptScore W4387623721C33923547 @default.
- W4387623721 hasConceptScore W4387623721C41008148 @default.
- W4387623721 hasConceptScore W4387623721C41608201 @default.
- W4387623721 hasConceptScore W4387623721C66322947 @default.
- W4387623721 hasConceptScore W4387623721C71139939 @default.
- W4387623721 hasFunder F4320334897 @default.
- W4387623721 hasLocation W43876237211 @default.
- W4387623721 hasOpenAccess W4387623721 @default.
- W4387623721 hasPrimaryLocation W43876237211 @default.
- W4387623721 hasRelatedWork W1034204177 @default.
- W4387623721 hasRelatedWork W1532035848 @default.
- W4387623721 hasRelatedWork W1539846681 @default.
- W4387623721 hasRelatedWork W1950785758 @default.
- W4387623721 hasRelatedWork W2023896637 @default.
- W4387623721 hasRelatedWork W2161689690 @default.
- W4387623721 hasRelatedWork W2185981755 @default.
- W4387623721 hasRelatedWork W3196191855 @default.
- W4387623721 hasRelatedWork W4235091896 @default.
- W4387623721 hasRelatedWork W57206970 @default.
- W4387623721 isParatext "false" @default.
- W4387623721 isRetracted "false" @default.
- W4387623721 workType "article" @default.