Matches in SemOpenAlex for { <https://semopenalex.org/work/W3203371361> ?p ?o ?g. }
Showing items 1 to 77 of
77
with 100 items per page.
- W3203371361 endingPage "1918" @default.
- W3203371361 startingPage "1908" @default.
- W3203371361 abstract "Vision-and-language (V&L) reasoning necessitates perception of visual concepts such as objects and actions, understanding semantics and language grounding, and reasoning about the interplay between the two modalities. One crucial aspect of visual reasoning is spatial understanding, which involves understanding relative locations of objects, i.e. implicitly learning the geometry of the scene. In this work, we evaluate the faithfulness of V&L models to such geometric understanding, by formulating the prediction of pair-wise relative locations of objects as a classification as well as a regression task. Our findings suggest that state-of-the-art transformer-based V&L models lack sufficient abilities to excel at this task. Motivated by this, we design two objectives as proxies for 3D spatial reasoning (SR) -- object centroid estimation, and relative position estimation, and train V&L with weak supervision from off-the-shelf depth estimators. This leads to considerable improvements in accuracy for the GQA visual question answering challenge (in fully supervised, few-shot, and O.O.D settings) as well as improvements in relative spatial reasoning. Code and data will be released href{this https URL}{here}." @default.
- W3203371361 created "2021-10-11" @default.
- W3203371361 creator A5002278578 @default.
- W3203371361 creator A5017986865 @default.
- W3203371361 creator A5035343276 @default.
- W3203371361 creator A5083735830 @default.
- W3203371361 date "2021-01-01" @default.
- W3203371361 modified "2023-09-24" @default.
- W3203371361 title "Weakly Supervised Relative Spatial Reasoning for Visual Question Answering" @default.
- W3203371361 hasPublicationYear "2021" @default.
- W3203371361 type Work @default.
- W3203371361 sameAs 3203371361 @default.
- W3203371361 citedByCount "0" @default.
- W3203371361 crossrefType "proceedings-article" @default.
- W3203371361 hasAuthorship W3203371361A5002278578 @default.
- W3203371361 hasAuthorship W3203371361A5017986865 @default.
- W3203371361 hasAuthorship W3203371361A5035343276 @default.
- W3203371361 hasAuthorship W3203371361A5083735830 @default.
- W3203371361 hasConcept C105795698 @default.
- W3203371361 hasConcept C119857082 @default.
- W3203371361 hasConcept C154945302 @default.
- W3203371361 hasConcept C155911833 @default.
- W3203371361 hasConcept C162324750 @default.
- W3203371361 hasConcept C184337299 @default.
- W3203371361 hasConcept C185429906 @default.
- W3203371361 hasConcept C187736073 @default.
- W3203371361 hasConcept C199360897 @default.
- W3203371361 hasConcept C204321447 @default.
- W3203371361 hasConcept C2777508537 @default.
- W3203371361 hasConcept C2780451532 @default.
- W3203371361 hasConcept C33923547 @default.
- W3203371361 hasConcept C41008148 @default.
- W3203371361 hasConcept C44291984 @default.
- W3203371361 hasConceptScore W3203371361C105795698 @default.
- W3203371361 hasConceptScore W3203371361C119857082 @default.
- W3203371361 hasConceptScore W3203371361C154945302 @default.
- W3203371361 hasConceptScore W3203371361C155911833 @default.
- W3203371361 hasConceptScore W3203371361C162324750 @default.
- W3203371361 hasConceptScore W3203371361C184337299 @default.
- W3203371361 hasConceptScore W3203371361C185429906 @default.
- W3203371361 hasConceptScore W3203371361C187736073 @default.
- W3203371361 hasConceptScore W3203371361C199360897 @default.
- W3203371361 hasConceptScore W3203371361C204321447 @default.
- W3203371361 hasConceptScore W3203371361C2777508537 @default.
- W3203371361 hasConceptScore W3203371361C2780451532 @default.
- W3203371361 hasConceptScore W3203371361C33923547 @default.
- W3203371361 hasConceptScore W3203371361C41008148 @default.
- W3203371361 hasConceptScore W3203371361C44291984 @default.
- W3203371361 hasLocation W32033713611 @default.
- W3203371361 hasOpenAccess W3203371361 @default.
- W3203371361 hasPrimaryLocation W32033713611 @default.
- W3203371361 hasRelatedWork W1491463701 @default.
- W3203371361 hasRelatedWork W2916723116 @default.
- W3203371361 hasRelatedWork W2939593052 @default.
- W3203371361 hasRelatedWork W2963655897 @default.
- W3203371361 hasRelatedWork W2963760481 @default.
- W3203371361 hasRelatedWork W2966369713 @default.
- W3203371361 hasRelatedWork W2969541233 @default.
- W3203371361 hasRelatedWork W2969679616 @default.
- W3203371361 hasRelatedWork W2986755220 @default.
- W3203371361 hasRelatedWork W3005881764 @default.
- W3203371361 hasRelatedWork W3021149246 @default.
- W3203371361 hasRelatedWork W3034982658 @default.
- W3203371361 hasRelatedWork W3089373619 @default.
- W3203371361 hasRelatedWork W3124461501 @default.
- W3203371361 hasRelatedWork W3168417280 @default.
- W3203371361 hasRelatedWork W3170410460 @default.
- W3203371361 hasRelatedWork W3184053512 @default.
- W3203371361 hasRelatedWork W3197035208 @default.
- W3203371361 hasRelatedWork W34131768 @default.
- W3203371361 hasRelatedWork W3121327184 @default.
- W3203371361 isParatext "false" @default.
- W3203371361 isRetracted "false" @default.
- W3203371361 magId "3203371361" @default.
- W3203371361 workType "article" @default.