Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287586989> ?p ?o ?g. }
Showing items 1 to 81 of
81
with 100 items per page.
- W4287586989 abstract "Visual Question Answering (VQA) has become one of the key benchmarks of visual recognition progress. Multiple VQA extensions have been explored to better simulate real-world settings: different question formulations, changing training and test distributions, conversational consistency in dialogues, and explanation-based answering. In this work, we further expand this space by considering visual questions that include a spatial point of reference. Pointing is a nearly universal gesture among humans, and real-world VQA is likely to involve a gesture towards the target region. Concretely, we (1) introduce and motivate point-input questions as an extension of VQA, (2) define three novel classes of questions within this space, and (3) for each class, introduce both a benchmark dataset and a series of baseline models to handle its unique challenges. There are two key distinctions from prior work. First, we explicitly design the benchmarks to require the point input, i.e., we ensure that the visual question cannot be answered accurately without the spatial reference. Second, we explicitly explore the more realistic point spatial input rather than the standard but unnatural bounding box input. Through our exploration we uncover and address several visual recognition challenges, including the ability to infer human intent, reason both locally and globally about the image, and effectively combine visual, language and spatial inputs. Code is available at: https://github.com/princetonvisualai/pointingqa ." @default.
- W4287586989 created "2022-07-25" @default.
- W4287586989 creator A5017598506 @default.
- W4287586989 creator A5022811687 @default.
- W4287586989 creator A5030093954 @default.
- W4287586989 creator A5038654894 @default.
- W4287586989 date "2020-11-27" @default.
- W4287586989 modified "2023-09-27" @default.
- W4287586989 title "Point and Ask: Incorporating Pointing into Visual Question Answering" @default.
- W4287586989 doi "https://doi.org/10.48550/arxiv.2011.13681" @default.
- W4287586989 hasPublicationYear "2020" @default.
- W4287586989 type Work @default.
- W4287586989 citedByCount "0" @default.
- W4287586989 crossrefType "posted-content" @default.
- W4287586989 hasAuthorship W4287586989A5017598506 @default.
- W4287586989 hasAuthorship W4287586989A5022811687 @default.
- W4287586989 hasAuthorship W4287586989A5030093954 @default.
- W4287586989 hasAuthorship W4287586989A5038654894 @default.
- W4287586989 hasBestOaLocation W42875869891 @default.
- W4287586989 hasConcept C111919701 @default.
- W4287586989 hasConcept C13280743 @default.
- W4287586989 hasConcept C136264566 @default.
- W4287586989 hasConcept C154945302 @default.
- W4287586989 hasConcept C162324750 @default.
- W4287586989 hasConcept C177264268 @default.
- W4287586989 hasConcept C185798385 @default.
- W4287586989 hasConcept C199360897 @default.
- W4287586989 hasConcept C205649164 @default.
- W4287586989 hasConcept C207347870 @default.
- W4287586989 hasConcept C23123220 @default.
- W4287586989 hasConcept C2524010 @default.
- W4287586989 hasConcept C26517878 @default.
- W4287586989 hasConcept C2776436953 @default.
- W4287586989 hasConcept C2776760102 @default.
- W4287586989 hasConcept C2777212361 @default.
- W4287586989 hasConcept C2778572836 @default.
- W4287586989 hasConcept C28719098 @default.
- W4287586989 hasConcept C33923547 @default.
- W4287586989 hasConcept C38652104 @default.
- W4287586989 hasConcept C41008148 @default.
- W4287586989 hasConcept C44291984 @default.
- W4287586989 hasConcept C90329073 @default.
- W4287586989 hasConceptScore W4287586989C111919701 @default.
- W4287586989 hasConceptScore W4287586989C13280743 @default.
- W4287586989 hasConceptScore W4287586989C136264566 @default.
- W4287586989 hasConceptScore W4287586989C154945302 @default.
- W4287586989 hasConceptScore W4287586989C162324750 @default.
- W4287586989 hasConceptScore W4287586989C177264268 @default.
- W4287586989 hasConceptScore W4287586989C185798385 @default.
- W4287586989 hasConceptScore W4287586989C199360897 @default.
- W4287586989 hasConceptScore W4287586989C205649164 @default.
- W4287586989 hasConceptScore W4287586989C207347870 @default.
- W4287586989 hasConceptScore W4287586989C23123220 @default.
- W4287586989 hasConceptScore W4287586989C2524010 @default.
- W4287586989 hasConceptScore W4287586989C26517878 @default.
- W4287586989 hasConceptScore W4287586989C2776436953 @default.
- W4287586989 hasConceptScore W4287586989C2776760102 @default.
- W4287586989 hasConceptScore W4287586989C2777212361 @default.
- W4287586989 hasConceptScore W4287586989C2778572836 @default.
- W4287586989 hasConceptScore W4287586989C28719098 @default.
- W4287586989 hasConceptScore W4287586989C33923547 @default.
- W4287586989 hasConceptScore W4287586989C38652104 @default.
- W4287586989 hasConceptScore W4287586989C41008148 @default.
- W4287586989 hasConceptScore W4287586989C44291984 @default.
- W4287586989 hasConceptScore W4287586989C90329073 @default.
- W4287586989 hasLocation W42875869891 @default.
- W4287586989 hasOpenAccess W4287586989 @default.
- W4287586989 hasPrimaryLocation W42875869891 @default.
- W4287586989 hasRelatedWork W1556931475 @default.
- W4287586989 hasRelatedWork W1809817947 @default.
- W4287586989 hasRelatedWork W1830854354 @default.
- W4287586989 hasRelatedWork W2341207148 @default.
- W4287586989 hasRelatedWork W2350879319 @default.
- W4287586989 hasRelatedWork W2356380379 @default.
- W4287586989 hasRelatedWork W2385202360 @default.
- W4287586989 hasRelatedWork W2608030593 @default.
- W4287586989 hasRelatedWork W3109959312 @default.
- W4287586989 hasRelatedWork W4287586989 @default.
- W4287586989 isParatext "false" @default.
- W4287586989 isRetracted "false" @default.
- W4287586989 workType "article" @default.