Matches in SemOpenAlex for { <https://semopenalex.org/work/W4295751877> ?p ?o ?g. }
- W4295751877 endingPage "6061" @default.
- W4295751877 startingPage "6048" @default.
- W4295751877 abstract "In this paper, we investigate the problem of abductive visual reasoning (AVR), which requires vision systems to infer the most plausible explanation for visual observations. Unlike previous work which performs visual reasoning on static images or synthesized scenes, we exploit long-term reasoning from instructional videos that contain a wealth of detailed information about the physical world. We conceptualize two tasks for this emerging and challenging topic. The primary task is AVR, which is based on the initial configuration and desired goal from an instructional video, and the model is expected to figure out what is the most plausible sequence of steps to achieve the goal. In order to avoid trivial solutions based on appearance information rather than reasoning, the second task called AVR++ is constructed, which requires the model to answer why the unselected options are less plausible. We introduce a new dataset called VideoABC, which consists of 46,354 unique steps derived from 11,827 instructional videos, formulated as 13,526 abductive reasoning questions with an average reasoning duration of 51 seconds. Through an adversarial hard hypothesis mining algorithm, non-trivial and high-quality problems are generated efficiently and effectively. To achieve human-level reasoning, we propose a Hierarchical Dual Reasoning Network (HDRNet) to capture the long-term dependencies among steps and observations. We establish a benchmark for abductive visual reasoning, and our method set state-of-the-arts on AVR (~74%) and AVR++ (~45% ), and humans can easily achieve over 90% accuracy on these two tasks. The large performance gap reveals the limitation of current video understanding models on temporal reasoning and leaves substantial room for future research on this challenging problem." @default.
- W4295751877 created "2022-09-15" @default.
- W4295751877 creator A5010357390 @default.
- W4295751877 creator A5060902476 @default.
- W4295751877 creator A5077076810 @default.
- W4295751877 creator A5082694832 @default.
- W4295751877 creator A5090079801 @default.
- W4295751877 date "2022-01-01" @default.
- W4295751877 modified "2023-10-13" @default.
- W4295751877 title "VideoABC: A Real-World Video Dataset for Abductive Visual Reasoning" @default.
- W4295751877 cites W1522734439 @default.
- W4295751877 cites W1536680647 @default.
- W4295751877 cites W1903029394 @default.
- W4295751877 cites W1927052826 @default.
- W4295751877 cites W1933349210 @default.
- W4295751877 cites W2016053056 @default.
- W4295751877 cites W2084092936 @default.
- W4295751877 cites W2113588862 @default.
- W4295751877 cites W2183341477 @default.
- W4295751877 cites W2194775991 @default.
- W4295751877 cites W2398250566 @default.
- W4295751877 cites W2507009361 @default.
- W4295751877 cites W2560730294 @default.
- W4295751877 cites W2561715562 @default.
- W4295751877 cites W2625366777 @default.
- W4295751877 cites W2798345491 @default.
- W4295751877 cites W2957775769 @default.
- W4295751877 cites W2962957359 @default.
- W4295751877 cites W2963115613 @default.
- W4295751877 cites W2963150697 @default.
- W4295751877 cites W2963155035 @default.
- W4295751877 cites W2963524571 @default.
- W4295751877 cites W2963570630 @default.
- W4295751877 cites W2963690694 @default.
- W4295751877 cites W2963890755 @default.
- W4295751877 cites W2964094654 @default.
- W4295751877 cites W2964220823 @default.
- W4295751877 cites W2981635073 @default.
- W4295751877 cites W2981851019 @default.
- W4295751877 cites W2984008963 @default.
- W4295751877 cites W2990152177 @default.
- W4295751877 cites W3023993913 @default.
- W4295751877 cites W3106768499 @default.
- W4295751877 cites W3116963012 @default.
- W4295751877 cites W3122622502 @default.
- W4295751877 cites W3159630763 @default.
- W4295751877 cites W343636949 @default.
- W4295751877 cites W4214612132 @default.
- W4295751877 doi "https://doi.org/10.1109/tip.2022.3205207" @default.
- W4295751877 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/36103440" @default.
- W4295751877 hasPublicationYear "2022" @default.
- W4295751877 type Work @default.
- W4295751877 citedByCount "1" @default.
- W4295751877 countsByYear W42957518772023 @default.
- W4295751877 crossrefType "journal-article" @default.
- W4295751877 hasAuthorship W4295751877A5010357390 @default.
- W4295751877 hasAuthorship W4295751877A5060902476 @default.
- W4295751877 hasAuthorship W4295751877A5077076810 @default.
- W4295751877 hasAuthorship W4295751877A5082694832 @default.
- W4295751877 hasAuthorship W4295751877A5090079801 @default.
- W4295751877 hasConcept C119857082 @default.
- W4295751877 hasConcept C13280743 @default.
- W4295751877 hasConcept C154945302 @default.
- W4295751877 hasConcept C155911833 @default.
- W4295751877 hasConcept C162324750 @default.
- W4295751877 hasConcept C165696696 @default.
- W4295751877 hasConcept C166088908 @default.
- W4295751877 hasConcept C177264268 @default.
- W4295751877 hasConcept C185798385 @default.
- W4295751877 hasConcept C187736073 @default.
- W4295751877 hasConcept C199360897 @default.
- W4295751877 hasConcept C20162079 @default.
- W4295751877 hasConcept C205649164 @default.
- W4295751877 hasConcept C2777508537 @default.
- W4295751877 hasConcept C2780451532 @default.
- W4295751877 hasConcept C37736160 @default.
- W4295751877 hasConcept C38652104 @default.
- W4295751877 hasConcept C41008148 @default.
- W4295751877 hasConcept C83725634 @default.
- W4295751877 hasConcept C89288958 @default.
- W4295751877 hasConceptScore W4295751877C119857082 @default.
- W4295751877 hasConceptScore W4295751877C13280743 @default.
- W4295751877 hasConceptScore W4295751877C154945302 @default.
- W4295751877 hasConceptScore W4295751877C155911833 @default.
- W4295751877 hasConceptScore W4295751877C162324750 @default.
- W4295751877 hasConceptScore W4295751877C165696696 @default.
- W4295751877 hasConceptScore W4295751877C166088908 @default.
- W4295751877 hasConceptScore W4295751877C177264268 @default.
- W4295751877 hasConceptScore W4295751877C185798385 @default.
- W4295751877 hasConceptScore W4295751877C187736073 @default.
- W4295751877 hasConceptScore W4295751877C199360897 @default.
- W4295751877 hasConceptScore W4295751877C20162079 @default.
- W4295751877 hasConceptScore W4295751877C205649164 @default.
- W4295751877 hasConceptScore W4295751877C2777508537 @default.
- W4295751877 hasConceptScore W4295751877C2780451532 @default.
- W4295751877 hasConceptScore W4295751877C37736160 @default.
- W4295751877 hasConceptScore W4295751877C38652104 @default.
- W4295751877 hasConceptScore W4295751877C41008148 @default.