Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385474074> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W4385474074 abstract "In recent times there has been a surge of multi-modal architectures based on Large Language Models, which leverage the zero shot generation capabilities of LLMs and project image embeddings into the text space and then use the auto-regressive capacity to solve tasks such as VQA, captioning, and image retrieval. We name these architectures as bridge-architectures as they project from the image space to the text space. These models deviate from the traditional recipe of training transformer based multi-modal models, which involve using large-scale pre-training and complex multi-modal interactions through co or cross attention. However, the capabilities of bridge architectures have not been tested on complex visual reasoning tasks which require fine grained analysis about the image. In this project, we investigate the performance of these bridge-architectures on the NLVR2 dataset, and compare it to state-of-the-art transformer based architectures. We first extend the traditional bridge architectures for the NLVR2 dataset, by adding object level features to faciliate fine-grained object reasoning. Our analysis shows that adding object level features to bridge architectures does not help, and that pre-training on multi-modal data is key for good performance on complex reasoning tasks such as NLVR2. We also demonstrate some initial results on a recently bridge-architecture, LLaVA, in the zero shot setting and analyze its performance." @default.
- W4385474074 created "2023-08-02" @default.
- W4385474074 creator A5019304605 @default.
- W4385474074 creator A5051720557 @default.
- W4385474074 creator A5076943003 @default.
- W4385474074 creator A5086052900 @default.
- W4385474074 date "2023-07-30" @default.
- W4385474074 modified "2023-09-27" @default.
- W4385474074 title "Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks" @default.
- W4385474074 doi "https://doi.org/10.48550/arxiv.2307.16395" @default.
- W4385474074 hasPublicationYear "2023" @default.
- W4385474074 type Work @default.
- W4385474074 citedByCount "0" @default.
- W4385474074 crossrefType "posted-content" @default.
- W4385474074 hasAuthorship W4385474074A5019304605 @default.
- W4385474074 hasAuthorship W4385474074A5051720557 @default.
- W4385474074 hasAuthorship W4385474074A5076943003 @default.
- W4385474074 hasAuthorship W4385474074A5086052900 @default.
- W4385474074 hasBestOaLocation W43854740741 @default.
- W4385474074 hasConcept C100776233 @default.
- W4385474074 hasConcept C115961682 @default.
- W4385474074 hasConcept C119599485 @default.
- W4385474074 hasConcept C119857082 @default.
- W4385474074 hasConcept C123657996 @default.
- W4385474074 hasConcept C126322002 @default.
- W4385474074 hasConcept C127413603 @default.
- W4385474074 hasConcept C142362112 @default.
- W4385474074 hasConcept C153083717 @default.
- W4385474074 hasConcept C153349607 @default.
- W4385474074 hasConcept C154945302 @default.
- W4385474074 hasConcept C157657479 @default.
- W4385474074 hasConcept C165801399 @default.
- W4385474074 hasConcept C174348530 @default.
- W4385474074 hasConcept C185592680 @default.
- W4385474074 hasConcept C188027245 @default.
- W4385474074 hasConcept C2777508537 @default.
- W4385474074 hasConcept C31258907 @default.
- W4385474074 hasConcept C41008148 @default.
- W4385474074 hasConcept C66322947 @default.
- W4385474074 hasConcept C71139939 @default.
- W4385474074 hasConcept C71924100 @default.
- W4385474074 hasConceptScore W4385474074C100776233 @default.
- W4385474074 hasConceptScore W4385474074C115961682 @default.
- W4385474074 hasConceptScore W4385474074C119599485 @default.
- W4385474074 hasConceptScore W4385474074C119857082 @default.
- W4385474074 hasConceptScore W4385474074C123657996 @default.
- W4385474074 hasConceptScore W4385474074C126322002 @default.
- W4385474074 hasConceptScore W4385474074C127413603 @default.
- W4385474074 hasConceptScore W4385474074C142362112 @default.
- W4385474074 hasConceptScore W4385474074C153083717 @default.
- W4385474074 hasConceptScore W4385474074C153349607 @default.
- W4385474074 hasConceptScore W4385474074C154945302 @default.
- W4385474074 hasConceptScore W4385474074C157657479 @default.
- W4385474074 hasConceptScore W4385474074C165801399 @default.
- W4385474074 hasConceptScore W4385474074C174348530 @default.
- W4385474074 hasConceptScore W4385474074C185592680 @default.
- W4385474074 hasConceptScore W4385474074C188027245 @default.
- W4385474074 hasConceptScore W4385474074C2777508537 @default.
- W4385474074 hasConceptScore W4385474074C31258907 @default.
- W4385474074 hasConceptScore W4385474074C41008148 @default.
- W4385474074 hasConceptScore W4385474074C66322947 @default.
- W4385474074 hasConceptScore W4385474074C71139939 @default.
- W4385474074 hasConceptScore W4385474074C71924100 @default.
- W4385474074 hasLocation W43854740741 @default.
- W4385474074 hasOpenAccess W4385474074 @default.
- W4385474074 hasPrimaryLocation W43854740741 @default.
- W4385474074 hasRelatedWork W2547835662 @default.
- W4385474074 hasRelatedWork W2596543464 @default.
- W4385474074 hasRelatedWork W2891852518 @default.
- W4385474074 hasRelatedWork W2905654560 @default.
- W4385474074 hasRelatedWork W2923366293 @default.
- W4385474074 hasRelatedWork W3008515501 @default.
- W4385474074 hasRelatedWork W3183824823 @default.
- W4385474074 hasRelatedWork W4317939313 @default.
- W4385474074 hasRelatedWork W4320016117 @default.
- W4385474074 hasRelatedWork W2519434724 @default.
- W4385474074 isParatext "false" @default.
- W4385474074 isRetracted "false" @default.
- W4385474074 workType "article" @default.