Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387687816> ?p ?o ?g. }
Showing items 1 to 85 of
85
with 100 items per page.
- W4387687816 abstract "The popularity of multimodal large language models (MLLMs) has triggered a recent surge in research efforts dedicated to evaluating these models. Nevertheless, existing evaluation studies of MLLMs primarily focus on the comprehension and reasoning of unimodal (vision) content, neglecting performance evaluations in the domain of multimodal (vision-language) content understanding. Beyond multimodal reasoning, tasks related to multimodal content comprehension necessitate a profound understanding of multimodal contexts, achieved through the multimodal interaction to obtain a final answer. In this paper, we introduce a comprehensive assessment framework called MM-BigBench, which incorporates a diverse range of metrics to offer an extensive evaluation of the performance of various models and instructions across a wide spectrum of diverse multimodal content comprehension tasks. Consequently, our work complements research on the performance of MLLMs in multimodal comprehension tasks, achieving a more comprehensive and holistic evaluation of MLLMs. To begin, we employ the Best Performance metric to ascertain each model's performance upper bound on different datasets. Subsequently, the Mean Relative Gain metric offers an assessment of the overall performance of various models and instructions, while the Stability metric measures their sensitivity. Furthermore, previous research centers on evaluating models independently or solely assessing instructions, neglecting the adaptability between models and instructions. We propose the Adaptability metric to quantify the adaptability between models and instructions. Our paper evaluates a total of 20 language models (14 MLLMs) on 14 multimodal datasets spanning 6 tasks, with 10 instructions for each task, and derives novel insights. Our code will be released at https://github.com/declare-lab/MM-BigBench." @default.
- W4387687816 created "2023-10-17" @default.
- W4387687816 creator A5018318168 @default.
- W4387687816 creator A5022020581 @default.
- W4387687816 creator A5023219588 @default.
- W4387687816 creator A5030202879 @default.
- W4387687816 creator A5033376109 @default.
- W4387687816 creator A5034964067 @default.
- W4387687816 creator A5035378456 @default.
- W4387687816 creator A5043569952 @default.
- W4387687816 creator A5064842058 @default.
- W4387687816 creator A5066141547 @default.
- W4387687816 date "2023-10-13" @default.
- W4387687816 modified "2023-10-18" @default.
- W4387687816 title "MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks" @default.
- W4387687816 doi "https://doi.org/10.48550/arxiv.2310.09036" @default.
- W4387687816 hasPublicationYear "2023" @default.
- W4387687816 type Work @default.
- W4387687816 citedByCount "0" @default.
- W4387687816 crossrefType "posted-content" @default.
- W4387687816 hasAuthorship W4387687816A5018318168 @default.
- W4387687816 hasAuthorship W4387687816A5022020581 @default.
- W4387687816 hasAuthorship W4387687816A5023219588 @default.
- W4387687816 hasAuthorship W4387687816A5030202879 @default.
- W4387687816 hasAuthorship W4387687816A5033376109 @default.
- W4387687816 hasAuthorship W4387687816A5034964067 @default.
- W4387687816 hasAuthorship W4387687816A5035378456 @default.
- W4387687816 hasAuthorship W4387687816A5043569952 @default.
- W4387687816 hasAuthorship W4387687816A5064842058 @default.
- W4387687816 hasAuthorship W4387687816A5066141547 @default.
- W4387687816 hasBestOaLocation W43876878161 @default.
- W4387687816 hasConcept C107457646 @default.
- W4387687816 hasConcept C119857082 @default.
- W4387687816 hasConcept C154945302 @default.
- W4387687816 hasConcept C15744967 @default.
- W4387687816 hasConcept C162324750 @default.
- W4387687816 hasConcept C176217482 @default.
- W4387687816 hasConcept C177606310 @default.
- W4387687816 hasConcept C187736073 @default.
- W4387687816 hasConcept C18903297 @default.
- W4387687816 hasConcept C199360897 @default.
- W4387687816 hasConcept C204321447 @default.
- W4387687816 hasConcept C21547014 @default.
- W4387687816 hasConcept C2780451532 @default.
- W4387687816 hasConcept C2780586970 @default.
- W4387687816 hasConcept C2780660688 @default.
- W4387687816 hasConcept C41008148 @default.
- W4387687816 hasConcept C511192102 @default.
- W4387687816 hasConcept C77805123 @default.
- W4387687816 hasConcept C86803240 @default.
- W4387687816 hasConceptScore W4387687816C107457646 @default.
- W4387687816 hasConceptScore W4387687816C119857082 @default.
- W4387687816 hasConceptScore W4387687816C154945302 @default.
- W4387687816 hasConceptScore W4387687816C15744967 @default.
- W4387687816 hasConceptScore W4387687816C162324750 @default.
- W4387687816 hasConceptScore W4387687816C176217482 @default.
- W4387687816 hasConceptScore W4387687816C177606310 @default.
- W4387687816 hasConceptScore W4387687816C187736073 @default.
- W4387687816 hasConceptScore W4387687816C18903297 @default.
- W4387687816 hasConceptScore W4387687816C199360897 @default.
- W4387687816 hasConceptScore W4387687816C204321447 @default.
- W4387687816 hasConceptScore W4387687816C21547014 @default.
- W4387687816 hasConceptScore W4387687816C2780451532 @default.
- W4387687816 hasConceptScore W4387687816C2780586970 @default.
- W4387687816 hasConceptScore W4387687816C2780660688 @default.
- W4387687816 hasConceptScore W4387687816C41008148 @default.
- W4387687816 hasConceptScore W4387687816C511192102 @default.
- W4387687816 hasConceptScore W4387687816C77805123 @default.
- W4387687816 hasConceptScore W4387687816C86803240 @default.
- W4387687816 hasLocation W43876878161 @default.
- W4387687816 hasOpenAccess W4387687816 @default.
- W4387687816 hasPrimaryLocation W43876878161 @default.
- W4387687816 hasRelatedWork W2047454415 @default.
- W4387687816 hasRelatedWork W2070040999 @default.
- W4387687816 hasRelatedWork W2348524959 @default.
- W4387687816 hasRelatedWork W2348924972 @default.
- W4387687816 hasRelatedWork W2357124094 @default.
- W4387687816 hasRelatedWork W2365736347 @default.
- W4387687816 hasRelatedWork W2368605798 @default.
- W4387687816 hasRelatedWork W2387399993 @default.
- W4387687816 hasRelatedWork W2389739210 @default.
- W4387687816 hasRelatedWork W2518037665 @default.
- W4387687816 isParatext "false" @default.
- W4387687816 isRetracted "false" @default.
- W4387687816 workType "article" @default.