Matches in SemOpenAlex for { <https://semopenalex.org/work/W4383604278> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W4383604278 abstract "Document understanding refers to automatically extract, analyze and comprehend information from various types of digital documents, such as a web page. Existing Multi-model Large Language Models (MLLMs), including mPLUG-Owl, have demonstrated promising zero-shot capabilities in shallow OCR-free text recognition, indicating their potential for OCR-free document understanding. Nevertheless, without in-domain training, these models tend to ignore fine-grained OCR features, such as sophisticated tables or large blocks of text, which are essential for OCR-free document understanding. In this paper, we propose mPLUG-DocOwl based on mPLUG-Owl for OCR-free document understanding. Specifically, we first construct a instruction tuning dataset featuring a wide range of visual-text understanding tasks. Then, we strengthen the OCR-free document understanding ability by jointly train the model on language-only, general vision-and-language, and document instruction tuning dataset with our unified instruction tuning strategy. We also build an OCR-free document instruction understanding evaluation set LLMDoc to better compare models' capabilities on instruct compliance and document understanding. Experimental results show that our model outperforms existing multi-modal models, demonstrating its strong ability of document understanding. Besides, without specific fine-tuning, mPLUG-DocOwl generalizes well on various downstream tasks. Our code, models, training data and evaluation set are available at https://github.com/X-PLUG/mPLUG-DocOwl." @default.
- W4383604278 created "2023-07-08" @default.
- W4383604278 creator A5005426696 @default.
- W4383604278 creator A5005965903 @default.
- W4383604278 creator A5007613197 @default.
- W4383604278 creator A5010446607 @default.
- W4383604278 creator A5013145898 @default.
- W4383604278 creator A5019498452 @default.
- W4383604278 creator A5028401090 @default.
- W4383604278 creator A5041067869 @default.
- W4383604278 creator A5047337082 @default.
- W4383604278 creator A5065789784 @default.
- W4383604278 creator A5084189341 @default.
- W4383604278 creator A5084741576 @default.
- W4383604278 creator A5091465907 @default.
- W4383604278 date "2023-07-04" @default.
- W4383604278 modified "2023-09-27" @default.
- W4383604278 title "mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding" @default.
- W4383604278 doi "https://doi.org/10.48550/arxiv.2307.02499" @default.
- W4383604278 hasPublicationYear "2023" @default.
- W4383604278 type Work @default.
- W4383604278 citedByCount "0" @default.
- W4383604278 crossrefType "posted-content" @default.
- W4383604278 hasAuthorship W4383604278A5005426696 @default.
- W4383604278 hasAuthorship W4383604278A5005965903 @default.
- W4383604278 hasAuthorship W4383604278A5007613197 @default.
- W4383604278 hasAuthorship W4383604278A5010446607 @default.
- W4383604278 hasAuthorship W4383604278A5013145898 @default.
- W4383604278 hasAuthorship W4383604278A5019498452 @default.
- W4383604278 hasAuthorship W4383604278A5028401090 @default.
- W4383604278 hasAuthorship W4383604278A5041067869 @default.
- W4383604278 hasAuthorship W4383604278A5047337082 @default.
- W4383604278 hasAuthorship W4383604278A5065789784 @default.
- W4383604278 hasAuthorship W4383604278A5084189341 @default.
- W4383604278 hasAuthorship W4383604278A5084741576 @default.
- W4383604278 hasAuthorship W4383604278A5091465907 @default.
- W4383604278 hasBestOaLocation W43836042781 @default.
- W4383604278 hasConcept C115961682 @default.
- W4383604278 hasConcept C137293760 @default.
- W4383604278 hasConcept C154945302 @default.
- W4383604278 hasConcept C177264268 @default.
- W4383604278 hasConcept C199360897 @default.
- W4383604278 hasConcept C204321447 @default.
- W4383604278 hasConcept C23123220 @default.
- W4383604278 hasConcept C2776760102 @default.
- W4383604278 hasConcept C2780801425 @default.
- W4383604278 hasConcept C41008148 @default.
- W4383604278 hasConcept C546480517 @default.
- W4383604278 hasConceptScore W4383604278C115961682 @default.
- W4383604278 hasConceptScore W4383604278C137293760 @default.
- W4383604278 hasConceptScore W4383604278C154945302 @default.
- W4383604278 hasConceptScore W4383604278C177264268 @default.
- W4383604278 hasConceptScore W4383604278C199360897 @default.
- W4383604278 hasConceptScore W4383604278C204321447 @default.
- W4383604278 hasConceptScore W4383604278C23123220 @default.
- W4383604278 hasConceptScore W4383604278C2776760102 @default.
- W4383604278 hasConceptScore W4383604278C2780801425 @default.
- W4383604278 hasConceptScore W4383604278C41008148 @default.
- W4383604278 hasConceptScore W4383604278C546480517 @default.
- W4383604278 hasLocation W43836042781 @default.
- W4383604278 hasOpenAccess W4383604278 @default.
- W4383604278 hasPrimaryLocation W43836042781 @default.
- W4383604278 hasRelatedWork W142374489 @default.
- W4383604278 hasRelatedWork W1569841287 @default.
- W4383604278 hasRelatedWork W1803932089 @default.
- W4383604278 hasRelatedWork W1985007624 @default.
- W4383604278 hasRelatedWork W2176369193 @default.
- W4383604278 hasRelatedWork W2351428524 @default.
- W4383604278 hasRelatedWork W2359001871 @default.
- W4383604278 hasRelatedWork W2802443881 @default.
- W4383604278 hasRelatedWork W3107474891 @default.
- W4383604278 hasRelatedWork W2584532118 @default.
- W4383604278 isParatext "false" @default.
- W4383604278 isRetracted "false" @default.
- W4383604278 workType "article" @default.