Matches in SemOpenAlex for { <https://semopenalex.org/work/W4378468513> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W4378468513 abstract "Visual document understanding is a complex task that involves analyzing both the text and the visual elements in document images. Existing models often rely on manual feature engineering or domain-specific pipelines, which limit their generalization ability across different document types and languages. In this paper, we propose DUBLIN, which is pretrained on web pages using three novel objectives: Masked Document Text Generation Task, Bounding Box Task, and Rendered Question Answering Task, that leverage both the spatial and semantic information in the document images. Our model achieves competitive or state-of-the-art results on several benchmarks, such as Web-Based Structural Reading Comprehension, Document Visual Question Answering, Key Information Extraction, Diagram Understanding, and Table Question Answering. In particular, we show that DUBLIN is the first pixel-based model to achieve an EM of 77.75 and F1 of 84.25 on the WebSRC dataset. We also show that our model outperforms the current pixel-based SOTA models on DocVQA, InfographicsVQA, OCR-VQA and AI2D datasets by 4.6%, 6.5%, 2.6% and 21%, respectively. We also achieve competitive performance on RVL-CDIP document classification. Moreover, we create new baselines for text-based datasets by rendering them as document images to promote research in this direction." @default.
- W4378468513 created "2023-05-27" @default.
- W4378468513 creator A5008944385 @default.
- W4378468513 creator A5011528143 @default.
- W4378468513 creator A5020732660 @default.
- W4378468513 creator A5020845551 @default.
- W4378468513 creator A5030018356 @default.
- W4378468513 creator A5036014115 @default.
- W4378468513 creator A5036907475 @default.
- W4378468513 creator A5053028004 @default.
- W4378468513 creator A5057055806 @default.
- W4378468513 creator A5064967046 @default.
- W4378468513 date "2023-05-23" @default.
- W4378468513 modified "2023-10-14" @default.
- W4378468513 title "DUBLIN -- Document Understanding By Language-Image Network" @default.
- W4378468513 doi "https://doi.org/10.48550/arxiv.2305.14218" @default.
- W4378468513 hasPublicationYear "2023" @default.
- W4378468513 type Work @default.
- W4378468513 citedByCount "0" @default.
- W4378468513 crossrefType "posted-content" @default.
- W4378468513 hasAuthorship W4378468513A5008944385 @default.
- W4378468513 hasAuthorship W4378468513A5011528143 @default.
- W4378468513 hasAuthorship W4378468513A5020732660 @default.
- W4378468513 hasAuthorship W4378468513A5020845551 @default.
- W4378468513 hasAuthorship W4378468513A5030018356 @default.
- W4378468513 hasAuthorship W4378468513A5036014115 @default.
- W4378468513 hasAuthorship W4378468513A5036907475 @default.
- W4378468513 hasAuthorship W4378468513A5053028004 @default.
- W4378468513 hasAuthorship W4378468513A5057055806 @default.
- W4378468513 hasAuthorship W4378468513A5064967046 @default.
- W4378468513 hasBestOaLocation W43784685131 @default.
- W4378468513 hasConcept C115961682 @default.
- W4378468513 hasConcept C137293760 @default.
- W4378468513 hasConcept C147037132 @default.
- W4378468513 hasConcept C153083717 @default.
- W4378468513 hasConcept C154945302 @default.
- W4378468513 hasConcept C177937566 @default.
- W4378468513 hasConcept C204321447 @default.
- W4378468513 hasConcept C23123220 @default.
- W4378468513 hasConcept C26517878 @default.
- W4378468513 hasConcept C38652104 @default.
- W4378468513 hasConcept C41008148 @default.
- W4378468513 hasConcept C44291984 @default.
- W4378468513 hasConcept C72773152 @default.
- W4378468513 hasConcept C73555534 @default.
- W4378468513 hasConceptScore W4378468513C115961682 @default.
- W4378468513 hasConceptScore W4378468513C137293760 @default.
- W4378468513 hasConceptScore W4378468513C147037132 @default.
- W4378468513 hasConceptScore W4378468513C153083717 @default.
- W4378468513 hasConceptScore W4378468513C154945302 @default.
- W4378468513 hasConceptScore W4378468513C177937566 @default.
- W4378468513 hasConceptScore W4378468513C204321447 @default.
- W4378468513 hasConceptScore W4378468513C23123220 @default.
- W4378468513 hasConceptScore W4378468513C26517878 @default.
- W4378468513 hasConceptScore W4378468513C38652104 @default.
- W4378468513 hasConceptScore W4378468513C41008148 @default.
- W4378468513 hasConceptScore W4378468513C44291984 @default.
- W4378468513 hasConceptScore W4378468513C72773152 @default.
- W4378468513 hasConceptScore W4378468513C73555534 @default.
- W4378468513 hasLocation W43784685131 @default.
- W4378468513 hasOpenAccess W4378468513 @default.
- W4378468513 hasPrimaryLocation W43784685131 @default.
- W4378468513 hasRelatedWork W1602056621 @default.
- W4378468513 hasRelatedWork W207304934 @default.
- W4378468513 hasRelatedWork W2135033253 @default.
- W4378468513 hasRelatedWork W2165473894 @default.
- W4378468513 hasRelatedWork W2356380379 @default.
- W4378468513 hasRelatedWork W2364562957 @default.
- W4378468513 hasRelatedWork W2747680751 @default.
- W4378468513 hasRelatedWork W3142774842 @default.
- W4378468513 hasRelatedWork W4287236245 @default.
- W4378468513 hasRelatedWork W4377703168 @default.
- W4378468513 isParatext "false" @default.
- W4378468513 isRetracted "false" @default.
- W4378468513 workType "article" @default.