Matches in SemOpenAlex for { <https://semopenalex.org/work/W4304192731> ?p ?o ?g. }
Showing items 1 to 81 of
81
with 100 items per page.
- W4304192731 abstract "Visually-situated language is ubiquitous -- sources range from textbooks with diagrams to web pages with images and tables, to mobile apps with buttons and forms. Perhaps due to this diversity, previous work has typically relied on domain-specific recipes with limited sharing of the underlying data, model architectures, and objectives. We present Pix2Struct, a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing visually-situated language. Pix2Struct is pretrained by learning to parse masked screenshots of web pages into simplified HTML. The web, with its richness of visual elements cleanly reflected in the HTML structure, provides a large source of pretraining data well suited to the diversity of downstream tasks. Intuitively, this objective subsumes common pretraining signals such as OCR, language modeling, image captioning. In addition to the novel pretraining strategy, we introduce a variable-resolution input representation and a more flexible integration of language and vision inputs, where language prompts such as questions are rendered directly on top of the input image. For the first time, we show that a single pretrained model can achieve state-of-the-art results in six out of nine tasks across four domains: documents, illustrations, user interfaces, and natural images." @default.
- W4304192731 created "2022-10-11" @default.
- W4304192731 creator A5000738730 @default.
- W4304192731 creator A5026154387 @default.
- W4304192731 creator A5053947885 @default.
- W4304192731 creator A5061827062 @default.
- W4304192731 creator A5065708799 @default.
- W4304192731 creator A5076904467 @default.
- W4304192731 creator A5081690204 @default.
- W4304192731 creator A5081862885 @default.
- W4304192731 creator A5086294822 @default.
- W4304192731 creator A5088072227 @default.
- W4304192731 date "2022-10-07" @default.
- W4304192731 modified "2023-09-26" @default.
- W4304192731 title "Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding" @default.
- W4304192731 doi "https://doi.org/10.48550/arxiv.2210.03347" @default.
- W4304192731 hasPublicationYear "2022" @default.
- W4304192731 type Work @default.
- W4304192731 citedByCount "0" @default.
- W4304192731 crossrefType "posted-content" @default.
- W4304192731 hasAuthorship W4304192731A5000738730 @default.
- W4304192731 hasAuthorship W4304192731A5026154387 @default.
- W4304192731 hasAuthorship W4304192731A5053947885 @default.
- W4304192731 hasAuthorship W4304192731A5061827062 @default.
- W4304192731 hasAuthorship W4304192731A5065708799 @default.
- W4304192731 hasAuthorship W4304192731A5076904467 @default.
- W4304192731 hasAuthorship W4304192731A5081690204 @default.
- W4304192731 hasAuthorship W4304192731A5081862885 @default.
- W4304192731 hasAuthorship W4304192731A5086294822 @default.
- W4304192731 hasAuthorship W4304192731A5088072227 @default.
- W4304192731 hasBestOaLocation W43041927311 @default.
- W4304192731 hasConcept C107457646 @default.
- W4304192731 hasConcept C115961682 @default.
- W4304192731 hasConcept C132829578 @default.
- W4304192731 hasConcept C134306372 @default.
- W4304192731 hasConcept C137293760 @default.
- W4304192731 hasConcept C154945302 @default.
- W4304192731 hasConcept C157657479 @default.
- W4304192731 hasConcept C17744445 @default.
- W4304192731 hasConcept C186644900 @default.
- W4304192731 hasConcept C195324797 @default.
- W4304192731 hasConcept C199539241 @default.
- W4304192731 hasConcept C204321447 @default.
- W4304192731 hasConcept C2776359362 @default.
- W4304192731 hasConcept C33923547 @default.
- W4304192731 hasConcept C36503486 @default.
- W4304192731 hasConcept C41008148 @default.
- W4304192731 hasConcept C94625758 @default.
- W4304192731 hasConceptScore W4304192731C107457646 @default.
- W4304192731 hasConceptScore W4304192731C115961682 @default.
- W4304192731 hasConceptScore W4304192731C132829578 @default.
- W4304192731 hasConceptScore W4304192731C134306372 @default.
- W4304192731 hasConceptScore W4304192731C137293760 @default.
- W4304192731 hasConceptScore W4304192731C154945302 @default.
- W4304192731 hasConceptScore W4304192731C157657479 @default.
- W4304192731 hasConceptScore W4304192731C17744445 @default.
- W4304192731 hasConceptScore W4304192731C186644900 @default.
- W4304192731 hasConceptScore W4304192731C195324797 @default.
- W4304192731 hasConceptScore W4304192731C199539241 @default.
- W4304192731 hasConceptScore W4304192731C204321447 @default.
- W4304192731 hasConceptScore W4304192731C2776359362 @default.
- W4304192731 hasConceptScore W4304192731C33923547 @default.
- W4304192731 hasConceptScore W4304192731C36503486 @default.
- W4304192731 hasConceptScore W4304192731C41008148 @default.
- W4304192731 hasConceptScore W4304192731C94625758 @default.
- W4304192731 hasLocation W43041927311 @default.
- W4304192731 hasOpenAccess W4304192731 @default.
- W4304192731 hasPrimaryLocation W43041927311 @default.
- W4304192731 hasRelatedWork W159132833 @default.
- W4304192731 hasRelatedWork W180507639 @default.
- W4304192731 hasRelatedWork W1806995473 @default.
- W4304192731 hasRelatedWork W1986021162 @default.
- W4304192731 hasRelatedWork W1989705153 @default.
- W4304192731 hasRelatedWork W2293457016 @default.
- W4304192731 hasRelatedWork W2502722637 @default.
- W4304192731 hasRelatedWork W2977842567 @default.
- W4304192731 hasRelatedWork W3198474835 @default.
- W4304192731 hasRelatedWork W4312304159 @default.
- W4304192731 isParatext "false" @default.
- W4304192731 isRetracted "false" @default.
- W4304192731 workType "article" @default.