Matches in SemOpenAlex for { <https://semopenalex.org/work/W2020111636> ?p ?o ?g. }
Showing items 1 to 84 of
84
with 100 items per page.
- W2020111636 abstract "Document images accompanied by OCR output text and ground truth transcriptions are useful for developing and evaluating document recognition and processing methods, especially for historical document images. Additionally, research into improving the performance of such methods often requires further annotation of training and test data (e.g., topical document labels). However, transcribing and labeling historical documents is expensive. As a result, existing real-world document image datasets with such accompanying resources are rare and often relatively small. We introduce synthetic document image datasets of varying levels of noise that have been created from standard (English) text corpora using an existing document degradation model applied in a novel way. Included in the datasets is the OCR output from real OCR engines including the commercial ABBYY FineReader and the open-source Tesseract engines. These synthetic datasets are designed to exhibit some of the characteristics of an example real-world document image dataset, the Eisenhower Communiqu´es. The new datasets also benefit from additional metadata that exist due to the nature of their collection and prior labeling efforts. We demonstrate the usefulness of the synthetic datasets by training an existing multi-engine OCR correction method on the synthetic data and then applying the model to reduce word error rates on the historical document dataset. The synthetic datasets will be made available for use by other researchers." @default.
- W2020111636 created "2016-06-24" @default.
- W2020111636 creator A5013219521 @default.
- W2020111636 creator A5077722320 @default.
- W2020111636 creator A5090791372 @default.
- W2020111636 date "2012-01-22" @default.
- W2020111636 modified "2023-09-23" @default.
- W2020111636 title "A synthetic document image dataset for developing and evaluating historical document processing methods" @default.
- W2020111636 cites W1493526108 @default.
- W2020111636 cites W1592735339 @default.
- W2020111636 cites W1601566332 @default.
- W2020111636 cites W1812398186 @default.
- W2020111636 cites W1972016971 @default.
- W2020111636 cites W1977053705 @default.
- W2020111636 cites W202303397 @default.
- W2020111636 cites W2024348326 @default.
- W2020111636 cites W2069172670 @default.
- W2020111636 doi "https://doi.org/10.1117/12.912203" @default.
- W2020111636 hasPublicationYear "2012" @default.
- W2020111636 type Work @default.
- W2020111636 sameAs 2020111636 @default.
- W2020111636 citedByCount "3" @default.
- W2020111636 countsByYear W20201116362012 @default.
- W2020111636 countsByYear W20201116362013 @default.
- W2020111636 countsByYear W20201116362019 @default.
- W2020111636 crossrefType "proceedings-article" @default.
- W2020111636 hasAuthorship W2020111636A5013219521 @default.
- W2020111636 hasAuthorship W2020111636A5077722320 @default.
- W2020111636 hasAuthorship W2020111636A5090791372 @default.
- W2020111636 hasBestOaLocation W20201116362 @default.
- W2020111636 hasConcept C115961682 @default.
- W2020111636 hasConcept C124504099 @default.
- W2020111636 hasConcept C136764020 @default.
- W2020111636 hasConcept C138885662 @default.
- W2020111636 hasConcept C146849305 @default.
- W2020111636 hasConcept C154945302 @default.
- W2020111636 hasConcept C160920958 @default.
- W2020111636 hasConcept C204321447 @default.
- W2020111636 hasConcept C23123220 @default.
- W2020111636 hasConcept C2776321320 @default.
- W2020111636 hasConcept C2778371909 @default.
- W2020111636 hasConcept C2988504005 @default.
- W2020111636 hasConcept C41008148 @default.
- W2020111636 hasConcept C41895202 @default.
- W2020111636 hasConcept C546480517 @default.
- W2020111636 hasConcept C90805587 @default.
- W2020111636 hasConcept C93518851 @default.
- W2020111636 hasConcept C99498987 @default.
- W2020111636 hasConceptScore W2020111636C115961682 @default.
- W2020111636 hasConceptScore W2020111636C124504099 @default.
- W2020111636 hasConceptScore W2020111636C136764020 @default.
- W2020111636 hasConceptScore W2020111636C138885662 @default.
- W2020111636 hasConceptScore W2020111636C146849305 @default.
- W2020111636 hasConceptScore W2020111636C154945302 @default.
- W2020111636 hasConceptScore W2020111636C160920958 @default.
- W2020111636 hasConceptScore W2020111636C204321447 @default.
- W2020111636 hasConceptScore W2020111636C23123220 @default.
- W2020111636 hasConceptScore W2020111636C2776321320 @default.
- W2020111636 hasConceptScore W2020111636C2778371909 @default.
- W2020111636 hasConceptScore W2020111636C2988504005 @default.
- W2020111636 hasConceptScore W2020111636C41008148 @default.
- W2020111636 hasConceptScore W2020111636C41895202 @default.
- W2020111636 hasConceptScore W2020111636C546480517 @default.
- W2020111636 hasConceptScore W2020111636C90805587 @default.
- W2020111636 hasConceptScore W2020111636C93518851 @default.
- W2020111636 hasConceptScore W2020111636C99498987 @default.
- W2020111636 hasLocation W20201116361 @default.
- W2020111636 hasLocation W20201116362 @default.
- W2020111636 hasOpenAccess W2020111636 @default.
- W2020111636 hasPrimaryLocation W20201116361 @default.
- W2020111636 hasRelatedWork W1556178152 @default.
- W2020111636 hasRelatedWork W2000350111 @default.
- W2020111636 hasRelatedWork W2020111636 @default.
- W2020111636 hasRelatedWork W2076264610 @default.
- W2020111636 hasRelatedWork W2146043838 @default.
- W2020111636 hasRelatedWork W2149128132 @default.
- W2020111636 hasRelatedWork W2356293009 @default.
- W2020111636 hasRelatedWork W2361349944 @default.
- W2020111636 hasRelatedWork W2381351160 @default.
- W2020111636 hasRelatedWork W3107474891 @default.
- W2020111636 isParatext "false" @default.
- W2020111636 isRetracted "false" @default.
- W2020111636 magId "2020111636" @default.
- W2020111636 workType "article" @default.