Matches in SemOpenAlex for { <https://semopenalex.org/work/W2149551320> ?p ?o ?g. }
Showing items 1 to 81 of
81
with 100 items per page.
- W2149551320 abstract "A number of projects are creating searchable digital libraries of printed books. These include the Million Book Project, the Google Book project and similar efforts from Yahoo and Microsoft. Content-based on line book retrieval usually requires first converting printed text into machine readable (e.g. ASCII) text using an optical character recognition (OCR) engine and then doing full text search on the results. Many of these books are old and there are a variety of processing steps that are required to create an end to end system. Changing any step (including the scanning process) can affect OCR performance and hence a good automatic statistical evaluation of OCR performance on book length material is needed. Evaluating OCR performance on the entire book is non-trivial. The only easily obtainable ground truth (the Gutenberg e-texts) must be automatically aligned with the OCR output over the entire length of a book. This may be viewed as equivalent to the problem of aligning two large (easily a million long) sequences. The problem is further complicated by OCR errors as well as the possibility of large chunks of missing material in one of the sequences. We propose a Hidden Markov Model (HMM) based hierarchical alignment algorithm to align OCR output and the ground truth for books. We believe this is the first work to automatically align a whole book without using any book structure information. The alignment process works by breaking up the problem of aligning two long sequences into the problem of aligning many smaller subsequences. This can be rapidly and effectively done. Experimental results show that our hierarchical alignment approach works very well even if OCR output has a high recognition error rate. Finally, we evaluate the performance of a commercial OCR engine over a large dataset of books based on the alignment results." @default.
- W2149551320 created "2016-06-24" @default.
- W2149551320 creator A5055459215 @default.
- W2149551320 creator A5071639289 @default.
- W2149551320 date "2006-06-11" @default.
- W2149551320 modified "2023-09-26" @default.
- W2149551320 title "A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books" @default.
- W2149551320 cites W1491083719 @default.
- W2149551320 cites W1970026646 @default.
- W2149551320 cites W1991133427 @default.
- W2149551320 cites W2062526508 @default.
- W2149551320 cites W2074231493 @default.
- W2149551320 cites W2087064593 @default.
- W2149551320 cites W2098875891 @default.
- W2149551320 cites W2102122585 @default.
- W2149551320 cites W2105594594 @default.
- W2149551320 cites W2121839820 @default.
- W2149551320 cites W2147986355 @default.
- W2149551320 cites W2158052399 @default.
- W2149551320 cites W2997195443 @default.
- W2149551320 doi "https://doi.org/10.1145/1141753.1141776" @default.
- W2149551320 hasPublicationYear "2006" @default.
- W2149551320 type Work @default.
- W2149551320 sameAs 2149551320 @default.
- W2149551320 citedByCount "43" @default.
- W2149551320 countsByYear W21495513202012 @default.
- W2149551320 countsByYear W21495513202013 @default.
- W2149551320 countsByYear W21495513202014 @default.
- W2149551320 countsByYear W21495513202015 @default.
- W2149551320 countsByYear W21495513202016 @default.
- W2149551320 countsByYear W21495513202017 @default.
- W2149551320 countsByYear W21495513202018 @default.
- W2149551320 countsByYear W21495513202020 @default.
- W2149551320 countsByYear W21495513202021 @default.
- W2149551320 countsByYear W21495513202023 @default.
- W2149551320 crossrefType "proceedings-article" @default.
- W2149551320 hasAuthorship W2149551320A5055459215 @default.
- W2149551320 hasAuthorship W2149551320A5071639289 @default.
- W2149551320 hasBestOaLocation W21495513202 @default.
- W2149551320 hasConcept C115961682 @default.
- W2149551320 hasConcept C138885662 @default.
- W2149551320 hasConcept C153180895 @default.
- W2149551320 hasConcept C154945302 @default.
- W2149551320 hasConcept C164913051 @default.
- W2149551320 hasConcept C23224414 @default.
- W2149551320 hasConcept C28490314 @default.
- W2149551320 hasConcept C41008148 @default.
- W2149551320 hasConcept C41895202 @default.
- W2149551320 hasConcept C513874922 @default.
- W2149551320 hasConcept C546480517 @default.
- W2149551320 hasConceptScore W2149551320C115961682 @default.
- W2149551320 hasConceptScore W2149551320C138885662 @default.
- W2149551320 hasConceptScore W2149551320C153180895 @default.
- W2149551320 hasConceptScore W2149551320C154945302 @default.
- W2149551320 hasConceptScore W2149551320C164913051 @default.
- W2149551320 hasConceptScore W2149551320C23224414 @default.
- W2149551320 hasConceptScore W2149551320C28490314 @default.
- W2149551320 hasConceptScore W2149551320C41008148 @default.
- W2149551320 hasConceptScore W2149551320C41895202 @default.
- W2149551320 hasConceptScore W2149551320C513874922 @default.
- W2149551320 hasConceptScore W2149551320C546480517 @default.
- W2149551320 hasLocation W21495513201 @default.
- W2149551320 hasLocation W21495513202 @default.
- W2149551320 hasLocation W21495513203 @default.
- W2149551320 hasLocation W21495513204 @default.
- W2149551320 hasOpenAccess W2149551320 @default.
- W2149551320 hasPrimaryLocation W21495513201 @default.
- W2149551320 hasRelatedWork W1521297879 @default.
- W2149551320 hasRelatedWork W197415996 @default.
- W2149551320 hasRelatedWork W2109705048 @default.
- W2149551320 hasRelatedWork W2128931134 @default.
- W2149551320 hasRelatedWork W2136763963 @default.
- W2149551320 hasRelatedWork W2149551320 @default.
- W2149551320 hasRelatedWork W2539985974 @default.
- W2149551320 hasRelatedWork W2547303667 @default.
- W2149551320 hasRelatedWork W2786030547 @default.
- W2149551320 hasRelatedWork W2940588515 @default.
- W2149551320 isParatext "false" @default.
- W2149551320 isRetracted "false" @default.
- W2149551320 magId "2149551320" @default.
- W2149551320 workType "article" @default.