Matches in SemOpenAlex for { <https://semopenalex.org/work/W2258158787> ?p ?o ?g. }
Showing items 1 to 95 of
95
with 100 items per page.
- W2258158787 abstract "Mass digitization of historical documents is a challenging problem for optical character recognition (OCR) tools. Issues include noisy backgrounds and faded text due to aging, border/marginal noise, bleed-through, skewing, warping, as well as irregular fonts and page layouts. As a result, OCR tools often produce a large number of spurious bounding boxes (BBs) in addition to those that correspond to words in the document. This paper presents an iterative classification algorithm to automatically label BBs (i.e., as text or noise) based on their spatial distribution and geometry. The approach uses a rule-base classifier to generate initial text/noise labels for each BB, followed by an iterative classifier that refines the initial labels by incorporating local information to each BB, its spatial location, shape and size. When evaluated on a dataset containing over 72,000 manually-labeled BBs from 159 historical documents, the algorithm can classify BBs with 0.95 precision and 0.96 recall. Further evaluation on a collection of 6,775 documents with ground-truth transcriptions shows that the algorithm can also be used to predict document quality (0.7 correlation) and improve OCR transcriptions in 85% of the cases." @default.
- W2258158787 created "2016-06-24" @default.
- W2258158787 creator A5020221440 @default.
- W2258158787 creator A5020556428 @default.
- W2258158787 creator A5048070110 @default.
- W2258158787 creator A5061106701 @default.
- W2258158787 creator A5062423099 @default.
- W2258158787 creator A5065474822 @default.
- W2258158787 creator A5080078052 @default.
- W2258158787 creator A5083909674 @default.
- W2258158787 date "2015-02-18" @default.
- W2258158787 modified "2023-10-18" @default.
- W2258158787 title "Automatic Assessment of OCR Quality in Historical Documents" @default.
- W2258158787 cites W1699166917 @default.
- W2258158787 cites W1987366206 @default.
- W2258158787 cites W2001642682 @default.
- W2258158787 cites W2056471870 @default.
- W2258158787 cites W2088326204 @default.
- W2258158787 cites W2092449865 @default.
- W2258158787 cites W2109305763 @default.
- W2258158787 cites W2155381731 @default.
- W2258158787 cites W2162383502 @default.
- W2258158787 cites W2283184170 @default.
- W2258158787 cites W2811213684 @default.
- W2258158787 cites W41404523 @default.
- W2258158787 doi "https://doi.org/10.1609/aaai.v29i1.9487" @default.
- W2258158787 hasPublicationYear "2015" @default.
- W2258158787 type Work @default.
- W2258158787 sameAs 2258158787 @default.
- W2258158787 citedByCount "12" @default.
- W2258158787 countsByYear W22581587872016 @default.
- W2258158787 countsByYear W22581587872017 @default.
- W2258158787 countsByYear W22581587872018 @default.
- W2258158787 countsByYear W22581587872021 @default.
- W2258158787 countsByYear W22581587872022 @default.
- W2258158787 countsByYear W22581587872023 @default.
- W2258158787 crossrefType "journal-article" @default.
- W2258158787 hasAuthorship W2258158787A5020221440 @default.
- W2258158787 hasAuthorship W2258158787A5020556428 @default.
- W2258158787 hasAuthorship W2258158787A5048070110 @default.
- W2258158787 hasAuthorship W2258158787A5061106701 @default.
- W2258158787 hasAuthorship W2258158787A5062423099 @default.
- W2258158787 hasAuthorship W2258158787A5065474822 @default.
- W2258158787 hasAuthorship W2258158787A5080078052 @default.
- W2258158787 hasAuthorship W2258158787A5083909674 @default.
- W2258158787 hasBestOaLocation W22581587871 @default.
- W2258158787 hasConcept C111919701 @default.
- W2258158787 hasConcept C115961682 @default.
- W2258158787 hasConcept C146849305 @default.
- W2258158787 hasConcept C153180895 @default.
- W2258158787 hasConcept C154945302 @default.
- W2258158787 hasConcept C23123220 @default.
- W2258158787 hasConcept C2777737414 @default.
- W2258158787 hasConcept C2778371909 @default.
- W2258158787 hasConcept C41008148 @default.
- W2258158787 hasConcept C546480517 @default.
- W2258158787 hasConcept C63584917 @default.
- W2258158787 hasConcept C80797182 @default.
- W2258158787 hasConcept C81669768 @default.
- W2258158787 hasConcept C95623464 @default.
- W2258158787 hasConcept C99498987 @default.
- W2258158787 hasConceptScore W2258158787C111919701 @default.
- W2258158787 hasConceptScore W2258158787C115961682 @default.
- W2258158787 hasConceptScore W2258158787C146849305 @default.
- W2258158787 hasConceptScore W2258158787C153180895 @default.
- W2258158787 hasConceptScore W2258158787C154945302 @default.
- W2258158787 hasConceptScore W2258158787C23123220 @default.
- W2258158787 hasConceptScore W2258158787C2777737414 @default.
- W2258158787 hasConceptScore W2258158787C2778371909 @default.
- W2258158787 hasConceptScore W2258158787C41008148 @default.
- W2258158787 hasConceptScore W2258158787C546480517 @default.
- W2258158787 hasConceptScore W2258158787C63584917 @default.
- W2258158787 hasConceptScore W2258158787C80797182 @default.
- W2258158787 hasConceptScore W2258158787C81669768 @default.
- W2258158787 hasConceptScore W2258158787C95623464 @default.
- W2258158787 hasConceptScore W2258158787C99498987 @default.
- W2258158787 hasIssue "1" @default.
- W2258158787 hasLocation W22581587871 @default.
- W2258158787 hasOpenAccess W2258158787 @default.
- W2258158787 hasPrimaryLocation W22581587871 @default.
- W2258158787 hasRelatedWork W1580181063 @default.
- W2258158787 hasRelatedWork W2057045141 @default.
- W2258158787 hasRelatedWork W2059084567 @default.
- W2258158787 hasRelatedWork W2071402150 @default.
- W2258158787 hasRelatedWork W2295510835 @default.
- W2258158787 hasRelatedWork W2385728895 @default.
- W2258158787 hasRelatedWork W2486962493 @default.
- W2258158787 hasRelatedWork W3035897066 @default.
- W2258158787 hasRelatedWork W4293093933 @default.
- W2258158787 hasRelatedWork W566288409 @default.
- W2258158787 hasVolume "29" @default.
- W2258158787 isParatext "false" @default.
- W2258158787 isRetracted "false" @default.
- W2258158787 magId "2258158787" @default.
- W2258158787 workType "article" @default.