Matches in SemOpenAlex for { <https://semopenalex.org/work/W3116669097> ?p ?o ?g. }
Showing items 1 to 78 of
78
with 100 items per page.
- W3116669097 endingPage "87" @default.
- W3116669097 startingPage "74" @default.
- W3116669097 abstract "Books printed before 1800 present major problems for OCR. One of the main obstacles is the lack of diversity of historical fonts in training data. The OCR-D project, consisting of book historians and computer scientists, aims to address this deficiency by focussing on three major issues. Our first target was to create a tool that identifies font groups automatically in images of historical documents. We concentrated on Gothic font groups that were commonly used in German texts printed in the 15th and 16th century: the well-known Fraktur and the lesser known Bastarda, Rotunda, Textura und Schwabacher. The tool was trained with 35,000 images and reaches an accuracy level of 98%. It can not only differentiate between the above-mentioned font groups but also Hebrew, Greek, Antiqua and Italic. It can also identify woodcut images and irrelevant data (book covers, empty pages, etc.). In a second step, we created an online training infrastructure (okralact), which allows for the use of various open source OCR engines such as Tesseract, OCRopus, Kraken and Calamari. At the same time, it facilitates training for specific models of font groups. The high accuracy of the recognition tool paves the way for the unprecedented opportunity to differentiate between the fonts used by individual printers. With more training data and further adjustments, the tool could help to fill a major gap in historical research." @default.
- W3116669097 created "2021-01-05" @default.
- W3116669097 creator A5004378974 @default.
- W3116669097 creator A5010354621 @default.
- W3116669097 creator A5021093135 @default.
- W3116669097 creator A5066134818 @default.
- W3116669097 creator A5074720526 @default.
- W3116669097 creator A5087093169 @default.
- W3116669097 date "2020-12-01" @default.
- W3116669097 modified "2023-09-23" @default.
- W3116669097 title "New Approaches to OCR for Early Printed Books" @default.
- W3116669097 doi "https://doi.org/10.36181/digitalia-00015" @default.
- W3116669097 hasPublicationYear "2020" @default.
- W3116669097 type Work @default.
- W3116669097 sameAs 3116669097 @default.
- W3116669097 citedByCount "1" @default.
- W3116669097 countsByYear W31166690972021 @default.
- W3116669097 crossrefType "journal-article" @default.
- W3116669097 hasAuthorship W3116669097A5004378974 @default.
- W3116669097 hasAuthorship W3116669097A5010354621 @default.
- W3116669097 hasAuthorship W3116669097A5021093135 @default.
- W3116669097 hasAuthorship W3116669097A5066134818 @default.
- W3116669097 hasAuthorship W3116669097A5074720526 @default.
- W3116669097 hasAuthorship W3116669097A5087093169 @default.
- W3116669097 hasBestOaLocation W31166690971 @default.
- W3116669097 hasConcept C115961682 @default.
- W3116669097 hasConcept C121684516 @default.
- W3116669097 hasConcept C136764020 @default.
- W3116669097 hasConcept C142362112 @default.
- W3116669097 hasConcept C153349607 @default.
- W3116669097 hasConcept C154775046 @default.
- W3116669097 hasConcept C154945302 @default.
- W3116669097 hasConcept C157657479 @default.
- W3116669097 hasConcept C166957645 @default.
- W3116669097 hasConcept C23123220 @default.
- W3116669097 hasConcept C2777737414 @default.
- W3116669097 hasConcept C41008148 @default.
- W3116669097 hasConcept C51632099 @default.
- W3116669097 hasConcept C546480517 @default.
- W3116669097 hasConcept C77658299 @default.
- W3116669097 hasConcept C95457728 @default.
- W3116669097 hasConceptScore W3116669097C115961682 @default.
- W3116669097 hasConceptScore W3116669097C121684516 @default.
- W3116669097 hasConceptScore W3116669097C136764020 @default.
- W3116669097 hasConceptScore W3116669097C142362112 @default.
- W3116669097 hasConceptScore W3116669097C153349607 @default.
- W3116669097 hasConceptScore W3116669097C154775046 @default.
- W3116669097 hasConceptScore W3116669097C154945302 @default.
- W3116669097 hasConceptScore W3116669097C157657479 @default.
- W3116669097 hasConceptScore W3116669097C166957645 @default.
- W3116669097 hasConceptScore W3116669097C23123220 @default.
- W3116669097 hasConceptScore W3116669097C2777737414 @default.
- W3116669097 hasConceptScore W3116669097C41008148 @default.
- W3116669097 hasConceptScore W3116669097C51632099 @default.
- W3116669097 hasConceptScore W3116669097C546480517 @default.
- W3116669097 hasConceptScore W3116669097C77658299 @default.
- W3116669097 hasConceptScore W3116669097C95457728 @default.
- W3116669097 hasIssue "2" @default.
- W3116669097 hasLocation W31166690971 @default.
- W3116669097 hasOpenAccess W3116669097 @default.
- W3116669097 hasPrimaryLocation W31166690971 @default.
- W3116669097 hasRelatedWork W1505678343 @default.
- W3116669097 hasRelatedWork W2020509212 @default.
- W3116669097 hasRelatedWork W2163126033 @default.
- W3116669097 hasRelatedWork W2187324811 @default.
- W3116669097 hasRelatedWork W2748952813 @default.
- W3116669097 hasRelatedWork W2761353030 @default.
- W3116669097 hasRelatedWork W3116669097 @default.
- W3116669097 hasRelatedWork W4226348049 @default.
- W3116669097 hasRelatedWork W4283809957 @default.
- W3116669097 hasRelatedWork W4320016076 @default.
- W3116669097 hasVolume "15" @default.
- W3116669097 isParatext "false" @default.
- W3116669097 isRetracted "false" @default.
- W3116669097 magId "3116669097" @default.
- W3116669097 workType "article" @default.