Matches in SemOpenAlex for { <https://semopenalex.org/work/W2601473681> ?p ?o ?g. }
Showing items 1 to 83 of
83
with 100 items per page.
- W2601473681 abstract "The Biodiversity Heritage Library (BHL) holds the largest collection of digitised legacy literature on biodiversity. Accessible as an online, fully-featured digital library, BHL stores bibliographic metadata for digital objects, allowing its users to issue keyword-based searches over the entire collection. Furthermore, owing to the application of optical character recognition (OCR) technology on scanned items (e.g., books, monographs, journals), textual content has been made available in machine-readable form, as well as automatically linked to taxonomic names in the Encyclopedia of Life (EOL). In the work presented herein, we report on our recent efforts aimed at the further advancement of the above-mentioned BHL functionalities. In terms of content rectification, the quality of available texts is being improved through the detection and correction of OCR-generated errors using an unsupervised statistical procedure incorporated into a desktop tool. In developing this tool, we are utilising the Google Books Ngram data sets as well as the accompanying Google Ngram Viewer. We are investigating two methods for error correction: lexical distance-based and context-based approaches. The former determines the best candidate unigram given only the features of an erroneous word. Context-based correction, in contrast, takes into account a word’s surrounding context. Meanwhile, in order to extend the current BHL features with semantic search capabilities, we are employing text mining solutions to automatically extract semantic metadata that capture a wide range of concepts apart from taxa. To this end, natural language processing (NLP) pipelines have been constructed using Argo ( http://argo.nactem.ac.uk ), a Web-based, graphical text mining workbench, in order to identify other biodiversity-relevant concepts, such as expressions pertaining to people, geographic locations, habitats, morphological characteristics and time. These pipelines, i.e., workflows, are built through the straightforward combination of several analytics (e.g., gazetteers and machine learning-based concept recognisers) which have been developed specifically for the biodiversity domain. The generated semantic metadata are displayed by the workbench’s graphical user interface that allows for the validation of annotations. Argo’s support for information interoperability is two-fold: firstly, its workflows can store their results in any of a number of standard encodings, e.g., XML Metadata Interchange (XMI) and Resource Description Framework (RDF) formats. Secondly, Argo includes facilities for deploying any of its workflows as Representational State Transfer (RESTful) Web services, thus rendering our NLP tools integrable with third-party applications similarly intending to enrich free-text biodiversity resources with automatically generated semantic metadata. Finally, to facilitate exploration and understanding of the documents and metadata which will be retrieved by semantic search, appropriate information visualisations are being designed. A key aspect of this design is driven by the need to allow users to interact with the visualisations in an analytical yet intuitive manner, enabling technical and non-technical users alike to discover and access various associations amongst BHL digital objects." @default.
- W2601473681 created "2017-04-07" @default.
- W2601473681 creator A5008782530 @default.
- W2601473681 creator A5013989913 @default.
- W2601473681 creator A5020378268 @default.
- W2601473681 creator A5050960711 @default.
- W2601473681 creator A5074068834 @default.
- W2601473681 creator A5077976343 @default.
- W2601473681 creator A5084852550 @default.
- W2601473681 date "2014-09-18" @default.
- W2601473681 modified "2023-09-28" @default.
- W2601473681 title "Enriching the legacy literature with OCR corrections and text-mined semantic metadata" @default.
- W2601473681 hasPublicationYear "2014" @default.
- W2601473681 type Work @default.
- W2601473681 sameAs 2601473681 @default.
- W2601473681 citedByCount "0" @default.
- W2601473681 crossrefType "journal-article" @default.
- W2601473681 hasAuthorship W2601473681A5008782530 @default.
- W2601473681 hasAuthorship W2601473681A5013989913 @default.
- W2601473681 hasAuthorship W2601473681A5020378268 @default.
- W2601473681 hasAuthorship W2601473681A5050960711 @default.
- W2601473681 hasAuthorship W2601473681A5074068834 @default.
- W2601473681 hasAuthorship W2601473681A5077976343 @default.
- W2601473681 hasAuthorship W2601473681A5084852550 @default.
- W2601473681 hasConcept C115961682 @default.
- W2601473681 hasConcept C136764020 @default.
- W2601473681 hasConcept C138885662 @default.
- W2601473681 hasConcept C148863701 @default.
- W2601473681 hasConcept C151730666 @default.
- W2601473681 hasConcept C154945302 @default.
- W2601473681 hasConcept C161191863 @default.
- W2601473681 hasConcept C189430467 @default.
- W2601473681 hasConcept C204321447 @default.
- W2601473681 hasConcept C23123220 @default.
- W2601473681 hasConcept C2779343474 @default.
- W2601473681 hasConcept C41008148 @default.
- W2601473681 hasConcept C41895202 @default.
- W2601473681 hasConcept C546480517 @default.
- W2601473681 hasConcept C86803240 @default.
- W2601473681 hasConcept C90805587 @default.
- W2601473681 hasConcept C93518851 @default.
- W2601473681 hasConceptScore W2601473681C115961682 @default.
- W2601473681 hasConceptScore W2601473681C136764020 @default.
- W2601473681 hasConceptScore W2601473681C138885662 @default.
- W2601473681 hasConceptScore W2601473681C148863701 @default.
- W2601473681 hasConceptScore W2601473681C151730666 @default.
- W2601473681 hasConceptScore W2601473681C154945302 @default.
- W2601473681 hasConceptScore W2601473681C161191863 @default.
- W2601473681 hasConceptScore W2601473681C189430467 @default.
- W2601473681 hasConceptScore W2601473681C204321447 @default.
- W2601473681 hasConceptScore W2601473681C23123220 @default.
- W2601473681 hasConceptScore W2601473681C2779343474 @default.
- W2601473681 hasConceptScore W2601473681C41008148 @default.
- W2601473681 hasConceptScore W2601473681C41895202 @default.
- W2601473681 hasConceptScore W2601473681C546480517 @default.
- W2601473681 hasConceptScore W2601473681C86803240 @default.
- W2601473681 hasConceptScore W2601473681C90805587 @default.
- W2601473681 hasConceptScore W2601473681C93518851 @default.
- W2601473681 hasOpenAccess W2601473681 @default.
- W2601473681 hasRelatedWork W1567585853 @default.
- W2601473681 hasRelatedWork W1858357318 @default.
- W2601473681 hasRelatedWork W2052978665 @default.
- W2601473681 hasRelatedWork W2059506441 @default.
- W2601473681 hasRelatedWork W2072836580 @default.
- W2601473681 hasRelatedWork W2096434924 @default.
- W2601473681 hasRelatedWork W2107991567 @default.
- W2601473681 hasRelatedWork W2116895936 @default.
- W2601473681 hasRelatedWork W2123906659 @default.
- W2601473681 hasRelatedWork W2156025625 @default.
- W2601473681 hasRelatedWork W2177799575 @default.
- W2601473681 hasRelatedWork W2397934268 @default.
- W2601473681 hasRelatedWork W2741528461 @default.
- W2601473681 hasRelatedWork W2751110430 @default.
- W2601473681 hasRelatedWork W3104793116 @default.
- W2601473681 hasRelatedWork W3159041084 @default.
- W2601473681 hasRelatedWork W42988114 @default.
- W2601473681 hasRelatedWork W611014183 @default.
- W2601473681 hasRelatedWork W2289058943 @default.
- W2601473681 hasRelatedWork W2596020497 @default.
- W2601473681 isParatext "false" @default.
- W2601473681 isRetracted "false" @default.
- W2601473681 magId "2601473681" @default.
- W2601473681 workType "article" @default.