Matches in SemOpenAlex for { <https://semopenalex.org/work/W2914274808> ?p ?o ?g. }
- W2914274808 abstract "Background Species occurrence records are very important in the biodiversity domain. While several available corpora contain only annotations of species names or habitats and geographical locations, there is no consolidated corpus that covers all types of entities necessary for extracting species occurrence from biodiversity literature. In order to alleviate this issue, we have constructed the COPIOUS corpus—a gold standard corpus that covers a wide range of biodiversity entities. Results Two annotators manually annotated the corpus with five categories of entities, i.e. taxon names, geographical locations, habitats, temporal expressions and person names. The overall inter-annotator agreement on 200 doubly-annotated documents is approximately 81.86% F-score. Amongst the five categories, the agreement on habitat entities was the lowest, indicating that this type of entity is complex. The COPIOUS corpus consists of 668 documents downloaded from the Biodiversity Heritage Library with over 26K sentences and more than 28K entities. Named entity recognisers trained on the corpus could achieve an F-score of 74.58%. Moreover, in recognising taxon names, our model performed better than two available tools in the biodiversity domain, namely the SPECIES tagger and the Global Name Recognition and Discovery. More than 1,600 binary relations of Taxon-Habitat, Taxon-Person, Taxon-Geographical locations and Taxon-Temporal expressions were identified by applying a pattern-based relation extraction system to the gold standard. Based on the extracted relations, we can produce a knowledge repository of species occurrences. Conclusion The paper describes in detail the construction of a gold standard named entity corpus for the biodiversity domain. An investigation of the performance of named entity recognition (NER) tools trained on the gold standard revealed that the corpus is sufficiently reliable and sizeable for both training and evaluation purposes. The corpus can be further used for relation extraction to locate species occurrences in literature—a useful task for monitoring species distribution and preserving the biodiversity." @default.
- W2914274808 created "2019-02-21" @default.
- W2914274808 creator A5050147609 @default.
- W2914274808 creator A5077976343 @default.
- W2914274808 creator A5086998532 @default.
- W2914274808 date "2019-01-22" @default.
- W2914274808 modified "2023-10-01" @default.
- W2914274808 title "COPIOUS: A gold standard corpus of named entities towards extracting species occurrence from biodiversity literature" @default.
- W2914274808 cites W1549142192 @default.
- W2914274808 cites W1895979660 @default.
- W2914274808 cites W1919684990 @default.
- W2914274808 cites W192665053 @default.
- W2914274808 cites W1964670939 @default.
- W2914274808 cites W1965893653 @default.
- W2914274808 cites W1987248861 @default.
- W2914274808 cites W1989517911 @default.
- W2914274808 cites W2009264880 @default.
- W2914274808 cites W2051044597 @default.
- W2914274808 cites W2057069926 @default.
- W2914274808 cites W2071879021 @default.
- W2914274808 cites W2080019484 @default.
- W2914274808 cites W2081580037 @default.
- W2914274808 cites W2099701465 @default.
- W2914274808 cites W2100276951 @default.
- W2914274808 cites W2100627415 @default.
- W2914274808 cites W2107347656 @default.
- W2914274808 cites W2124714582 @default.
- W2914274808 cites W2144578941 @default.
- W2914274808 cites W2159327312 @default.
- W2914274808 cites W2165488387 @default.
- W2914274808 cites W2166240318 @default.
- W2914274808 cites W2166628775 @default.
- W2914274808 cites W2227919965 @default.
- W2914274808 cites W2288418054 @default.
- W2914274808 cites W2518463060 @default.
- W2914274808 cites W2529235827 @default.
- W2914274808 cites W2593341166 @default.
- W2914274808 cites W2606273705 @default.
- W2914274808 cites W2606418983 @default.
- W2914274808 cites W2741379117 @default.
- W2914274808 cites W2802711901 @default.
- W2914274808 cites W2890631628 @default.
- W2914274808 cites W3122465213 @default.
- W2914274808 cites W619094263 @default.
- W2914274808 doi "https://doi.org/10.3897/bdj.7.e29626" @default.
- W2914274808 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/6351503" @default.
- W2914274808 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/30700967" @default.
- W2914274808 hasPublicationYear "2019" @default.
- W2914274808 type Work @default.
- W2914274808 sameAs 2914274808 @default.
- W2914274808 citedByCount "15" @default.
- W2914274808 countsByYear W29142748082019 @default.
- W2914274808 countsByYear W29142748082020 @default.
- W2914274808 countsByYear W29142748082021 @default.
- W2914274808 countsByYear W29142748082022 @default.
- W2914274808 countsByYear W29142748082023 @default.
- W2914274808 crossrefType "journal-article" @default.
- W2914274808 hasAuthorship W2914274808A5050147609 @default.
- W2914274808 hasAuthorship W2914274808A5077976343 @default.
- W2914274808 hasAuthorship W2914274808A5086998532 @default.
- W2914274808 hasBestOaLocation W29142748081 @default.
- W2914274808 hasConcept C130217890 @default.
- W2914274808 hasConcept C134306372 @default.
- W2914274808 hasConcept C154945302 @default.
- W2914274808 hasConcept C159985019 @default.
- W2914274808 hasConcept C18903297 @default.
- W2914274808 hasConcept C189592816 @default.
- W2914274808 hasConcept C192562407 @default.
- W2914274808 hasConcept C204321447 @default.
- W2914274808 hasConcept C204323151 @default.
- W2914274808 hasConcept C23123220 @default.
- W2914274808 hasConcept C33923547 @default.
- W2914274808 hasConcept C36503486 @default.
- W2914274808 hasConcept C41008148 @default.
- W2914274808 hasConcept C71640776 @default.
- W2914274808 hasConcept C86803240 @default.
- W2914274808 hasConceptScore W2914274808C130217890 @default.
- W2914274808 hasConceptScore W2914274808C134306372 @default.
- W2914274808 hasConceptScore W2914274808C154945302 @default.
- W2914274808 hasConceptScore W2914274808C159985019 @default.
- W2914274808 hasConceptScore W2914274808C18903297 @default.
- W2914274808 hasConceptScore W2914274808C189592816 @default.
- W2914274808 hasConceptScore W2914274808C192562407 @default.
- W2914274808 hasConceptScore W2914274808C204321447 @default.
- W2914274808 hasConceptScore W2914274808C204323151 @default.
- W2914274808 hasConceptScore W2914274808C23123220 @default.
- W2914274808 hasConceptScore W2914274808C33923547 @default.
- W2914274808 hasConceptScore W2914274808C36503486 @default.
- W2914274808 hasConceptScore W2914274808C41008148 @default.
- W2914274808 hasConceptScore W2914274808C71640776 @default.
- W2914274808 hasConceptScore W2914274808C86803240 @default.
- W2914274808 hasLocation W29142748081 @default.
- W2914274808 hasLocation W29142748082 @default.
- W2914274808 hasLocation W29142748083 @default.
- W2914274808 hasLocation W29142748084 @default.
- W2914274808 hasLocation W29142748085 @default.
- W2914274808 hasOpenAccess W2914274808 @default.
- W2914274808 hasPrimaryLocation W29142748081 @default.
- W2914274808 hasRelatedWork W12366621 @default.
- W2914274808 hasRelatedWork W1580469571 @default.