Matches in SemOpenAlex for { <https://semopenalex.org/work/W2139067830> ?p ?o ?g. }
- W2139067830 endingPage "28" @default.
- W2139067830 startingPage "28" @default.
- W2139067830 abstract "The identification of protein and gene names (PGNs) from the scientific literature requires semantic resources: Terminological and lexical resources deliver the term candidates into PGN tagging solutions and the gold standard corpora (GSC) train them to identify term parameters and contextual features. Ideally all three resources, i.e. corpora, lexica and taggers, cover the same domain knowledge, and thus support identification of the same types of PGNs and cover all of them. Unfortunately, none of the three serves as a predominant standard and for this reason it is worth exploring, how these three resources comply with each other. We systematically compare different PGN taggers against publicly available corpora and analyze the impact of the included lexical resource in their performance. In particular, we determine the performance gains through false positive filtering, which contributes to the disambiguation of identified PGNs. In general, machine learning approaches (ML-Tag) for PGN tagging show higher F1-measure performance against the BioCreative-II and Jnlpba GSCs (exact matching), whereas the lexicon based approaches (LexTag) in combination with disambiguation methods show better results on FsuPrge and PennBio. The ML-Tag solutions balance precision and recall, whereas the LexTag solutions have different precision and recall profiles at the same F1-measure across all corpora. Higher recall is achieved with larger lexical resources, which also introduce more noise (false positive results). The ML-Tag solutions certainly perform best, if the test corpus is from the same GSC as the training corpus. As expected, the false negative errors characterize the test corpora and – on the other hand – the profiles of the false positive mistakes characterize the tagging solutions. Lex-Tag solutions that are based on a large terminological resource in combination with false positive filtering produce better results, which, in addition, provide concept identifiers from a knowledge source in contrast to ML-Tag solutions. The standard ML-Tag solutions achieve high performance, but not across all corpora, and thus should be trained using several different corpora to reduce possible biases. The LexTag solutions have different profiles for their precision and recall performance, but with similar F1-measure. This result is surprising and suggests that they cover a portion of the most common naming standards, but cope differently with the term variability across the corpora. The false positive filtering applied to LexTag solutions does improve the results by increasing their precision without compromising significantly their recall. The harmonisation of the annotation schemes in combination with standardized lexical resources in the tagging solutions will enable their comparability and will pave the way for a shared standard." @default.
- W2139067830 created "2016-06-24" @default.
- W2139067830 creator A5000020062 @default.
- W2139067830 creator A5006224704 @default.
- W2139067830 creator A5016481961 @default.
- W2139067830 creator A5017672883 @default.
- W2139067830 creator A5032447166 @default.
- W2139067830 creator A5043808311 @default.
- W2139067830 creator A5069397549 @default.
- W2139067830 creator A5088357141 @default.
- W2139067830 date "2013-01-01" @default.
- W2139067830 modified "2023-10-13" @default.
- W2139067830 title "Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources" @default.
- W2139067830 cites W137651377 @default.
- W2139067830 cites W1529842856 @default.
- W2139067830 cites W1578955001 @default.
- W2139067830 cites W1592053870 @default.
- W2139067830 cites W1959755538 @default.
- W2139067830 cites W1973646872 @default.
- W2139067830 cites W1976097579 @default.
- W2139067830 cites W1979225493 @default.
- W2139067830 cites W1994843778 @default.
- W2139067830 cites W2005058680 @default.
- W2139067830 cites W2007367068 @default.
- W2139067830 cites W2009341227 @default.
- W2139067830 cites W2024159593 @default.
- W2139067830 cites W2030390110 @default.
- W2139067830 cites W2032445314 @default.
- W2139067830 cites W2033237963 @default.
- W2139067830 cites W2039612385 @default.
- W2139067830 cites W2040298461 @default.
- W2139067830 cites W2042711932 @default.
- W2139067830 cites W2044420612 @default.
- W2139067830 cites W2045016337 @default.
- W2139067830 cites W2048140075 @default.
- W2139067830 cites W2050902903 @default.
- W2139067830 cites W2054558802 @default.
- W2139067830 cites W2070127972 @default.
- W2139067830 cites W2074640468 @default.
- W2139067830 cites W2094591616 @default.
- W2139067830 cites W2094726706 @default.
- W2139067830 cites W2097678794 @default.
- W2139067830 cites W2101265630 @default.
- W2139067830 cites W2107005506 @default.
- W2139067830 cites W2107580398 @default.
- W2139067830 cites W2108010925 @default.
- W2139067830 cites W2109487646 @default.
- W2139067830 cites W2112671334 @default.
- W2139067830 cites W2114361266 @default.
- W2139067830 cites W2116159459 @default.
- W2139067830 cites W2117479478 @default.
- W2139067830 cites W2117770626 @default.
- W2139067830 cites W2122545791 @default.
- W2139067830 cites W2123439964 @default.
- W2139067830 cites W2126276057 @default.
- W2139067830 cites W2127603354 @default.
- W2139067830 cites W2129113459 @default.
- W2139067830 cites W2130260860 @default.
- W2139067830 cites W2141869602 @default.
- W2139067830 cites W2142364412 @default.
- W2139067830 cites W2142741334 @default.
- W2139067830 cites W2144896636 @default.
- W2139067830 cites W2144949988 @default.
- W2139067830 cites W2150252713 @default.
- W2139067830 cites W2154139219 @default.
- W2139067830 cites W2154142897 @default.
- W2139067830 cites W2155461328 @default.
- W2139067830 cites W2155832627 @default.
- W2139067830 cites W2156111363 @default.
- W2139067830 cites W2157870551 @default.
- W2139067830 cites W2159335058 @default.
- W2139067830 cites W2159620792 @default.
- W2139067830 cites W2162461580 @default.
- W2139067830 cites W2162965868 @default.
- W2139067830 cites W2163107094 @default.
- W2139067830 cites W2168091548 @default.
- W2139067830 cites W2169918010 @default.
- W2139067830 cites W2263893547 @default.
- W2139067830 cites W239119325 @default.
- W2139067830 cites W61219374 @default.
- W2139067830 cites W2127869678 @default.
- W2139067830 doi "https://doi.org/10.1186/2041-1480-4-28" @default.
- W2139067830 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/4021975" @default.
- W2139067830 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/24112383" @default.
- W2139067830 hasPublicationYear "2013" @default.
- W2139067830 type Work @default.
- W2139067830 sameAs 2139067830 @default.
- W2139067830 citedByCount "14" @default.
- W2139067830 countsByYear W21390678302014 @default.
- W2139067830 countsByYear W21390678302015 @default.
- W2139067830 countsByYear W21390678302016 @default.
- W2139067830 countsByYear W21390678302017 @default.
- W2139067830 countsByYear W21390678302018 @default.
- W2139067830 countsByYear W21390678302020 @default.
- W2139067830 countsByYear W21390678302021 @default.
- W2139067830 countsByYear W21390678302022 @default.
- W2139067830 crossrefType "journal-article" @default.
- W2139067830 hasAuthorship W2139067830A5000020062 @default.