Matches in SemOpenAlex for { <https://semopenalex.org/work/W2955474241> ?p ?o ?g. }
- W2955474241 abstract "Abstract This study proposes a text similarity model to help biocuration efforts of the Conserved Domain Database (CDD). CDD is a curated resource that catalogs annotated multiple sequence alignment models for ancient domains and full-length proteins. These models allow for fast searching and quick identification of conserved motifs in protein sequences via Reverse PSI-BLAST. In addition, CDD curators prepare summaries detailing the function of these conserved domains and specific protein families, based on published peer-reviewed articles. To facilitate information access for database users, it is desirable to specifically identify the referenced articles that support the assertions of curator-composed sentences. Moreover, CDD curators desire an alert system that scans the newly published literature and proposes related articles of relevance to the existing CDD records. Our approach to address these needs is a text similarity method that automatically maps a curator-written statement to candidate sentences extracted from the list of referenced articles, as well as the articles in the PubMed Central database. To evaluate this proposal, we paired CDD description sentences with the top 10 matching sentences from the literature, which were given to curators for review. Through this exercise, we discovered that we were able to map the articles in the reference list to the CDD description statements with an accuracy of 77%. In the dataset that was reviewed by curators, we were able to successfully provide references for 86% of the curator statements. In addition, we suggested new articles for curator review, which were accepted by curators to be added into the reference list at an acceptance rate of 50%. Through this process, we developed a substantial corpus of similar sentences from biomedical articles on protein sequence, structure and function research, which constitute the CDD text similarity corpus. This corpus contains 5159 sentence pairs judged for their similarity on a scale from 1 (low) to 5 (high) doubly annotated by four CDD curators. Curator-assigned similarity scores have a Pearson correlation coefficient of 0.70 and an inter-annotator agreement of 85%. To date, this is the largest biomedical text similarity resource that has been manually judged, evaluated and made publicly available to the community to foster research and development of text similarity algorithms." @default.
- W2955474241 created "2019-07-12" @default.
- W2955474241 creator A5002959499 @default.
- W2955474241 creator A5007644986 @default.
- W2955474241 creator A5008126200 @default.
- W2955474241 creator A5009507390 @default.
- W2955474241 creator A5026383671 @default.
- W2955474241 creator A5041025050 @default.
- W2955474241 creator A5049089682 @default.
- W2955474241 creator A5068777093 @default.
- W2955474241 creator A5083081872 @default.
- W2955474241 date "2019-01-01" @default.
- W2955474241 modified "2023-09-25" @default.
- W2955474241 title "PubMed Text Similarity Model and its application to curation efforts in the Conserved Domain Database" @default.
- W2955474241 cites W1964670939 @default.
- W2955474241 cites W1976325156 @default.
- W2955474241 cites W2004622806 @default.
- W2955474241 cites W2017608260 @default.
- W2955474241 cites W2020278455 @default.
- W2955474241 cites W2032443316 @default.
- W2955474241 cites W2044420612 @default.
- W2955474241 cites W2094726706 @default.
- W2955474241 cites W2101727078 @default.
- W2955474241 cites W2120939273 @default.
- W2955474241 cites W2145870108 @default.
- W2955474241 cites W2148130205 @default.
- W2955474241 cites W2166240318 @default.
- W2955474241 cites W2169929748 @default.
- W2955474241 cites W2171895522 @default.
- W2955474241 cites W2224056471 @default.
- W2955474241 cites W2559191318 @default.
- W2955474241 cites W2735784619 @default.
- W2955474241 cites W2952894920 @default.
- W2955474241 cites W2962985038 @default.
- W2955474241 cites W3105439152 @default.
- W2955474241 cites W4213009331 @default.
- W2955474241 doi "https://doi.org/10.1093/database/baz064" @default.
- W2955474241 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/6606757" @default.
- W2955474241 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/31267135" @default.
- W2955474241 hasPublicationYear "2019" @default.
- W2955474241 type Work @default.
- W2955474241 sameAs 2955474241 @default.
- W2955474241 citedByCount "10" @default.
- W2955474241 countsByYear W29554742412020 @default.
- W2955474241 countsByYear W29554742412021 @default.
- W2955474241 countsByYear W29554742412022 @default.
- W2955474241 countsByYear W29554742412023 @default.
- W2955474241 crossrefType "journal-article" @default.
- W2955474241 hasAuthorship W2955474241A5002959499 @default.
- W2955474241 hasAuthorship W2955474241A5007644986 @default.
- W2955474241 hasAuthorship W2955474241A5008126200 @default.
- W2955474241 hasAuthorship W2955474241A5009507390 @default.
- W2955474241 hasAuthorship W2955474241A5026383671 @default.
- W2955474241 hasAuthorship W2955474241A5041025050 @default.
- W2955474241 hasAuthorship W2955474241A5049089682 @default.
- W2955474241 hasAuthorship W2955474241A5068777093 @default.
- W2955474241 hasAuthorship W2955474241A5083081872 @default.
- W2955474241 hasBestOaLocation W29554742411 @default.
- W2955474241 hasConcept C103278499 @default.
- W2955474241 hasConcept C105795698 @default.
- W2955474241 hasConcept C115961682 @default.
- W2955474241 hasConcept C116834253 @default.
- W2955474241 hasConcept C134306372 @default.
- W2955474241 hasConcept C136764020 @default.
- W2955474241 hasConcept C154945302 @default.
- W2955474241 hasConcept C158154518 @default.
- W2955474241 hasConcept C165064840 @default.
- W2955474241 hasConcept C17744445 @default.
- W2955474241 hasConcept C199539241 @default.
- W2955474241 hasConcept C23123220 @default.
- W2955474241 hasConcept C2777026412 @default.
- W2955474241 hasConcept C33923547 @default.
- W2955474241 hasConcept C36503486 @default.
- W2955474241 hasConcept C41008148 @default.
- W2955474241 hasConcept C59822182 @default.
- W2955474241 hasConcept C77088390 @default.
- W2955474241 hasConcept C86803240 @default.
- W2955474241 hasConcept C91632574 @default.
- W2955474241 hasConceptScore W2955474241C103278499 @default.
- W2955474241 hasConceptScore W2955474241C105795698 @default.
- W2955474241 hasConceptScore W2955474241C115961682 @default.
- W2955474241 hasConceptScore W2955474241C116834253 @default.
- W2955474241 hasConceptScore W2955474241C134306372 @default.
- W2955474241 hasConceptScore W2955474241C136764020 @default.
- W2955474241 hasConceptScore W2955474241C154945302 @default.
- W2955474241 hasConceptScore W2955474241C158154518 @default.
- W2955474241 hasConceptScore W2955474241C165064840 @default.
- W2955474241 hasConceptScore W2955474241C17744445 @default.
- W2955474241 hasConceptScore W2955474241C199539241 @default.
- W2955474241 hasConceptScore W2955474241C23123220 @default.
- W2955474241 hasConceptScore W2955474241C2777026412 @default.
- W2955474241 hasConceptScore W2955474241C33923547 @default.
- W2955474241 hasConceptScore W2955474241C36503486 @default.
- W2955474241 hasConceptScore W2955474241C41008148 @default.
- W2955474241 hasConceptScore W2955474241C59822182 @default.
- W2955474241 hasConceptScore W2955474241C77088390 @default.
- W2955474241 hasConceptScore W2955474241C86803240 @default.
- W2955474241 hasConceptScore W2955474241C91632574 @default.
- W2955474241 hasLocation W29554742411 @default.
- W2955474241 hasLocation W29554742412 @default.