Matches in SemOpenAlex for { <https://semopenalex.org/work/W4200445293> ?p ?o ?g. }
- W4200445293 endingPage "e27386" @default.
- W4200445293 startingPage "e27386" @default.
- W4200445293 abstract "Semantic textual similarity (STS) measures the degree of relatedness between sentence pairs. The Open Health Natural Language Processing (OHNLP) Consortium released an expertly annotated STS data set and called for the National Natural Language Processing Clinical Challenges. This work describes our entry, an ensemble model that leverages a range of deep learning (DL) models. Our team from the National Library of Medicine obtained a Pearson correlation of 0.8967 in an official test set during 2019 National Natural Language Processing Clinical Challenges/Open Health Natural Language Processing shared task and achieved a second rank.Although our models strongly correlate with manual annotations, annotator-level correlation was only moderate (weighted Cohen κ=0.60). We are cautious of the potential use of DL models in production systems and argue that it is more critical to evaluate the models in-depth, especially those with extremely high correlations. In this study, we benchmark the effectiveness and efficiency of top-ranked DL models. We quantify their robustness and inference times to validate their usefulness in real-time applications.We benchmarked five DL models, which are the top-ranked systems for STS tasks: Convolutional Neural Network, BioSentVec, BioBERT, BlueBERT, and ClinicalBERT. We evaluated a random forest model as an additional baseline. For each model, we repeated the experiment 10 times, using the official training and testing sets. We reported 95% CI of the Wilcoxon rank-sum test on the average Pearson correlation (official evaluation metric) and running time. We further evaluated Spearman correlation, R², and mean squared error as additional measures.Using only the official training set, all models obtained highly effective results. BioSentVec and BioBERT achieved the highest average Pearson correlations (0.8497 and 0.8481, respectively). BioSentVec also had the highest results in 3 of 4 effectiveness measures, followed by BioBERT. However, their robustness to sentence pairs of different similarity levels varies significantly. A particular observation is that BERT models made the most errors (a mean squared error of over 2.5) on highly similar sentence pairs. They cannot capture highly similar sentence pairs effectively when they have different negation terms or word orders. In addition, time efficiency is dramatically different from the effectiveness results. On average, the BERT models were approximately 20 times and 50 times slower than the Convolutional Neural Network and BioSentVec models, respectively. This results in challenges for real-time applications.Despite the excitement of further improving Pearson correlations in this data set, our results highlight that evaluations of the effectiveness and efficiency of STS models are critical. In future, we suggest more evaluations on the generalization capability and user-level testing of the models. We call for community efforts to create more biomedical and clinical STS data sets from different perspectives to reflect the multifaceted notion of sentence-relatedness." @default.
- W4200445293 created "2021-12-31" @default.
- W4200445293 creator A5013811383 @default.
- W4200445293 creator A5033862822 @default.
- W4200445293 creator A5040545882 @default.
- W4200445293 creator A5042874172 @default.
- W4200445293 creator A5083081872 @default.
- W4200445293 date "2021-12-30" @default.
- W4200445293 modified "2023-09-30" @default.
- W4200445293 title "Benchmarking Effectiveness and Efficiency of Deep Learning Models for Semantic Textual Similarity in the Clinical Domain: Validation Study" @default.
- W4200445293 cites W2250539671 @default.
- W4200445293 cites W2493916176 @default.
- W4200445293 cites W2525575452 @default.
- W4200445293 cites W2735784619 @default.
- W4200445293 cites W2751762827 @default.
- W4200445293 cites W2793350103 @default.
- W4200445293 cites W2888285200 @default.
- W4200445293 cites W2888557427 @default.
- W4200445293 cites W2889272240 @default.
- W4200445293 cites W2911489562 @default.
- W4200445293 cites W2940542551 @default.
- W4200445293 cites W2944400536 @default.
- W4200445293 cites W2963341956 @default.
- W4200445293 cites W2963804993 @default.
- W4200445293 cites W2963923670 @default.
- W4200445293 cites W2971258845 @default.
- W4200445293 cites W3009459039 @default.
- W4200445293 cites W3014227648 @default.
- W4200445293 cites W3017463390 @default.
- W4200445293 cites W3020931369 @default.
- W4200445293 cites W3023154087 @default.
- W4200445293 cites W3037063616 @default.
- W4200445293 cites W3094834348 @default.
- W4200445293 cites W3104033643 @default.
- W4200445293 cites W3104059174 @default.
- W4200445293 cites W3118454233 @default.
- W4200445293 cites W3161987470 @default.
- W4200445293 cites W4244393794 @default.
- W4200445293 doi "https://doi.org/10.2196/27386" @default.
- W4200445293 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/34967748" @default.
- W4200445293 hasPublicationYear "2021" @default.
- W4200445293 type Work @default.
- W4200445293 citedByCount "6" @default.
- W4200445293 countsByYear W42004452932022 @default.
- W4200445293 countsByYear W42004452932023 @default.
- W4200445293 crossrefType "journal-article" @default.
- W4200445293 hasAuthorship W4200445293A5013811383 @default.
- W4200445293 hasAuthorship W4200445293A5033862822 @default.
- W4200445293 hasAuthorship W4200445293A5040545882 @default.
- W4200445293 hasAuthorship W4200445293A5042874172 @default.
- W4200445293 hasAuthorship W4200445293A5083081872 @default.
- W4200445293 hasBestOaLocation W42004452931 @default.
- W4200445293 hasConcept C101601086 @default.
- W4200445293 hasConcept C104317684 @default.
- W4200445293 hasConcept C105795698 @default.
- W4200445293 hasConcept C108583219 @default.
- W4200445293 hasConcept C117220453 @default.
- W4200445293 hasConcept C119857082 @default.
- W4200445293 hasConcept C12868164 @default.
- W4200445293 hasConcept C130318100 @default.
- W4200445293 hasConcept C144133560 @default.
- W4200445293 hasConcept C148524875 @default.
- W4200445293 hasConcept C154945302 @default.
- W4200445293 hasConcept C159744936 @default.
- W4200445293 hasConcept C162324750 @default.
- W4200445293 hasConcept C162853370 @default.
- W4200445293 hasConcept C169258074 @default.
- W4200445293 hasConcept C169903167 @default.
- W4200445293 hasConcept C176217482 @default.
- W4200445293 hasConcept C185592680 @default.
- W4200445293 hasConcept C204321447 @default.
- W4200445293 hasConcept C206041023 @default.
- W4200445293 hasConcept C21547014 @default.
- W4200445293 hasConcept C2524010 @default.
- W4200445293 hasConcept C2777530160 @default.
- W4200445293 hasConcept C33923547 @default.
- W4200445293 hasConcept C41008148 @default.
- W4200445293 hasConcept C55078378 @default.
- W4200445293 hasConcept C55493867 @default.
- W4200445293 hasConcept C63479239 @default.
- W4200445293 hasConcept C81363708 @default.
- W4200445293 hasConcept C86251818 @default.
- W4200445293 hasConceptScore W4200445293C101601086 @default.
- W4200445293 hasConceptScore W4200445293C104317684 @default.
- W4200445293 hasConceptScore W4200445293C105795698 @default.
- W4200445293 hasConceptScore W4200445293C108583219 @default.
- W4200445293 hasConceptScore W4200445293C117220453 @default.
- W4200445293 hasConceptScore W4200445293C119857082 @default.
- W4200445293 hasConceptScore W4200445293C12868164 @default.
- W4200445293 hasConceptScore W4200445293C130318100 @default.
- W4200445293 hasConceptScore W4200445293C144133560 @default.
- W4200445293 hasConceptScore W4200445293C148524875 @default.
- W4200445293 hasConceptScore W4200445293C154945302 @default.
- W4200445293 hasConceptScore W4200445293C159744936 @default.
- W4200445293 hasConceptScore W4200445293C162324750 @default.
- W4200445293 hasConceptScore W4200445293C162853370 @default.
- W4200445293 hasConceptScore W4200445293C169258074 @default.
- W4200445293 hasConceptScore W4200445293C169903167 @default.