SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4238787739> ?p ?o ?g. }

Showing items 1 to 61 of 61 with 100 items per page.

W4238787739 abstract "<sec> <title>BACKGROUND</title> Semantic textual similarity (STS) is a natural language processing (NLP) task that involves assigning a similarity score to 2 snippets of text based on their meaning. This task is particularly difficult in the domain of clinical text, which often features specialized language and the frequent use of abbreviations. </sec> <sec> <title>OBJECTIVE</title> We created an NLP system to predict similarity scores for sentence pairs as part of the Clinical Semantic Textual Similarity track in the 2019 n2c2/OHNLP Shared Task on Challenges in Natural Language Processing for Clinical Data. We subsequently sought to analyze the intermediary token vectors extracted from our models while processing a pair of clinical sentences to identify where and how representations of semantic similarity are built in transformer models. </sec> <sec> <title>METHODS</title> Given a clinical sentence pair, we take the average predicted similarity score across several independently fine-tuned transformers. In our model analysis we investigated the relationship between the final model’s loss and surface features of the sentence pairs and assessed the decodability and representational similarity of the token vectors generated by each model. </sec> <sec> <title>RESULTS</title> Our model achieved a correlation of 0.87 with the ground-truth similarity score, reaching 6th place out of 33 teams (with a first-place score of 0.90). In detailed qualitative and quantitative analyses of the model’s loss, we identified the system’s failure to correctly model semantic similarity when both sentence pairs contain details of medical prescriptions, as well as its general tendency to overpredict semantic similarity given significant token overlap. The token vector analysis revealed divergent representational strategies for predicting textual similarity between bidirectional encoder representations from transformers (BERT)–style models and XLNet. We also found that a large amount information relevant to predicting STS can be captured using a combination of a classification token and the cosine distance between sentence-pair representations in the first layer of a transformer model that did not produce the best predictions on the test set. </sec> <sec> <title>CONCLUSIONS</title> We designed and trained a system that uses state-of-the-art NLP models to achieve very competitive results on a new clinical STS data set. As our approach uses no hand-crafted rules, it serves as a strong deep learning baseline for this task. Our key contribution is a detailed analysis of the model’s outputs and an investigation of the heuristic biases learned by transformer models. We suggest future improvements based on these findings. In our representational analysis we explore how different transformer models converge or diverge in their representation of semantic signals as the tokens of the sentences are augmented by successive layers. This analysis sheds light on how these “black box” models integrate semantic similarity information in intermediate layers, and points to new research directions in model distillation and sentence embedding extraction for applications in clinical NLP. </sec>" @default.
W4238787739 created "2022-05-12" @default.
W4238787739 creator A5007017103 @default.
W4238787739 creator A5014600847 @default.
W4238787739 creator A5072304991 @default.
W4238787739 date "2020-07-31" @default.
W4238787739 modified "2023-10-16" @default.
W4238787739 title "Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis (Preprint)" @default.
W4238787739 cites W2134302857 @default.
W4238787739 doi "https://doi.org/10.2196/preprints.23099" @default.
W4238787739 hasPublicationYear "2020" @default.
W4238787739 type Work @default.
W4238787739 citedByCount "0" @default.
W4238787739 crossrefType "posted-content" @default.
W4238787739 hasAuthorship W4238787739A5007017103 @default.
W4238787739 hasAuthorship W4238787739A5014600847 @default.
W4238787739 hasAuthorship W4238787739A5072304991 @default.
W4238787739 hasBestOaLocation W42387877392 @default.
W4238787739 hasConcept C103278499 @default.
W4238787739 hasConcept C115961682 @default.
W4238787739 hasConcept C121332964 @default.
W4238787739 hasConcept C130318100 @default.
W4238787739 hasConcept C154945302 @default.
W4238787739 hasConcept C165801399 @default.
W4238787739 hasConcept C204321447 @default.
W4238787739 hasConcept C2777530160 @default.
W4238787739 hasConcept C38652104 @default.
W4238787739 hasConcept C41008148 @default.
W4238787739 hasConcept C48145219 @default.
W4238787739 hasConcept C62520636 @default.
W4238787739 hasConcept C66322947 @default.
W4238787739 hasConceptScore W4238787739C103278499 @default.
W4238787739 hasConceptScore W4238787739C115961682 @default.
W4238787739 hasConceptScore W4238787739C121332964 @default.
W4238787739 hasConceptScore W4238787739C130318100 @default.
W4238787739 hasConceptScore W4238787739C154945302 @default.
W4238787739 hasConceptScore W4238787739C165801399 @default.
W4238787739 hasConceptScore W4238787739C204321447 @default.
W4238787739 hasConceptScore W4238787739C2777530160 @default.
W4238787739 hasConceptScore W4238787739C38652104 @default.
W4238787739 hasConceptScore W4238787739C41008148 @default.
W4238787739 hasConceptScore W4238787739C48145219 @default.
W4238787739 hasConceptScore W4238787739C62520636 @default.
W4238787739 hasConceptScore W4238787739C66322947 @default.
W4238787739 hasLocation W42387877391 @default.
W4238787739 hasLocation W42387877392 @default.
W4238787739 hasOpenAccess W4238787739 @default.
W4238787739 hasPrimaryLocation W42387877391 @default.
W4238787739 hasRelatedWork W11313625 @default.
W4238787739 hasRelatedWork W11990303 @default.
W4238787739 hasRelatedWork W13343750 @default.
W4238787739 hasRelatedWork W13618705 @default.
W4238787739 hasRelatedWork W14350660 @default.
W4238787739 hasRelatedWork W149980 @default.
W4238787739 hasRelatedWork W2060686 @default.
W4238787739 hasRelatedWork W2061806 @default.
W4238787739 hasRelatedWork W8300060 @default.
W4238787739 hasRelatedWork W867563 @default.
W4238787739 isParatext "false" @default.
W4238787739 isRetracted "false" @default.
W4238787739 workType "article" @default.