Matches in SemOpenAlex for { <https://semopenalex.org/work/W4293083967> ?p ?o ?g. }
- W4293083967 endingPage "17" @default.
- W4293083967 startingPage "1" @default.
- W4293083967 abstract "Recent advancements in deep learning techniques have transformed the area of semantic text matching (STM). However, most state-of-the-art models are designed to operate with short documents such as tweets, user reviews, comments, and so on. These models have fundamental limitations when applied to long-form documents such as scientific papers, legal documents, and patents. When handling such long documents, there are three primary challenges: (i) the presence of different contexts for the same word throughout the document, (ii) small sections of contextually similar text between two documents, but dissimilar text in the remaining parts (this defies the basic understanding of “similarity”), and (iii) the coarse nature of a single global similarity measure which fails to capture the heterogeneity of the document content. In this article, we describe CoLDE : Co ntrastive L ong D ocument E ncoder—a transformer-based framework that addresses these challenges and allows for interpretable comparisons of long documents. CoLDE uses unique positional embeddings and a multi-headed chunkwise attention layer in conjunction with a supervised contrastive learning framework to capture similarity at three different levels: (i) high-level similarity scores between a pair of documents, (ii) similarity scores between different sections within and across documents, and (iii) similarity scores between different chunks in the same document and across other documents. These fine-grained similarity scores aid in better interpretability. We evaluate CoLDE on three long document datasets namely, ACL Anthology publications, Wikipedia articles, and USPTO patents. Besides outperforming the state-of-the-art methods on the document matching task, CoLDE is also robust to changes in document length and text perturbations and provides interpretable results. The code for the proposed model is publicly available at https://github.com/InterDigitalInc/CoLDE ." @default.
- W4293083967 created "2022-08-26" @default.
- W4293083967 creator A5000202397 @default.
- W4293083967 creator A5005165729 @default.
- W4293083967 creator A5085720726 @default.
- W4293083967 creator A5087404235 @default.
- W4293083967 creator A5089322398 @default.
- W4293083967 date "2023-02-20" @default.
- W4293083967 modified "2023-10-12" @default.
- W4293083967 title "Supervised Contrastive Learning for Interpretable Long-Form Document Matching" @default.
- W4293083967 cites W1531333757 @default.
- W4293083967 cites W2028742638 @default.
- W4293083967 cites W2136189984 @default.
- W4293083967 cites W2178628967 @default.
- W4293083967 cites W2286300105 @default.
- W4293083967 cites W2470673105 @default.
- W4293083967 cites W2510530102 @default.
- W4293083967 cites W2536015822 @default.
- W4293083967 cites W2538374209 @default.
- W4293083967 cites W2539671052 @default.
- W4293083967 cites W2769216919 @default.
- W4293083967 cites W2885396331 @default.
- W4293083967 cites W2911997761 @default.
- W4293083967 cites W2950670227 @default.
- W4293083967 cites W2962807820 @default.
- W4293083967 cites W2963798744 @default.
- W4293083967 cites W2964110616 @default.
- W4293083967 cites W2970641574 @default.
- W4293083967 cites W2970726176 @default.
- W4293083967 cites W3019932981 @default.
- W4293083967 cites W3106298483 @default.
- W4293083967 cites W3108316907 @default.
- W4293083967 cites W3156636935 @default.
- W4293083967 cites W3173783447 @default.
- W4293083967 doi "https://doi.org/10.1145/3542822" @default.
- W4293083967 hasPublicationYear "2023" @default.
- W4293083967 type Work @default.
- W4293083967 citedByCount "0" @default.
- W4293083967 crossrefType "journal-article" @default.
- W4293083967 hasAuthorship W4293083967A5000202397 @default.
- W4293083967 hasAuthorship W4293083967A5005165729 @default.
- W4293083967 hasAuthorship W4293083967A5085720726 @default.
- W4293083967 hasAuthorship W4293083967A5087404235 @default.
- W4293083967 hasAuthorship W4293083967A5089322398 @default.
- W4293083967 hasBestOaLocation W42930839671 @default.
- W4293083967 hasConcept C103278499 @default.
- W4293083967 hasConcept C105795698 @default.
- W4293083967 hasConcept C115961682 @default.
- W4293083967 hasConcept C121332964 @default.
- W4293083967 hasConcept C130318100 @default.
- W4293083967 hasConcept C154945302 @default.
- W4293083967 hasConcept C162324750 @default.
- W4293083967 hasConcept C165064840 @default.
- W4293083967 hasConcept C165801399 @default.
- W4293083967 hasConcept C187736073 @default.
- W4293083967 hasConcept C204321447 @default.
- W4293083967 hasConcept C23123220 @default.
- W4293083967 hasConcept C2780451532 @default.
- W4293083967 hasConcept C2781067378 @default.
- W4293083967 hasConcept C33923547 @default.
- W4293083967 hasConcept C41008148 @default.
- W4293083967 hasConcept C62520636 @default.
- W4293083967 hasConcept C66322947 @default.
- W4293083967 hasConceptScore W4293083967C103278499 @default.
- W4293083967 hasConceptScore W4293083967C105795698 @default.
- W4293083967 hasConceptScore W4293083967C115961682 @default.
- W4293083967 hasConceptScore W4293083967C121332964 @default.
- W4293083967 hasConceptScore W4293083967C130318100 @default.
- W4293083967 hasConceptScore W4293083967C154945302 @default.
- W4293083967 hasConceptScore W4293083967C162324750 @default.
- W4293083967 hasConceptScore W4293083967C165064840 @default.
- W4293083967 hasConceptScore W4293083967C165801399 @default.
- W4293083967 hasConceptScore W4293083967C187736073 @default.
- W4293083967 hasConceptScore W4293083967C204321447 @default.
- W4293083967 hasConceptScore W4293083967C23123220 @default.
- W4293083967 hasConceptScore W4293083967C2780451532 @default.
- W4293083967 hasConceptScore W4293083967C2781067378 @default.
- W4293083967 hasConceptScore W4293083967C33923547 @default.
- W4293083967 hasConceptScore W4293083967C41008148 @default.
- W4293083967 hasConceptScore W4293083967C62520636 @default.
- W4293083967 hasConceptScore W4293083967C66322947 @default.
- W4293083967 hasFunder F4320306076 @default.
- W4293083967 hasIssue "2" @default.
- W4293083967 hasLocation W42930839671 @default.
- W4293083967 hasLocation W42930839672 @default.
- W4293083967 hasLocation W42930839673 @default.
- W4293083967 hasOpenAccess W4293083967 @default.
- W4293083967 hasPrimaryLocation W42930839671 @default.
- W4293083967 hasRelatedWork W2147243590 @default.
- W4293083967 hasRelatedWork W2153717697 @default.
- W4293083967 hasRelatedWork W2252005665 @default.
- W4293083967 hasRelatedWork W2380556669 @default.
- W4293083967 hasRelatedWork W2564015900 @default.
- W4293083967 hasRelatedWork W2922382144 @default.
- W4293083967 hasRelatedWork W2963829519 @default.
- W4293083967 hasRelatedWork W2974225181 @default.
- W4293083967 hasRelatedWork W3078371441 @default.
- W4293083967 hasRelatedWork W4288108740 @default.