Matches in SemOpenAlex for { <https://semopenalex.org/work/W4312491933> ?p ?o ?g. }
Showing items 1 to 86 of
86
with 100 items per page.
- W4312491933 abstract "Recently, the Allen Institute for Artificial Intelligence released the Semantic Scholar Open Research Corpus (S2ORC), one of the largest open-access scholarly big datasets with more than 130 million scholarly paper records. S2ORC contains a significant portion of automatically generated metadata. The metadata quality could impact downstream tasks such as citation analysis, citation prediction, and link analysis. In this project, we assess the document linking quality and estimate the document conflation rate for the S2ORC dataset. Using semi-automatically curated ground truth corpora, we estimated that the overall document linking quality is high, with 92.6% of documents correctly linking to six major databases, but the linking quality varies depending on subject domains. The document conflation rate is around 2.6%, meaning that about 97.4% of documents are unique. We further quantitatively compared three near-duplicate detection methods using the ground truth created from S2ORC. The experiments indicated that locality-sensitive hashing was the best method in terms of effectiveness and scalability, achieving high performance (F1=0.960) and a much reduced runtime. Our code and data are available at https://github.com/lamps-lab/docconflation." @default.
- W4312491933 created "2023-01-05" @default.
- W4312491933 creator A5001294898 @default.
- W4312491933 creator A5015568562 @default.
- W4312491933 creator A5075242841 @default.
- W4312491933 creator A5088754329 @default.
- W4312491933 date "2022-09-20" @default.
- W4312491933 modified "2023-10-18" @default.
- W4312491933 title "Scholarly big data quality assessment" @default.
- W4312491933 cites W1603719052 @default.
- W4312491933 cites W2038319167 @default.
- W4312491933 cites W2046977065 @default.
- W4312491933 cites W2168190036 @default.
- W4312491933 cites W2213054775 @default.
- W4312491933 cites W2570427954 @default.
- W4312491933 cites W2789337509 @default.
- W4312491933 cites W2945941062 @default.
- W4312491933 cites W3002924435 @default.
- W4312491933 cites W3015453090 @default.
- W4312491933 cites W3099977667 @default.
- W4312491933 cites W3102611879 @default.
- W4312491933 cites W3120655260 @default.
- W4312491933 doi "https://doi.org/10.1145/3558100.3563850" @default.
- W4312491933 hasPublicationYear "2022" @default.
- W4312491933 type Work @default.
- W4312491933 citedByCount "0" @default.
- W4312491933 crossrefType "proceedings-article" @default.
- W4312491933 hasAuthorship W4312491933A5001294898 @default.
- W4312491933 hasAuthorship W4312491933A5015568562 @default.
- W4312491933 hasAuthorship W4312491933A5075242841 @default.
- W4312491933 hasAuthorship W4312491933A5088754329 @default.
- W4312491933 hasBestOaLocation W43124919331 @default.
- W4312491933 hasConcept C111472728 @default.
- W4312491933 hasConcept C124101348 @default.
- W4312491933 hasConcept C130440534 @default.
- W4312491933 hasConcept C136764020 @default.
- W4312491933 hasConcept C138885662 @default.
- W4312491933 hasConcept C146849305 @default.
- W4312491933 hasConcept C153048206 @default.
- W4312491933 hasConcept C154945302 @default.
- W4312491933 hasConcept C23123220 @default.
- W4312491933 hasConcept C2522767166 @default.
- W4312491933 hasConcept C2778805511 @default.
- W4312491933 hasConcept C2779530757 @default.
- W4312491933 hasConcept C38652104 @default.
- W4312491933 hasConcept C41008148 @default.
- W4312491933 hasConcept C48044578 @default.
- W4312491933 hasConcept C75684735 @default.
- W4312491933 hasConcept C77088390 @default.
- W4312491933 hasConcept C93518851 @default.
- W4312491933 hasConcept C99138194 @default.
- W4312491933 hasConceptScore W4312491933C111472728 @default.
- W4312491933 hasConceptScore W4312491933C124101348 @default.
- W4312491933 hasConceptScore W4312491933C130440534 @default.
- W4312491933 hasConceptScore W4312491933C136764020 @default.
- W4312491933 hasConceptScore W4312491933C138885662 @default.
- W4312491933 hasConceptScore W4312491933C146849305 @default.
- W4312491933 hasConceptScore W4312491933C153048206 @default.
- W4312491933 hasConceptScore W4312491933C154945302 @default.
- W4312491933 hasConceptScore W4312491933C23123220 @default.
- W4312491933 hasConceptScore W4312491933C2522767166 @default.
- W4312491933 hasConceptScore W4312491933C2778805511 @default.
- W4312491933 hasConceptScore W4312491933C2779530757 @default.
- W4312491933 hasConceptScore W4312491933C38652104 @default.
- W4312491933 hasConceptScore W4312491933C41008148 @default.
- W4312491933 hasConceptScore W4312491933C48044578 @default.
- W4312491933 hasConceptScore W4312491933C75684735 @default.
- W4312491933 hasConceptScore W4312491933C77088390 @default.
- W4312491933 hasConceptScore W4312491933C93518851 @default.
- W4312491933 hasConceptScore W4312491933C99138194 @default.
- W4312491933 hasLocation W43124919331 @default.
- W4312491933 hasOpenAccess W4312491933 @default.
- W4312491933 hasPrimaryLocation W43124919331 @default.
- W4312491933 hasRelatedWork W1965294778 @default.
- W4312491933 hasRelatedWork W1992807924 @default.
- W4312491933 hasRelatedWork W2076264610 @default.
- W4312491933 hasRelatedWork W2148539344 @default.
- W4312491933 hasRelatedWork W2354642172 @default.
- W4312491933 hasRelatedWork W2361349944 @default.
- W4312491933 hasRelatedWork W2368437561 @default.
- W4312491933 hasRelatedWork W2548059104 @default.
- W4312491933 hasRelatedWork W4312491933 @default.
- W4312491933 hasRelatedWork W2396520157 @default.
- W4312491933 isParatext "false" @default.
- W4312491933 isRetracted "false" @default.
- W4312491933 workType "article" @default.