Matches in SemOpenAlex for { <https://semopenalex.org/work/W2977458312> ?p ?o ?g. }
Showing items 1 to 95 of
95
with 100 items per page.
- W2977458312 abstract "Author name disambiguation (AND) is an important task in the field of scientific data mining. It has become a great challenge with the rapid growth of academic digital libraries. The task of AND for a large number of authors is computationally intensive. In particular, an author's name in MEDLINE is represented by full last name and initials, like Zhang S, which leads to a lot of identical strings that actually represent different names. In this paper, we proposed an efficient algorithm for parallel AND computation. The proposed algorithm mainly addresses the load balancing issue across many computing nodes. It involves the following strategies:(1) Author-based load balancing, which splits the computation load for each core by author name labels. (2) Matrix-based strategy, which calculates the pairwise similarity between publications and saves them in a matrix globally shared by all processes. Then group them by width-first search. We combine the above two strategies, the second of which is used to calculate authors with a large number of documents, and the other authors apply the first. We constructed a publications database written by Chinese authors from MEDLINE, the biggest public database for biomedical literature (abstracts). For benchmark testing, we experimented our algorithm with a dataset of 1 million publications on the Tianhe-2A supercomputer. Firstly, we trained an AND classifier that can achieve 98.1% of F1. The serial computation time is estimated to be approximately 246 hours, while the parallel execution time is approximately 66 hours in the case of four cores on a single node (with a speedup of 3.7x). Finally, we reduced the total parallel computing time of 1 million documents to about 2 hours and achieved 65.8% of parallelism efficiency using 200 cores on 90 nodes." @default.
- W2977458312 created "2019-10-10" @default.
- W2977458312 creator A5000803798 @default.
- W2977458312 creator A5036241469 @default.
- W2977458312 creator A5044053690 @default.
- W2977458312 creator A5051392171 @default.
- W2977458312 creator A5054335730 @default.
- W2977458312 creator A5085352453 @default.
- W2977458312 date "2019-08-01" @default.
- W2977458312 modified "2023-10-16" @default.
- W2977458312 title "Parallel Computing for Large-Scale Author Name Disambiguation in MEDLINE" @default.
- W2977458312 cites W1208937987 @default.
- W2977458312 cites W1539434515 @default.
- W2977458312 cites W1541691357 @default.
- W2977458312 cites W1979532552 @default.
- W2977458312 cites W1982912387 @default.
- W2977458312 cites W2007172042 @default.
- W2977458312 cites W2019465613 @default.
- W2977458312 cites W2039599657 @default.
- W2977458312 cites W2063326034 @default.
- W2977458312 cites W2090987348 @default.
- W2977458312 cites W2098365647 @default.
- W2977458312 cites W2131193521 @default.
- W2977458312 cites W2131871440 @default.
- W2977458312 cites W2136517229 @default.
- W2977458312 cites W2156281098 @default.
- W2977458312 cites W2162965868 @default.
- W2977458312 cites W2340568692 @default.
- W2977458312 cites W2785392915 @default.
- W2977458312 cites W2963863453 @default.
- W2977458312 cites W3098845338 @default.
- W2977458312 doi "https://doi.org/10.1109/hpcc/smartcity/dss.2019.00217" @default.
- W2977458312 hasPublicationYear "2019" @default.
- W2977458312 type Work @default.
- W2977458312 sameAs 2977458312 @default.
- W2977458312 citedByCount "1" @default.
- W2977458312 countsByYear W29774583122020 @default.
- W2977458312 crossrefType "proceedings-article" @default.
- W2977458312 hasAuthorship W2977458312A5000803798 @default.
- W2977458312 hasAuthorship W2977458312A5036241469 @default.
- W2977458312 hasAuthorship W2977458312A5044053690 @default.
- W2977458312 hasAuthorship W2977458312A5051392171 @default.
- W2977458312 hasAuthorship W2977458312A5054335730 @default.
- W2977458312 hasAuthorship W2977458312A5085352453 @default.
- W2977458312 hasConcept C11413529 @default.
- W2977458312 hasConcept C124952713 @default.
- W2977458312 hasConcept C13280743 @default.
- W2977458312 hasConcept C142362112 @default.
- W2977458312 hasConcept C154945302 @default.
- W2977458312 hasConcept C162324750 @default.
- W2977458312 hasConcept C164913051 @default.
- W2977458312 hasConcept C184898388 @default.
- W2977458312 hasConcept C185798385 @default.
- W2977458312 hasConcept C187736073 @default.
- W2977458312 hasConcept C205649164 @default.
- W2977458312 hasConcept C23123220 @default.
- W2977458312 hasConcept C2780451532 @default.
- W2977458312 hasConcept C41008148 @default.
- W2977458312 hasConcept C45374587 @default.
- W2977458312 hasConcept C513874922 @default.
- W2977458312 hasConcept C95623464 @default.
- W2977458312 hasConceptScore W2977458312C11413529 @default.
- W2977458312 hasConceptScore W2977458312C124952713 @default.
- W2977458312 hasConceptScore W2977458312C13280743 @default.
- W2977458312 hasConceptScore W2977458312C142362112 @default.
- W2977458312 hasConceptScore W2977458312C154945302 @default.
- W2977458312 hasConceptScore W2977458312C162324750 @default.
- W2977458312 hasConceptScore W2977458312C164913051 @default.
- W2977458312 hasConceptScore W2977458312C184898388 @default.
- W2977458312 hasConceptScore W2977458312C185798385 @default.
- W2977458312 hasConceptScore W2977458312C187736073 @default.
- W2977458312 hasConceptScore W2977458312C205649164 @default.
- W2977458312 hasConceptScore W2977458312C23123220 @default.
- W2977458312 hasConceptScore W2977458312C2780451532 @default.
- W2977458312 hasConceptScore W2977458312C41008148 @default.
- W2977458312 hasConceptScore W2977458312C45374587 @default.
- W2977458312 hasConceptScore W2977458312C513874922 @default.
- W2977458312 hasConceptScore W2977458312C95623464 @default.
- W2977458312 hasLocation W29774583121 @default.
- W2977458312 hasOpenAccess W2977458312 @default.
- W2977458312 hasPrimaryLocation W29774583121 @default.
- W2977458312 hasRelatedWork W12066792 @default.
- W2977458312 hasRelatedWork W12829028 @default.
- W2977458312 hasRelatedWork W14516383 @default.
- W2977458312 hasRelatedWork W149980 @default.
- W2977458312 hasRelatedWork W605621 @default.
- W2977458312 hasRelatedWork W6264993 @default.
- W2977458312 hasRelatedWork W6680660 @default.
- W2977458312 hasRelatedWork W6745161 @default.
- W2977458312 hasRelatedWork W728297 @default.
- W2977458312 hasRelatedWork W5208458 @default.
- W2977458312 isParatext "false" @default.
- W2977458312 isRetracted "false" @default.
- W2977458312 magId "2977458312" @default.
- W2977458312 workType "article" @default.