Matches in SemOpenAlex for { <https://semopenalex.org/work/W4281753295> ?p ?o ?g. }
- W4281753295 abstract "Abstract Background Tools for accurately clustering biological sequences are among the most important tools in computational biology. Two pioneering tools for clustering sequences are CD-HIT and UCLUST , both of which are fast and consume reasonable amounts of memory; however, there is a big room for improvement in terms of cluster quality. Motivated by this opportunity for improving cluster quality, we applied the mean shift algorithm in MeShClust v1.0 . The mean shift algorithm is an instance of unsupervised learning. Its strong theoretical foundation guarantees the convergence to the true cluster centers. Our implementation of the mean shift algorithm in MeShClust v1.0 was a step forward. In this work, we scale up the algorithm by adapting an out-of-core strategy while utilizing alignment-free identity scores in a new tool: MeShClust v3.0 . Results We evaluated CD-HIT , MeShClust v1.0 , MeShClust v3.0 , and UCLUST on 22 synthetic sets and five real sets. These data sets were designed or selected for testing the tools in terms of scalability and different similarity levels among sequences comprising clusters. On the synthetic data sets, MeShClust v3.0 outperformed the related tools on all sets in terms of cluster quality. On two real data sets obtained from human microbiome and maize transposons, MeShClust v3.0 outperformed the related tools by wide margins, achieving 55%–300% improvement in cluster quality. On another set that includes degenerate viral sequences, MeShClust v3.0 came third. On two bacterial sets, MeShClust v3.0 was the only applicable tool because of the long sequences in these sets. MeShClust v3.0 requires more time and memory than the related tools; almost all personal computers at the time of this writing can accommodate such requirements. MeShClust v3.0 can estimate an important parameter that controls cluster membership with high accuracy. Conclusions These results demonstrate the high quality of clusters produced by MeShClust v3.0 and its ability to apply the mean shift algorithm to large data sets and long sequences. Because clustering tools are utilized in many studies, providing high-quality clusters will help with deriving accurate biological knowledge." @default.
- W4281753295 created "2022-06-13" @default.
- W4281753295 creator A5031037437 @default.
- W4281753295 date "2022-06-06" @default.
- W4281753295 modified "2023-10-18" @default.
- W4281753295 title "MeShClust v3.0: high-quality clustering of DNA sequences using the mean shift algorithm and alignment-free identity scores" @default.
- W4281753295 cites W1519266993 @default.
- W4281753295 cites W1980739145 @default.
- W4281753295 cites W1987971958 @default.
- W4281753295 cites W1989889539 @default.
- W4281753295 cites W1989936903 @default.
- W4281753295 cites W2017337590 @default.
- W4281753295 cites W2022686119 @default.
- W4281753295 cites W2029064186 @default.
- W4281753295 cites W2030644393 @default.
- W4281753295 cites W2047063122 @default.
- W4281753295 cites W2048986223 @default.
- W4281753295 cites W2051224630 @default.
- W4281753295 cites W2067191022 @default.
- W4281753295 cites W2068448872 @default.
- W4281753295 cites W2072022921 @default.
- W4281753295 cites W2074231493 @default.
- W4281753295 cites W2124351063 @default.
- W4281753295 cites W2129849193 @default.
- W4281753295 cites W2156125289 @default.
- W4281753295 cites W2228304976 @default.
- W4281753295 cites W2565824505 @default.
- W4281753295 cites W2761430568 @default.
- W4281753295 cites W2774657098 @default.
- W4281753295 cites W2948076807 @default.
- W4281753295 cites W2950589160 @default.
- W4281753295 cites W2962807110 @default.
- W4281753295 cites W2987559184 @default.
- W4281753295 cites W3084123916 @default.
- W4281753295 cites W3092563965 @default.
- W4281753295 cites W3103230670 @default.
- W4281753295 cites W3120540968 @default.
- W4281753295 cites W3128249532 @default.
- W4281753295 cites W3131716537 @default.
- W4281753295 cites W3183923930 @default.
- W4281753295 cites W3199306065 @default.
- W4281753295 cites W3209047475 @default.
- W4281753295 cites W4282923058 @default.
- W4281753295 doi "https://doi.org/10.1186/s12864-022-08619-0" @default.
- W4281753295 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/35668366" @default.
- W4281753295 hasPublicationYear "2022" @default.
- W4281753295 type Work @default.
- W4281753295 citedByCount "9" @default.
- W4281753295 countsByYear W42817532952022 @default.
- W4281753295 countsByYear W42817532952023 @default.
- W4281753295 crossrefType "journal-article" @default.
- W4281753295 hasAuthorship W4281753295A5031037437 @default.
- W4281753295 hasBestOaLocation W42817532951 @default.
- W4281753295 hasConcept C103278499 @default.
- W4281753295 hasConcept C11413529 @default.
- W4281753295 hasConcept C115961682 @default.
- W4281753295 hasConcept C124101348 @default.
- W4281753295 hasConcept C153180895 @default.
- W4281753295 hasConcept C154945302 @default.
- W4281753295 hasConcept C162324750 @default.
- W4281753295 hasConcept C164866538 @default.
- W4281753295 hasConcept C177264268 @default.
- W4281753295 hasConcept C199360897 @default.
- W4281753295 hasConcept C2777303404 @default.
- W4281753295 hasConcept C41008148 @default.
- W4281753295 hasConcept C48044578 @default.
- W4281753295 hasConcept C50522688 @default.
- W4281753295 hasConcept C73555534 @default.
- W4281753295 hasConcept C77088390 @default.
- W4281753295 hasConceptScore W4281753295C103278499 @default.
- W4281753295 hasConceptScore W4281753295C11413529 @default.
- W4281753295 hasConceptScore W4281753295C115961682 @default.
- W4281753295 hasConceptScore W4281753295C124101348 @default.
- W4281753295 hasConceptScore W4281753295C153180895 @default.
- W4281753295 hasConceptScore W4281753295C154945302 @default.
- W4281753295 hasConceptScore W4281753295C162324750 @default.
- W4281753295 hasConceptScore W4281753295C164866538 @default.
- W4281753295 hasConceptScore W4281753295C177264268 @default.
- W4281753295 hasConceptScore W4281753295C199360897 @default.
- W4281753295 hasConceptScore W4281753295C2777303404 @default.
- W4281753295 hasConceptScore W4281753295C41008148 @default.
- W4281753295 hasConceptScore W4281753295C48044578 @default.
- W4281753295 hasConceptScore W4281753295C50522688 @default.
- W4281753295 hasConceptScore W4281753295C73555534 @default.
- W4281753295 hasConceptScore W4281753295C77088390 @default.
- W4281753295 hasFunder F4320333010 @default.
- W4281753295 hasIssue "1" @default.
- W4281753295 hasLocation W42817532951 @default.
- W4281753295 hasLocation W42817532952 @default.
- W4281753295 hasLocation W42817532953 @default.
- W4281753295 hasLocation W42817532954 @default.
- W4281753295 hasLocation W42817532955 @default.
- W4281753295 hasOpenAccess W4281753295 @default.
- W4281753295 hasPrimaryLocation W42817532951 @default.
- W4281753295 hasRelatedWork W1457719682 @default.
- W4281753295 hasRelatedWork W1525643724 @default.
- W4281753295 hasRelatedWork W2067938758 @default.
- W4281753295 hasRelatedWork W2302028273 @default.
- W4281753295 hasRelatedWork W2364921833 @default.
- W4281753295 hasRelatedWork W2366792704 @default.