Matches in SemOpenAlex for { <https://semopenalex.org/work/W2103187775> ?p ?o ?g. }
- W2103187775 abstract "Fueled by rapid progress in high-throughput sequencing, the size of public sequence databases doubles every two years. Searching the ever larger and more redundant databases is getting increasingly inefficient. Clustering can help to organize sequences into homologous and functionally similar groups and can improve the speed, sensitivity, and readability of homology searches. However, because the clustering time is quadratic in the number of sequences, standard sequence search methods are becoming impracticable.Here we present a method to cluster large protein sequence databases such as UniProt within days down to 20%-30% maximum pairwise sequence identity. kClust owes its speed and sensitivity to an alignment-free prefilter that calculates the cumulative score of all similar 6-mers between pairs of sequences, and to a dynamic programming algorithm that operates on pairs of similar 4-mers. To increase sensitivity further, kClust can run in profile-sequence comparison mode, with profiles computed from the clusters of a previous kClust iteration. kClust is two to three orders of magnitude faster than clustering based on NCBI BLAST, and on multidomain sequences of 20%-30% maximum pairwise sequence identity it achieves comparable sensitivity and a lower false discovery rate. It also compares favorably to CD-HIT and UCLUST in terms of false discovery rate, sensitivity, and speed.kClust fills the need for a fast, sensitive, and accurate tool to cluster large protein sequence databases to below 30% sequence identity. kClust is freely available under GPL at http://toolkit.lmb.uni-muenchen.de/pub/kClust/." @default.
- W2103187775 created "2016-06-24" @default.
- W2103187775 creator A5010030898 @default.
- W2103187775 creator A5058804289 @default.
- W2103187775 creator A5063897974 @default.
- W2103187775 date "2013-08-15" @default.
- W2103187775 modified "2023-10-16" @default.
- W2103187775 title "kClust: fast and sensitive clustering of large protein sequence databases" @default.
- W2103187775 cites W1536270671 @default.
- W2103187775 cites W1995530145 @default.
- W2103187775 cites W2008856488 @default.
- W2103187775 cites W2015292449 @default.
- W2103187775 cites W2023046869 @default.
- W2103187775 cites W2041862730 @default.
- W2103187775 cites W2051210555 @default.
- W2103187775 cites W2055043387 @default.
- W2103187775 cites W2076048958 @default.
- W2103187775 cites W2082819719 @default.
- W2103187775 cites W2093830129 @default.
- W2103187775 cites W2097485877 @default.
- W2103187775 cites W2104174804 @default.
- W2103187775 cites W2112884978 @default.
- W2103187775 cites W2118597789 @default.
- W2103187775 cites W2124166542 @default.
- W2103187775 cites W2124351063 @default.
- W2103187775 cites W2125826054 @default.
- W2103187775 cites W2127261350 @default.
- W2103187775 cites W2128591967 @default.
- W2103187775 cites W2131412555 @default.
- W2103187775 cites W2135083016 @default.
- W2103187775 cites W2137488129 @default.
- W2103187775 cites W2137995988 @default.
- W2103187775 cites W2149023499 @default.
- W2103187775 cites W2149616469 @default.
- W2103187775 cites W2154139219 @default.
- W2103187775 cites W2155520400 @default.
- W2103187775 cites W2155606054 @default.
- W2103187775 cites W2156125289 @default.
- W2103187775 cites W2158714788 @default.
- W2103187775 cites W2167188257 @default.
- W2103187775 cites W2170747616 @default.
- W2103187775 cites W3216627664 @default.
- W2103187775 cites W4210623056 @default.
- W2103187775 cites W4235848672 @default.
- W2103187775 doi "https://doi.org/10.1186/1471-2105-14-248" @default.
- W2103187775 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/3843501" @default.
- W2103187775 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/23945046" @default.
- W2103187775 hasPublicationYear "2013" @default.
- W2103187775 type Work @default.
- W2103187775 sameAs 2103187775 @default.
- W2103187775 citedByCount "84" @default.
- W2103187775 countsByYear W21031877752014 @default.
- W2103187775 countsByYear W21031877752015 @default.
- W2103187775 countsByYear W21031877752016 @default.
- W2103187775 countsByYear W21031877752017 @default.
- W2103187775 countsByYear W21031877752018 @default.
- W2103187775 countsByYear W21031877752019 @default.
- W2103187775 countsByYear W21031877752020 @default.
- W2103187775 countsByYear W21031877752021 @default.
- W2103187775 countsByYear W21031877752022 @default.
- W2103187775 countsByYear W21031877752023 @default.
- W2103187775 crossrefType "journal-article" @default.
- W2103187775 hasAuthorship W2103187775A5010030898 @default.
- W2103187775 hasAuthorship W2103187775A5058804289 @default.
- W2103187775 hasAuthorship W2103187775A5063897974 @default.
- W2103187775 hasBestOaLocation W21031877751 @default.
- W2103187775 hasConcept C104317684 @default.
- W2103187775 hasConcept C11413529 @default.
- W2103187775 hasConcept C124101348 @default.
- W2103187775 hasConcept C127413603 @default.
- W2103187775 hasConcept C154945302 @default.
- W2103187775 hasConcept C167625842 @default.
- W2103187775 hasConcept C184898388 @default.
- W2103187775 hasConcept C202264299 @default.
- W2103187775 hasConcept C21200559 @default.
- W2103187775 hasConcept C24326235 @default.
- W2103187775 hasConcept C2778112365 @default.
- W2103187775 hasConcept C41008148 @default.
- W2103187775 hasConcept C41584329 @default.
- W2103187775 hasConcept C45484198 @default.
- W2103187775 hasConcept C54355233 @default.
- W2103187775 hasConcept C70721500 @default.
- W2103187775 hasConcept C72802188 @default.
- W2103187775 hasConcept C73555534 @default.
- W2103187775 hasConcept C77088390 @default.
- W2103187775 hasConcept C86803240 @default.
- W2103187775 hasConceptScore W2103187775C104317684 @default.
- W2103187775 hasConceptScore W2103187775C11413529 @default.
- W2103187775 hasConceptScore W2103187775C124101348 @default.
- W2103187775 hasConceptScore W2103187775C127413603 @default.
- W2103187775 hasConceptScore W2103187775C154945302 @default.
- W2103187775 hasConceptScore W2103187775C167625842 @default.
- W2103187775 hasConceptScore W2103187775C184898388 @default.
- W2103187775 hasConceptScore W2103187775C202264299 @default.
- W2103187775 hasConceptScore W2103187775C21200559 @default.
- W2103187775 hasConceptScore W2103187775C24326235 @default.
- W2103187775 hasConceptScore W2103187775C2778112365 @default.
- W2103187775 hasConceptScore W2103187775C41008148 @default.
- W2103187775 hasConceptScore W2103187775C41584329 @default.
- W2103187775 hasConceptScore W2103187775C45484198 @default.