Matches in SemOpenAlex for { <https://semopenalex.org/work/W2110340112> ?p ?o ?g. }
- W2110340112 abstract "An important problem in computational biology is the automatic detection of protein families (groups of homologous sequences). Clustering sequences into families is at the heart of most comparative studies dealing with protein evolution, structure, and function. Many methods have been developed for this task, and they perform reasonably well (over 0.88 of F-measure) when grouping proteins with high sequence identity. However, for highly diverged proteins the performance of these methods can be much lower, mainly because a common evolutionary origin is not deduced directly from sequence similarity. To the best of our knowledge, a systematic evaluation of clustering methods over distant homologous proteins is still lacking. We performed a comparative assessment of four clustering algorithms: Markov Clustering (MCL), Transitive Clustering (TransClust), Spectral Clustering of Protein Sequences (SCPS), and High-Fidelity clustering of protein sequences (HiFix), considering several datasets with different levels of sequence similarity. Two types of similarity measures, required by the clustering sequence methods, were used to evaluate the performance of the algorithms: the standard measure obtained from sequence–sequence comparisons, and a novel measure based on profile-profile comparisons, used here for the first time. The results reveal low clustering performance for the highly divergent datasets when the standard measure was used. However, the novel measure based on profile-profile comparisons substantially improved the performance of the four methods, especially when very low sequence identity datasets were evaluated. We also performed a parameter optimization step to determine the best configuration for each clustering method. We found that TransClust clearly outperformed the other methods for most datasets. This work also provides guidelines for the practical application of clustering sequence methods aimed at detecting accurately groups of related protein sequences." @default.
- W2110340112 created "2016-06-24" @default.
- W2110340112 creator A5012291023 @default.
- W2110340112 creator A5035499392 @default.
- W2110340112 creator A5069144453 @default.
- W2110340112 creator A5089115213 @default.
- W2110340112 date "2015-02-05" @default.
- W2110340112 modified "2023-10-10" @default.
- W2110340112 title "Evaluation and improvements of clustering algorithms for detecting remote homologous protein families" @default.
- W2110340112 cites W1981029985 @default.
- W2110340112 cites W1986075313 @default.
- W2110340112 cites W2015530520 @default.
- W2110340112 cites W2047606947 @default.
- W2110340112 cites W2051210555 @default.
- W2110340112 cites W2055043387 @default.
- W2110340112 cites W2058720919 @default.
- W2110340112 cites W2059223767 @default.
- W2110340112 cites W2082819719 @default.
- W2110340112 cites W2097263991 @default.
- W2110340112 cites W2097555181 @default.
- W2110340112 cites W2109820980 @default.
- W2110340112 cites W2110734043 @default.
- W2110340112 cites W2117077088 @default.
- W2110340112 cites W2124166542 @default.
- W2110340112 cites W2131681506 @default.
- W2110340112 cites W2135152200 @default.
- W2110340112 cites W2137745608 @default.
- W2110340112 cites W2145268834 @default.
- W2110340112 cites W2145358391 @default.
- W2110340112 cites W2152219892 @default.
- W2110340112 cites W2497532187 @default.
- W2110340112 cites W2611831635 @default.
- W2110340112 cites W4235848672 @default.
- W2110340112 doi "https://doi.org/10.1186/s12859-014-0445-4" @default.
- W2110340112 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/4339679" @default.
- W2110340112 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/25651949" @default.
- W2110340112 hasPublicationYear "2015" @default.
- W2110340112 type Work @default.
- W2110340112 sameAs 2110340112 @default.
- W2110340112 citedByCount "24" @default.
- W2110340112 countsByYear W21103401122015 @default.
- W2110340112 countsByYear W21103401122016 @default.
- W2110340112 countsByYear W21103401122017 @default.
- W2110340112 countsByYear W21103401122018 @default.
- W2110340112 countsByYear W21103401122019 @default.
- W2110340112 countsByYear W21103401122020 @default.
- W2110340112 countsByYear W21103401122022 @default.
- W2110340112 countsByYear W21103401122023 @default.
- W2110340112 crossrefType "journal-article" @default.
- W2110340112 hasAuthorship W2110340112A5012291023 @default.
- W2110340112 hasAuthorship W2110340112A5035499392 @default.
- W2110340112 hasAuthorship W2110340112A5069144453 @default.
- W2110340112 hasAuthorship W2110340112A5089115213 @default.
- W2110340112 hasBestOaLocation W21103401121 @default.
- W2110340112 hasConcept C10010492 @default.
- W2110340112 hasConcept C103278499 @default.
- W2110340112 hasConcept C104317684 @default.
- W2110340112 hasConcept C115961682 @default.
- W2110340112 hasConcept C124101348 @default.
- W2110340112 hasConcept C153180895 @default.
- W2110340112 hasConcept C154945302 @default.
- W2110340112 hasConcept C167625842 @default.
- W2110340112 hasConcept C17212007 @default.
- W2110340112 hasConcept C178180057 @default.
- W2110340112 hasConcept C22648726 @default.
- W2110340112 hasConcept C2776517306 @default.
- W2110340112 hasConcept C2778112365 @default.
- W2110340112 hasConcept C2780009758 @default.
- W2110340112 hasConcept C33704608 @default.
- W2110340112 hasConcept C41008148 @default.
- W2110340112 hasConcept C45484198 @default.
- W2110340112 hasConcept C47701112 @default.
- W2110340112 hasConcept C54355233 @default.
- W2110340112 hasConcept C55493867 @default.
- W2110340112 hasConcept C58773245 @default.
- W2110340112 hasConcept C73555534 @default.
- W2110340112 hasConcept C86803240 @default.
- W2110340112 hasConcept C94641424 @default.
- W2110340112 hasConceptScore W2110340112C10010492 @default.
- W2110340112 hasConceptScore W2110340112C103278499 @default.
- W2110340112 hasConceptScore W2110340112C104317684 @default.
- W2110340112 hasConceptScore W2110340112C115961682 @default.
- W2110340112 hasConceptScore W2110340112C124101348 @default.
- W2110340112 hasConceptScore W2110340112C153180895 @default.
- W2110340112 hasConceptScore W2110340112C154945302 @default.
- W2110340112 hasConceptScore W2110340112C167625842 @default.
- W2110340112 hasConceptScore W2110340112C17212007 @default.
- W2110340112 hasConceptScore W2110340112C178180057 @default.
- W2110340112 hasConceptScore W2110340112C22648726 @default.
- W2110340112 hasConceptScore W2110340112C2776517306 @default.
- W2110340112 hasConceptScore W2110340112C2778112365 @default.
- W2110340112 hasConceptScore W2110340112C2780009758 @default.
- W2110340112 hasConceptScore W2110340112C33704608 @default.
- W2110340112 hasConceptScore W2110340112C41008148 @default.
- W2110340112 hasConceptScore W2110340112C45484198 @default.
- W2110340112 hasConceptScore W2110340112C47701112 @default.
- W2110340112 hasConceptScore W2110340112C54355233 @default.
- W2110340112 hasConceptScore W2110340112C55493867 @default.
- W2110340112 hasConceptScore W2110340112C58773245 @default.
- W2110340112 hasConceptScore W2110340112C73555534 @default.