Matches in SemOpenAlex for { <https://semopenalex.org/work/W2105428289> ?p ?o ?g. }
- W2105428289 abstract "The NCBI Conserved Domain Database (CDD) consists of a collection of multiple sequence alignments of protein domains that are at various stages of being manually curated into evolutionary hierarchies based on conserved and divergent sequence and structural features. These domain models are annotated to provide insights into the relationships between sequence, structure and function via web-based BLAST searches. Here we automate the generation of conserved domain (CD) hierarchies using a combination of heuristic and Markov chain Monte Carlo (MCMC) sampling procedures and starting from a (typically very large) multiple sequence alignment. This procedure relies on statistical criteria to define each hierarchy based on the conserved and divergent sequence patterns associated with protein functional-specialization. At the same time this facilitates the sequence and structural annotation of residues that are functionally important. These statistical criteria also provide a means to objectively assess the quality of CD hierarchies, a non-trivial task considering that the protein subgroups are often very distantly related—a situation in which standard phylogenetic methods can be unreliable. Our aim here is to automatically generate (typically sub-optimal) hierarchies that, based on statistical criteria and visual comparisons, are comparable to manually curated hierarchies; this serves as the first step toward the ultimate goal of obtaining optimal hierarchical classifications. A plot of runtimes for the most time-intensive (non-parallelizable) part of the algorithm indicates a nearly linear time complexity so that, even for the extremely large Rossmann fold protein class, results were obtained in about a day. This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation. At the same time, it also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups." @default.
- W2105428289 created "2016-06-24" @default.
- W2105428289 creator A5010571747 @default.
- W2105428289 creator A5026383671 @default.
- W2105428289 creator A5061900671 @default.
- W2105428289 date "2012-06-22" @default.
- W2105428289 modified "2023-10-09" @default.
- W2105428289 title "Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures" @default.
- W2105428289 cites W1513400187 @default.
- W2105428289 cites W1580356236 @default.
- W2105428289 cites W1947762817 @default.
- W2105428289 cites W1958476169 @default.
- W2105428289 cites W1965420804 @default.
- W2105428289 cites W1982226220 @default.
- W2105428289 cites W1994261741 @default.
- W2105428289 cites W1995924392 @default.
- W2105428289 cites W2008116827 @default.
- W2105428289 cites W2008856488 @default.
- W2105428289 cites W2018300877 @default.
- W2105428289 cites W2018981978 @default.
- W2105428289 cites W2023385944 @default.
- W2105428289 cites W2027542060 @default.
- W2105428289 cites W2036667662 @default.
- W2105428289 cites W2037993016 @default.
- W2105428289 cites W2050363660 @default.
- W2105428289 cites W2058941563 @default.
- W2105428289 cites W2059284454 @default.
- W2105428289 cites W2074511635 @default.
- W2105428289 cites W2079959721 @default.
- W2105428289 cites W2085099772 @default.
- W2105428289 cites W2088339365 @default.
- W2105428289 cites W2092006899 @default.
- W2105428289 cites W2094406540 @default.
- W2105428289 cites W2098127859 @default.
- W2105428289 cites W2103660241 @default.
- W2105428289 cites W2104572985 @default.
- W2105428289 cites W2104601544 @default.
- W2105428289 cites W2109497857 @default.
- W2105428289 cites W2111373249 @default.
- W2105428289 cites W2116007667 @default.
- W2105428289 cites W2121714338 @default.
- W2105428289 cites W2124908487 @default.
- W2105428289 cites W2129621613 @default.
- W2105428289 cites W2134043769 @default.
- W2105428289 cites W2135083016 @default.
- W2105428289 cites W2137857555 @default.
- W2105428289 cites W2137991504 @default.
- W2105428289 cites W2139919097 @default.
- W2105428289 cites W2141885858 @default.
- W2105428289 cites W2143004134 @default.
- W2105428289 cites W2146213764 @default.
- W2105428289 cites W2151457629 @default.
- W2105428289 cites W2151831732 @default.
- W2105428289 cites W2156125289 @default.
- W2105428289 cites W2158020463 @default.
- W2105428289 cites W2158714788 @default.
- W2105428289 cites W2162566191 @default.
- W2105428289 cites W2162574056 @default.
- W2105428289 cites W2163507574 @default.
- W2105428289 cites W2163822406 @default.
- W2105428289 cites W2167188257 @default.
- W2105428289 cites W2168563301 @default.
- W2105428289 cites W2170249492 @default.
- W2105428289 cites W2327088997 @default.
- W2105428289 cites W2480680997 @default.
- W2105428289 cites W4210323379 @default.
- W2105428289 cites W4254275792 @default.
- W2105428289 doi "https://doi.org/10.1186/1471-2105-13-144" @default.
- W2105428289 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/3599474" @default.
- W2105428289 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/22726767" @default.
- W2105428289 hasPublicationYear "2012" @default.
- W2105428289 type Work @default.
- W2105428289 sameAs 2105428289 @default.
- W2105428289 citedByCount "11" @default.
- W2105428289 countsByYear W21054282892012 @default.
- W2105428289 countsByYear W21054282892014 @default.
- W2105428289 countsByYear W21054282892016 @default.
- W2105428289 countsByYear W21054282892017 @default.
- W2105428289 countsByYear W21054282892018 @default.
- W2105428289 countsByYear W21054282892021 @default.
- W2105428289 crossrefType "journal-article" @default.
- W2105428289 hasAuthorship W2105428289A5010571747 @default.
- W2105428289 hasAuthorship W2105428289A5026383671 @default.
- W2105428289 hasAuthorship W2105428289A5061900671 @default.
- W2105428289 hasBestOaLocation W21054282891 @default.
- W2105428289 hasConcept C10010492 @default.
- W2105428289 hasConcept C104317684 @default.
- W2105428289 hasConcept C107673813 @default.
- W2105428289 hasConcept C111350023 @default.
- W2105428289 hasConcept C124101348 @default.
- W2105428289 hasConcept C144292202 @default.
- W2105428289 hasConcept C154945302 @default.
- W2105428289 hasConcept C162324750 @default.
- W2105428289 hasConcept C167625842 @default.
- W2105428289 hasConcept C193252679 @default.
- W2105428289 hasConcept C207060522 @default.
- W2105428289 hasConcept C2778112365 @default.
- W2105428289 hasConcept C2986374874 @default.
- W2105428289 hasConcept C31170391 @default.
- W2105428289 hasConcept C34447519 @default.