Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386708656> ?p ?o ?g. }
- W4386708656 abstract "Abstract Proteins are key to all cellular processes and their structure is important in understanding their function and evolution. Sequence-based predictions of protein structures have increased in accuracy 1 , and over 214 million predicted structures are available in the AlphaFold database 2 . However, studying protein structures at this scale requires highly efficient methods. Here, we developed a structural-alignment-based clustering algorithm—Foldseek cluster—that can cluster hundreds of millions of structures. Using this method, we have clustered all of the structures in the AlphaFold database, identifying 2.30 million non-singleton structural clusters, of which 31% lack annotations representing probable previously undescribed structures. Clusters without annotation tend to have few representatives covering only 4% of all proteins in the AlphaFold database. Evolutionary analysis suggests that most clusters are ancient in origin but 4% seem to be species specific, representing lower-quality predictions or examples of de novo gene birth. We also show how structural comparisons can be used to predict domain families and their relationships, identifying examples of remote structural similarity. On the basis of these analyses, we identify several examples of human immune-related proteins with putative remote homology in prokaryotic species, illustrating the value of this resource for studying protein function and evolution across the tree of life." @default.
- W4386708656 created "2023-09-14" @default.
- W4386708656 creator A5000750117 @default.
- W4386708656 creator A5010061723 @default.
- W4386708656 creator A5010970708 @default.
- W4386708656 creator A5016012053 @default.
- W4386708656 creator A5019985343 @default.
- W4386708656 creator A5042460017 @default.
- W4386708656 creator A5042635343 @default.
- W4386708656 creator A5050058340 @default.
- W4386708656 creator A5054254768 @default.
- W4386708656 creator A5075106193 @default.
- W4386708656 date "2023-09-13" @default.
- W4386708656 modified "2023-10-12" @default.
- W4386708656 title "Clustering-predicted structures at the scale of the known protein universe" @default.
- W4386708656 cites W1965827393 @default.
- W4386708656 cites W1976325156 @default.
- W4386708656 cites W2040233577 @default.
- W4386708656 cites W2051210555 @default.
- W4386708656 cites W2096942328 @default.
- W4386708656 cites W2102245393 @default.
- W4386708656 cites W2111270835 @default.
- W4386708656 cites W2138122982 @default.
- W4386708656 cites W2140673705 @default.
- W4386708656 cites W2142678478 @default.
- W4386708656 cites W2185402392 @default.
- W4386708656 cites W2303521084 @default.
- W4386708656 cites W2462773201 @default.
- W4386708656 cites W2474188283 @default.
- W4386708656 cites W2802099741 @default.
- W4386708656 cites W2806399642 @default.
- W4386708656 cites W2904141395 @default.
- W4386708656 cites W2912990896 @default.
- W4386708656 cites W2945231757 @default.
- W4386708656 cites W2950954328 @default.
- W4386708656 cites W2953008890 @default.
- W4386708656 cites W2972411752 @default.
- W4386708656 cites W2977180444 @default.
- W4386708656 cites W3028050938 @default.
- W4386708656 cites W3094967361 @default.
- W4386708656 cites W3095583226 @default.
- W4386708656 cites W3112376646 @default.
- W4386708656 cites W3137047737 @default.
- W4386708656 cites W3139151250 @default.
- W4386708656 cites W3164046276 @default.
- W4386708656 cites W3177828909 @default.
- W4386708656 cites W3183475563 @default.
- W4386708656 cites W3186179742 @default.
- W4386708656 cites W3211795435 @default.
- W4386708656 cites W3212912034 @default.
- W4386708656 cites W4200135473 @default.
- W4386708656 cites W4220838810 @default.
- W4386708656 cites W4281790889 @default.
- W4386708656 cites W4282939988 @default.
- W4386708656 cites W4294719209 @default.
- W4386708656 cites W4297494206 @default.
- W4386708656 cites W4300861364 @default.
- W4386708656 cites W4306353170 @default.
- W4386708656 cites W4308463927 @default.
- W4386708656 cites W4309681892 @default.
- W4386708656 cites W4311211228 @default.
- W4386708656 cites W4311579410 @default.
- W4386708656 cites W4317802023 @default.
- W4386708656 cites W4319593844 @default.
- W4386708656 cites W4324329034 @default.
- W4386708656 cites W4327550249 @default.
- W4386708656 cites W4375858802 @default.
- W4386708656 doi "https://doi.org/10.1038/s41586-023-06510-w" @default.
- W4386708656 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/37704730" @default.
- W4386708656 hasPublicationYear "2023" @default.
- W4386708656 type Work @default.
- W4386708656 citedByCount "2" @default.
- W4386708656 countsByYear W43867086562023 @default.
- W4386708656 crossrefType "journal-article" @default.
- W4386708656 hasAuthorship W4386708656A5000750117 @default.
- W4386708656 hasAuthorship W4386708656A5010061723 @default.
- W4386708656 hasAuthorship W4386708656A5010970708 @default.
- W4386708656 hasAuthorship W4386708656A5016012053 @default.
- W4386708656 hasAuthorship W4386708656A5019985343 @default.
- W4386708656 hasAuthorship W4386708656A5042460017 @default.
- W4386708656 hasAuthorship W4386708656A5042635343 @default.
- W4386708656 hasAuthorship W4386708656A5050058340 @default.
- W4386708656 hasAuthorship W4386708656A5054254768 @default.
- W4386708656 hasAuthorship W4386708656A5075106193 @default.
- W4386708656 hasBestOaLocation W43867086561 @default.
- W4386708656 hasConcept C103278499 @default.
- W4386708656 hasConcept C104317684 @default.
- W4386708656 hasConcept C115961682 @default.
- W4386708656 hasConcept C124101348 @default.
- W4386708656 hasConcept C136475424 @default.
- W4386708656 hasConcept C139489369 @default.
- W4386708656 hasConcept C144292202 @default.
- W4386708656 hasConcept C154945302 @default.
- W4386708656 hasConcept C164866538 @default.
- W4386708656 hasConcept C171897839 @default.
- W4386708656 hasConcept C199360897 @default.
- W4386708656 hasConcept C41008148 @default.
- W4386708656 hasConcept C41584329 @default.
- W4386708656 hasConcept C47701112 @default.
- W4386708656 hasConcept C54355233 @default.