Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386083982> ?p ?o ?g. }
- W4386083982 abstract "Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research." @default.
- W4386083982 created "2023-08-24" @default.
- W4386083982 creator A5001503348 @default.
- W4386083982 creator A5002987125 @default.
- W4386083982 creator A5021353263 @default.
- W4386083982 creator A5029140705 @default.
- W4386083982 creator A5050636351 @default.
- W4386083982 creator A5054232581 @default.
- W4386083982 creator A5054355245 @default.
- W4386083982 creator A5078906955 @default.
- W4386083982 creator A5089868160 @default.
- W4386083982 creator A5092677669 @default.
- W4386083982 date "2023-08-10" @default.
- W4386083982 modified "2023-09-27" @default.
- W4386083982 title "OpenProteinSet: Training data for structural biology at scale." @default.
- W4386083982 cites W1979762151 @default.
- W4386083982 cites W2020566094 @default.
- W4386083982 cites W2051210555 @default.
- W4386083982 cites W2061042699 @default.
- W4386083982 cites W2065921821 @default.
- W4386083982 cites W2069458148 @default.
- W4386083982 cites W2102461176 @default.
- W4386083982 cites W2107867854 @default.
- W4386083982 cites W2110483430 @default.
- W4386083982 cites W2126103104 @default.
- W4386083982 cites W2136799255 @default.
- W4386083982 cites W2140673705 @default.
- W4386083982 cites W2151457629 @default.
- W4386083982 cites W2166701319 @default.
- W4386083982 cites W2557595285 @default.
- W4386083982 cites W2737584047 @default.
- W4386083982 cites W2770647690 @default.
- W4386083982 cites W2780845733 @default.
- W4386083982 cites W2803360717 @default.
- W4386083982 cites W2890223884 @default.
- W4386083982 cites W2898210859 @default.
- W4386083982 cites W2913820882 @default.
- W4386083982 cites W2949867299 @default.
- W4386083982 cites W2952317511 @default.
- W4386083982 cites W2953008890 @default.
- W4386083982 cites W2972411752 @default.
- W4386083982 cites W2980789587 @default.
- W4386083982 cites W2984894304 @default.
- W4386083982 cites W2995514860 @default.
- W4386083982 cites W2999044305 @default.
- W4386083982 cites W3112376646 @default.
- W4386083982 cites W3132323068 @default.
- W4386083982 cites W3146944767 @default.
- W4386083982 cites W3158960338 @default.
- W4386083982 cites W3163965514 @default.
- W4386083982 cites W3177828909 @default.
- W4386083982 cites W3183475563 @default.
- W4386083982 cites W3186179742 @default.
- W4386083982 cites W3209435229 @default.
- W4386083982 cites W3211795435 @default.
- W4386083982 cites W4281790889 @default.
- W4386083982 cites W4288428465 @default.
- W4386083982 cites W4298148229 @default.
- W4386083982 cites W4300861364 @default.
- W4386083982 cites W4318071656 @default.
- W4386083982 cites W4319061788 @default.
- W4386083982 cites W4327550249 @default.
- W4386083982 cites W4365505983 @default.
- W4386083982 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/37608940" @default.
- W4386083982 hasPublicationYear "2023" @default.
- W4386083982 type Work @default.
- W4386083982 citedByCount "0" @default.
- W4386083982 crossrefType "posted-content" @default.
- W4386083982 hasAuthorship W4386083982A5001503348 @default.
- W4386083982 hasAuthorship W4386083982A5002987125 @default.
- W4386083982 hasAuthorship W4386083982A5021353263 @default.
- W4386083982 hasAuthorship W4386083982A5029140705 @default.
- W4386083982 hasAuthorship W4386083982A5050636351 @default.
- W4386083982 hasAuthorship W4386083982A5054232581 @default.
- W4386083982 hasAuthorship W4386083982A5054355245 @default.
- W4386083982 hasAuthorship W4386083982A5078906955 @default.
- W4386083982 hasAuthorship W4386083982A5089868160 @default.
- W4386083982 hasAuthorship W4386083982A5092677669 @default.
- W4386083982 hasConcept C104317684 @default.
- W4386083982 hasConcept C119145174 @default.
- W4386083982 hasConcept C119857082 @default.
- W4386083982 hasConcept C14036430 @default.
- W4386083982 hasConcept C144133560 @default.
- W4386083982 hasConcept C154945302 @default.
- W4386083982 hasConcept C155202549 @default.
- W4386083982 hasConcept C2778712577 @default.
- W4386083982 hasConcept C41008148 @default.
- W4386083982 hasConcept C47701112 @default.
- W4386083982 hasConcept C51632099 @default.
- W4386083982 hasConcept C55493867 @default.
- W4386083982 hasConcept C66746571 @default.
- W4386083982 hasConcept C78458016 @default.
- W4386083982 hasConcept C86803240 @default.
- W4386083982 hasConceptScore W4386083982C104317684 @default.
- W4386083982 hasConceptScore W4386083982C119145174 @default.
- W4386083982 hasConceptScore W4386083982C119857082 @default.
- W4386083982 hasConceptScore W4386083982C14036430 @default.
- W4386083982 hasConceptScore W4386083982C144133560 @default.
- W4386083982 hasConceptScore W4386083982C154945302 @default.
- W4386083982 hasConceptScore W4386083982C155202549 @default.