Matches in SemOpenAlex for { <https://semopenalex.org/work/W4376646091> ?p ?o ?g. }
- W4376646091 abstract "Abstract Stabilizing proteins is a fundamental challenge in protein engineering and is almost always a prerequisite for the development of industrial and pharmaceutical biotechnologies. Here we present Stability Oracle: a structure-based graph-transformer framework that achieves state-of-the-art performance on predicting the effect of a point mutation on a protein’s thermodynamic stability (ΔΔG). A strength of our model is its ability to identify stabilizing mutations, which often make up a small fraction of a protein’s mutational landscape. Our framework introduces several data and machine learning innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time. Stability Oracle is first pretrained on over 2M masked microenvironments and then fine-tuned using a novel data augmentation technique, Thermodynamic Permutations (TP), applied to a ∼120K curated subset of the mega-scale cDNA display proteolysis dataset. This technique increases the original 120K mutations to over 2M thermodynamically valid ΔΔG measurements to generate the first structure training set that samples and balances all 380 mutation types. By using the masked microenvironment paradigm, Stability Oracle does not require a second mutant structure and instead uses amino acid structural embeddings to represent a mutation. This architectural design accelerates training and inference times: we can both train on 2M instances with just 119 structures and generate deep mutational scan (DMS) predictions from only the wildtype structure. We benchmark Stability Oracle with both experimental and AlphaFold structures of all proteins on T2837, a test set that aggregates the common test sets (SSym, S669, p53, and Myoglobin) with all additional experimental data from proteins with over a 30% sequence similarity overlap. We used TP augmented T2837 to evaluate performance for engineering protein stability: Stability Oracle correctly identifies 48% of stabilizing mutations (ΔΔG < −0.5 kcal/mol) and 74% of its stabilizing predictions are indeed stabilizing (18% and 8% of predictions were neutral and destabilizing, respectively). For a fair comparison between sequence and structure-based fine-tuned deep learning models, we build on the Prostata framework and fine-tune the sequence embeddings of ESM2 on our training set (Prostata-IFML). A head-to-head comparison demonstrates that Stability Oracle outperforms Prostata-IFML on regression and classification even though the model is 548 times smaller and is pretrained with 4000 times fewer proteins, highlighting the advantages of learning from structures." @default.
- W4376646091 created "2023-05-17" @default.
- W4376646091 creator A5005669711 @default.
- W4376646091 creator A5006626709 @default.
- W4376646091 creator A5011339261 @default.
- W4376646091 creator A5022357356 @default.
- W4376646091 creator A5041966256 @default.
- W4376646091 creator A5078811278 @default.
- W4376646091 creator A5085061919 @default.
- W4376646091 creator A5089401594 @default.
- W4376646091 creator A5089627980 @default.
- W4376646091 date "2023-05-15" @default.
- W4376646091 modified "2023-10-01" @default.
- W4376646091 title "Stability Oracle: A Structure-Based Graph-Transformer for Identifying Stabilizing Mutations" @default.
- W4376646091 cites W2014159272 @default.
- W4376646091 cites W2023490488 @default.
- W4376646091 cites W2028113659 @default.
- W4376646091 cites W2031409259 @default.
- W4376646091 cites W2034937344 @default.
- W4376646091 cites W2056053449 @default.
- W4376646091 cites W2057197272 @default.
- W4376646091 cites W2064488723 @default.
- W4376646091 cites W2069313265 @default.
- W4376646091 cites W2100538849 @default.
- W4376646091 cites W2103385859 @default.
- W4376646091 cites W2103459989 @default.
- W4376646091 cites W2123000304 @default.
- W4376646091 cites W2136513422 @default.
- W4376646091 cites W2149580316 @default.
- W4376646091 cites W2153457180 @default.
- W4376646091 cites W2161151688 @default.
- W4376646091 cites W2171901881 @default.
- W4376646091 cites W2890223884 @default.
- W4376646091 cites W2898210859 @default.
- W4376646091 cites W2912783011 @default.
- W4376646091 cites W2916078022 @default.
- W4376646091 cites W2943495267 @default.
- W4376646091 cites W2950954328 @default.
- W4376646091 cites W2999754980 @default.
- W4376646091 cites W3002895420 @default.
- W4376646091 cites W3014805132 @default.
- W4376646091 cites W3014856454 @default.
- W4376646091 cites W3022117139 @default.
- W4376646091 cites W3035847987 @default.
- W4376646091 cites W3045100294 @default.
- W4376646091 cites W3048947632 @default.
- W4376646091 cites W3093098684 @default.
- W4376646091 cites W3104081477 @default.
- W4376646091 cites W3109312535 @default.
- W4376646091 cites W3136963718 @default.
- W4376646091 cites W3177828909 @default.
- W4376646091 cites W3186179742 @default.
- W4376646091 cites W3209435229 @default.
- W4376646091 cites W3210450413 @default.
- W4376646091 cites W3211681229 @default.
- W4376646091 cites W3214430077 @default.
- W4376646091 cites W3217015076 @default.
- W4376646091 cites W4200321201 @default.
- W4376646091 cites W4206007286 @default.
- W4376646091 cites W4224988655 @default.
- W4376646091 cites W4281790889 @default.
- W4376646091 cites W4285593053 @default.
- W4376646091 cites W4293089302 @default.
- W4376646091 cites W4293257826 @default.
- W4376646091 cites W4293475204 @default.
- W4376646091 cites W4309643848 @default.
- W4376646091 cites W4311200057 @default.
- W4376646091 cites W4311591895 @default.
- W4376646091 cites W4313219015 @default.
- W4376646091 cites W4313291879 @default.
- W4376646091 cites W4313530229 @default.
- W4376646091 cites W4327550249 @default.
- W4376646091 cites W4328114198 @default.
- W4376646091 cites W4362672617 @default.
- W4376646091 doi "https://doi.org/10.1101/2023.05.15.540857" @default.
- W4376646091 hasPublicationYear "2023" @default.
- W4376646091 type Work @default.
- W4376646091 citedByCount "2" @default.
- W4376646091 countsByYear W43766460912023 @default.
- W4376646091 crossrefType "posted-content" @default.
- W4376646091 hasAuthorship W4376646091A5005669711 @default.
- W4376646091 hasAuthorship W4376646091A5006626709 @default.
- W4376646091 hasAuthorship W4376646091A5011339261 @default.
- W4376646091 hasAuthorship W4376646091A5022357356 @default.
- W4376646091 hasAuthorship W4376646091A5041966256 @default.
- W4376646091 hasAuthorship W4376646091A5078811278 @default.
- W4376646091 hasAuthorship W4376646091A5085061919 @default.
- W4376646091 hasAuthorship W4376646091A5089401594 @default.
- W4376646091 hasAuthorship W4376646091A5089627980 @default.
- W4376646091 hasBestOaLocation W43766460911 @default.
- W4376646091 hasConcept C104317684 @default.
- W4376646091 hasConcept C112972136 @default.
- W4376646091 hasConcept C118615104 @default.
- W4376646091 hasConcept C119857082 @default.
- W4376646091 hasConcept C132525143 @default.
- W4376646091 hasConcept C154945302 @default.
- W4376646091 hasConcept C199360897 @default.
- W4376646091 hasConcept C2780069185 @default.
- W4376646091 hasConcept C33923547 @default.
- W4376646091 hasConcept C41008148 @default.