Matches in SemOpenAlex for { <https://semopenalex.org/work/W4225868104> ?p ?o ?g. }
- W4225868104 endingPage "1647" @default.
- W4225868104 startingPage "1629" @default.
- W4225868104 abstract "Abstract The emergence of SARS-CoV-2 variants stressed the demand for tools allowing to interpret the effect of single amino acid variants (SAVs) on protein function. While Deep Mutational Scanning (DMS) sets continue to expand our understanding of the mutational landscape of single proteins, the results continue to challenge analyses. Protein Language Models (pLMs) use the latest deep learning (DL) algorithms to leverage growing databases of protein sequences. These methods learn to predict missing or masked amino acids from the context of entire sequence regions. Here, we used pLM representations (embeddings) to predict sequence conservation and SAV effects without multiple sequence alignments (MSAs). Embeddings alone predicted residue conservation almost as accurately from single sequences as ConSeq using MSAs (two-state Matthews Correlation Coefficient—MCC—for ProtT5 embeddings of 0.596 ± 0.006 vs. 0.608 ± 0.006 for ConSeq). Inputting the conservation prediction along with BLOSUM62 substitution scores and pLM mask reconstruction probabilities into a simplistic logistic regression (LR) ensemble for Variant Effect Score Prediction without Alignments (VESPA) predicted SAV effect magnitude without any optimization on DMS data. Comparing predictions for a standard set of 39 DMS experiments to other methods (incl. ESM-1v, DeepSequence, and GEMME) revealed our approach as competitive with the state-of-the-art (SOTA) methods using MSA input. No method outperformed all others, neither consistently nor statistically significantly, independently of the performance measure applied (Spearman and Pearson correlation). Finally, we investigated binary effect predictions on DMS experiments for four human proteins. Overall, embedding-based methods have become competitive with methods relying on MSAs for SAV effect prediction at a fraction of the costs in computing/energy. Our method predicted SAV effects for the entire human proteome (~ 20 k proteins) within 40 min on one Nvidia Quadro RTX 8000. All methods and data sets are freely available for local and online execution through bioembeddings.com, https://github.com/Rostlab/VESPA , and PredictProtein." @default.
- W4225868104 created "2022-05-05" @default.
- W4225868104 creator A5002122017 @default.
- W4225868104 creator A5005408973 @default.
- W4225868104 creator A5015317247 @default.
- W4225868104 creator A5035722235 @default.
- W4225868104 creator A5064905883 @default.
- W4225868104 creator A5075726670 @default.
- W4225868104 creator A5088531553 @default.
- W4225868104 creator A5090410765 @default.
- W4225868104 date "2021-12-30" @default.
- W4225868104 modified "2023-10-14" @default.
- W4225868104 title "Embeddings from protein language models predict conservation and variant effects" @default.
- W4225868104 cites W1499450468 @default.
- W4225868104 cites W1683278196 @default.
- W4225868104 cites W1985818354 @default.
- W4225868104 cites W2017818880 @default.
- W4225868104 cites W2019032222 @default.
- W4225868104 cites W2057029228 @default.
- W4225868104 cites W2057271915 @default.
- W4225868104 cites W2058487877 @default.
- W4225868104 cites W2058568633 @default.
- W4225868104 cites W2059145105 @default.
- W4225868104 cites W2060588922 @default.
- W4225868104 cites W2063274819 @default.
- W4225868104 cites W2066001051 @default.
- W4225868104 cites W2068113423 @default.
- W4225868104 cites W2076357933 @default.
- W4225868104 cites W2079882489 @default.
- W4225868104 cites W2095318832 @default.
- W4225868104 cites W2097889307 @default.
- W4225868104 cites W2099589970 @default.
- W4225868104 cites W2102652793 @default.
- W4225868104 cites W2104418738 @default.
- W4225868104 cites W2109372707 @default.
- W4225868104 cites W2114886480 @default.
- W4225868104 cites W2121926265 @default.
- W4225868104 cites W2130479394 @default.
- W4225868104 cites W2136513422 @default.
- W4225868104 cites W2137736270 @default.
- W4225868104 cites W2137886330 @default.
- W4225868104 cites W2143210482 @default.
- W4225868104 cites W2143238378 @default.
- W4225868104 cites W2153153865 @default.
- W4225868104 cites W2155144535 @default.
- W4225868104 cites W2158714788 @default.
- W4225868104 cites W2160378127 @default.
- W4225868104 cites W2160995259 @default.
- W4225868104 cites W2161888332 @default.
- W4225868104 cites W2170747616 @default.
- W4225868104 cites W2211953232 @default.
- W4225868104 cites W2245592118 @default.
- W4225868104 cites W2508408872 @default.
- W4225868104 cites W2537556928 @default.
- W4225868104 cites W2582396271 @default.
- W4225868104 cites W2774216375 @default.
- W4225868104 cites W2883004550 @default.
- W4225868104 cites W2885278423 @default.
- W4225868104 cites W2889874867 @default.
- W4225868104 cites W2890223884 @default.
- W4225868104 cites W2898364362 @default.
- W4225868104 cites W2901527454 @default.
- W4225868104 cites W2950629294 @default.
- W4225868104 cites W2950954328 @default.
- W4225868104 cites W2953008890 @default.
- W4225868104 cites W2967474035 @default.
- W4225868104 cites W2980789587 @default.
- W4225868104 cites W2986717577 @default.
- W4225868104 cites W2987965949 @default.
- W4225868104 cites W2995514860 @default.
- W4225868104 cites W3010338076 @default.
- W4225868104 cites W3010879523 @default.
- W4225868104 cites W3022687324 @default.
- W4225868104 cites W3038248848 @default.
- W4225868104 cites W3038792485 @default.
- W4225868104 cites W3039901154 @default.
- W4225868104 cites W3042916618 @default.
- W4225868104 cites W3098471978 @default.
- W4225868104 cites W3111174583 @default.
- W4225868104 cites W3112376646 @default.
- W4225868104 cites W3118936575 @default.
- W4225868104 cites W3122018424 @default.
- W4225868104 cites W3144701084 @default.
- W4225868104 cites W3146944767 @default.
- W4225868104 cites W3157437194 @default.
- W4225868104 cites W3158518077 @default.
- W4225868104 cites W3161612534 @default.
- W4225868104 cites W3163595068 @default.
- W4225868104 cites W3166142427 @default.
- W4225868104 cites W3176307508 @default.
- W4225868104 cites W3177500196 @default.
- W4225868104 cites W3177828909 @default.
- W4225868104 cites W3179485843 @default.
- W4225868104 cites W3191896067 @default.
- W4225868104 cites W3196903168 @default.
- W4225868104 cites W3201299379 @default.
- W4225868104 cites W4225438928 @default.
- W4225868104 cites W4242765109 @default.