Matches in SemOpenAlex for { <https://semopenalex.org/work/W2904361469> ?p ?o ?g. }
- W2904361469 abstract "Potts statistical models have become a popular and promising way to analyze mutational covariation in protein multiple sequence alignments (MSAs) in order to understand protein structure, function, and fitness. But the statistical limitations of these models, which can have millions of parameters and are fit to MSAs of only thousands or hundreds of effective sequences using a procedure known as inverse Ising inference, are incompletely understood. In this work we predict how model quality degrades as a function of the number of sequences N, sequence length L, amino-acid alphabet size q, and the degree of conservation of the MSA, in different applications of the Potts models: in fitness predictions of individual protein sequences, in predictions of the effects of single-point mutations, in double mutant cycle predictions of epistasis, and in 3D contact prediction in protein structure. We show how as MSA depth N decreases an overfitting effect occurs such that sequences in the training MSA have overestimated fitness, and we predict the magnitude of this effect and discuss how regularization can help correct for it, using a regularization procedure motivated by statistical analysis of the effects of finite sampling. We find that as N decreases the quality of point-mutation effect predictions degrade least, fitness and epistasis predictions degrade more rapidly, and contact predictions are most affected. However, overfitting becomes negligible for MSA depths of more than a few thousand effective sequences, as often used in practice, and regularization becomes less necessary. We discuss the implications of these results for users of Potts covariation analysis." @default.
- W2904361469 created "2018-12-22" @default.
- W2904361469 creator A5033874771 @default.
- W2904361469 creator A5061246105 @default.
- W2904361469 date "2019-03-05" @default.
- W2904361469 modified "2023-10-14" @default.
- W2904361469 title "Influence of multiple-sequence-alignment depth on Potts statistical models of protein covariation" @default.
- W2904361469 cites W1514077080 @default.
- W2904361469 cites W1537712541 @default.
- W2904361469 cites W1558524274 @default.
- W2904361469 cites W1631314968 @default.
- W2904361469 cites W1861406683 @default.
- W2904361469 cites W1965839094 @default.
- W2904361469 cites W1974988689 @default.
- W2904361469 cites W1977436723 @default.
- W2904361469 cites W1979762151 @default.
- W2904361469 cites W1982583124 @default.
- W2904361469 cites W1989901670 @default.
- W2904361469 cites W2001438084 @default.
- W2904361469 cites W2008545402 @default.
- W2904361469 cites W2046763527 @default.
- W2904361469 cites W2048154580 @default.
- W2904361469 cites W2065921821 @default.
- W2904361469 cites W2070555657 @default.
- W2904361469 cites W2085036090 @default.
- W2904361469 cites W2087786915 @default.
- W2904361469 cites W2092572492 @default.
- W2904361469 cites W2107517961 @default.
- W2904361469 cites W2127593441 @default.
- W2904361469 cites W2130479394 @default.
- W2904361469 cites W2130710252 @default.
- W2904361469 cites W2137566700 @default.
- W2904361469 cites W2152471738 @default.
- W2904361469 cites W2156628505 @default.
- W2904361469 cites W2166701319 @default.
- W2904361469 cites W2245592118 @default.
- W2904361469 cites W2391698393 @default.
- W2904361469 cites W2416642098 @default.
- W2904361469 cites W2470360319 @default.
- W2904361469 cites W2470723632 @default.
- W2904361469 cites W2547766908 @default.
- W2904361469 cites W2551582013 @default.
- W2904361469 cites W2620385970 @default.
- W2904361469 cites W2735905224 @default.
- W2904361469 cites W2742834898 @default.
- W2904361469 cites W2780845733 @default.
- W2904361469 cites W2782950273 @default.
- W2904361469 cites W2792977793 @default.
- W2904361469 cites W2949867299 @default.
- W2904361469 cites W3099405535 @default.
- W2904361469 cites W3100163799 @default.
- W2904361469 doi "https://doi.org/10.1103/physreve.99.032405" @default.
- W2904361469 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/6508952" @default.
- W2904361469 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/30999494" @default.
- W2904361469 hasPublicationYear "2019" @default.
- W2904361469 type Work @default.
- W2904361469 sameAs 2904361469 @default.
- W2904361469 citedByCount "18" @default.
- W2904361469 countsByYear W29043614692019 @default.
- W2904361469 countsByYear W29043614692020 @default.
- W2904361469 countsByYear W29043614692021 @default.
- W2904361469 countsByYear W29043614692022 @default.
- W2904361469 countsByYear W29043614692023 @default.
- W2904361469 crossrefType "journal-article" @default.
- W2904361469 hasAuthorship W2904361469A5033874771 @default.
- W2904361469 hasAuthorship W2904361469A5061246105 @default.
- W2904361469 hasBestOaLocation W29043614692 @default.
- W2904361469 hasConcept C104317684 @default.
- W2904361469 hasConcept C121332964 @default.
- W2904361469 hasConcept C121864883 @default.
- W2904361469 hasConcept C154945302 @default.
- W2904361469 hasConcept C22019652 @default.
- W2904361469 hasConcept C2776135515 @default.
- W2904361469 hasConcept C2776214188 @default.
- W2904361469 hasConcept C2778112365 @default.
- W2904361469 hasConcept C33923547 @default.
- W2904361469 hasConcept C41008148 @default.
- W2904361469 hasConcept C50644808 @default.
- W2904361469 hasConcept C51329190 @default.
- W2904361469 hasConcept C54355233 @default.
- W2904361469 hasConcept C61727976 @default.
- W2904361469 hasConcept C86803240 @default.
- W2904361469 hasConcept C98925819 @default.
- W2904361469 hasConceptScore W2904361469C104317684 @default.
- W2904361469 hasConceptScore W2904361469C121332964 @default.
- W2904361469 hasConceptScore W2904361469C121864883 @default.
- W2904361469 hasConceptScore W2904361469C154945302 @default.
- W2904361469 hasConceptScore W2904361469C22019652 @default.
- W2904361469 hasConceptScore W2904361469C2776135515 @default.
- W2904361469 hasConceptScore W2904361469C2776214188 @default.
- W2904361469 hasConceptScore W2904361469C2778112365 @default.
- W2904361469 hasConceptScore W2904361469C33923547 @default.
- W2904361469 hasConceptScore W2904361469C41008148 @default.
- W2904361469 hasConceptScore W2904361469C50644808 @default.
- W2904361469 hasConceptScore W2904361469C51329190 @default.
- W2904361469 hasConceptScore W2904361469C54355233 @default.
- W2904361469 hasConceptScore W2904361469C61727976 @default.
- W2904361469 hasConceptScore W2904361469C86803240 @default.
- W2904361469 hasConceptScore W2904361469C98925819 @default.
- W2904361469 hasFunder F4320332161 @default.