Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387661504> ?p ?o ?g. }
Showing items 1 to 60 of
60
with 100 items per page.
- W4387661504 abstract "Biochemical constraints on the admissible amino acids at specific sites in proteins lead to heterogeneity of the amino acid substitution process over sites in alignments. It is well known that phylogenetic models of protein sequence evolution that do not account for site heterogeneity are prone to long-branch attraction (LBA) artifacts. Profile mixture models were developed to model heterogeneity of preferred amino acids at sites via a finite distribution of site classes each with a distinct set of equilibrium amino acid frequencies. However, it is unknown whether the large number of parameters in such models associated with the many amino acid frequency vectors can adversely affect tree topology estimates because of over-parameterization. Here we demonstrate theoretically that for long sequences, over-parameterization does not create problems for estimation with profile mixture models. Under mild conditions, tree, amino acid frequencies, and other model parameters converge to true values as sequence length increases, even when there are large numbers of components in the frequency profile distributions. Because large sample theory does not necessarily imply good behavior for shorter alignments we explore the performance of these models with short alignments simulated with tree topologies that are prone to LBA artifacts. We find that over-parameterization is not a problem for complex profile mixture models even when there are many amino acid frequency vectors. In fact, simple models with few site classes behave poorly. Interestingly, we also found that misspecification of the amino acid frequency vectors does not lead to increased LBA artifacts as long as the estimated cumulative distribution function of the amino acid frequencies at sites adequately approximates the true one. In contrast, misspecification of the amino acid exchangeability rates can severely negatively affect parameter estimation. Finally, we explore the effects of including in the profile mixture model an additional 'F-class' representing the overall frequencies of amino acids in the data set. Surprisingly, the F-class does not help parameter estimation significantly and can decrease the probability of correct tree estimation, depending on the scenario, even though it tends to improve likelihood scores." @default.
- W4387661504 created "2023-10-17" @default.
- W4387661504 creator A5021576212 @default.
- W4387661504 creator A5053095410 @default.
- W4387661504 creator A5064252541 @default.
- W4387661504 date "2023-10-16" @default.
- W4387661504 modified "2023-10-17" @default.
- W4387661504 title "Is Over-parameterization a Problem for Profile Mixture Models?" @default.
- W4387661504 doi "https://doi.org/10.1093/sysbio/syad063" @default.
- W4387661504 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/37843172" @default.
- W4387661504 hasPublicationYear "2023" @default.
- W4387661504 type Work @default.
- W4387661504 citedByCount "0" @default.
- W4387661504 crossrefType "journal-article" @default.
- W4387661504 hasAuthorship W4387661504A5021576212 @default.
- W4387661504 hasAuthorship W4387661504A5053095410 @default.
- W4387661504 hasAuthorship W4387661504A5064252541 @default.
- W4387661504 hasConcept C104317684 @default.
- W4387661504 hasConcept C113174947 @default.
- W4387661504 hasConcept C114614502 @default.
- W4387661504 hasConcept C177264268 @default.
- W4387661504 hasConcept C186060115 @default.
- W4387661504 hasConcept C193252679 @default.
- W4387661504 hasConcept C199360897 @default.
- W4387661504 hasConcept C2778112365 @default.
- W4387661504 hasConcept C33923547 @default.
- W4387661504 hasConcept C41008148 @default.
- W4387661504 hasConcept C515207424 @default.
- W4387661504 hasConcept C54355233 @default.
- W4387661504 hasConcept C86803240 @default.
- W4387661504 hasConceptScore W4387661504C104317684 @default.
- W4387661504 hasConceptScore W4387661504C113174947 @default.
- W4387661504 hasConceptScore W4387661504C114614502 @default.
- W4387661504 hasConceptScore W4387661504C177264268 @default.
- W4387661504 hasConceptScore W4387661504C186060115 @default.
- W4387661504 hasConceptScore W4387661504C193252679 @default.
- W4387661504 hasConceptScore W4387661504C199360897 @default.
- W4387661504 hasConceptScore W4387661504C2778112365 @default.
- W4387661504 hasConceptScore W4387661504C33923547 @default.
- W4387661504 hasConceptScore W4387661504C41008148 @default.
- W4387661504 hasConceptScore W4387661504C515207424 @default.
- W4387661504 hasConceptScore W4387661504C54355233 @default.
- W4387661504 hasConceptScore W4387661504C86803240 @default.
- W4387661504 hasLocation W43876615041 @default.
- W4387661504 hasLocation W43876615042 @default.
- W4387661504 hasOpenAccess W4387661504 @default.
- W4387661504 hasPrimaryLocation W43876615041 @default.
- W4387661504 hasRelatedWork W1997023990 @default.
- W4387661504 hasRelatedWork W2038264393 @default.
- W4387661504 hasRelatedWork W2059637021 @default.
- W4387661504 hasRelatedWork W2062354928 @default.
- W4387661504 hasRelatedWork W2069013776 @default.
- W4387661504 hasRelatedWork W2088120817 @default.
- W4387661504 hasRelatedWork W2113708670 @default.
- W4387661504 hasRelatedWork W2193237397 @default.
- W4387661504 hasRelatedWork W2362218761 @default.
- W4387661504 hasRelatedWork W4312203868 @default.
- W4387661504 isParatext "false" @default.
- W4387661504 isRetracted "false" @default.
- W4387661504 workType "article" @default.