Matches in SemOpenAlex for { <https://semopenalex.org/work/W3216367746> ?p ?o ?g. }
- W3216367746 endingPage "2367" @default.
- W3216367746 startingPage "2354" @default.
- W3216367746 abstract "Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk. Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk. Sequencing technologies are quickly transforming human genetic studies of complex traits. It is increasingly possible to obtain whole-genome sequence data on thousands of samples at manageable costs. As a result, the genome-wide study of rare variants (minor allele frequency [MAF] < 1%) and their contribution to disease susceptibility and phenotype variation is now feasible.1Abecasis G.R. Altshuler D. Auton A. Brooks L.D. Durbin R.M. Gibbs R.A. Hurles M.E. McVean G.A. 1000 Genomes Project ConsortiumA map of human genome variation from population scale sequencing.Nature. 2010; 467: 1061-1073Google Scholar, 2Nejentsev S. Walker N. Riches D. Egholm M. Todd J.A. Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes.Science. 2009; 324: 387-389Google Scholar, 3Rivas M.A. Beaudoin M. Gardet A. Stevens C. Sharma Y. Zhang C.K. Boucher G. Ripke S. Ellinghaus D. Burtt N. et al.Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease.Nat. Genet. 2011; 43: 1066-1073Google Scholar, 4Abecasis G.R. Auton A. Brooks L.D. DePristo M.A. Durbin R.M. Handsaker R.E. Kang H.M. Marth G.T. McVean G.A. 1000 Genomes Project ConsortiumAn integrated map of genetic variation from 1,092 human genomes.Nature. 2012; 491: 56-65Google Scholar In genetic studies of diseases or continuous phenotypes, rare variants are hard to assess individually because of the limited number of observations of each rare variant. Hence, to boost the power to detect a signal, evidence is usually aggregated across variants in blocks. When designing an aggregation method, there are three questions that are usually considered. First, across which biological units should variants be combined (e.g., genes); second, which variants within those units should be included;5Majithia A.R. Flannick J. Shahinian P. Guo M. Bray M.-.A. Fontanillas P. Gabriel S.B. Rosen E.D. Altshuler D. NHGRI JHSRare variants in pparg with decreased activity in adipocyte differentiation are associated with increased risk of type 2 diabetes.Proceedings of the National Academy of Sciences. 2014; 111: 13127-13132Google Scholar and third, which statistical model should be used?6Lee S. Abecasis G.R. Boehnke M. Lin X. Rare-variant association analysis: study designs and statistical tests.Am. J. Hum. Genet. 2014; 95: 5-23Google Scholar Given the widespread observations of shared genetic risk factors across distinct diseases, there is also considerable motivation to use gene discovery approaches that leverage the information from multiple phenotypes jointly. In other words, rather than only aggregating variants that may have effects on a single phenotype, we can also bring together sets of phenotypes for which a single variant or set of variants might have effects. In this paper, we present a Bayesian multiple rare variants and phenotypes (MRP) model comparison approach for identifying rare-variant associations as an alternative to current, widely used univariate statistical tests. The MRP framework exploits correlation, scale, and/or direction of genetic effects in a broad range of rare-variant association study designs including case-control, multiple diseases and shared controls, a single continuous phenotype, multiple continuous phenotypes, or a mixture of case-control and multiple continuous phenotypes (Figure 1). MRP makes use of Bayesian model comparison whereby we compute a Bayes factor (BF) defined as the ratio of the marginal likelihoods under two models: (1) a null model where all genetic effects are zero and (2) an alternative model where factors such as correlation, scale, and direction of genetic effects are considered. For MRP, the BF represents the statistical evidence for a non-zero effect for a particular group of rare variants on the phenotype(s) of interest and can be used as an alternative to p values from traditional significance testing. While many large genetic consortia collect both raw genotype and phenotype data, in practice, sharing of individual genotype and phenotype data across groups is difficult to achieve. To address this, MRP can use summary statistics, such as estimates of effect size and corresponding standard errors from typical single-variant/single-phenotype linear or logistic regressions, as input. Furthermore, we use insights from Liu et al.7Liu D.J. Peloso G.M. Zhan X. Holmen O.L. Zawistowski M. Feng S. Nikpay M. Auer P.L. Goel A. Zhang H. et al.Meta-analysis of gene-level tests for rare variant association.Nat. Genet. 2014; 46: 200-204Google Scholar and Cichonska et al.,8Cichonska A. Rousu J. Marttinen P. Kangas A.J. Soininen P. Lehtimäki T. Raitakari O.T. Järvelin M.R. Salomaa V. Ala-Korpela M. et al.metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.Bioinformatics. 2016; 32: 1981-1989Google Scholar which suggest the use of additional summary statistics such as covariance estimates across variants and studies, respectively, for the lossless ability to detect gene-based association signals with summary statistics alone. Prior work has explored the use of model comparison and BFs in multi-trait settings. The model comparison in Stephens, 20139Stephens M. A unified framework for association analysis with multiple related phenotypes.PLoS ONE. 2013; 8: e65245Google Scholar is slightly different in usage. Whereas MRP can be used for meta-analysis and the combining of signal across multiple variants within a block, the method explored in Stephens is used for identifying a distinction between direct and indirect associations. Both have strengths relative to the other. The multi-trait model comparison approach that is referenced in Pickrell et al., 201610Pickrell J.K. Berisa T. Liu J.Z. Ségurel L. Tung J.Y. Hinds D.A. Detection and interpretation of shared genetic influences on 42 human traits.Nat. Genet. 2016; 48: 709-717Google Scholar focuses on the two-phenotype case. In other words, a null (where the SNP is associated to neither trait) is compared to the alternatives of the SNP being associated to one, the other, or both traits. MRP can generalize beyond the two-phenotype case and assumes a more holistic prior across phenotypes by using correlation coefficients. Aggregation techniques rely on variant annotations to assign variants to groups for analysis. MRP allows for the inclusion of priors on the scale of effect sizes that can be adjusted depending on what type of variants are included in the analysis. For instance, protein-truncating variants (PTVs)11Rivas M.A. Pirinen M. Neville M.J. Gaulton K.J. Moutsianas L. Lindgren C.M. Karpe F. McCarthy M.I. Donnelly P. Assessing association between protein truncating variants and quantitative traits.Bioinformatics. 2013; 29: 2419-2426Google Scholar,12Rivas M.A. Pirinen M. Conrad D.F. Lek M. Tsang E.K. Karczewski K.J. Maller J.B. Kukurba K.R. DeLuca D.S. Fromer M. et al.Human genomics. Effect of predicted protein-truncating genetic variants on the human transcriptome.Science. 2015; 348: 666-669Google Scholar are highly likely to be functional because they often disrupt the normal function of a gene. Additional deleteriousness metrics, such as MPC (a metric that combines subgenic constraints with variant-level data for deleteriousness prediction)13Samocha K.E. Kosmicki J.A. Karczewski K.J. O’Donnell-Luria A.H. Pierce-Hoffman E. MacArthur D.G. Neale B.M. Daly M.J. Regional missense constraint improves variant deleteriousness prediction.bioRxiv. 2017; https://doi.org/10.1101/148353Google Scholar and pLI (a metric derived from a comparison of the observed number of PTVs in a sample to the number expected in the absence of fitness effects, i.e., under neutrality, given an estimated mutation rate for the gene),14Fuller Z.L. Berg J.J. Mostafavi H. Sella G. Przeworski M. Measuring intolerance to mutation in human genetics.Nat. Genet. 2019; 51: 772-776Google Scholar can further attenuate or accentuate these granular signals. Furthermore, because PTVs typically abolish or severely alter gene function, there is particular interest in identifying protective PTV modifiers of human disease risk that may serve as targets for future therapeutics.15Cohen J. Pertsemlidis A. Kotowski I.K. Graham R. Garcia C.K. Hobbs H.H. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9.Nat. Genet. 2005; 37: 161-165Google Scholar, 16Cohen J.C. Boerwinkle E. Mosley Jr., T.H. Hobbs H.H. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease.N. Engl. J. Med. 2006; 354: 1264-1272Google Scholar, 17Sullivan D. Olsson A.G. Scott R. Kim J.B. Xue A. Gebski V. Wasserman S.M. Stein E.A. Effect of a monoclonal antibody to PCSK9 on low-density lipoprotein cholesterol levels in statin-intolerant patients: the GAUSS randomized trial.JAMA. 2012; 308: 2497-2506Google Scholar We therefore demonstrate how the MRP model comparison approach can improve discovery of such protective signals by modeling the direction of genetic effects; this prioritizes variants or genes that are consistent with protecting against disease. To evaluate the performance of MRP, we use simulations and compare it to other commonly used approaches. Some simple alternatives to MRP include univariate approaches for rare-variant association studies, including the sequence kernel association test (SKAT)18Wu M.C. Lee S. Cai T. Li Y. Boehnke M. Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test.Am. J. Hum. Genet. 2011; 89: 82-93Google Scholar and the burden test,6Lee S. Abecasis G.R. Boehnke M. Lin X. Rare-variant association analysis: study designs and statistical tests.Am. J. Hum. Genet. 2014; 95: 5-23Google Scholar which are special cases of the MRP model comparison when we assign the prior correlation of genetic effects across different variants to be zero or one, respectively. We apply MRP to summary statistics computed on a tranche of n = 184,698 exomes for thousands of traits in the UK Biobank for which we have exome data for n ≥ 1,000 white British individuals, focusing on a meta-analysis context across six UK Biobank subpopulations as defined previously (material and methods).19Sinnott-Armstrong N. Tanigawa Y. Amar D. Mars N. Benner C. Aguirre M. Venkataraman G.R. Wainberg M. Ollila H.M. Kiiskinen T. et al.Genetics of 35 blood and urine biomarkers in the UK Biobank.Nat. Genet. 2021; 53: 185-194Google Scholar We additionally apply multi-phenotype MRP on clusters of biomarker traits within a single-population context (white British individuals). These analyses show that MRP recovers results from single-variant-single-phenotype association analyses while increasing the power to detect new rare-variant associations, including protective modifiers of disease risk. In this section, we provide an overview of the MRP model comparison approach. MRP models genome-wide association study (GWAS) summary statistics as being distributed according to one of two models: the null model, where the effect sizes across all studies for a group of variants and a group of phenotypes is zero, and the alternative model, where effect sizes are distributed according to a multivariate normal distribution with a non-zero mean and/or covariance matrix. MRP compares the evidence between the alternative model and the null model with a BF, which is the ratio of the marginal likelihoods under the two models given the observed data. To define the alternative model, we must specify the prior correlation structure, scale, and direction of the effect sizes. Let N be the number of individuals and K the number of phenotype measurements on each individual. Let M be the number of variants in a testing unit G, where G can be, for example, a gene, a pathway, or a network. Let S be the number of studies from which data is obtained—this data may be in the form of (1) raw genotypes and phenotypes or (2) summary statistics including linkage-disequilibrium (LD) coefficients, effect sizes, and corresponding standard errors. When considering multiple studies (S > 1), multiple rare variants (M > 1), and multiple phenotypes (K > 1), we define the prior correlation structure of the effect sizes as an SMK × SMK matrix, U. In practice, we define U as a Kronecker product of three sub-matrices:•an S × S matrix Rstudy containing the correlations of genetic effects among studies that can model the level of heterogeneity in effect sizes between populations;20Band G. Le Q.S. Jostins L. Pirinen M. Kivinen K. Jallow M. Sisay-Joof F. Bojang K. Pinder M. Sirugo G. et al.Imputation-based meta-analysis of severe malaria in three African populations.PLoS Genet. 2013; 9: e1003509Google Scholar•an M × M matrix Svar containing the covariances of genetic effects among genetic variants, which may reflect, e.g., the assumption that all the PTVs in a gene may have the same biological consequence11Rivas M.A. Pirinen M. Neville M.J. Gaulton K.J. Moutsianas L. Lindgren C.M. Karpe F. McCarthy M.I. Donnelly P. Assessing association between protein truncating variants and quantitative traits.Bioinformatics. 2013; 29: 2419-2426Google Scholar,12Rivas M.A. Pirinen M. Conrad D.F. Lek M. Tsang E.K. Karczewski K.J. Maller J.B. Kukurba K.R. DeLuca D.S. Fromer M. et al.Human genomics. Effect of predicted protein-truncating genetic variants on the human transcriptome.Science. 2015; 348: 666-669Google Scholar,21MacArthur D.G. Balasubramanian S. Frankish A. Huang N. Morris J. Walter K. Jostins L. Habegger L. Pickrell J.K. Montgomery S.B. et al.A systematic survey of loss-of-function variants in human protein-coding genes.Science. 2012; 335: 823-828Google Scholar or prior information on scale of the effects obtained through integration of additional functional data;5Majithia A.R. Flannick J. Shahinian P. Guo M. Bray M.-.A. Fontanillas P. Gabriel S.B. Rosen E.D. Altshuler D. NHGRI JHSRare variants in pparg with decreased activity in adipocyte differentiation are associated with increased risk of type 2 diabetes.Proceedings of the National Academy of Sciences. 2014; 111: 13127-13132Google Scholar,22Findlay G.M. Boyle E.A. Hause R.J. Klein J.C. Shendure J. Saturation editing of genomic regions by multiplex homology-directed repair.Nature. 2014; 513: 120-123Google Scholar by assuming zero correlation of genetic effects, MRP becomes a dispersion test similar to C-alpha23Neale B.M. Rivas M.A. Voight B.F. Altshuler D. Devlin B. Orho-Melander M. Kathiresan S. Purcell S.M. Roeder K. Daly M.J. Testing for an unusual distribution of rare variants.PLoS Genet. 2011; 7: e1001322Google Scholar,24Clarke G.M. Rivas M.A. Morris A.P. A flexible approach for the analysis of rare variants allowing for a mixture of effects on binary or quantitative traits.PLoS Genet. 2013; 9: e1003694Google Scholar and SKAT;18Wu M.C. Lee S. Cai T. Li Y. Boehnke M. Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test.Am. J. Hum. Genet. 2011; 89: 82-93Google Scholar and•a K × K Rphen matrix containing the correlations of genetic effects among phenotypes, which may be estimated from common variant data.25Cotsapas C. Voight B.F. Rossin E. Lage K. Neale B.M. Wallace C. Abecasis G.R. Barrett J.C. Behrens T. Cho J. et al.Pervasive sharing of genetic effects in autoimmune disease.PLoS Genet. 2011; 7: e1002254Google Scholar, 26Solovieff N. Cotsapas C. Lee P.H. Purcell S.M. Smoller J.W. Pleiotropy in complex traits: challenges and strategies.Nat. Rev. Genet. 2013; 14: 483-495Google Scholar, 27Bulik-Sullivan B. Finucane H.K. Anttila V. Gusev A. Day F.R. Loh P.R. Duncan L. Perry J.R. Patterson N. Robinson E.B. et al.An atlas of genetic correlations across human diseases and traits.Nat. Genet. 2015; 47: 1236-1241Scopus (1403) Google Scholar The variance-covariance matrix of the effect size estimates may be obtained from readily available summary statistics such as in-study LD matrices, effect size estimates (or log odds ratios), and the standard errors of the effect size estimates. MRP allows users to specify priors that reflect knowledge of the variants and phenotypes under study. For instance, we can define an independent effects model (IEM) where the effect sizes of different variants are not correlated at all. In this case, Svar is the identity matrix, and MRP behaves similarly to dispersion tests such as C-alpha23Neale B.M. Rivas M.A. Voight B.F. Altshuler D. Devlin B. Orho-Melander M. Kathiresan S. Purcell S.M. Roeder K. Daly M.J. Testing for an unusual distribution of rare variants.PLoS Genet. 2011; 7: e1001322Google Scholar,24Clarke G.M. Rivas M.A. Morris A.P. A flexible approach for the analysis of rare variants allowing for a mixture of effects on binary or quantitative traits.PLoS Genet. 2013; 9: e1003694Google Scholar and SKAT.18Wu M.C. Lee S. Cai T. Li Y. Boehnke M. Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test.Am. J. Hum. Genet. 2011; 89: 82-93Google Scholar We can also define a similar effects model (SEM) by setting every value of Rvar to ∼ 1, where Rvar is the correlation matrix corresponding to covariance matrix Svar. This model assumes that all variants under consideration have similar effect sizes (with, possibly, differences in scale, such as in the burden test). Such a model may be appropriate for PTVs, where each variant completely disrupts the function of the gene, leading to a gene knockout. The prior on the scale of effect sizes can be used to denote which variants may have larger effect sizes. For instance, emerging empirical genetic studies have shown that within a gene, PTVs may have stronger effects than missense variants.28Do R. Stitziel N.O. Won H.H. Jørgensen A.B. Duga S. Angelica Merlini P. Kiezun A. Farrall M. Goel A. Zuk O. et al.Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction.Nature. 2015; 518: 102-106Google Scholar This can be reflected by adjusting the prior variances of effect sizes (σ) for different categories of variants. Finally, we can utilize a prior on the expected location/direction of effects to specify alternative models where we seek to identify variants with protective effects against disease. By default, we have assumed that the prior mean of genetic effects is zero, which makes it possible to analyze a large number of phenotypes without enumerating the prior mean across all phenotypes. To proactively identify genetic variants that are consistent with a protective profile for a disease, we can include a non-zero vector as a prior mean of genetic effects. For this, we can exploit information from Mendelian randomization studies of common variants, such as recent findings where rare protein-truncating loss-of-function variants in PCSK9 were found to decrease low-density lipoprotein (LDL) and triglyceride levels and decrease coronary artery disease risk,15Cohen J. Pertsemlidis A. Kotowski I.K. Graham R. Garcia C.K. Hobbs H.H. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9.Nat. Genet. 2005; 37: 161-165Google Scholar,29Do R. Willer C.J. Schmidt E.M. Sengupta S. Gao C. Peloso G.M. Gustafsson S. Kanoni S. Ganna A. Chen J. et al.Common variants associated with plasma triglycerides and risk for coronary artery disease.Nat. Genet. 2013; 45: 1345-1352Google Scholar,30Crosby J. Peloso G.M. Auer P.L. Crosslin D.R. Stitziel N.O. Lange L.A. Lu Y. Tang Z.Z. Zhang H. Hindy G. et al.Loss-of-function mutations in APOC3, triglycerides, and coronary disease.N. Engl. J. Med. 2014; 371: 22-31Google Scholar to identify situations where such a prior is warranted. Applying MRP to variants from a testing unit G yields a BF for that testing unit that describes the evidence that rare variants in that testing unit have a non-zero effect on the traits used in the model. We can turn this evidence into probability via Bayes’ rule. Namely, a multiplication of prior-odds of association by BF transforms the prior-odds to posterior-odds. For example, if our prior probability for one particular gene to be associated with a phenotype is 10−4, then an observed BF of 105 means that our posterior probability of association between the gene and the phenotype is over 90%. Although we see advantages in adopting a Bayesian interpretation for MRP, our approach could also be used in a frequentist context by using BF as a test statistic to compute p values. We consider the multivariate linear regression modelY(N×K)=Ψ(N×K)+X(N×M)B(M×K)+E(N×K),where the matrices Y=[yik], X=[xim], B=[βmk], and E=[eik] describe the phenotype values (yik), copies of minor allele (xim), variant-phenotype effects (βmk), and residual errors (eik), for individual i, phenotype k, and variant m. We assume that each phenotype has been transformed to a standard normal distribution and that the columns of X have been centered, which means that the estimate for the intercept term Ψ is 0 and independent of the estimate of B. We use vectorized notation where the rows of B form vector β=(β1,…,βM)⊺ of length MK. We define the MRP model comparison as a BF between the alternative model, where at least one variant affects at least one phenotype, and the null model, where all variant-phenotype effects are zero. BF is the ratio of the marginal likelihoods for these two models:BF=∫βp(Data|β)p(β|ALT)dβ∫βp(Data|β)p(β|NULL)dβ,where Data can correspond either to the effect size estimates βˆ and the estimated variance-covariance matrix of βˆ, Vˆβ, or to the original phenotypes and genotypes, Y(N×K) and X(N×M), and any other covariates that we want to regress out from the phenotypes. The prior distribution for the null model, p(β|NULL), is simply the point mass at β=0. A maximum likelihood estimator of B is given by the ordinary least-squares methodBˆ=(X⊺X)−1X⊺Y,which in vectorized form is denoted βˆ=(βˆ1,…,βˆM)⊺. An estimator of the variance-covariance of βˆ is given byVˆβ=(X⊺X)−1⊗VˆY,where VˆY is the estimated residual variance-covariance matrix of Y given X. Following Band et al.,20Band G. Le Q.S. Jostins L. Pirinen M. Kivinen K. Jallow M. Sisay-Joof F. Bojang K. Pinder M. Sirugo G. et al.Imputation-based meta-analysis of severe malaria in three African populations.PLoS Genet. 2013; 9: e1003509Google Scholar we approximate the likelihood function of β by a multivariate normal distribution with mean βˆ and variance-covariance matrix Vˆβ. Note that by approximating VˆY via the trait correlation matrix, this likelihood approximation does not require access to the individual-level data X and Y but only to the summary data of effect sizes βˆ, LD-matrix X⊺X, and a trait correlation estimate. We construct the prior distribution p(β|ALT) for the alternative model in three steps, allowing the user to specify correlations between effects of different variants on different traits across different studies. In a single study, the prior density for β incorporates the expected correlation of genetic effects among a group of variants (Rvar) and among a group of phenotypes (Rphen). In addition, we incorporate an expected spread of the effect size of each variant by scaling Rvar asSvar=Δ(σm)RvarΔ(σm),where Δ(σm) is a diagonal matrix with entries σm determining the spread of the effect size distribution for each variant m≤M. Thus, we can model settings where, e.g., PTVs have larger effect sizes (σ=0.5) than missense variants (σ=0.2). Note that when σm=1 for all m, then Svar=Rvar. All in all, our prior density for β under alternative model isβ|ALT∼N(0,U), where U=Svar⊗Rphen.When we have data from multiple studies, we allow for possible differences in genetic effects across ethnicities or populations, extending the approximate BFs of Band et al.20Band G. Le Q.S. Jostins L. Pirinen M. Kivinen K. Jallow M. Sisay-Joof F. Bojang K. Pinder M. Sirugo G. et al.Imputation-based meta-analysis of severe malaria in three African populations.PLoS Genet. 2013; 9: e1003509Google Scholar and the summary statistics approach of RAREMETAL7Liu D.J. Peloso G.M. Zhan X. Holmen O.L. Zawistowski M. Feng S. Nikpay M. Auer P.L. Goel A. Zhang H. et al.Meta-analysis of gene-level tests for rare variant association.Nat. Genet. 2014; 46: 200-204Google Scholar from univariate to multivariate phenotypes. Letβˆ=(βˆs,m,k)=(βˆ1,1,1,βˆ1,1,2,…,βˆ1,1,K,βˆ1,2,1,…,βˆ1,2,K,…,βˆ1,M,K,βˆ2,1,1,…,βˆS,M,K),where S is the number of studies, M is the number of variants, and K is the number of phenotypes. As with a single study, we incorporate the expected correlation of genetic effects between a pair of variants and a single phenotype by using the matrix Svar, between a variant and a pair of phenotypes by using the matrix Rphen, and we introduce the matrix Rstudy to specify a prior on the similarity in effect sizes across the studies. Thus, the prior isβ∼N(0,U), where U=Rstudy⊗(Svar⊗Rphen).It is also straightforward to include a non-zero vector μ as a prior mean of genetic effects, in which case the prior is β∼N(μ,U). We use this, for example, when screening for protective rare variants that have a pre-specified beneficial profile on a set of risk factors. The BF is the ratio of the marginal likelihoods between the alternative and the null model. The marginal likelihood for the alternative model is∫βp(Data|β)p(β|ALT)dβ=c×N(βˆ;μ,Vˆβ+U)and the marginal likelihood for the null model is∫βp(Data|β)p(β|NULL)dβ=c×N(βˆ;0,Vˆβ).The BF is given byBFMRP=det(Vˆβ+U)−12exp[−12(βˆ−μ)⊺(Vˆβ+U)−1(βˆ−μ)]det(Vˆβ)−12exp[−12βˆ⊺Vˆβ−1βˆ].When μ=0, BFMRP is an increasing function of the following quadratic form:Q(βˆ;Vˆβ,U)=βˆ⊺(Vˆβ−1−(Vˆβ+U)−1)βˆ.Furthermore, this quadratic form is the only part of the BFMRP that depends on βˆ. Thus, by deriving a distribution of Q(βˆ;Vˆβ,U) under the null model, we can compute a p value (by using the Imhof, Davies, or Farebrother methods) when BFMRP is used as a test statistic. We include support for computing these p values in the software package for MRP. According to basic properties of quadratic forms of Gaussian variables, Q(βˆ;Vˆβ,U)∼∑i=1ndiχi2, where χi2 is an independent sample from a χ12 distribution (chi-square with one degree of freedom) and di are the eigenvalues of matrix I−(Vˆβ+U)−1Vˆβ. The distribution function for a mixture of chi-squares can be numerically evaluated by the R package “CompQuadForm,”31Duchesne P. Lafaye de Micheaux P. Computing the distribution" @default.
- W3216367746 created "2021-12-06" @default.
- W3216367746 creator A5003493606 @default.
- W3216367746 creator A5010394714 @default.
- W3216367746 creator A5010735748 @default.
- W3216367746 creator A5015832202 @default.
- W3216367746 creator A5019229964 @default.
- W3216367746 creator A5019646038 @default.
- W3216367746 creator A5032910777 @default.
- W3216367746 creator A5059511661 @default.
- W3216367746 creator A5060103420 @default.
- W3216367746 creator A5064582174 @default.
- W3216367746 creator A5073710029 @default.
- W3216367746 creator A5075792195 @default.
- W3216367746 date "2021-12-01" @default.
- W3216367746 modified "2023-09-27" @default.
- W3216367746 title "Bayesian model comparison for rare-variant association studies" @default.
- W3216367746 cites W1750145230 @default.
- W3216367746 cites W1843990841 @default.
- W3216367746 cites W1966775465 @default.
- W3216367746 cites W1972655542 @default.
- W3216367746 cites W1990430353 @default.
- W3216367746 cites W1991301405 @default.
- W3216367746 cites W1996299724 @default.
- W3216367746 cites W2014747507 @default.
- W3216367746 cites W2020040634 @default.
- W3216367746 cites W2021708309 @default.
- W3216367746 cites W2032633357 @default.
- W3216367746 cites W2050838276 @default.
- W3216367746 cites W2056165766 @default.
- W3216367746 cites W2056387470 @default.
- W3216367746 cites W2061100095 @default.
- W3216367746 cites W2063655230 @default.
- W3216367746 cites W2064994526 @default.
- W3216367746 cites W2067539811 @default.
- W3216367746 cites W2081890649 @default.
- W3216367746 cites W2096791516 @default.
- W3216367746 cites W2097804771 @default.
- W3216367746 cites W2101961436 @default.
- W3216367746 cites W2104549677 @default.
- W3216367746 cites W2106285889 @default.
- W3216367746 cites W2110090977 @default.
- W3216367746 cites W2122492792 @default.
- W3216367746 cites W2126510876 @default.
- W3216367746 cites W2128371599 @default.
- W3216367746 cites W2143602768 @default.
- W3216367746 cites W2155360150 @default.
- W3216367746 cites W2155766871 @default.
- W3216367746 cites W2159636096 @default.
- W3216367746 cites W2159998163 @default.
- W3216367746 cites W2163953557 @default.
- W3216367746 cites W2163969089 @default.
- W3216367746 cites W2170009139 @default.
- W3216367746 cites W2171777347 @default.
- W3216367746 cites W2195783463 @default.
- W3216367746 cites W2302979210 @default.
- W3216367746 cites W2401811918 @default.
- W3216367746 cites W2417483443 @default.
- W3216367746 cites W2522993044 @default.
- W3216367746 cites W2523530569 @default.
- W3216367746 cites W2550031966 @default.
- W3216367746 cites W2566116513 @default.
- W3216367746 cites W2609071159 @default.
- W3216367746 cites W2609218480 @default.
- W3216367746 cites W2616325230 @default.
- W3216367746 cites W2755166087 @default.
- W3216367746 cites W2789833091 @default.
- W3216367746 cites W2790653691 @default.
- W3216367746 cites W2801522398 @default.
- W3216367746 cites W2902741907 @default.
- W3216367746 cites W2910993873 @default.
- W3216367746 cites W2911684552 @default.
- W3216367746 cites W2920792207 @default.
- W3216367746 cites W2949112115 @default.
- W3216367746 cites W2950099124 @default.
- W3216367746 cites W2952013316 @default.
- W3216367746 cites W2952259804 @default.
- W3216367746 cites W2953080562 @default.
- W3216367746 cites W2962374818 @default.
- W3216367746 cites W2969726224 @default.
- W3216367746 cites W2980581406 @default.
- W3216367746 cites W2981754575 @default.
- W3216367746 cites W3006330340 @default.
- W3216367746 cites W3008618488 @default.
- W3216367746 cites W3011413344 @default.
- W3216367746 cites W3012200299 @default.
- W3216367746 cites W3012570809 @default.
- W3216367746 cites W3081528139 @default.
- W3216367746 cites W3082575318 @default.
- W3216367746 cites W3113925949 @default.
- W3216367746 cites W3116712500 @default.
- W3216367746 cites W3121239749 @default.
- W3216367746 cites W3162800482 @default.
- W3216367746 doi "https://doi.org/10.1016/j.ajhg.2021.11.005" @default.
- W3216367746 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/34822764" @default.
- W3216367746 hasPublicationYear "2021" @default.
- W3216367746 type Work @default.
- W3216367746 sameAs 3216367746 @default.