Matches in SemOpenAlex for { <https://semopenalex.org/work/W1864883907> ?p ?o ?g. }
- W1864883907 endingPage "259" @default.
- W1864883907 startingPage "250" @default.
- W1864883907 abstract "Several methods have been proposed to estimate the variance in disease liability explained by large sets of genetic markers. However, current methods do not scale up well to large sample sizes. Linear mixed models require solving high-dimensional matrix equations, and methods that use polygenic scores are very computationally intensive. Here we propose a fast analytic method that uses polygenic scores, based on the formula for the non-centrality parameter of the association test of the score. We estimate model parameters from the results of multiple polygenic score tests based on markers with p values in different intervals. We estimate parameters by maximum likelihood and use profile likelihood to compute confidence intervals. We compare various options for constructing polygenic scores, based on nested or disjoint intervals of p values, weighted or unweighted effect sizes, and different numbers of intervals, in estimating the variance explained by a set of markers, the proportion of markers with effects, and the genetic covariance between a pair of traits. Our method provides nearly unbiased estimates and confidence intervals with good coverage, although estimation of the variance is less reliable when jointly estimated with the covariance. We find that disjoint p value intervals perform better than nested intervals, but the weighting did not affect our results. A particular advantage of our method is that it can be applied to summary statistics from single markers, and so can be quickly applied to large consortium datasets. Our method, named AVENGEME (Additive Variance Explained and Number of Genetic Effects Method of Estimation), is implemented in R software. Several methods have been proposed to estimate the variance in disease liability explained by large sets of genetic markers. However, current methods do not scale up well to large sample sizes. Linear mixed models require solving high-dimensional matrix equations, and methods that use polygenic scores are very computationally intensive. Here we propose a fast analytic method that uses polygenic scores, based on the formula for the non-centrality parameter of the association test of the score. We estimate model parameters from the results of multiple polygenic score tests based on markers with p values in different intervals. We estimate parameters by maximum likelihood and use profile likelihood to compute confidence intervals. We compare various options for constructing polygenic scores, based on nested or disjoint intervals of p values, weighted or unweighted effect sizes, and different numbers of intervals, in estimating the variance explained by a set of markers, the proportion of markers with effects, and the genetic covariance between a pair of traits. Our method provides nearly unbiased estimates and confidence intervals with good coverage, although estimation of the variance is less reliable when jointly estimated with the covariance. We find that disjoint p value intervals perform better than nested intervals, but the weighting did not affect our results. A particular advantage of our method is that it can be applied to summary statistics from single markers, and so can be quickly applied to large consortium datasets. Our method, named AVENGEME (Additive Variance Explained and Number of Genetic Effects Method of Estimation), is implemented in R software. Genome-wide association studies have been successful in identifying many variants linked to complex diseases. To date more than 6,000 have been found in more than 500 quantitative traits and common diseases in humans.1Robinson M.R. Wray N.R. Visscher P.M. Explaining additional genetic variation in complex traits.Trends Genet. 2014; 30: 124-132Abstract Full Text Full Text PDF PubMed Scopus (104) Google Scholar However, when considering the variance explained by the markers associated with any specific disease, there remains a large gap to match the heritability estimates obtained from family studies.2Maher B. Personal genomes: The case of the missing heritability.Nature. 2008; 456: 18-21Crossref PubMed Scopus (1295) Google Scholar This observation has spurred the development of theories and investigations to explain the missing heritability, including copy-number variation,3Gamazon E.R. Cox N.J. Davis L.K. Structural architecture of SNP effects on complex traits.Am. J. Hum. Genet. 2014; 95: 477-489Abstract Full Text Full Text PDF PubMed Scopus (21) Google Scholar rare variants,4Zuk O. Schaffner S.F. Samocha K. Do R. Hechter E. Kathiresan S. Daly M.J. Neale B.M. Sunyaev S.R. Lander E.S. Searching for missing heritability: designing rare variant association studies.Proc. Natl. Acad. Sci. USA. 2014; 111: E455-E464Crossref PubMed Scopus (408) Google Scholar epigenetics,5Furrow R.E. Christiansen F.B. Feldman M.W. Environment-sensitive epigenetics and the heritability of complex diseases.Genetics. 2011; 189: 1377-1387Crossref PubMed Scopus (78) Google Scholar and genetic interactions.6Zuk O. Hechter E. Sunyaev S.R. Lander E.S. The mystery of missing heritability: Genetic interactions create phantom heritability.Proc. Natl. Acad. Sci. USA. 2012; 109: 1193-1198Crossref PubMed Scopus (1040) Google Scholar It has become increasingly clear that a large portion of the missing heritability is represented on current genotyping products, but the associated markers are not statistically significant. Several approaches have been developed to estimate the heritability explained by a set of genetic markers that might not be individually associated. In the linear mixed model approach, the genetic value of each individual is treated as a random effect whose sample covariance matrix is derived from the relatedness matrix, which is estimated from the genotype data.7Yang J. Benyamin B. McEvoy B.P. Gordon S. Henders A.K. Nyholt D.R. Madden P.A. Heath A.C. Martin N.G. Montgomery G.W. et al.Common SNPs explain a large proportion of the heritability for human height.Nat. Genet. 2010; 42: 565-569Crossref PubMed Scopus (2793) Google Scholar Solving this model gives an estimate of the additive genetic variance explained by the available genotypes, often called the “chip heritability.” Variations of this approach include multiple classes of variant with different effect size distributions,8Zhou X. Stephens M. Genome-wide efficient mixed-model analysis for association studies.Nat. Genet. 2012; 44: 821-824Crossref PubMed Scopus (1515) Google Scholar, 9Speed D. Balding D.J. MultiBLUP: improved SNP-based prediction for complex traits.Genome Res. 2014; 24: 1550-1557Crossref PubMed Scopus (166) Google Scholar regression of pair-wise phenotypic correlation on genetic correlation,10Golan D. Lander E.S. Rosset S. Measuring missing heritability: inferring the contribution of common variants.Proc. Natl. Acad. Sci. USA. 2014; 111: E5272-E5281Crossref PubMed Scopus (166) Google Scholar and multivariate models to estimate genetic correlation between traits.11Maier R. Moser G. Chen G.B. Ripke S. Coryell W. Potash J.B. Scheftner W.A. Shi J. Weissman M.M. Hultman C.M. et al.Cross-Disorder Working Group of the Psychiatric Genomics ConsortiumJoint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder.Am. J. Hum. Genet. 2015; 96: 283-294Abstract Full Text Full Text PDF PubMed Scopus (162) Google Scholar Another approach uses polygenic scores to estimate chip heritability. Here, effect sizes for all markers are estimated in one sample of data, called the training sample. These effects are then used to construct a score for each subject in a second sample, called the target sample, as the weighted sum of genotypes across a set of markers. Originally, association of the score in the target sample was used to demonstrate the presence of missing heritability among an ensemble of markers.12Purcell S.M. Wray N.R. Stone J.L. Visscher P.M. O’Donovan M.C. Sullivan P.F. Sklar P. International Schizophrenia ConsortiumCommon polygenic variation contributes to risk of schizophrenia and bipolar disorder.Nature. 2009; 460: 748-752Crossref PubMed Scopus (3442) Google Scholar More recently, the strength of this association has been used to infer the chip heritability.13Stahl E.A. Wegmann D. Trynka G. Gutierrez-Achury J. Do R. Voight B.F. Kraft P. Chen R. Kallberg H.J. Kurreeman F.A. et al.Diabetes Genetics Replication and Meta-analysis ConsortiumMyocardial Infarction Genetics ConsortiumBayesian inference analyses of the polygenic architecture of rheumatoid arthritis.Nat. Genet. 2012; 44: 483-489Crossref PubMed Scopus (313) Google Scholar, 14Dudbridge F. Power and predictive accuracy of polygenic risk scores.PLoS Genet. 2013; 9: e1003348Crossref PubMed Scopus (942) Google Scholar A further approach uses empirical Bayes methods to estimate the chip heritability from the distribution of Z scores for individual markers.15So H.C. Li M. Sham P.C. Uncovering the total heritability explained by all true susceptibility variants in a genome-wide association study.Genet. Epidemiol. 2011; 35: 447-456Crossref PubMed Scopus (209) Google Scholar This has the advantage of requiring only summary statistics from standard association analysis. Finally, a very recent method of “LD scoring”16Bulik-Sullivan B. Finucane H. Anttila V. Gusev A. Day F.R. Perry J.R.B. Patterson N. et al.ReproGen Consortium, Psychiatric Genomics Consortium, Genetic Consortium for Anorexia NervosaAn atlas of genetic correlations across human diseases and traits.bioRxiv. 2015; https://doi.org/10.1101/014498Crossref Google Scholar estimates the chip heritability from the correlation between the marginal effect size of a marker and a measure of its linkage disequilibrium (LD) with other markers, also using only summary statistics. In general, the methods using linear mixed models are computationally expensive and require individual-level data to calculate the genetic relatedness matrix. Furthermore, many of these methods estimate only the chip heritability, but it is often of interest also to estimate the proportion of markers that affect a trait. This bears on the design of association studies, because it indicates the number and effect sizes of the associated markers remaining to be found. It is also relevant for the debate on the nature of evolution,17Orr H.A. The genetic theory of adaptation: a brief history.Nat. Rev. Genet. 2005; 6: 119-127Crossref PubMed Scopus (751) Google Scholar, 18Pritchard J.K. Di Rienzo A. Adaptation - not by sweeps alone.Nat. Rev. Genet. 2010; 11: 665-667Crossref PubMed Scopus (300) Google Scholar because if a large number of variants affect a trait, mechanisms of selection by polygenic adaptation are possible, acting on standing variation without requiring new mutations.19Pritchard J.K. Pickrell J.K. Coop G. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation.Curr. Biol. 2010; 20: R208-R215Abstract Full Text Full Text PDF PubMed Scopus (591) Google Scholar Methods for the estimation of the number of genes affecting a trait have been proposed since the early 20th century, including complex segregation analysis comparing single- and multi-locus models with or without polygenic background,20Lynch M. Walsh B. Genetics and Analysis of Quantitative Traits. Sinauer Associates, 1998Google Scholar but only with the recent availability of dense genome-wide data has it become possible to assess the polygenic background itself. Linear mixed models have been extended to allow for a proportion of variants with effects,8Zhou X. Stephens M. Genome-wide efficient mixed-model analysis for association studies.Nat. Genet. 2012; 44: 821-824Crossref PubMed Scopus (1515) Google Scholar but this remains computationally demanding. Polygenic scoring has also been used to estimate this proportion, but again with a computationally demanding procedure that uses repeated genome-wide simulations within a Bayesian sampling scheme.13Stahl E.A. Wegmann D. Trynka G. Gutierrez-Achury J. Do R. Voight B.F. Kraft P. Chen R. Kallberg H.J. Kurreeman F.A. et al.Diabetes Genetics Replication and Meta-analysis ConsortiumMyocardial Infarction Genetics ConsortiumBayesian inference analyses of the polygenic architecture of rheumatoid arthritis.Nat. Genet. 2012; 44: 483-489Crossref PubMed Scopus (313) Google Scholar On the other hand, an analytic method for polygenic scores14Dudbridge F. Power and predictive accuracy of polygenic risk scores.PLoS Genet. 2013; 9: e1003348Crossref PubMed Scopus (942) Google Scholar estimates only one parameter among several defined in its model; therefore, it can estimate the proportion of variants with effects if the chip heritability is assumed to be known, or vice versa. Empirical Bayes methods are also available to estimate the proportion of markers with effects21Efron B. Tibshirani R. Storey J.D. Tusher V. Empirical Bayes analysis of a microarray experiment.J. Am. Stat. Assoc. 2001; 96: 1151-1160Crossref Scopus (1157) Google Scholar but have not been adapted to jointly estimate this proportion with the chip heritability. Here we extend the analytic approach of Dudbridge14Dudbridge F. Power and predictive accuracy of polygenic risk scores.PLoS Genet. 2013; 9: e1003348Crossref PubMed Scopus (942) Google Scholar to develop a fast analytic method based on polygenic scores for the joint estimation of chip heritability and the proportion of variants affecting the trait, and we further estimate the genetic covariance between two related traits. A particular advantage is that our method can be applied when only summary data is available for individual markers, and this allows our approach to be readily applied to the increasingly large datasets that are now being made available by study consortia. We consider the model presented by Dudbridge14Dudbridge F. Power and predictive accuracy of polygenic risk scores.PLoS Genet. 2013; 9: e1003348Crossref PubMed Scopus (942) Google Scholar in which a pair of standardized traits Y = (Y1,Y2)′ is expressed as a linear combination of m genetic effects and an error term E = (E1,E2)′:Y=β′G+E=(∑i=1mβi1Gi+E1,∑i=1mβi2Gi+E2)′(Equation 1) where G is an m vector of coded genetic markers and β an m × 2 matrix of coefficients, with E independent of G. Assuming that in two independent samples the estimates of the genetic effects are given, respectively, by βˆi1 and βˆi2, where i = 1,...,m, either set of estimates can then be used to create polygenic scores Sˆ1=∑i=1mβˆi2Gi and Sˆ2=∑i=1mβˆi1Gi to be tested for association with Y1 and Y2, respectively. Focusing without loss of generality on Sˆ2, the statistical properties of the test of association have been described.14Dudbridge F. Power and predictive accuracy of polygenic risk scores.PLoS Genet. 2013; 9: e1003348Crossref PubMed Scopus (942) Google Scholar In particular, the coefficient of determination between Sˆ2 and Y2, i.e., the variance explained by the polygenic score in the regression of Y2 on Sˆ2, is given byRSˆ2,Y22=mcov(βˆi1,βi2)2var(βˆi1)var(Y2),where the terms on the right-hand side are expressed analytically in terms of the following parameters. For study design: sample sizes of the two samples, (n1, n2); number of variants in the marker panel, m, assumed to be uncorrelated; p value thresholds for selecting a marker into the score from the training sample, (pL,pU); and for binary traits, population prevalences (K1,K2) and case sampling fractions (P1,P2). For genetic model: additive genetic variance in the training sample, σ12; genetic covariance between training and testing samples, σ12; and proportion of null markers with no effect on the trait in the training sample, π01. The variance and covariance are marginal over all markers, so include the null markers with βi1 = 0 or βi2 = 0. The asymptotic non-centrality parameter of the χ12 test of association between Y2 and Sˆ2 is given by λ=n2RSˆ2,Y22/(1−RSˆ2,Y22); equivalently, the expectation of the Z (or t) test is μ=(n2RSˆ2,Y22/(1−RSˆ2,Y22)) with the sign taken from the correlation between Y2 and Sˆ2. Binary traits are assumed to arise from a liability threshold model, in which each subject has an unobserved trait, called the liability, that is normally distributed in the population. Subjects with liability greater than a fixed threshold have the trait. The same theory then holds when either Y1 or Y2 is binary as for when it is quantitative, assuming linear transformations between effects on the liability scale to effects on the observed (0/1) scale, and accounting for ascertainment in case/control studies. Specifically, each effect βij on the liability scale corresponds to an effect βijφ(τj)(Pj(1−Pj)/Kj(1−Kj)) on the observed binary scale,14Dudbridge F. Power and predictive accuracy of polygenic risk scores.PLoS Genet. 2013; 9: e1003348Crossref PubMed Scopus (942) Google Scholar where τj = Φ−1(1 − Kj) with ϕ and Φ the standard normal density and cumulative distribution functions, respectively. We aim to estimate the genetic model parameters σ12, σ12, and π01 from the association test between Sˆ2 and Y2. Previously it was shown14Dudbridge F. Power and predictive accuracy of polygenic risk scores.PLoS Genet. 2013; 9: e1003348Crossref PubMed Scopus (942) Google Scholar that one parameter could be estimated by solving for the value at which λ equals the observed χ2 statistic. To estimate multiple parameters, we now propose using association tests of Y2 with multiple polygenic scores constructed by selecting markers with different p value thresholds in the training data. We then fit parameters to the observed association tests by using maximum likelihood. Specifically, let d1,...,dk denote a set of k intervals within the unit interval, where k is equal to or greater than the number of parameters to be estimated. For each i = 1,..., k, we select markers with p values falling in di, construct the corresponding polygenic score, and obtain its (signed) Z score (Zi) for association with Y2. The log-likelihood for σ12, σ12, and π01 is thenℓ(σ12,σ12,π01)=∑i=1klogφ(Zi−μ(σ12,σ12,π01;di)),where μ(σ12,σ12,π01;di) is the expectation of the Z test as described above, expressed explicitly as a function of the model parameters given selection interval di. Maximization of this log-likelihood yields estimates of the model parameters. Note that any of σ12, σ12, and π01 could be held fixed while the other parameter(s) are estimated. An equivalent procedure estimates (using obvious notation) σ22, σ12, and π02 by reversing the roles of the training and target samples. Furthermore, a bidirectional procedure can be used to simultaneously estimate up to five parameters (σ12, σ22, σ12, π01, and π02) by fitting to the Z scores for association of both Sˆ2 with Y2 and Sˆ1 with Y1. The number of estimated parameters can be reduced by assuming that the genetic architectures are identical in the training and testing samples. This would occur if two samples are drawn from the same population with the same trait definitions, or if one sample is randomly split into training and target subsets. Then we can assume σ12=σ22=σ12 and π01 = π02, estimating just two parameters in either unidirectional or bidirectional analysis. Ours is not a proper likelihood because the Z scores Zi corresponding to the marker selection intervals are not independent. The presence of a marker in one interval determines its presence or absence in all other intervals, creating dependence between the corresponding scores, but this is not reflected in our likelihood. Furthermore, the bidirectional likelihood does not account for dependence between the scores calculated in each direction. We are therefore using a quasi-likelihood and will later use simulations to investigate its sensitivity to the assumption of independent likelihood contributions. Maximization of the log-likelihood is complicated by constraints on the range of σ12. Because the absolute correlation between βi1 and βi2 must be no greater than 1, |σ12|≤σ1σ2. In the unidirectional estimation, σ22 is not identified and we need only respect that σ22≤1, giving the constraint |σ12|≤σ1. In the bidirectional estimation, we must also consider that the absolute correlation is no greater than 1 for the markers that have non-null effects in both training and target samples. Denoting this correlation as ρ∗, the correlation over all markers as ρ, and the proportion of markers with non-null effects in both samples as γ ≤ 1 − max(π01, π02), we haveρ∗=σ12γ−1σ12(1−π01)−1σ22(1−π02)−1=ρ(1−π01)(1−π02)γσ12=ρσ1σ2=ρ∗γσ1σ2(1−π01)(1−π02)|σ12|≤(1−max(π01,π02))σ1σ2(1−π01)(1−π02).We maximize the likelihood numerically by nesting the maximization for σ12 within that for the other parameters: for each proposed value of σ12, σ22, π01, and π02, we perform a univariate maximization for σ12 subject to the constraint imposed by the proposed values. To obtain analytic confidence intervals, we use profile likelihood.22Davison A.R. Statistical Models. Cambridge University Press, Cambridge2003Crossref Google Scholar For a general scalar parameter θ, its profile log-likelihood function is ℓP(θ)=ℓ(θ,ϑˆ(θ)) where ϑˆ(θ) is the maximum likelihood estimate of the remaining parameters in the model given θ. Because for a regular model 2(ℓ(θˆ,ϑˆ(θˆ))−ℓ(θ,ϑˆ(θ)))→Dχ12, for the estimated value θˆ we obtain a (1 − α) confidence interval as the set {θ:ℓP(θ)≥ℓP(θˆ)−(1/2)χ12(1−α)} where χ12(1−α) is the 1 − α quantile point of the χ12 distribution. This procedure is used to obtain confidence intervals for each of σ12, σ22, σ12, π01, and π02. Often it is the genetic correlation rather than the covariance between two traits that is of interest. Because the unidirectional estimation does not identify σ22, the correlation cannot be estimated unless a value is assumed for σ22. In the bidirectional estimation, the correlation and its confidence interval can be obtained via previously derived formulas.23Visscher P.M. Hemani G. Vinkhuyzen A.A. Chen G.B. Lee S.H. Wray N.R. Goddard M.E. Yang J. Statistical power to detect genetic (co)variance of complex traits using SNP data in unrelated samples.PLoS Genet. 2014; 10: e1004269Crossref PubMed Scopus (214) Google Scholar Association tests of polygenic scores can be calculated from summary data alone, as shown in the gtx package for R (see Web Resources). The regression of Y2 on Sˆ2 has coefficientcov(Y2,Sˆ2)var(Sˆ2)=∑cov(Y2,βˆ1jGj)∑var(βˆ1jGj)=∑βˆ1jβˆ2jvar(Gj)∑βˆ1j2var(Gj)≈∑βˆ1jβˆ2js2j−2∑βˆ1j2s2j−2where s2j2 is the sampling variance of βˆ2j, assuming markers are uncorrelated. This is the inverse-variance weighted mean of βˆ2j/βˆ1j and hence has sampling variance (1/∑βˆ1j2s2j−2). The Wald statistic,∑βˆ1jβˆ2js2j−2∑βˆ1j2s2j−2,(Equation 2) is then calculated from summary effect sizes and standard errors for the individual markers. These data are frequently available from research consortia even when access to individual-level data is impractical.24Ehret G.B. Munroe P.B. Rice K.M. Bochud M. Johnson A.D. Chasman D.I. Smith A.V. Tobin M.D. Verwoert G.C. Hwang S.J. et al.International Consortium for Blood Pressure Genome-Wide Association StudiesCARDIoGRAM consortiumCKDGen ConsortiumKidneyGen ConsortiumEchoGen consortiumCHARGE-HF consortiumGenetic variants in novel pathways influence blood pressure and cardiovascular disease risk.Nature. 2011; 478: 103-109Crossref PubMed Scopus (1567) Google Scholar, 25Dastani Z. Hivert M.F. Timpson N. Perry J.R. Yuan X. Scott R.A. Henneman P. Heid I.M. Kizer J.R. Lyytikäinen L.P. et al.DIAGRAM+ ConsortiumMAGIC ConsortiumGLGC InvestigatorsMuTHER ConsortiumDIAGRAM ConsortiumGIANT ConsortiumGlobal B Pgen ConsortiumProcardis ConsortiumMAGIC investigatorsGLGC ConsortiumNovel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals.PLoS Genet. 2012; 8: e1002607Crossref PubMed Scopus (340) Google Scholar Our methods, named AVENGEME (Additive Variance Explained and Number of Genetic Effects Method of Estimation), are implemented in R software available from the authors. To study the statistical and operating characteristics of AVENGEME, we simulated genome-wide marker data under various genetic models. We based our simulations on four complex diseases studied by Stahl et al.,13Stahl E.A. Wegmann D. Trynka G. Gutierrez-Achury J. Do R. Voight B.F. Kraft P. Chen R. Kallberg H.J. Kurreeman F.A. et al.Diabetes Genetics Replication and Meta-analysis ConsortiumMyocardial Infarction Genetics ConsortiumBayesian inference analyses of the polygenic architecture of rheumatoid arthritis.Nat. Genet. 2012; 44: 483-489Crossref PubMed Scopus (313) Google Scholar allowing direct comparisons with their ABPA method, which is conceptually similar to ours. We also performed simulations based on three successively larger studies of schizophrenia.26Schizophrenia Working Group of the Psychiatric Genomics ConsortiumBiological insights from 108 schizophrenia-associated genetic loci.Nature. 2014; 511: 421-427Crossref PubMed Scopus (5068) Google Scholar The study design parameters and the genetic models used for our simulations are given in Table 1.Table 1Parameter Values for Studies of Four Diseases13Stahl E.A. Wegmann D. Trynka G. Gutierrez-Achury J. Do R. Voight B.F. Kraft P. Chen R. Kallberg H.J. Kurreeman F.A. et al.Diabetes Genetics Replication and Meta-analysis ConsortiumMyocardial Infarction Genetics ConsortiumBayesian inference analyses of the polygenic architecture of rheumatoid arthritis.Nat. Genet. 2012; 44: 483-489Crossref PubMed Scopus (313) Google Scholar and Three Studies of Schizophrenia26Schizophrenia Working Group of the Psychiatric Genomics ConsortiumBiological insights from 108 schizophrenia-associated genetic loci.Nature. 2014; 511: 421-427Crossref PubMed Scopus (5068) Google ScholarRACDMIT2DSCZ ISCSCZ PGC1SCZ PGC2n116,0165,3096,04214,9195,95319,54877,195n212,0786,7854,8614,8625,1205,1205,120m82,39091,38889,80875,91284,88293,093103,125σ120.180.440.480.49––0.30π010.9730.9720.9800.962––0.95P10.2480.3940.4910.4160.4230.4770.425P20.1260.2730.3960.3960.5150.5150.515K10.010.010.060.080.010.010.01K20.010.010.060.080.010.010.01Abbreviations are as follows: RA, rheumatoid arthritis; CD, celiac disease; MI, myocardial infarction; T2D, type II diabetes; SCZ, schizophrenia; ISC, International Schizophrenia Consortium; PGC, Psychiatric Genomics Consortium. Values of σ12 and π01 for RA, CD, MI, and T2D were estimated by Stahl et al.13Stahl E.A. Wegmann D. Trynka G. Gutierrez-Achury J. Do R. Voight B.F. Kraft P. Chen R. Kallberg H.J. Kurreeman F.A. et al.Diabetes Genetics Replication and Meta-analysis ConsortiumMyocardial Infarction Genetics ConsortiumBayesian inference analyses of the polygenic architecture of rheumatoid arthritis.Nat. Genet. 2012; 44: 483-489Crossref PubMed Scopus (313) Google Scholar and subsequently used in our simulations. Those for SCZ are an approximation based on estimates from several studies and methods (Table 5). Open table in a new tab Abbreviations are as follows: RA, rheumatoid arthritis; CD, celiac disease; MI, myocardial infarction; T2D, type II diabetes; SCZ, schizophrenia; ISC, International Schizophrenia Consortium; PGC, Psychiatric Genomics Consortium. Values of σ12 and π01 for RA, CD, MI, and T2D were estimated by Stahl et al.13Stahl E.A. Wegmann D. Trynka G. Gutierrez-Achury J. Do R. Voight B.F. Kraft P. Chen R. Kallberg H.J. Kurreeman F.A. et al.Diabetes Genetics Replication and Meta-analysis ConsortiumMyocardial Infarction Genetics ConsortiumBayesian inference analyses of the polygenic architecture of rheumatoid arthritis.Nat. Genet. 2012; 44: 483-489Crossref PubMed Scopus (313) Google Scholar and subsequently used in our simulations. Those for SCZ are an approximation based on estimates from several studies and methods (Table 5). For each genetic model, we simulated estimated effect sizes βˆ1j and βˆ2j independently for each marker, by drawing the true effects from the bivariate normal distribution in Equation 1 and adding independent sampling error to each effect. We then selected markers according to their p values in the training sample and used the summary statistic formula in Equation 2 to obtain tests of association for each polygenic score. We verified this approach for sample sizes up to 10K by explicitly simulating genotypes in case and control subjects as previously described.14Dudbridge F. Power and predictive accuracy of polygenic risk scores.PLoS Genet. 2013; 9: e1003348Crossref PubMed Scopus (942) Google Scholar In brief, independent biallelic markers were defined with population minor allele frequencies uniformly distributed on (0.01,0.5). Their effect sizes were drawn from the bivariate normal distribution such that the desired variances and covariances were attained. Allele frequencies were then derived for case and control subjects and genotypes simulated in each. Allelic odds ratios were then computed from the genotype counts. The results from the genotype simulations were indistinguishable from those from summary statistics, so we adopted the summary statistic method, which is much faster and easily scales up to very large sample sizes. Note that in our simulations, markers were assumed to be independent, i.e., in linkage equilibrium, as assumed by AVENGEME. We will later consider the effect of LD on our method. For the models in Table 1, we simulated 1,000 sets of polygenic score results and estimated the genetic model parameters via the unidirectional AVENGEME. This was done both when assuming σ12=σ12 (which reflects the assumption that the two samples have the same genetic model), in which case AVENGEME estimates the two free parameters σ12 and π01, and when allowing σ12≠σ12, in which case AVENGEME estimates three free parameters. We evaluated the accuracy from the mean and SD of the parameter estimates and the coverage of the 95% confidence intervals. We the" @default.
- W1864883907 created "2016-06-24" @default.
- W1864883907 creator A5026743197 @default.
- W1864883907 creator A5067816983 @default.
- W1864883907 date "2015-08-01" @default.
- W1864883907 modified "2023-10-16" @default.
- W1864883907 title "A Fast Method that Uses Polygenic Scores to Estimate the Variance Explained by Genome-wide Marker Panels and the Proportion of Variants Affecting a Trait" @default.
- W1864883907 cites W1525285200 @default.
- W1864883907 cites W1980168725 @default.
- W1864883907 cites W1982952214 @default.
- W1864883907 cites W1986818660 @default.
- W1864883907 cites W1997338841 @default.
- W1864883907 cites W2020663825 @default.
- W1864883907 cites W2034046719 @default.
- W1864883907 cites W2045227722 @default.
- W1864883907 cites W2061623413 @default.
- W1864883907 cites W2070082005 @default.
- W1864883907 cites W2074089196 @default.
- W1864883907 cites W2098143048 @default.
- W1864883907 cites W2098597355 @default.
- W1864883907 cites W2101357408 @default.
- W1864883907 cites W2101605059 @default.
- W1864883907 cites W2102795115 @default.
- W1864883907 cites W2110808585 @default.
- W1864883907 cites W2113306639 @default.
- W1864883907 cites W2113381731 @default.
- W1864883907 cites W2115807273 @default.
- W1864883907 cites W2124204548 @default.
- W1864883907 cites W2134783591 @default.
- W1864883907 cites W2135370544 @default.
- W1864883907 cites W2139852368 @default.
- W1864883907 cites W2146415150 @default.
- W1864883907 cites W2148985208 @default.
- W1864883907 cites W2150730998 @default.
- W1864883907 cites W2153860431 @default.
- W1864883907 cites W2153968413 @default.
- W1864883907 cites W2155496693 @default.
- W1864883907 cites W2161633633 @default.
- W1864883907 doi "https://doi.org/10.1016/j.ajhg.2015.06.005" @default.
- W1864883907 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/4573448" @default.
- W1864883907 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/26189816" @default.
- W1864883907 hasPublicationYear "2015" @default.
- W1864883907 type Work @default.
- W1864883907 sameAs 1864883907 @default.
- W1864883907 citedByCount "201" @default.
- W1864883907 countsByYear W18648839072015 @default.
- W1864883907 countsByYear W18648839072016 @default.
- W1864883907 countsByYear W18648839072017 @default.
- W1864883907 countsByYear W18648839072018 @default.
- W1864883907 countsByYear W18648839072019 @default.
- W1864883907 countsByYear W18648839072020 @default.
- W1864883907 countsByYear W18648839072021 @default.
- W1864883907 countsByYear W18648839072022 @default.
- W1864883907 countsByYear W18648839072023 @default.
- W1864883907 crossrefType "journal-article" @default.
- W1864883907 hasAuthorship W1864883907A5026743197 @default.
- W1864883907 hasAuthorship W1864883907A5067816983 @default.
- W1864883907 hasBestOaLocation W18648839071 @default.
- W1864883907 hasConcept C104317684 @default.
- W1864883907 hasConcept C105795698 @default.
- W1864883907 hasConcept C106208931 @default.
- W1864883907 hasConcept C106934330 @default.
- W1864883907 hasConcept C121955636 @default.
- W1864883907 hasConcept C135763542 @default.
- W1864883907 hasConcept C141231307 @default.
- W1864883907 hasConcept C144133560 @default.
- W1864883907 hasConcept C153209595 @default.
- W1864883907 hasConcept C196083921 @default.
- W1864883907 hasConcept C199360897 @default.
- W1864883907 hasConcept C2910691881 @default.
- W1864883907 hasConcept C33923547 @default.
- W1864883907 hasConcept C41008148 @default.
- W1864883907 hasConcept C54355233 @default.
- W1864883907 hasConcept C70721500 @default.
- W1864883907 hasConcept C81941488 @default.
- W1864883907 hasConcept C86803240 @default.
- W1864883907 hasConceptScore W1864883907C104317684 @default.
- W1864883907 hasConceptScore W1864883907C105795698 @default.
- W1864883907 hasConceptScore W1864883907C106208931 @default.
- W1864883907 hasConceptScore W1864883907C106934330 @default.
- W1864883907 hasConceptScore W1864883907C121955636 @default.
- W1864883907 hasConceptScore W1864883907C135763542 @default.
- W1864883907 hasConceptScore W1864883907C141231307 @default.
- W1864883907 hasConceptScore W1864883907C144133560 @default.
- W1864883907 hasConceptScore W1864883907C153209595 @default.
- W1864883907 hasConceptScore W1864883907C196083921 @default.
- W1864883907 hasConceptScore W1864883907C199360897 @default.
- W1864883907 hasConceptScore W1864883907C2910691881 @default.
- W1864883907 hasConceptScore W1864883907C33923547 @default.
- W1864883907 hasConceptScore W1864883907C41008148 @default.
- W1864883907 hasConceptScore W1864883907C54355233 @default.
- W1864883907 hasConceptScore W1864883907C70721500 @default.
- W1864883907 hasConceptScore W1864883907C81941488 @default.
- W1864883907 hasConceptScore W1864883907C86803240 @default.
- W1864883907 hasIssue "2" @default.
- W1864883907 hasLocation W18648839071 @default.
- W1864883907 hasLocation W18648839072 @default.
- W1864883907 hasLocation W18648839073 @default.