Matches in SemOpenAlex for { <https://semopenalex.org/work/W1964998512> ?p ?o ?g. }
- W1964998512 endingPage "735" @default.
- W1964998512 startingPage "728" @default.
- W1964998512 abstract "Empirical evidences suggest that both common and rare variants contribute to complex disease etiology. Although the effects of common variants have been thoroughly assessed in recent genome-wide association studies (GWAS), our knowledge of the impact of rare variants on complex diseases remains limited. A number of methods have been proposed to test for rare variant association in sequencing-based studies, a study design that is becoming popular but is still not economically feasible. On the contrary, few (if any) methods exist to detect rare variants in GWAS data, the data we have collected on thousands of individuals. Here we propose two methods, a weighted haplotype-based approach and an imputation-based approach, to test for the effect of rare variants with GWAS data. Both methods can incorporate external sequencing data when available. We evaluated our methods and compared them with methods proposed in the sequencing setting through extensive simulations. Our methods clearly show enhanced statistical power over existing methods for a wide range of population-attributable risk, percentage of disease-contributing rare variants, and proportion of rare alleles working in different directions. We also applied our methods to the IFIH1 region for the type 1 diabetes GWAS data collected by the Wellcome Trust Case-Control Consortium. Our methods yield p values in the order of 10−3, whereas the most significant p value from the existing methods is greater than 0.17. We thus demonstrate that the evaluation of rare variants with GWAS data is possible, particularly when public sequencing data are incorporated. Empirical evidences suggest that both common and rare variants contribute to complex disease etiology. Although the effects of common variants have been thoroughly assessed in recent genome-wide association studies (GWAS), our knowledge of the impact of rare variants on complex diseases remains limited. A number of methods have been proposed to test for rare variant association in sequencing-based studies, a study design that is becoming popular but is still not economically feasible. On the contrary, few (if any) methods exist to detect rare variants in GWAS data, the data we have collected on thousands of individuals. Here we propose two methods, a weighted haplotype-based approach and an imputation-based approach, to test for the effect of rare variants with GWAS data. Both methods can incorporate external sequencing data when available. We evaluated our methods and compared them with methods proposed in the sequencing setting through extensive simulations. Our methods clearly show enhanced statistical power over existing methods for a wide range of population-attributable risk, percentage of disease-contributing rare variants, and proportion of rare alleles working in different directions. We also applied our methods to the IFIH1 region for the type 1 diabetes GWAS data collected by the Wellcome Trust Case-Control Consortium. Our methods yield p values in the order of 10−3, whereas the most significant p value from the existing methods is greater than 0.17. We thus demonstrate that the evaluation of rare variants with GWAS data is possible, particularly when public sequencing data are incorporated. Recent studies suggest that rare variants play an important role in the etiology of complex traits,1Cohen J.C. Kiss R.S. Pertsemlidis A. Marcel Y.L. McPherson R. Hobbs H.H. Multiple rare alleles contribute to low plasma levels of HDL cholesterol.Science. 2004; 305: 869-872Crossref PubMed Scopus (874) Google Scholar, 2Nejentsev S. Walker N. Riches D. Egholm M. Todd J.A. Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes.Science. 2009; 324: 387-389Crossref PubMed Scopus (715) Google Scholar revealing that rare variants generally have larger genetic effects than common variants.3Pritchard J.K. Are rare variants responsible for susceptibility to complex diseases?.Am. J. Hum. Genet. 2001; 69: 124-137Abstract Full Text Full Text PDF PubMed Scopus (879) Google Scholar, 4Pritchard J.K. Cox N.J. The allelic architecture of human disease genes: Common disease-common variant…or not?.Hum. Mol. Genet. 2002; 11: 2417-2423Crossref PubMed Scopus (526) Google Scholar, 5Kryukov G.V. Pennacchio L.A. Sunyaev S.R. Most rare missense alleles are deleterious in humans: Implications for complex disease and association studies.Am. J. Hum. Genet. 2007; 80: 727-739Abstract Full Text Full Text PDF PubMed Scopus (425) Google Scholar, 6Frazer K.A. Murray S.S. Schork N.J. Topol E.J. Human genetic variation and its contribution to complex traits.Nat. Rev. Genet. 2009; 10: 241-251Crossref PubMed Scopus (727) Google Scholar There is also evidence that multiple rare variants together influence the risk of complex diseases, making it sensible to combine information across them. Although there is a lingering debate over the two schools of hypothesis for the genetics underlying complex traits, namely common disease common variants and common disease rare variants, the community has now gradually reached a consensus that both common and rare variants contribute to the underlying genetic mechanism.7Schork N.J. Murray S.S. Frazer K.A. Topol E.J. Common vs. rare allele hypotheses for complex diseases.Curr. Opin. Genet. Dev. 2009; 19: 212-219Crossref PubMed Scopus (444) Google Scholar However, unlike common variants, whose impact on human diseases has been thoroughly evaluated in the recent wave of genome-wide association studies (GWAS), rare variants are largely waiting for the evaluation of their impact. Rare variants are attracting increasing attention from researchers for two major reasons. First, common variants identified through GWAS only explain a small proportion of the overall heritability, and rare variants hold the promise to explain some of the missing heritability.8Gibson G. Hints of hidden heritability in GWAS.Nat. Genet. 2010; 42: 558-560Crossref PubMed Scopus (205) Google Scholar, 9Manolio T.A. Collins F.S. Cox N.J. Goldstein D.B. Hindorff L.A. Hunter D.J. McCarthy M.I. Ramos E.M. Cardon L.R. Chakravarti A. et al.Finding the missing heritability of complex diseases.Nature. 2009; 461: 747-753Crossref PubMed Scopus (5379) Google Scholar, 10Maher B. Personal genomes: The case of the missing heritability.Nature. 2008; 456: 18-21Crossref PubMed Scopus (1213) Google Scholar Second, massively parallel sequencing technologies have made it feasible to search after rare variants.2Nejentsev S. Walker N. Riches D. Egholm M. Todd J.A. Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes.Science. 2009; 324: 387-389Crossref PubMed Scopus (715) Google Scholar, 11Shendure J. Ji H.L. Next-generation DNA sequencing.Nat. Biotechnol. 2008; 26: 1135-1145Crossref PubMed Scopus (2857) Google Scholar In preparation for the coming wave of sequencing-based studies, a number of methods have been proposed to test for the effect of rare variants in aggregate.12Li B.S. Leal S.M. Methods for detecting associations with rare variants for common diseases: Application to analysis of sequence data.Am. J. Hum. Genet. 2008; 83: 311-321Abstract Full Text Full Text PDF PubMed Scopus (1039) Google Scholar, 13Madsen B.E. Browning S.R. A groupwise association test for rare mutations using a weighted sum statistic.PLoS Genet. 2009; 5: e1000384Crossref PubMed Scopus (769) Google Scholar, 14Zhu X.F. Feng T. Li Y.L. Lu Q. Elston R.C. Detecting rare variants for complex traits using family and unrelated data.Genet. Epidemiol. 2010; 34: 171-187Crossref PubMed Scopus (102) Google Scholar, 15Morris A.P. Zeggini E. An evaluation of statistical approaches to rare variant analysis in genetic association studies.Genet. Epidemiol. 2010; 34: 188-193Crossref PubMed Scopus (351) Google Scholar, 16Li Q. Zhang H. Yu K. Approaches for evaluating rare polymorphisms in genetic association studies.Hum. Hered. 2010; 69: 219-228Crossref PubMed Scopus (14) Google Scholar, 17Han F. Pan W. A data-adaptive sum test for disease association with multiple common or rare variants.Hum. Hered. 2010; 70: 42-54Crossref PubMed Scopus (238) Google Scholar, 18Price A.L. Kryukov G.V. de Bakker P.I. Purcell S.M. Staples J. Wei L.J. Sunyaev S.R. Pooled association tests for rare variants in exon-resequencing studies.Am. J. Hum. Genet. 2010; 86: 832-838Abstract Full Text Full Text PDF PubMed Scopus (569) Google Scholar However, whole-genome sequencing is still cost prohibitive, and only a few groups can afford to sequence a relatively small number of samples, limiting the statistical power to detect association. On the other hand, little, if any, attention has been given to GWAS data for the evaluation of rare variants. There are good reasons for the lack of methods targeting at GWAS data. Analysis of directly assayed rare variants is challenging statistically because methods developed for common variants are underpowered. Commercial genotyping panels employed by GWAS were designed to cover most of the common variants but have poor coverage of rare variants, making the analysis even more challenging. Now, with the publicly available data from the 1000 Genomes Project being rapidly generated and released,19Kaiser J. DNA sequencing. A plan to capture human diversity in 1000 genomes.Science. 2008; 319: 395Crossref PubMed Scopus (99) Google Scholar, 20The 1000 Genomes ProjectA map of human genome variation from population scale sequencing.Nature. 2010; 467: 1061-1073Crossref PubMed Scopus (5498) Google Scholar an attempt to detect rare variants with GWAS data is worthy and holds promise before study-specific sequencing data become widely available. We note that, with GWAS data alone, extremely rare variants (for example, singletons or study population private variants) still cannot be evaluated. Our focus is on the analysis of variants in the frequency range of 0.1%–5%, which have not been adequately assessed in GWAS but can be better captured either by haplotyping or with the aid of external sequencing data by multimarker imputation.21Pe'er I. de Bakker P.I. Maller J. Yelensky R. Altshuler D. Daly M.J. Evaluating and improving power in whole-genome association studies using fixed marker sets.Nat. Genet. 2006; 38: 663-667Crossref PubMed Scopus (238) Google Scholar, 22Li Y. Willer C. Sanna S. Abecasis G. Genotype imputation.Annu. Rev. Genomics Hum. Genet. 2009; 10: 387-406Crossref PubMed Scopus (757) Google Scholar, 23Marchini J. Howie B. Genotype imputation for genome-wide association studies.Nat. Rev. Genet. 2010; 11: 499-511Crossref PubMed Scopus (977) Google Scholar Here we propose two methods to search for the aggregated effect of rare variants with GWAS data. Our approaches do not rely on the availability of external sequencing data, but they can incorporate such information when available. Moreover, our methods make no assumption on the direction of association of rare alleles with disease risk. We applied our methods, along with existing methods proposed in the sequencing context, to simulated data sets. Our methods demonstrated better performance across a wide range of scenarios with an average power improvement of 8.6% (31.6%) in the absence (presence) of external sequencing data. We also applied our methods to the Wellcome Trust Case-Control Consortium (WTCCC) type 1 diabetes (T1D [MIM 222100]) GWAS data set in the IFIH1 (MIM 606951) gene region, where both common and multiple rare variants have been found to influence the risk of T1D.2Nejentsev S. Walker N. Riches D. Egholm M. Todd J.A. Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes.Science. 2009; 324: 387-389Crossref PubMed Scopus (715) Google Scholar, 24Barrett J.C. Clayton D.G. Concannon P. Akolkar B. Cooper J.D. Erlich H.A. Julier C. Morahan G. Nerup J. Nierras C. et al.Type 1 Diabetes Genetics ConsortiumGenome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes.Nat. Genet. 2009; 41: 703-707Crossref PubMed Scopus (1208) Google Scholar, 25Smyth D.J. Cooper J.D. Bailey R. Field S. Burren O. Smink L.J. Guja C. Ionescu-Tirgoviste C. Widmer B. Dunger D.B. et al.A genome-wide association study of nonsynonymous SNPs identifies a type 1 diabetes locus in the interferon-induced helicase (IFIH1) region.Nat. Genet. 2006; 38: 617-619Crossref PubMed Scopus (511) Google Scholar Our first test is a weighted haplotype test. Assume a sample of N diploid individuals is collected, among which Ncs are affected cases and Nct are unaffected controls. Let m denote the number of genotyped markers in a region of interest. Further denote haplotypes of the N individuals by H = (H1, H2, …, Hi, …, HN)t, where Hi = {Hi,1, Hi,2} are the two haplotypes carried by the ith individual, consisting of the m markers in the region. For each individual i, we define a weighted haplotype score as follows:WHSi=∑j=12WHi,j,in which the sum is taken over the two haplotypes of individual i. Wh stands for the weight of haplotype h and is defined asWh=I(h∈C)·(−1)I(h∈P)·Sh,in which C is the set of disease-contributing haplotypes including both risk and protective haplotypes, P is the set of disease-protective haplotypes (note that P is a subset of C), and Sh is a score assigned to haplotype h. Following the weighting scheme proposed by Madsen and Browning13Madsen B.E. Browning S.R. A groupwise association test for rare mutations using a weighted sum statistic.PLoS Genet. 2009; 5: e1000384Crossref PubMed Scopus (769) Google Scholar for SNPs, we define Sh asSh=Nct·fct,h·(1−fct,h),in which fct,h denotes the adjusted frequency of haplotype h among controls and is defined asfct,h=C+ct,h12(Nct+1),in which Cct,h is the number of haplotype h among controls. The rationale of using such a score is that a rare variant (most likely untyped in GWAS) is more likely to be tagged by a rare haplotype than by a common haplotype, and thus rare haplotypes should receive more weight in the analysis. To define the sets of the disease-contributing and disease-protective haplotypes, we first split the data into a testing set and a training set and then compared the haplotype frequencies between cases and controls in the training set according to the formula below:{h∈Cif|fcs,htr−fct,htr|>μfct,htr(1−fct,htr)2Ncttr,h∈Piffcs,htr−fct,htr<−μfct,htr(1−fct,htr)2Ncttr,(Equation 1) with tr standing for the training set. Here, μ is a constant that is determined by a prespecified type I error rate. For example, μ = 1.28 (1.64) corresponds to a type I error of 0.2 (0.1). Following Zhu et al.,14Zhu X.F. Feng T. Li Y.L. Lu Q. Elston R.C. Detecting rare variants for complex traits using family and unrelated data.Genet. Epidemiol. 2010; 34: 171-187Crossref PubMed Scopus (102) Google Scholar we set μ = 1.28 and randomly selected 30% of the samples for training in the analysis. We note that by explicitly modeling the two sets of haplotypes as described above, we do not need to make assumptions about the direction of association between rare alleles and disease risk. Weighted haplotype scores are calculated in the testing set after identifying the two sets of haplotypes with the training set. To assess whether the rare variants are significantly associated with the disease, we can perform a standard Wilcoxon26Wilcoxon F. Individual comparisons by ranking methods.Biom. Bull. 1945; 1: 80-83Crossref Google Scholar test on the weighted haplotype scores and assess the significance of the test by permutations. For each permuted data set, the training set and the testing set will be obtained in a similar fashion as the original data set. Because typical GWAS data consist of genotypes rather than haplotypes, we need to infer haplotypes from unphased genotypes. This step can be done via standard phasing methods, including PHASE, fastPHASE, MaCH, and Beagle.22Li Y. Willer C. Sanna S. Abecasis G. Genotype imputation.Annu. Rev. Genomics Hum. Genet. 2009; 10: 387-406Crossref PubMed Scopus (757) Google Scholar, 27Stephens M. Scheet P. Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.Am. J. Hum. Genet. 2005; 76: 449-462Abstract Full Text Full Text PDF PubMed Scopus (1068) Google Scholar, 28Scheet P. Stephens M. A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase.Am. J. Hum. Genet. 2006; 78: 629-644Abstract Full Text Full Text PDF PubMed Scopus (1343) Google Scholar, 29Browning S.R. Multilocus association mapping using variable-length Markov chains.Am. J. Hum. Genet. 2006; 78: 903-913Abstract Full Text Full Text PDF PubMed Scopus (103) Google Scholar We used MaCH, which allows the incorporation of external genotyping, haplotyping, or sequencing data. Our weighted haplotype approach can be applied to haplotypes consisting of GWAS markers alone or to haplotypes including additional markers via incorporation of external reference data. Our second test is a weighted imputation dosage test. Following the notations defined above, we assume that there are a total of M markers genotyped or sequenced after the incorporation of one or more external data sets (e.g., the International HapMap Project30International HapMap ConsortiumA haplotype map of the human genome.Nature. 2005; 437: 1299-1320Crossref PubMed Scopus (4545) Google Scholar, 31Frazer K.A. Ballinger D.G. Cox D.R. Hinds D.A. Stuve L.L. Gibbs R.A. Belmont J.W. Boudreau A. Hardenbol P. Leal S.M. et al.International HapMap ConsortiumA second generation human haplotype map of over 3.1 million SNPs.Nature. 2007; 449: 851-861Crossref PubMed Scopus (3377) Google Scholar or the 1000 Genomes Project19Kaiser J. DNA sequencing. A plan to capture human diversity in 1000 genomes.Science. 2008; 319: 395Crossref PubMed Scopus (99) Google Scholar). We have previously described a hidden Markov model-based method that imputes untyped markers in study samples by exploiting external data as reference, which was implemented in software MaCH and has become standard in GWAS analysis.32de Bakker P.I.W. Ferreira M.A.R. Jia X.M. Neale B.M. Raychaudhuri S. Voight B.F. Practical aspects of imputation-driven meta-analysis of genome-wide association studies.Hum. Mol. Genet. 2008; 17: R122-R128Crossref PubMed Scopus (396) Google Scholar Let D = (D1, D2, …, Di, …, DN)t denote the dosage matrices across M markers for the N study subjects, in which Di = (Di,1, Di,2, …, Di,j, …, Di,M) denotes the dosages of the ith individual. Here Dij is the dosage for the ith individual at marker j, which is defined as the expected number of the rare allele at marker j. Now we define the weighted dosage score for each individual i asWDSi=∑j=1MI(j∈MC)·(−1)I(j∈MP)·Di,j,in which the summation is taken over all M markers with genotype dosage scores. Here MC is the set of markers with the rare allele that contributes to disease risk, and MP is the set of markers with the rare allele that decreases disease risk. We define these two sets by examining frequency difference between cases and controls, similar to Equation 1 for the weighted haplotype test. After obtaining the scores, the standard Wilcoxon test is applied to test for association with the disease, and its significance is assessed via permutation. We compared our proposed methods with the following three methods proposed in the sequencing context. (1) Weighted SNP Test (denoted by WS)13Madsen B.E. Browning S.R. A groupwise association test for rare mutations using a weighted sum statistic.PLoS Genet. 2009; 5: e1000384Crossref PubMed Scopus (769) Google Scholar is a weighted-sum method in which rare alleles are aggregated and weighted according to a function of minor allele frequency among controls. Despite the fact that the method was proposed as a test for “rare mutations,” it indeed sums over all markers by giving smaller weight to alleles with higher frequency. Although an omnibus regional-based test that evaluates both common and rare variants is sometimes desired, here we are interested in a regional-based test for rare variants only, assuming that common variants have been thoroughly evaluated by large-scale GWAS. Because of this, we compared our methods with both the originally proposed test (denoted by WSall) and a modified version of it (denoted by WSrare), in which only markers with minor allele frequency (MAF) < 5% are included. (2) Zhu and colleagues proposed a haplotype grouping method (denoted by HG)14Zhu X.F. Feng T. Li Y.L. Lu Q. Elston R.C. Detecting rare variants for complex traits using family and unrelated data.Genet. Epidemiol. 2010; 34: 171-187Crossref PubMed Scopus (102) Google Scholar that counts the number of rare risky haplotypes for each individual and uses a Fisher's exact test for testing. (3) We also applied the rare variant collapsing method (denoted by RVC) proposed by Li and Leal,12Li B.S. Leal S.M. Methods for detecting associations with rare variants for common diseases: Application to analysis of sequence data.Am. J. Hum. Genet. 2008; 83: 311-321Abstract Full Text Full Text PDF PubMed Scopus (1039) Google Scholar which groups each individual into one of two groups: carrying any rare allele or not. Together with case-control status, a 2 × 2 table is generated, and a standard test for contingency table (e.g., chi-square test for independence) is applied. Table 1 lists the above-described tests and their abbreviations.Table 1Abbreviation and Description of Tests AppliedTest AbbreviationDescriptionWDSWeighted dosage test on genotyped plus imputed SNPs with external sequencing dataWHSWeighted haplotype test on genotyped plus imputed SNPs with external sequencing dataWHGWeighted haplotype test on genotyped SNPs onlyHGHaplotype grouping test proposed by Zhu et al.14Zhu X.F. Feng T. Li Y.L. Lu Q. Elston R.C. Detecting rare variants for complex traits using family and unrelated data.Genet. Epidemiol. 2010; 34: 171-187Crossref PubMed Scopus (102) Google ScholarWSallOriginal weighted SNP test aggregating evidence over all (regardless of MAF) SNPs proposed by Madsen and Browning13Madsen B.E. Browning S.R. A groupwise association test for rare mutations using a weighted sum statistic.PLoS Genet. 2009; 5: e1000384Crossref PubMed Scopus (769) Google ScholarWSrareModified weighted SNP test aggregating evidence over rare (MAF < 5%) SNPs onlyRVCRare variant collapsing method proposed by Li and Leal12Li B.S. Leal S.M. Methods for detecting associations with rare variants for common diseases: Application to analysis of sequence data.Am. J. Hum. Genet. 2008; 83: 311-321Abstract Full Text Full Text PDF PubMed Scopus (1039) Google Scholar Open table in a new tab We simulated 10,000 chromosomes for a series of 100 1 Mb regions with a coalescent model that mimics linkage disequilibrium (LD) in real data, accounts for variations in local recombination rates, and models population history, consistent with the HapMap CEU (CEPH people from Utah, USA) samples.33Schaffner S.F. Foo C. Gabriel S. Reich D. Daly M.J. Altshuler D. Calibrating a coalescent simulation of human genome sequence variation.Genome Res. 2005; 15: 1576-1583Crossref PubMed Scopus (477) Google Scholar We then took a random subset of 1000 simulated chromosomes (i.e., 500 individuals) to serve as the external reference, mimicking the targeting sample size for the 1000 Genomes Project. To generate a set of GWAS markers in each region, we first randomly picked 120 chromosomes, mimicking Phase II HapMap CEU data. We then ascertained and thinned polymorphic sites to match marker density and allele frequency spectrum of their real-data counterparts. Based on LD measures calculated with the 120 chromosomes, we selected a set of 100 SNPs for each region that included 90 tagSNPs tagging the largest number of SNPs and 10 additional SNPs picked at random among the remaining SNPs. The final set of retained SNPs (GWAS markers in the region) captured ∼78% of the common variants (MAF > 5%) at a conventional r2 cutoff of 0.8, similar to the real-data performance of the Illumina HumanHap300 BeadChip SNP genotyping platform. Within each simulated 1 Mb region, we picked an ∼50 kb region as the causal region in which we assume only rare variants (variants with population MAF between 0.1% and 5%) contribute to the disease risk. We randomly selected d% of the rare variants in the causal region to be causal, i.e., to influence disease risk. Among these rare variants, we further assume that r% of them increase disease risk, whereas the remaining (100 – r)% decrease disease risk. To ensure that each variant only has a small contribution to the overall disease risk, we followed a model similar to that proposed by Madsen and Browning.13Madsen B.E. Browning S.R. A groupwise association test for rare mutations using a weighted sum statistic.PLoS Genet. 2009; 5: e1000384Crossref PubMed Scopus (769) Google Scholar Specifically, the contribution of each causal variant j to the overall genotype relative risk (GRR) is defined asGRRj=(PAR(1−PAR)·MAFj+1)(−1)I(ξj=1),in which PAR is the population attributable risk and ξj=1 indicates that the rare allele of marker j decreases disease risk. Following Madsen and Browning,13Madsen B.E. Browning S.R. A groupwise association test for rare mutations using a weighted sum statistic.PLoS Genet. 2009; 5: e1000384Crossref PubMed Scopus (769) Google Scholar we used the same marginal PAR for each causal variant, which intrinsically assumes that alleles with lower frequency have higher GRR than alleles with higher frequency. In our 50 kb core region, there are ∼500 SNPs with MAF < 5%; the distributions of MAFs and GRRs (without loss of generality, assuming all rare alleles increase disease risk) are shown in Figure S1 available online. To generate the chromosomes for an individual, we randomly selected two chromosomes {H1, H2} from the remaining 9000 chromosomes that were not selected as external reference. The disease status of the individual was assigned according toP(affected|{H1,H2})=f0×∏k=12∏j=1mcGRRjI(Hk,j=aj),in which f0 is the baseline penetrance and was fixed at 10% in our simulations (1% and 5% were also evaluated and resulted in similar patterns but with slight power loss), mc is the number of causal SNPs, and aj is the rare allele of SNP j. Sampling was repeated until the desired number of cases and controls was reached. In our simulations, d took values from 10% to 50% by an increment of 10%. Among the disease risk influencing loci, we set the value of r, the percentage of rare alleles increasing disease risk, at 5%, 20%, 50%, 80%, and 100%, respectively. For each of the 100 regions, two independent data sets with 1000 cases and 1000 controls were simulated with the model described above. In addition, five independent null data sets of the same sample size were simulated, assuming no genetic effect by randomly sampling 4000 chromosomes (i.e., 2000 individuals) from the pool of 9000 chromosomes. Average power was estimated based on the 100 regions, which represent a wide range of LD patterns. To account for local LD differences, we permuted each of the null sets 200 times to obtain region-specific empirical significant threshold. For the weighted haplotype analysis, we considered two versions: WHG, which uses haplotypes consisting of GWAS SNPs only, and WHS, which uses haplotypes encompassing both genotyped and imputed SNPs. For both the weighted haplotype tests and the weighted dosage test, untyped SNPs with Rsq (estimated imputation quality) < 0.3 were discarded from subsequent analysis.22Li Y. Willer C. Sanna S. Abecasis G. Genotype imputation.Annu. Rev. Genomics Hum. Genet. 2009; 10: 387-406Crossref PubMed Scopus (757) Google Scholar In all analyses, we used haplotypes reconstructed from the unphased genotypes and imputed genotypes for markers that are not included on the GWAS chip. Our methods (WHG, WHS, and WDS), together with WSall, WSrare, HG, and RVC, were applied to the 1000 null data sets within each region to determine the region-specific empirical significance threshold, ensuring the correct type I error rate of 0.05 for all tests. Figure 1 shows the empirical power of our methods relative to the other four methods proposed in the sequencing context as a function of r, the proportion of rare alleles increasing disease risk, which ranges from 5% to 100%. We fixed PAR at 0.5% and d (percent of disease-influencing rare variants) at 50%. Although the synergy assumption is more reasonable for rarer alleles than for common alleles because rarer alleles tend to disrupt gene function, our knowledge regarding the direction of rarer alleles is still limited. Therefore, methods robust to such an assumption are desirable. Although all methods have decreased power when rare alleles work in different directions, our methods performed better by explicitly modeling the direction of association. For example, compared with the haplotype grouping (HG) method, the advantage of our weighted haplotype method (WHG, on GWAS SNPs only without the aid of external sequencing data) manifests more when a larger proportion of the rare alleles is protective: power gain is 9.1% when all of the rare alleles at disease-contributing loci increase disease risk, and the power gain increases to 20.7% when only 5% of the rare alleles increase disease risk. Our proposed tests increase power through two different mechanisms: by using haplotypes to better capture information for rare variants (mostly untyped in GWAS) and by using external sequencing data to impute rare variants. Let us consider the first mechanism by examining tests on GWAS data alone, namely WHG, HG, WSall, WSrare, and RVC. At GWAS level, haplotype-based methods clearly manifest their advantages. Among the five methods, the two haplotype-based methods (WHG and HG) rank as the best two across the five scena" @default.
- W1964998512 created "2016-06-24" @default.
- W1964998512 creator A5001685590 @default.
- W1964998512 creator A5025429944 @default.
- W1964998512 creator A5032020402 @default.
- W1964998512 date "2010-11-01" @default.
- W1964998512 modified "2023-10-16" @default.
- W1964998512 title "To Identify Associations with Rare Variants, Just WHaIT: Weighted Haplotype and Imputation-Based Tests" @default.
- W1964998512 cites W1560200430 @default.
- W1964998512 cites W1980991473 @default.
- W1964998512 cites W1981012086 @default.
- W1964998512 cites W1987620105 @default.
- W1964998512 cites W1990102353 @default.
- W1964998512 cites W2015849111 @default.
- W1964998512 cites W2020040634 @default.
- W1964998512 cites W2024108088 @default.
- W1964998512 cites W2027266229 @default.
- W1964998512 cites W2027688494 @default.
- W1964998512 cites W2029387902 @default.
- W1964998512 cites W2031082289 @default.
- W1964998512 cites W2046247026 @default.
- W1964998512 cites W2053725906 @default.
- W1964998512 cites W2058148991 @default.
- W1964998512 cites W2061680337 @default.
- W1964998512 cites W2070082005 @default.
- W1964998512 cites W2070984858 @default.
- W1964998512 cites W2078663610 @default.
- W1964998512 cites W2082122713 @default.
- W1964998512 cites W2087036932 @default.
- W1964998512 cites W2091143313 @default.
- W1964998512 cites W2115837368 @default.
- W1964998512 cites W2119067354 @default.
- W1964998512 cites W2119279196 @default.
- W1964998512 cites W2129559300 @default.
- W1964998512 cites W2131889332 @default.
- W1964998512 cites W2153899794 @default.
- W1964998512 cites W2157929834 @default.
- W1964998512 cites W2161644980 @default.
- W1964998512 cites W2171777347 @default.
- W1964998512 cites W2217809488 @default.
- W1964998512 cites W4240204556 @default.
- W1964998512 cites W4252684946 @default.
- W1964998512 doi "https://doi.org/10.1016/j.ajhg.2010.10.014" @default.
- W1964998512 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/3014366" @default.
- W1964998512 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/21055717" @default.
- W1964998512 hasPublicationYear "2010" @default.
- W1964998512 type Work @default.
- W1964998512 sameAs 1964998512 @default.
- W1964998512 citedByCount "86" @default.
- W1964998512 countsByYear W19649985122012 @default.
- W1964998512 countsByYear W19649985122013 @default.
- W1964998512 countsByYear W19649985122014 @default.
- W1964998512 countsByYear W19649985122015 @default.
- W1964998512 countsByYear W19649985122016 @default.
- W1964998512 countsByYear W19649985122017 @default.
- W1964998512 countsByYear W19649985122018 @default.
- W1964998512 countsByYear W19649985122019 @default.
- W1964998512 countsByYear W19649985122020 @default.
- W1964998512 countsByYear W19649985122021 @default.
- W1964998512 countsByYear W19649985122023 @default.
- W1964998512 crossrefType "journal-article" @default.
- W1964998512 hasAuthorship W1964998512A5001685590 @default.
- W1964998512 hasAuthorship W1964998512A5025429944 @default.
- W1964998512 hasAuthorship W1964998512A5032020402 @default.
- W1964998512 hasBestOaLocation W19649985121 @default.
- W1964998512 hasConcept C104317684 @default.
- W1964998512 hasConcept C105795698 @default.
- W1964998512 hasConcept C180754005 @default.
- W1964998512 hasConcept C197754878 @default.
- W1964998512 hasConcept C33923547 @default.
- W1964998512 hasConcept C54355233 @default.
- W1964998512 hasConcept C58041806 @default.
- W1964998512 hasConcept C70721500 @default.
- W1964998512 hasConcept C86803240 @default.
- W1964998512 hasConcept C9357733 @default.
- W1964998512 hasConceptScore W1964998512C104317684 @default.
- W1964998512 hasConceptScore W1964998512C105795698 @default.
- W1964998512 hasConceptScore W1964998512C180754005 @default.
- W1964998512 hasConceptScore W1964998512C197754878 @default.
- W1964998512 hasConceptScore W1964998512C33923547 @default.
- W1964998512 hasConceptScore W1964998512C54355233 @default.
- W1964998512 hasConceptScore W1964998512C58041806 @default.
- W1964998512 hasConceptScore W1964998512C70721500 @default.
- W1964998512 hasConceptScore W1964998512C86803240 @default.
- W1964998512 hasConceptScore W1964998512C9357733 @default.
- W1964998512 hasIssue "5" @default.
- W1964998512 hasLocation W19649985121 @default.
- W1964998512 hasLocation W19649985122 @default.
- W1964998512 hasLocation W19649985123 @default.
- W1964998512 hasLocation W19649985124 @default.
- W1964998512 hasLocation W19649985125 @default.
- W1964998512 hasOpenAccess W1964998512 @default.
- W1964998512 hasPrimaryLocation W19649985121 @default.
- W1964998512 hasRelatedWork W1965396974 @default.
- W1964998512 hasRelatedWork W1981169569 @default.
- W1964998512 hasRelatedWork W1983642025 @default.
- W1964998512 hasRelatedWork W1990804418 @default.
- W1964998512 hasRelatedWork W1998686087 @default.