Matches in SemOpenAlex for { <https://semopenalex.org/work/W2976869024> ?p ?o ?g. }
- W2976869024 endingPage "772" @default.
- W2976869024 startingPage "763" @default.
- W2976869024 abstract "Large-scale multi-ethnic cohorts offer unprecedented opportunities to elucidate the genetic factors influencing complex traits related to health and disease among minority populations. At the same time, the genetic diversity in these cohorts presents new challenges for analysis and interpretation. We consider the utility of race and/or ethnicity categories in genome-wide association studies (GWASs) of multi-ethnic cohorts. We demonstrate that race/ethnicity information enhances the ability to understand population-specific genetic architecture. To address the practical issue that self-identified racial/ethnic information may be incomplete, we propose a machine learning algorithm that produces a surrogate variable, termed HARE. We use height as a model trait to demonstrate the utility of HARE and ethnicity-specific GWASs. Large-scale multi-ethnic cohorts offer unprecedented opportunities to elucidate the genetic factors influencing complex traits related to health and disease among minority populations. At the same time, the genetic diversity in these cohorts presents new challenges for analysis and interpretation. We consider the utility of race and/or ethnicity categories in genome-wide association studies (GWASs) of multi-ethnic cohorts. We demonstrate that race/ethnicity information enhances the ability to understand population-specific genetic architecture. To address the practical issue that self-identified racial/ethnic information may be incomplete, we propose a machine learning algorithm that produces a surrogate variable, termed HARE. We use height as a model trait to demonstrate the utility of HARE and ethnicity-specific GWASs. Genome-wide association studies (GWASs) have become a powerful approach for exploring the genetic basis of complex phenotypes. While earlier studies focused on populations of predominantly European descent, recent efforts have aimed to substantially expand racial and ethnic diversity. The Million Veterans Program1Gaziano J.M. Concato J. Brophy M. Fiore L. Pyarajan S. Breeling J. Whitbourne S. Deen J. Shannon C. Humphries D. et al.Million Veteran Program: A mega-biobank to study genetic influences on health and disease.J. Clin. Epidemiol. 2016; 70: 214-223Abstract Full Text Full Text PDF PubMed Scopus (403) Google Scholar (MVP) represents a multi-ethnic cohort, which has enrolled more than 750,000 veteran volunteers, completed genotyping in more than 350,000 participants to date, and includes a wealth of phenotypes and health outcomes. Questions have arisen while performing GWASs in a multi-ethnic cohort regarding the definition and the use of an individual’s ancestry. Dense genotype data have enabled accurate estimation of individual ancestry,2Falush D. Stephens M. Pritchard J.K. Inference of population structure using multilocus genotype data: dominant markers and null alleles.Mol. Ecol. Notes. 2007; 7: 574-578Crossref PubMed Scopus (2429) Google Scholar, 3Tang H. Peng J. Wang P. Risch N.J. Estimation of individual admixture: analytical and study design considerations.Genet. Epidemiol. 2005; 28: 289-301Crossref PubMed Scopus (456) Google Scholar, 4Price A.L. Patterson N.J. Plenge R.M. Weinblatt M.E. Shadick N.A. Reich D. Principal components analysis corrects for stratification in genome-wide association studies.Nat. Genet. 2006; 38: 904-909Crossref PubMed Scopus (6831) Google Scholar which has been shaped by reproductive isolation and admixture through human history. At the same time, many studies also obtain racial/ethnic information on participants through questionnaires or electronic health records (EHR). In this paper, we will refer to this latter information as self-identified race/ethnicity (SIRE) to distinguish from genetically inferred ancestry (GIA). A primary goal of multi-ethnic GWASs is to characterize ethnicity-specific trait loci or heterogeneous genetic effect across populations. An example of ethnicity-specific locus is CD36 (MIM: 173510) for high-density lipid cholesterol (HDL), for which the putative causal variant (rs2366858) is only polymorphic in populations of African descent.5Coram M.A. Duan Q. Hoffmann T.J. Thornton T. Knowles J.W. Johnson N.A. Ochs-Balcom H.M. Donlon T.A. Martin L.W. Eaton C.B. et al.Genome-wide characterization of shared and distinct genetic components that influence blood lipid levels in ethnically diverse human populations.Am. J. Hum. Genet. 2013; 92: 904-916Abstract Full Text Full Text PDF PubMed Scopus (89) Google Scholar A well-known example of heterogeneous genetic effect is the APOE (MIM: 107741) e4 allele, which is polymorphic in many populations but confers greater risk of Alzheimer disease in Asians compared to other populations.6Farrer L.A. Cupples L.A. Haines J.L. Hyman B. Kukull W.A. Mayeux R. Myers R.H. Pericak-Vance M.A. Risch N. van Duijn C.M. APOE and Alzheimer Disease Meta Analysis ConsortiumEffects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer disease. A meta-analysis.JAMA. 1997; 278: 1349-1356Crossref PubMed Google Scholar, 7Liu M. Bian C. Zhang J. Wen F. Apolipoprotein E gene polymorphism and Alzheimer’s disease in Chinese population: a meta-analysis.Sci. Rep. 2014; 4: 4383Crossref PubMed Scopus (49) Google Scholar The mechanisms underlying such heterogeneity are not well understood and may include unaccounted-for causal variants nearby or interaction with environmental or genetic factors that vary across populations. With the goal of effectively characterizing ethnicity-specific trait loci and interpreting heterogeneous genetic effects, we investigate the analytic issues related to ancestry, race, and ethnicity in multi-ethnic GWASs. To date, most GWASs stratify on SIRE and adjust GIA within SIRE as covariates. The stratification by SIRE often implicitly occurs at the recruitment or genotyping stages, which focus on populations described by a single SIRE, such as Hispanics, Europeans/European Americans, African Americans/Afro-Caribbean, or East Asians, among others. Within each race/ethnicity, GIA is adjusted as covariates to account for genetic structure within a SIRE.4Price A.L. Patterson N.J. Plenge R.M. Weinblatt M.E. Shadick N.A. Reich D. Principal components analysis corrects for stratification in genome-wide association studies.Nat. Genet. 2006; 38: 904-909Crossref PubMed Scopus (6831) Google Scholar, 8Kang H.M. Sul J.H. Service S.K. Zaitlen N.A. Kong S.Y. Freimer N.B. Sabatti C. Eskin E. Variance component model to account for sample structure in genome-wide association studies.Nat. Genet. 2010; 42: 348-354Crossref PubMed Scopus (1619) Google Scholar Results from these ethnicity-specific studies are combined through meta-analysis within an ethnicity or through trans-ethnic analysis across ethnicities.9Morris A.P. Transethnic meta-analysis of genomewide association studies.Genet. Epidemiol. 2011; 35: 809-822Crossref PubMed Scopus (219) Google Scholar, 10van Rooij F.J.A. Qayyum R. Smith A.V. Zhou Y. Trompet S. Tanaka T. Keller M.F. Chang L.C. Schmidt H. Yang M.L. et al.BioBank Japan ProjectGenome-wide Trans-ethnic Meta-analysis Identifies Seven Genetic Loci Influencing Erythrocyte Traits and a Role for RBPMS in Erythropoiesis.Am. J. Hum. Genet. 2017; 100: 51-63Abstract Full Text Full Text PDF PubMed Scopus (30) Google Scholar In contrast, in recent Biobank-based multi-ethnic cohort studies, participants are recruited, phenotyped, and genotyped according to a uniform protocol. For such studies, two analytic strategies can be considered. One approach, which we will refer to as mega-analysis, performs association mapping on the entire cohort, adjusting for population structure in the entire cohort using GIA. While simple to implement, results from such an analysis are difficult to interpret: a significant trait locus may be relevant in one racial/ethnic group, a few groups, or all groups. When the representation of ethnicities is unbalanced, the association results are likely driven by the group with the largest sample size. Furthermore, we show through simulation and analyses of real data that, compared to stratified analysis, mega-analysis often loses statistical power when the causal variant is minority specific or its allelic effect varies between populations. The alternative approach performs stratified analyses for each racial/ethnic group. In addition to the interpretability of association findings, this approach enables meaningful comparison between studies and meta-analysis across studies. However, the question remains how strata should be defined in a multi-ethnic cohort, in which participants are enrolled without restrictions based on race or ethnicity. We reason that SIRE and GIA have complementary strengths. In epidemiologic studies, there is a long history of stratifying on SIRE. This is because SIRE acts as a surrogate to an array of social, cultural, behavioral, and environmental variables, many of which are correlated with trait variation or disease risk.11Burchard E.G. Ziv E. Coyle N. Gomez S.L. Tang H. Karter A.J. Mountain J.L. Pérez-Stable E.J. Sheppard D. Risch N. The importance of race and ethnic background in biomedical research and clinical practice.N. Engl. J. Med. 2003; 348: 1170-1175Crossref PubMed Scopus (862) Google Scholar, 12Risch N. Burchard E. Ziv E. Tang H. Categorization of humans in biomedical research: genes, race and disease.Genome Biol. 2002; 3: t2007Crossref Google Scholar, 13Conomos M.P. Laurie C.A. Stilp A.M. Gogarten S.M. McHugh C.P. Nelson S.C. Sofer T. Fernández-Rhodes L. Justice A.E. Graff M. et al.Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos.Am. J. Hum. Genet. 2016; 98: 165-184Abstract Full Text Full Text PDF PubMed Scopus (179) Google Scholar Hence, stratifying on SIRE has the potential benefits of reducing heterogeneity of these non-genetic variables and decoupling the correlation between genetic and non-genetic factors. For genetic association studies, the SIRE categories recapitulate the continental-level genetic ancestry structure;14Rosenberg N.A. Pritchard J.K. Weber J.L. Cann H.M. Kidd K.K. Zhivotovsky L.A. Feldman M.W. Genetic structure of human populations.Science. 2002; 298: 2381-2385Crossref PubMed Scopus (2082) Google Scholar, 15Tang H. Quertermous T. Rodriguez B. Kardia S.L. Zhu X. Brown A. Pankow J.S. Province M.A. Hunt S.C. Boerwinkle E. et al.Genetic structure, self-identified race/ethnicity, and confounding in case-control association studies.Am. J. Hum. Genet. 2005; 76: 268-275Abstract Full Text Full Text PDF PubMed Scopus (428) Google Scholar, 16Li J.Z. Absher D.M. Tang H. Southwick A.M. Casto A.M. Ramachandran S. Cann H.M. Barsh G.S. Feldman M. Cavalli-Sforza L.L. Myers R.M. Worldwide human relationships inferred from genome-wide patterns of variation.Science. 2008; 319: 1100-1104Crossref PubMed Scopus (1431) Google Scholar therefore, population-specific trait variants are likely to be enriched in one or a few SIRE groups. However, SIRE can be incomplete and of varying accuracy depending on the source. In MVP, SIRE is derived from direct responses to survey questionnaires and from text mining of the Department of Veterans Affairs EHR. This leaves 3.67% of the participants without any SIRE information; additionally, inconsistency occurs when consolidating multiple sources. The missing and imperfect SIRE is expected in most multi-ethnic EHR-based biobank cohorts. In contrast, GIA—in the form of principal components or admixture proportions—can be estimated for every GWAS participant. Previous population genetic studies have demonstrated that GIA and self-identified racial/ethnic information have a high correlation, but one does not unambiguously determine the other. Specifically, in admixed groups such as African Americans and Hispanics, genetic ancestries vary continuously among individuals along axes that represent admixture proportions; defining strata based on GIA requires thresholds that are often ad hoc.17Klarin D. Damrauer S.M. Cho K. Sun Y.V. Teslovich T.M. Honerlaw J. Gagnon D.R. DuVall S.L. Li J. Peloso G.M. et al.Global Lipids Genetics ConsortiumMyocardial Infarction Genetics (MIGen) ConsortiumGeisinger-Regeneron DiscovEHR CollaborationVA Million Veteran ProgramGenetics of blood lipids among ∼300,000 multi-ethnic participants of the Million Veteran Program.Nat. Genet. 2018; 50: 1514-1523Crossref PubMed Scopus (288) Google Scholar Conversely, the distribution of ancestry proportions may partially overlap between different racial/ethnic groups and cannot be separated based on GIA alone.18Banda Y. Kvale M.N. Hoffmann T.J. Hesselson S.E. Ranatunga D. Tang H. Sabatti C. Croen L.A. Dispensa B.P. Henderson M. et al.Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort.Genetics. 2015; 200: 1285-1295Crossref PubMed Scopus (191) Google Scholar, 19Bryc K. Durand E.Y. Macpherson J.M. Reich D. Mountain J.L. The genetic ancestry of African Americans, Latinos, and European Americans across the United States.Am. J. Hum. Genet. 2015; 96: 37-53Abstract Full Text Full Text PDF PubMed Scopus (369) Google Scholar Motivated by these practical challenges, we propose a supervised learning algorithm that defines a categorical stratification variable in a multi-ethnic GWAS. The variable, termed HARE (harmonized ancestry and race/ethnicity), uses GIA to refine SIRE for genetic association studies in three ways: identify individuals whose SIRE is likely inaccurate, reconcile conflicts among multiple SIRE sources, and impute missing racial/ethnic information when the predictive confidence is high. We describe the relationship between HARE, racial/ethnic, and genetic ancestry in MVP, a representative US-based multi-ethnic cohort. Using HARE as the stratifying variable, we investigate the effectiveness of detecting ethnicity-specific trait loci through simulation as well as analysis of height as a model trait in the MVP. The goal of HARE is to define strata for ethnicity-specific GWAS analyses. The computation of HARE consisted of two components: first, in the “training” step, a support vector machine (SVM) model was built, which learned the correspondence between GIA and SIRE; second, in the “assignment” step, HARE was determined based on SIRE, GIA, and the output from the SVM. The assignment follows the decision tree of Figure 1. The SVM used GIA, the top 30 PCs in our analysis, as predictors and SIRE as response. Because SIRE is a multi-class categorical variable, we first trained a one-versus-one classifier with a radial basis function kernel for every pair of categories. These binary classifiers were then combined using a pairwise coupling model to produce a multi-class probability vector for each individual.20Wu T.-F. Lin C.-J. Weng R.C. Probability Estimates for Multi-class Classification by Pairwise Coupling.J. Mach. Learn. Res. 2004; 5: 975-1005Google Scholar The individual classifiers had two tuning parameters: the inverse variance of the kernel, γ, controls the radius of influence exerted by a single training sample, while the regularization constant, C, encourages sparse models. These parameters were optimized by searching a two-dimensional grid and using a 5-fold cross-validation. More details are given in the caption of Figure S1. In the MVP analysis, SIRE took four values: Hispanic, non-Hispanic Asian, non-Hispanic black, and non-Hispanic white, as described below. Given an individual’s genetic PCs, the multi-class SVM outputs a probability vector, (P1,…,PK)(∑l=1KPl=1), representing the predicted membership probability for each of the K distinct categories. Let L1 denote the stratum corresponding to the highest predicted probability, L2 be the stratum corresponding to the second highest predicted probability, and so on, such that PL1≥PL2≥…≥PLK. For individuals whose SIRE is non-missing and consistent across records, let PSIRE denote the predicted probability corresponding to SIRE. For each individual, HARE is assigned according to the decision tree in Figure 1, or equivalently, as:HARE={SIRE,ifSIREisnon−missing,andPL1PSIRE≤t1;L1,ifSIREismissing,andPL1PL2>t2;Missing,otherwise.Note that when SIRE is non-missing and strongly contradicts GIA, we set HARE as missing rather than re-assigning the individual according to the predicted stratum L1. HARE may be unassigned (missing) for some individuals. We set t1 = 40 and t2 = 20; lower t1 and higher t2 will result in more individuals with missing HARE, through removing more outliers and assigning fewer individuals, respectively. All results presented in this paper used the SVM trained on top 30 PCs. Comparing the assignment using 30 PCs versus 20 PCs revealed a discordance rate of 1.3%. This level of consistency is not surprising, because as higher PCs tend to describe finer-level populations structure, they are less informative for the four major HARE groups. On the other hand, if a PC were entirely uninformative, it will be ignored during the SVM training. Naturally, including many unnecessary PCs will increase computation burden. Therefore, we recommend using an upper limit of the PCs that are relevant; specifically, for major race/ethnicity strata, 30 PCs suffice. The two thresholds, t1 and t2, control the stringencies with which the outliers are removed and individuals without SIRE are assigned a HARE. Varying these parameter values in a wide range from 0 to 100, we found that the HARE assignment was quite stable (Figure S2). This analysis also provides practical guidance in choosing the thresholds. In our study, we chose the values of 40 and 20, respectively, based on the visual inspection that the slope of the curves is fairly shallow at these values. The MVP, launched in 2011 by the Department of Veteran Affairs Office of Research and Development, was a nation-wide research program aiming to acquire new biological insights and to elucidate the genetic basis of diseases, with the ultimate goal of further refining precision medicine to Veteran Affairs health care.1Gaziano J.M. Concato J. Brophy M. Fiore L. Pyarajan S. Breeling J. Whitbourne S. Deen J. Shannon C. Humphries D. et al.Million Veteran Program: A mega-biobank to study genetic influences on health and disease.J. Clin. Epidemiol. 2016; 70: 214-223Abstract Full Text Full Text PDF PubMed Scopus (403) Google Scholar MVP participants consented to a blood draw and to have their DNA extracted for genomic profiling and linked to their full electronic health record within the VA. Both MVP Biobank and this analysis were approved by the VA institutional review boards. Unless otherwise noted, analyses presented in this paper included 351,820 MVP participants, who were genotyped using a customized Affymetrix Axiom Biobank array of 723,305 variants. For GIA, the top 30 PCs were computed using program FlashPCA21Abraham G. Qiu Y. Inouye M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets.Bioinformatics. 2017; 33: 2776-2778Crossref PubMed Scopus (125) Google Scholar on an extended genotype dataset that included all MVP participants and an additional 2,504 individuals from the 1000 Genomes Phase 3 data.22Sudmant P.H. Rausch T. Gardner E.J. Handsaker R.E. Abyzov A. Huddleston J. Zhang Y. Ye K. Jun G. Fritz M.H. et al.1000 Genomes Project ConsortiumAn integrated map of structural variation in 2,504 human genomes.Nature. 2015; 526: 75-81Crossref PubMed Scopus (1241) Google Scholar To aid interpretation, we also estimated individual ancestry proportions using the program ADMIXTURE23Alexander D.H. Novembre J. Lange K. Fast model-based estimation of ancestry in unrelated individuals.Genome Res. 2009; 19: 1655-1664Crossref PubMed Scopus (4040) Google Scholar with K = 5 and augmented the MVP participants with individuals from 1000 Genomes Phase 3 data that approximated European (GBR), African (YRI/LWK), East Asian (CHB), South Asian (GIH/PJL), and Native American (PEL) ancestral populations. We note that this admixture analysis was designed to qualitatively complement the PCA analysis: as the 1000 Genomes individuals included in this analysis did not fully represent ancestry diversity in MVP, various model assumptions in ADMIXTURE were violated; therefore, we caution quantitative interpretation of the estimated admixture proportions. SIRE in MVP was derived based on information collected from the VA Corporate Data Warehouse (CDW) and the MVP Baseline Survey (MVP-BS). Overall, ∼60% of participants had consistent SIRE, while the remaining participants either had no SIRE or had multiple and inconsistent responses among two or more SIRE determinations in CDW and the MVP-BS. Because our goal was to define ethnicity-specific strata for subsequent GWAS analyses, we focused on defining four groups—Hispanics, non-Hispanic Asian, non-Hispanic black, and non-Hispanic white—which have moderately large sample sizes for adequately powered genetic association analysis. For this reason, we set the SIRE of individuals whose responses were not in one of these four categories as “missing,” which included American Indian, Alaska Native, Native Hawaiian, Other Pacific Islanders, and multi-race/ethnicity responses. To train the SVM model described above, we constructed a training dataset that included 201,931 individuals whose SIRE was unambiguous and was one of the four groups. The top 30 PCs were used as predictors. To reduce the influence of a few outliers on the SVM model, we repeated the SVM training step once after removing 1,547 individuals, for whom the predicted most likely group is not the same as SIRE. Thus, the final SVM model used to compute the predicted probability vectors was based on 200,384 individuals. The assignment of HARE followed the decision tree in Figure 1. Because our training dataset did not include American Indian, Alaska Native, Native Hawaiians, and Pacific Islanders, the HARE for individuals reporting SIRE entirely from one of these populations were set to missing. Altogether, 6,257 individuals had missing HARE. As an assessment of its statistical accuracy, we applied the SVM trained on the 201,931 individuals, described above, to a non-overlapping set of 27,974 MVP participants, for whom SIRE was available and genotyping was completed on the same Affymetrix array at a later date. PCs of these individuals were calculated by projecting onto the axes determined based on the main cohort.21Abraham G. Qiu Y. Inouye M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets.Bioinformatics. 2017; 33: 2776-2778Crossref PubMed Scopus (125) Google Scholar Thus, these individuals were not used in any step during the training of the SVM model. We then assigned strata assuming either their SIRE is known or unknown, and we compared these assignments with the actual SIRE. We performed simulation studies to characterize the statistical power for detecting minority-specific trait variants using HARE-stratified analysis as compared to that of a mega-analysis approach. In brief, we first selected a minority-specific causal variant as described below. A quantitative phenotype was then simulated using program GCTA24Yang J. Lee S.H. Goddard M.E. Visscher P.M. GCTA: a tool for genome-wide complex trait analysis.Am. J. Hum. Genet. 2011; 88: 76-82Abstract Full Text Full Text PDF PubMed Scopus (3814) Google Scholar and the MVP genotype data, according to the genotype at the causal variant and assuming that it explains a specific proportion of the phenotypic variance, h2. The causal variant was then removed from the dataset, and SNPs within a ± 100K base pair (bp) window of the causal variant were scanned for association using PLINK.25Purcell S. Neale B. Todd-Brown K. Thomas L. Ferreira M.A. Bender D. Maller J. Sklar P. de Bakker P.I. Daly M.J. Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses.Am. J. Hum. Genet. 2007; 81: 559-575Abstract Full Text Full Text PDF PubMed Scopus (19634) Google Scholar Thus, the genotype data used for the association analysis represented realistic LD patterns both within ethnicity and between ethnicities. For each causal variant, a total of 100 phenotypes were simulated for each specific h2, and power was defined as the proportion of simulations, in which at least one tag SNP near the causal variant was associated with the phenotype at p < 5 × 10−8. This process was repeated for ten different values of h2 ranging from 0.0001 to 0.01. To eliminate population stratification, in HARE-stratified analysis, the top 10 PCs calculated within a HARE stratum were adjusted as covariates. For the mega-analysis, the top 20 PCs computed on the entire cohort were adjusted as covariates. To select causal variants, we considered rare and common causal variants separately because the LD pattern around these causal variants are likely to differ. For rare causal variants, we randomly selected 125 unlinked SNPs such that the minor allele frequency (MAF) was less than 1% in one HARE minority strata while absent in all other HARE strata; these included 105 variants that were polymorphic only in non-Hispanic black and 20 that were polymorphic only in Hispanics. Requiring a causal variant to have an MAF > 10% in one minority population while monomorphic in all other strata yielded very few SNPs. Therefore, we relaxed the population-specific criterion and instead looked for relatively common variants that preferentially occur in one stratum. Specifically, we selected (1) 103 variants with MAF > 0.1 in non-Hispanic black, MAF ≤ 5 × 10−4 in non-Hispanic white and MAF ≤ 1 × 10−2 in Hispanics and (2) 3 variants with MAF > 0.1 in Hispanics, MAF ≤ 5 × 10−4 in non-Hispanic white, and MAF ≤ 2 × 10−3 in non-Hispanic black. Of 351,820 MVP participants, 342,883 had height measurements after excluding extreme outliers (height < 48 or > 99 inches) and amputees. We then took the average of measurements that were made within 3 years from an individual’s enrollment date, excluding measures more than 3 inches from the individual’s average height. A multi-ethnic GWAS using both stratified and mega-analysis were performed within each HARE stratum and in the entire cohort, respectively, using the same strategy to control for population stratification as described in the Simulation section above. We also performed fixed-effects, inverse-variance weighted meta-analysis combining four HARE groups using PLINK.25Purcell S. Neale B. Todd-Brown K. Thomas L. Ferreira M.A. Bender D. Maller J. Sklar P. de Bakker P.I. Daly M.J. Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses.Am. J. Hum. Genet. 2007; 81: 559-575Abstract Full Text Full Text PDF PubMed Scopus (19634) Google Scholar Significant SNPs within 1 Mb were considered as the same locus. For validation, we compared GWAS results in MVP with UKB GWAS26Bycroft C. Freeman C. Petkova D. Band G. Elliott L.T. Sharp K. Motyer A. Vukcevic D. Delaneau O. O’Connell J. et al.The UK Biobank resource with deep phenotyping and genomic data.Nature. 2018; 562: 203-209Crossref PubMed Scopus (2281) Google Scholar for height in 452K individuals of European ancestry, and to WHI, which included 8,149 African American women.27Carty C.L. Johnson N.A. Hutter C.M. Reiner A.P. Peters U. Tang H. Kooperberg C. Genome-wide association study of body height in African Americans: the Women’s Health Initiative SNP Health Association Resource (SHARe).Hum. Mol. Genet. 2012; 21: 711-720Crossref PubMed Scopus (63) Google Scholar Of 351,820 individuals, all but 6,257 (1.78%) were assigned to one of the four non-overlapping HARE groups: Hispanics, non-Hispanic white, non-Hispanic black, and non-Hispanic Asian (Table 1). Figures 2 and 3 compare GIA and HARE; the interpretation of the genetic PCs is assisted using a model-based admixture analysis, which included Europeans, Africans, Native Americans, East Asians, and South Asians as the ancestral individuals (see Material and Methods). The first two PCs, computed using the genotypes of the entire MVP cohort, represented the variation of African (PC1) and Native American/Asian (PC2) ancestry. As expected, the ancestries of individuals in the non-Hispanic black group varied along PC1 that described the difference among European ancestry and African ancestry (Figures 2B, 3B, and S3B). Likewise, Hispanic individuals showed varying proportions of European, African, and Native American ancestry (Figures 2C, 3C, and S3C). The non-Hispanic Asian group consisted of two components, corresponding to the East and South Asian populations, respectively, according to the admixture analysis (Figures 2D, 3D, and S3D). Interestingly, European admixture (greater than 20%) were inferred in 12% (n = 364) of the individuals in the HARE non-Hispanic Asian group. Among this group, 46% (n = 166) individuals had “Asian” as the only SIRE information; an additional 25% (n = 91) indicated both Asian and European ancestries. This likely reflected recent admixture between Asian Americans and European Americans. Although it would have been feasible to train the support vector machine to learn East Asian and South Asian as two separate HARE categories, we chose to group them into one stratum because the statistical power of subsequent genetic association analysis would likely be low in this group due to relatively small sample size (n = 3,054).Table 1Comparison between HARE and SIRE among 351,820 MVP ParticipantsHARESIRENon-Hispanic WhiteNon-Hispanic BlackHispanicsNon-Hispanic AsianMissingTotalNon-Hispanic White163,26700085,240248,507Non-Hispanic Black025,8300042,32568,155Hispanics0010,306015,54125,847Non-Hispanic Asian0001,4491,6053,054Missing400110546235,1786,257Total163,66725,94010,8521,472149,889351,820 Open table in a new tab Figure 3The First Two P" @default.
- W2976869024 created "2019-10-03" @default.
- W2976869024 creator A5000171296 @default.
- W2976869024 creator A5000776451 @default.
- W2976869024 creator A5001817812 @default.
- W2976869024 creator A5002083105 @default.
- W2976869024 creator A5004869170 @default.
- W2976869024 creator A5005406697 @default.
- W2976869024 creator A5007316657 @default.
- W2976869024 creator A5007930206 @default.
- W2976869024 creator A5008371415 @default.
- W2976869024 creator A5008652100 @default.
- W2976869024 creator A5009849057 @default.
- W2976869024 creator A5012460833 @default.
- W2976869024 creator A5012561743 @default.
- W2976869024 creator A5013325742 @default.
- W2976869024 creator A5016029663 @default.
- W2976869024 creator A5017210235 @default.
- W2976869024 creator A5018192048 @default.
- W2976869024 creator A5018592922 @default.
- W2976869024 creator A5018888370 @default.
- W2976869024 creator A5019951862 @default.
- W2976869024 creator A5020635790 @default.
- W2976869024 creator A5022753411 @default.
- W2976869024 creator A5024385881 @default.
- W2976869024 creator A5025945146 @default.
- W2976869024 creator A5027614153 @default.
- W2976869024 creator A5029336572 @default.
- W2976869024 creator A5030005932 @default.
- W2976869024 creator A5031053113 @default.
- W2976869024 creator A5031196288 @default.
- W2976869024 creator A5031243886 @default.
- W2976869024 creator A5033741175 @default.
- W2976869024 creator A5034976085 @default.
- W2976869024 creator A5036055219 @default.
- W2976869024 creator A5037130521 @default.
- W2976869024 creator A5037435182 @default.
- W2976869024 creator A5037581593 @default.
- W2976869024 creator A5039218212 @default.
- W2976869024 creator A5039284931 @default.
- W2976869024 creator A5040303985 @default.
- W2976869024 creator A5043407581 @default.
- W2976869024 creator A5044845889 @default.
- W2976869024 creator A5045163489 @default.
- W2976869024 creator A5047754354 @default.
- W2976869024 creator A5048214696 @default.
- W2976869024 creator A5048579153 @default.
- W2976869024 creator A5050647477 @default.
- W2976869024 creator A5050838527 @default.
- W2976869024 creator A5052350137 @default.
- W2976869024 creator A5052574584 @default.
- W2976869024 creator A5053040763 @default.
- W2976869024 creator A5053149684 @default.
- W2976869024 creator A5053476437 @default.
- W2976869024 creator A5054351828 @default.
- W2976869024 creator A5054862199 @default.
- W2976869024 creator A5055532732 @default.
- W2976869024 creator A5055937650 @default.
- W2976869024 creator A5057380214 @default.
- W2976869024 creator A5057965099 @default.
- W2976869024 creator A5057987348 @default.
- W2976869024 creator A5058198453 @default.
- W2976869024 creator A5058352184 @default.
- W2976869024 creator A5058776349 @default.
- W2976869024 creator A5060224824 @default.
- W2976869024 creator A5061226924 @default.
- W2976869024 creator A5064365204 @default.
- W2976869024 creator A5064619090 @default.
- W2976869024 creator A5067607372 @default.
- W2976869024 creator A5068163407 @default.
- W2976869024 creator A5068376158 @default.
- W2976869024 creator A5070906689 @default.
- W2976869024 creator A5070937807 @default.
- W2976869024 creator A5072777068 @default.
- W2976869024 creator A5073138616 @default.
- W2976869024 creator A5074828414 @default.
- W2976869024 creator A5075376681 @default.
- W2976869024 creator A5076868271 @default.
- W2976869024 creator A5077310935 @default.
- W2976869024 creator A5077980742 @default.
- W2976869024 creator A5078314054 @default.
- W2976869024 creator A5078664314 @default.
- W2976869024 creator A5079444927 @default.
- W2976869024 creator A5081943294 @default.
- W2976869024 creator A5082672076 @default.
- W2976869024 creator A5082687541 @default.
- W2976869024 creator A5082890040 @default.
- W2976869024 creator A5084614356 @default.
- W2976869024 creator A5084970543 @default.
- W2976869024 creator A5085002382 @default.
- W2976869024 creator A5086293553 @default.
- W2976869024 creator A5086562929 @default.
- W2976869024 creator A5087830284 @default.
- W2976869024 creator A5089147353 @default.
- W2976869024 creator A5091051814 @default.
- W2976869024 creator A5091896275 @default.
- W2976869024 date "2019-10-01" @default.
- W2976869024 modified "2023-10-17" @default.