Matches in SemOpenAlex for { <https://semopenalex.org/work/W2142243300> ?p ?o ?g. }
- W2142243300 endingPage "1231" @default.
- W2142243300 startingPage "1221" @default.
- W2142243300 abstract "The common-variant/common-disease model predicts that most risk alleles underlying complex health-related traits are common and, therefore, old and found in multiple populations, rather than being rare or population specific. Accordingly, there is widespread interest in assessing the population structure of common alleles. However, such assessments have been confounded by analysis of data sets with bias toward ascertainment of common alleles (e.g., HapMap and Perlegen) or in which a relatively small number of genes and/or populations were sampled. The aim of this study was to examine the structure of common variation ascertained in major U.S. populations, by resequencing the exons and flanking regions of 3,873 genes in 154 chromosomes from European, Latino/Hispanic, Asian, and African Americans generated by the Genaissance Resequencing Project. The frequency distributions of private and common single-nucleotide polymorphisms (SNPs) were measured, and the extent to which common SNPs were shared across populations was analyzed using several different estimators of population structure. Most SNPs that were common in one population were present in multiple populations, but SNPs common in one population were frequently not common in other populations. Moreover, SNPs that were common in two or more populations often differed significantly in frequency from one population to another, particularly in comparisons of African Americans versus other U.S. populations. These findings indicate that, even if the bulk of alleles underlying complex health-related traits are common SNPs, geographic ancestry might well be an important predictor of whether a person carries a risk allele. The common-variant/common-disease model predicts that most risk alleles underlying complex health-related traits are common and, therefore, old and found in multiple populations, rather than being rare or population specific. Accordingly, there is widespread interest in assessing the population structure of common alleles. However, such assessments have been confounded by analysis of data sets with bias toward ascertainment of common alleles (e.g., HapMap and Perlegen) or in which a relatively small number of genes and/or populations were sampled. The aim of this study was to examine the structure of common variation ascertained in major U.S. populations, by resequencing the exons and flanking regions of 3,873 genes in 154 chromosomes from European, Latino/Hispanic, Asian, and African Americans generated by the Genaissance Resequencing Project. The frequency distributions of private and common single-nucleotide polymorphisms (SNPs) were measured, and the extent to which common SNPs were shared across populations was analyzed using several different estimators of population structure. Most SNPs that were common in one population were present in multiple populations, but SNPs common in one population were frequently not common in other populations. Moreover, SNPs that were common in two or more populations often differed significantly in frequency from one population to another, particularly in comparisons of African Americans versus other U.S. populations. These findings indicate that, even if the bulk of alleles underlying complex health-related traits are common SNPs, geographic ancestry might well be an important predictor of whether a person carries a risk allele. Health is primarily determined by conditions that are both common and have a complex pattern of inheritance (i.e., risk is influenced by a combination of several different genetic and environmental factors). A popular model of the genetic architecture of common disease posits that the minor-allele frequencies (MAFs) of genetic variants influencing susceptibility are often also common (i.e., ⩾5%) and that such alleles are therefore old and found in multiple populations, rather than being rare and population specific. This model is known as the common-variant/common-disease (CV/CD) hypothesis.1Chakravarti A Population genetics—making sense out of sequence.Nat Genet. 1999; 21: 56-60Crossref PubMed Scopus (473) Google Scholar, 2Lander ES The new genomics: global views of biology.Science. 1996; 274: 536-539Crossref PubMed Scopus (861) Google Scholar, 3Pritchard JK Are rare variants responsible for susceptibility to complex diseases?.Am J Hum Genet. 2001; 69: 124-137Abstract Full Text Full Text PDF PubMed Scopus (879) Google Scholar, 4Reich DE Lander ES On the allelic spectrum of human disease.Trends Genet. 2001; 17: 502-510Abstract Full Text Full Text PDF PubMed Scopus (863) Google Scholar To facilitate testing of whether common variants influence susceptibility to common diseases, substantial efforts have been made to characterize the distribution of common alleles, particularly SNPs, among populations. This is important, because the extent to which common alleles explain risk of common disease across populations depends, in part, on how often alleles common in one population are common, or at least shared, in other populations.5Ioannidis JP Ntzani EE Trikalinos TA ‘Racial’ differences in genetic effects for complex diseases.Nat Genet. 2004; 36: 1312-1318Crossref PubMed Scopus (369) Google Scholar Although only a relatively small number of alleles associated with complex disease have been reported, some alleles putatively associated with complex disease are common and are found at similar frequencies among populations,6Lohmueller KE Mauney MM Reich D Braverman JM Variants associated with common disease are not unusually differentiated in frequency across populations.Am J Hum Genet. 2006; 78: 130-136Abstract Full Text Full Text PDF PubMed Scopus (39) Google Scholar whereas others, such as those that influence risk for atherosclerosis,7Cohen J Pertsemlidis A Kotowski IK Graham R Garcia CK Hobbs HH Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9.Nat Genet. 2005; 37: 161-165Crossref PubMed Scopus (950) Google Scholar hypertension,8Rosskopf D Manthey I Siffert W Identification and ethnic distribution of major haplotypes in the gene GNB3 encoding the G-protein β3 subunit.Pharmacogenetics. 2002; 12: 209-220Crossref PubMed Scopus (86) Google Scholar and acquired immunodeficiency syndrome9Gonzalez E Dhanda R Bamshad M Mummidi S Geevarghese R Catano G Anderson SA Walter EA Stephan KT Hammer MF et al.Global survey of genetic variation in CCR5, RANTES, and MIP-1α: impact on the epidemiology of the HIV-1 pandemic.Proc Natl Acad Sci USA. 2001; 98: 5199-5204Crossref PubMed Scopus (208) Google Scholar and some drug responses,10Tate SK Goldstein DB Will tomorrow’s medicines work for everyone?.Nat Genet. 2004; 36: S34-S42Crossref PubMed Scopus (150) Google Scholar either are common in only a single population or differ significantly in frequency among groups. The extent to which such differences explain overall variation in heritable disease risk across populations remains to be determined. A frequent claim about human population structure is that most common variation is shared among all populations.11Rebbeck TR Halbert CH Sankar P Genetics, epidemiology, and cancer disparities: is it black and white?.J Clin Oncol. 2006; 24: 2164-2169Crossref PubMed Scopus (35) Google Scholar, 12Weigmann K Racial medicine: here to stay? The success of the International HapMap Project and other initiatives may help to overcome racial profiling in medicine, but old habits die hard.EMBO Rep. 2006; 7: 246-249Crossref PubMed Scopus (12) Google Scholar, 13Abecasis G Tam PK Bustamante CD Ostrander EA Scherer SW Chanock SJ Kwok PY Brookes AJ Human Genome Variation 2006: emerging views on structural variation and large-scale SNP analysis.Nat Genet. 2007; 39: 153-155Crossref PubMed Scopus (24) Google Scholar This, of course, depends on how population boundaries are defined, but often cited to support such comments are the comparisons of SNP frequencies in pairs of populations in the HapMap data and the Perlegen data. Analyses of these data indicated that common SNPs were frequently both shared and common among populations of predominately African, Asian, and European ancestry.14The International HapMap Consortium A haplotype map of the human genome.Nature. 2005; 437: 1299-1320Crossref PubMed Scopus (4545) Google Scholar, 15Hinds DA Stuve LL Nilsen GB Halperin E Eskin E Ballinger DG Frazer KA Cox DR Whole-genome patterns of common DNA variation in three human populations.Science. 2005; 307: 1072-1079Crossref PubMed Scopus (953) Google Scholar However, population-genetics analysis was not the intended goal of either the HapMap or the Perlegen projects, and common, shared SNPs were oversampled by the ascertainment strategies used for each project.16Clark AG Hubisz MJ Bustamante CD Williamson SH Nielsen R Ascertainment bias in studies of human genome-wide polymorphism.Genome Res. 2005; 15: 1496-1502Crossref PubMed Scopus (346) Google Scholar, 17Weir BS Cardon LR Anderson AD Nielsen DM Hill WG Measures of human population structure show heterogeneity among genomic regions.Genome Res. 2005; 15: 1468-1476Crossref PubMed Scopus (209) Google Scholar Other projects avoided this ascertainment bias by resequencing the entire sample from which SNP frequencies were estimated. Examples of these projects include the Environmental Genome Project (EGP),18Livingston RJ von Niederhausern A Jegga AG Crawford DC Carlson CS Rieder MJ Gowrisankar S Aronow BJ Weiss RB Nickerson DA Pattern of sequence variation across 213 environmental response genes.Genome Res. 2004; 14: 1821-1831Crossref PubMed Scopus (153) Google Scholar the Seattle SNP project,19Carlson CS Eberle MA Rieder MJ Smith JD Kruglyak L Nickerson DA Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans.Nat Genet. 2003; 33: 518-521Crossref PubMed Scopus (273) Google Scholar, 20Crawford DC Carlson CS Rieder MJ Carrington DP Yi Q Smith JD Eberle MA Kruglyak L Nickerson DA Haplotype diversity across 100 candidate genes for inflammation, lipid metabolism, and blood pressure regulation in two populations.Am J Hum Genet. 2004; 74: 610-622Abstract Full Text Full Text PDF PubMed Scopus (152) Google Scholar the Applera SNP project,21Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MT Glanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RD et al.Natural selection on protein-coding genes in the human genome.Nature. 2005; 437: 1153-1157Crossref PubMed Scopus (566) Google Scholar and the ENCyclopedia of DNA Elements (ENCODE) project.22ENCODE The ENCODE (ENCyclopedia Of DNA Elements) Project.Science. 2004; 306: 636-640Crossref PubMed Scopus (1515) Google Scholar Yet, comparison of common coding-SNP variation across U.S. populations was limited by the design of each of these studies as well (table 1). For example, the EGP used the Polymorphism Discovery Resource, in which the sample identities are unknown, precluding comparisons across populations. The Seattle SNP and the Applera SNP projects resequenced samples only from self-identified African Americans and European Americans; Asian Americans and Latino/Hispanic Americans were not included. Furthermore, with the exception of Applera, all these projects resequenced a relatively modest number of genes, and several projects concentrated on genes with similar functional properties (e.g., genes involved in inflammation, immune defense, etc.).Table 1Comparison of Samples among Different Resequencing ProjectsNo. of Individuals in Population SampleaEA = European American; Afa = African American; AsA = Asian American; HA = Latino/Hispanic American.Project NameNo. of ChromosomesEAAfAAsAHANo. of Genes ResequencedSeattle SNP944648……100EGP180…………213Applera SNP784038……11,624ENCODE1283232bSamples were ascertained from native populations, not U.S. populations.32bSamples were ascertained from native populations, not U.S. populations.…10×500–kb regionsGRP152404038343,873a EA = European American; Afa = African American; AsA = Asian American; HA = Latino/Hispanic American.b Samples were ascertained from native populations, not U.S. populations. Open table in a new tab To estimate how frequently common SNPs ascertained by resequencing are shared among major U.S. populations, we analyzed the Genaissance Resequencing Project (GRP) SNP frequency data from 3,873 genes on 152 chromosomes (∼14 Mb of DNA sequence per individual) from self-identified African, Asian, Latino/Hispanic, and European Americans.23Salisbury BA Pungliya M Choi JY Jiang R Sun XJ Stephens JC SNP and haplotype variation in the human genome.Mutat Res. 2003; 526: 53-61Crossref PubMed Scopus (121) Google Scholar, 24Schneider JA Pungliya MS Choi JY Jiang R Sun XJ Salisbury BA Stephens JC DNA variability of human genes.Mech Ageing Dev. 2003; 124: 17-25Crossref PubMed Scopus (45) Google Scholar, 25Stephens JC Schneider JA Tanguay DA Choi J Acharya T Stanley SE Jiang R Messer CJ Chew A Han JH et al.Haplotype variation and linkage disequilibrium in 313 human genes.Science. 2001; 293: 489-493Crossref PubMed Scopus (690) Google Scholar These population labels were used despite the controversy surrounding the correspondence between notions of race and population structure inferred from explicit genetic data, because they are the labels used by the National Institutes of Health (NIH), the U.S. Food and Drug Administration, and many, if not most, biomedical researchers. Insofar as these labels capture information about genetic ancestry, it is of substantial biomedical interest to understand the distribution of common variation across populations such defined. The data set used herein consisted of genotypes ascertained by resequencing each exon (including the coding regions, 5′ UTR, and 3′ UTR), up to 100 bp upstream and downstream of each exon, up to 1,000 bp upstream of the transcription start site, and 100 bp downstream of the termination codon of 3,873 genes in 76 unrelated individuals (152 chromosomes), including 20 European Americans, 17 Latino/Hispanic Americans, 19 East Asian Americans, and 20 African Americans. All samples were obtained with institutional review board approval from individuals of self-identified group membership who participated in the GRP.23Salisbury BA Pungliya M Choi JY Jiang R Sun XJ Stephens JC SNP and haplotype variation in the human genome.Mutat Res. 2003; 526: 53-61Crossref PubMed Scopus (121) Google Scholar, 24Schneider JA Pungliya MS Choi JY Jiang R Sun XJ Salisbury BA Stephens JC DNA variability of human genes.Mech Ageing Dev. 2003; 124: 17-25Crossref PubMed Scopus (45) Google Scholar, 25Stephens JC Schneider JA Tanguay DA Choi J Acharya T Stanley SE Jiang R Messer CJ Chew A Han JH et al.Haplotype variation and linkage disequilibrium in 313 human genes.Science. 2001; 293: 489-493Crossref PubMed Scopus (690) Google Scholar Individuals were sampled from two locations in the United States—Anaheim, CA, and Miami, FL. Sampling 40 chromosomes in a population provides a 95% probability of detecting a SNP with a true population MAF ⩾5% (i.e., the common polymorphisms in which we are most interested). These data were provided as anonymous genotypes, so the identities of the genes that were resequenced and the location of each SNP were unknown to M.B. and S.L.G, the two authors responsible for the analysis. Accordingly, this precluded the performance of analyses that require such information (e.g., stratifying estimates of SNP sharing on the basis of functional and/or structural similarities among genes). For each individual, a blood sample was obtained, and lymphocytes were immortalized as Epstein-Barr virus–transformed cell lines. Genomic DNA was extracted using standard techniques and was used as the template for all subsequent PCRs. Sequencing reactions were performed using Applied Biosystems Big Dye Terminator chemistry, essentially in accordance with the manufacturer’s protocol, and results were analyzed on ABI Prism 3700/3730 DNA Analyzers. The presence of a polymorphism was confirmed by sequencing both strands of DNA. After initial data processing with the ABI instruments, sequence trace files were reanalyzed with the Phred program, which adds a quantitative base-quality value. This base-quality value provides a probabilistic estimate of the correctness of the base call. The quality values are the log of the probability that the base call is correct, such that a Phred value of 20 corresponds to a 99% probability that the base call is accurate, whereas a Phred value of 30 corresponds to a 99.9% probability that the base call is accurate. A minimum Phred value of 20 was used as a threshold. The sequence was assembled with consensus sequence with use of the Phrap program, and potential polymorphisms were identified using the Polyphred program. All sequence assemblies (i.e., reads plus consensus sequence and tagged polymorphisms) were then compiled into one Consed project for review. Potential polymorphisms were catalogued and underwent human review of original trace files. This final list of verified polymorphisms was loaded into a database, where they could be further reviewed. Sample mix-ups were controlled in three ways, by (1) genotyping several triads (i.e., parents and offspring) that were included on each sequencing plate, (2) confirming the identity of each sample by use of a subset of the Combined DNA Index System microsatellite markers each time a new master plate of DNA was generated, and (3) positioning a “null sample” in the same well on each sequencing plate to ensure that each plate was oriented properly. Hardy-Weinberg equilibrium (HWE) for each SNP within each population was calculated on the basis of a comparison of observed and expected heterozygosities and significance, tested against a χ2 distribution. For <5% of SNPs, the genotype frequencies differed significantly (i.e., P<.05) from HWE. This result suggests there were no gross systematic errors in base calls and/or sample mix-ups. Genotypes were available for 96.5%–99.9% (mean 98.4%) of sites per individual. For each SNP, the minor allele was defined as the allele with the lowest frequency in the total chromosome sample. To assess the degree of allele sharing among populations, we first determined the proportion of SNPs that were common in each population, defined as the number of SNPs with a MAF ⩾5% or ⩾10% in each population. We used both Spearman rank and Pearson correlation coefficients to calculate the pairwise correlations between the MAFs in each population. Simulations to generate expected values between populations of similar sample size for the proportion of SNPs shared, minor-SNP frequency differences, Spearman rank correlation coefficients for minor-SNP frequencies, and pairwise FST values were performed by randomly sampling 40 individuals without replacement from the total sample of 76 individuals, with the use of Floyd’s ordered hash table algorithm implemented in the surveyselect procedure of SAS 9.1.3., then by randomly allocating them into two groups to be used for analysis. The sample size of 20 individuals for each population matches the maximum sample size of each population from which empirical data were available. Reported values and SDs were generated from 1,000 such simulated data sets. Contour plots were constructed using the kernel-density estimation procedure of SAS 9.1.3, with use of a bandwidth multiplier equal to 1 and grid points of 60×60. We performed a principal-components analysis and distance-based cluster analysis, using the number of sequence differences between two individuals for all pairwise comparisons of individuals as the distance metric. Eigenvalues for the principal-components analysis are shown in figure 1A. We used the unweighted pair group method with arithmetic mean (UPGMA), implemented in SAS and PHYLIP26Felsenstein J PHYLIP (Phylogeny Inference Package) release 3.6. Department of Genome Sciences, University of Washington, Seattle2004Google Scholar for cluster analysis, and estimated the number of clusters where the pseudo F-test statistics were maximized (fig. 1B).27Calinski T Harabasz J A dendrite method for cluster analysis.Commun Statistics. 1974; 3: 1-27Crossref Scopus (3473) Google Scholar The distance matrix, principal-components analysis, and pseudo F-test statistics were generated in SAS 9.1.3. A radial tree depicting the relationships between individuals was drawn in TREEVIEW.28Page RD TreeView: an application to display phylogenetic trees on personal computers.Comput Appl Biosci. 1996; 12: 357-358PubMed Google Scholar The estimated log-likelihood of the probability of the data over the range of K is demonstrated in figure 1C. For the model-based cluster analysis, we used STRUCTURE 2.0,29Pritchard JK Stephens M Donnelly P Inference of population structure using multilocus genotype data.Genetics. 2000; 155: 945-959PubMed Google Scholar using the correlated allele-frequency model.30Falush D Stephens M Pritchard JK Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies.Genetics. 2003; 164: 1567-1587PubMed Google Scholar Among the 63,127 SNPs, we selected those in the top 10th percentile for expected heterozygosity, since this is a readily available measure and since data from Pritchard et al. suggest that markers with high expected heterozygosity are informative when used to infer population structure.31Pritchard JK Stephens M Rosenberg NA Donnelly P Association mapping in structured populations.Am J Hum Genet. 2000; 67: 170-181Abstract Full Text Full Text PDF PubMed Scopus (1403) Google Scholar These selected markers had low pairwise linkage disequilibrium. We used the following settings for the STRUCTURE run: admixture model, correlated markers, K=1–6, a length of 100,000 for the burn-in period, and 100,000 repetitions following the burn-in period. The estimated log-likelihood of the probability of the data over the range of K is demonstrated in figure 1C. For each biallelic locus, Wright’s locus-by-locus fixation index, FST, was estimated usingFST=1-∑j2pj1-pjj2p∧(1-p∧) ,where pj is the MAF in population j andp¯ is the MAF in all j populations.32Hartl DL Clark AG Principles of population genetics. Sinauer Associates, Sunderland, MA1997Google Scholar Total FST is expressed as an average over all alleles. A total of 63,127 SNPs were identified in 3,873 genes (data available at the Bamshad Lab Web site). Of these SNPs, 24,982 (39.6%) were singletons, meaning that the minor allele was observed on only one chromosome (fig. 2). Of all singletons, 45% (11,244) were observed in African Americans, and the lowest number of singletons was found in Asian Americans (table 2). More than half of all SNPs (35,385, or 56%) were private—that is, observed in only one population (table 2). The majority of private SNPs were rare; 70.6% were singletons and 99% were observed at a frequency of <5% (table 2). The percentage of all nonsingleton SNPs (i.e., the number of SNPs in a population divided by the total number of SNPs identified in all populations combined) found in any single group ranged between 50% in Asian Americans and 83% in African Americans.Table 2Summary of Private Genetic Variation in U.S. PopulationsNo. of SNPs (% in Population)aAfA = African American; EA = European American; AsA = Asian American; HA = Latino/Hispanic American.SNP FrequencyAfAEAAsAHATotalSingletons11,2445,1574,2304,35124,982MAF ⩾5%7,498 (40)1,579 (23)820 (16)506 (10)10,403 (29)MAF ⩾10%2,297 (12)712 (11)263 (5)99 (2)3,371 (10) Total18,7426,7365,0504,85735,385a AfA = African American; EA = European American; AsA = Asian American; HA = Latino/Hispanic American. Open table in a new tab The absolute number of SNPs with an MAF of either ⩾5% or ⩾10% (i.e., common SNPs) was highest in African Americans, and, for every three common SNPs in European Americans, there were four common SNPs in African Americans. For SNPs with an MAF ⩾10%, pairwise population comparisons showed that 67%–96% of SNPs common in one population were at least present in both populations (fig. 3A). However, only 44%–72% of such SNPs were common in both populations (fig. 3C). These findings were similar when SNPs with an MAF ⩾5% in at least one population were compared (fig. 3B and 3D). Of the 23,220 SNPs with MAF ⩾10% in at least one population, 7,436 (32%) were common in all four populations, and 13,285 (57%) were present in all four populations. Additionally, common SNPs were often not shared among the African American population and other U.S. populations. These results indicate that, in this sample of U.S. populations, slightly more than half of all common SNPs are shared among populations, but two-thirds of them are not common in all populations. Even when common alleles are common in two or more groups, it is important to know whether the frequencies of such alleles differ substantially between groups. Indeed, there is predicted to be greater variation among the frequencies of common alleles than among those of rare alleles. To what extent the frequencies of common SNPs were similar among populations was assessed by estimating the pairwise correlation coefficient between frequencies of SNPs with an MAF ⩾10% in both populations. The MAFs of common SNPs varied widely between groups (table 3). They were most highly correlated between Latino/Hispanic Americans and European Americans (ρ=0.84) and were least correlated between African Americans and Asian Americans (ρ=0.26). Contour plots demonstrated that pairwise correlation coefficients were consistently lower between African Americans and non–African American populations (fig. 4); results were similar when SNPs ⩾5% were compared between populations. This is due, in part, to the presence of more high-frequency SNPs in African Americans, which leads to greater differences among MAFs between populations (fig. 5). Therefore, whereas rare alleles contributing to common disease might be less likely to be found in multiple populations, common alleles influencing risk of common disease are likely to vary more in frequency among groups.Table 3MAF Correlation of Common SNPs between PopulationsPopulationAfAEAAsAHAAfA….36aCorrelation between SNPs with an MAF ⩾10%, where MAF is defined for the whole sample (n=152 chromosomes)..26.45EA.19bSpearman rank correlation between SNPs with an MAF ⩾10% in both populations where MAF is defined in each subpopulation (below the diagonal).….58.84AsA.084.21….62HA.28.65.25…Note.—AfA = African American; AsA = Asian American; EA = European American; HA = Latino/Hispanic American.a Correlation between SNPs with an MAF ⩾10%, where MAF is defined for the whole sample (n=152 chromosomes).b Spearman rank correlation between SNPs with an MAF ⩾10% in both populations where MAF is defined in each subpopulation (below the diagonal). Open table in a new tab Figure 5Measures of SNP sharing among Latino/Hispanic (HA), African (AfA), Asian (AsA), and European (EA) Americans. For all figures, the X-axis represents overlapping bins (i.e., >0.05 represents all SNPs with MAF >0.05), and MAF is calculated across all 152 chromosomes. When two populations are compared, MAF is calculated separately for each population. A, Pairwise comparisons of the proportion of SNPs shared between populations. B, Mean differences of pairwise comparisons of MAF between SNPs. C, Spearman rank correlation coefficients among pairwise comparisons of MAF between SNPs. D, Pairwise FST estimates between SNPs. The solid black line in each figure represents the mean value, and the dotted lines indicate the CI of values estimated from 1,000 data sets in which individuals were randomly distributed into pairs of populations (see text for details). ns = nonsingletons.View Large Image Figure ViewerDownload Hi-res image Download (PPT) Note.— AfA = African American; AsA = Asian American; EA = European American; HA = Latino/Hispanic American. To examine the relationship between the MAF and the sharing of SNPs among groups, we estimated the proportion of SNPs shared (i.e., present in both populations) between pairs of populations for SNPs with frequencies ranging from all “nonsingletons” to ⩾40% in either population. For all pairwise comparisons, the proportion of SNPs shared between groups was substantially less than that shared among individuals who were randomly sorted into two populations, and, as the frequency of a SNP increased, it was more likely to be shared between populations (fig. 5A). Indeed, >95% of SNPs with an MAF ⩾20% and 99% of SNPs with an MAF ⩾30% were shared between pairs of populations. However, despite being more frequently shared between groups, SNPs with a higher frequency were also associated with a greater mean difference in frequency between populations (fig. 5B). Accordingly, the correlation between MAFs between groups decreased as SNP frequency increased (fig. 5C). One outcome of this phenomenon is that the effect of population structure among pairs of groups, as estimated by Wright’s fixation index, or FST, is more pronounced when calculated using SNPs with a higher allele frequency (fig. 5D). Therefore, the difference between the frequencies of a given SNP in two different populations was positively correlated with MAF, and the magnitude of the correlation varied among pairwise population comparisons. For each comparison (i.e., fig. 5A–5D), it was of interest to assess the departure from expectations under a simple model in which individuals mated at random (i.e., no population structure). Accordingly, we created a thousand data sets in which individuals were randomly a" @default.
- W2142243300 created "2016-06-24" @default.
- W2142243300 creator A5004844213 @default.
- W2142243300 creator A5004929872 @default.
- W2142243300 creator A5032182699 @default.
- W2142243300 creator A5064174809 @default.
- W2142243300 creator A5064571261 @default.
- W2142243300 date "2007-12-01" @default.
- W2142243300 modified "2023-10-16" @default.
- W2142243300 title "The Structure of Common Genetic Variation in United States Populations" @default.
- W2142243300 cites W1480933312 @default.
- W2142243300 cites W1969176674 @default.
- W2142243300 cites W1972673599 @default.
- W2142243300 cites W1989539292 @default.
- W2142243300 cites W1994199626 @default.
- W2142243300 cites W1996767070 @default.
- W2142243300 cites W1997759222 @default.
- W2142243300 cites W2020501414 @default.
- W2142243300 cites W2021699930 @default.
- W2142243300 cites W2026583751 @default.
- W2142243300 cites W2028818536 @default.
- W2142243300 cites W2033082339 @default.
- W2142243300 cites W2034580949 @default.
- W2142243300 cites W2037653027 @default.
- W2142243300 cites W2039323256 @default.
- W2142243300 cites W2052937797 @default.
- W2142243300 cites W2055222883 @default.
- W2142243300 cites W2059545122 @default.
- W2142243300 cites W2064994526 @default.
- W2142243300 cites W2066614521 @default.
- W2142243300 cites W2072517581 @default.
- W2142243300 cites W2079501979 @default.
- W2142243300 cites W2080267054 @default.
- W2142243300 cites W2097103181 @default.
- W2142243300 cites W2098126593 @default.
- W2142243300 cites W2100315218 @default.
- W2142243300 cites W2100988728 @default.
- W2142243300 cites W2105096129 @default.
- W2142243300 cites W2110343195 @default.
- W2142243300 cites W2122004833 @default.
- W2142243300 cites W2124300302 @default.
- W2142243300 cites W2129559300 @default.
- W2142243300 cites W2131798051 @default.
- W2142243300 cites W2134070988 @default.
- W2142243300 cites W2145371252 @default.
- W2142243300 cites W2158489424 @default.
- W2142243300 cites W2217809488 @default.
- W2142243300 cites W2247766769 @default.
- W2142243300 cites W4254753097 @default.
- W2142243300 doi "https://doi.org/10.1086/522239" @default.
- W2142243300 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/2276358" @default.
- W2142243300 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/17999361" @default.
- W2142243300 hasPublicationYear "2007" @default.
- W2142243300 type Work @default.
- W2142243300 sameAs 2142243300 @default.
- W2142243300 citedByCount "49" @default.
- W2142243300 countsByYear W21422433002012 @default.
- W2142243300 countsByYear W21422433002013 @default.
- W2142243300 countsByYear W21422433002014 @default.
- W2142243300 countsByYear W21422433002015 @default.
- W2142243300 countsByYear W21422433002016 @default.
- W2142243300 countsByYear W21422433002017 @default.
- W2142243300 countsByYear W21422433002018 @default.
- W2142243300 countsByYear W21422433002019 @default.
- W2142243300 countsByYear W21422433002020 @default.
- W2142243300 countsByYear W21422433002021 @default.
- W2142243300 crossrefType "journal-article" @default.
- W2142243300 hasAuthorship W2142243300A5004844213 @default.
- W2142243300 hasAuthorship W2142243300A5004929872 @default.
- W2142243300 hasAuthorship W2142243300A5032182699 @default.
- W2142243300 hasAuthorship W2142243300A5064174809 @default.
- W2142243300 hasAuthorship W2142243300A5064571261 @default.
- W2142243300 hasBestOaLocation W21422433001 @default.
- W2142243300 hasConcept C104317684 @default.
- W2142243300 hasConcept C121332964 @default.
- W2142243300 hasConcept C2778334786 @default.
- W2142243300 hasConcept C44870925 @default.
- W2142243300 hasConcept C54355233 @default.
- W2142243300 hasConcept C68873052 @default.
- W2142243300 hasConcept C78458016 @default.
- W2142243300 hasConcept C86803240 @default.
- W2142243300 hasConceptScore W2142243300C104317684 @default.
- W2142243300 hasConceptScore W2142243300C121332964 @default.
- W2142243300 hasConceptScore W2142243300C2778334786 @default.
- W2142243300 hasConceptScore W2142243300C44870925 @default.
- W2142243300 hasConceptScore W2142243300C54355233 @default.
- W2142243300 hasConceptScore W2142243300C68873052 @default.
- W2142243300 hasConceptScore W2142243300C78458016 @default.
- W2142243300 hasConceptScore W2142243300C86803240 @default.
- W2142243300 hasIssue "6" @default.
- W2142243300 hasLocation W21422433001 @default.
- W2142243300 hasLocation W21422433002 @default.
- W2142243300 hasLocation W21422433003 @default.
- W2142243300 hasLocation W21422433004 @default.
- W2142243300 hasOpenAccess W2142243300 @default.
- W2142243300 hasPrimaryLocation W21422433001 @default.
- W2142243300 hasRelatedWork W1613639794 @default.
- W2142243300 hasRelatedWork W1641042124 @default.