Matches in SemOpenAlex for { <https://semopenalex.org/work/W2952862024> ?p ?o ?g. }
- W2952862024 endingPage "726" @default.
- W2952862024 startingPage "707" @default.
- W2952862024 abstract "Most population isolates examined to date were founded from a single ancestral population. Consequently, there is limited knowledge about the demographic history of admixed population isolates. Here we investigate genomic diversity of recently admixed population isolates from Costa Rica and Colombia and compare their diversity to a benchmark population isolate, the Finnish. These Latin American isolates originated during the 16th century from admixture between a few hundred European males and Amerindian females, with a limited contribution from African founders. We examine whole-genome sequence data from 449 individuals, ascertained as families to build mutigenerational pedigrees, with a mean sequencing depth of coverage of approximately 36×. We find that Latin American isolates have increased genetic diversity relative to the Finnish. However, there is an increase in the amount of identity by descent (IBD) segments in the Latin American isolates relative to the Finnish. The increase in IBD segments is likely a consequence of a very recent and severe population bottleneck during the founding of the admixed population isolates. Furthermore, the proportion of the genome that falls within a long run of homozygosity (ROH) in Costa Rican and Colombian individuals is significantly greater than that in the Finnish, suggesting more recent consanguinity in the Latin American isolates relative to that seen in the Finnish. Lastly, we find that recent consanguinity increased the number of deleterious variants found in the homozygous state, which is relevant if deleterious variants are recessive. Our study suggests that there is no single genetic signature of a population isolate. Most population isolates examined to date were founded from a single ancestral population. Consequently, there is limited knowledge about the demographic history of admixed population isolates. Here we investigate genomic diversity of recently admixed population isolates from Costa Rica and Colombia and compare their diversity to a benchmark population isolate, the Finnish. These Latin American isolates originated during the 16th century from admixture between a few hundred European males and Amerindian females, with a limited contribution from African founders. We examine whole-genome sequence data from 449 individuals, ascertained as families to build mutigenerational pedigrees, with a mean sequencing depth of coverage of approximately 36×. We find that Latin American isolates have increased genetic diversity relative to the Finnish. However, there is an increase in the amount of identity by descent (IBD) segments in the Latin American isolates relative to the Finnish. The increase in IBD segments is likely a consequence of a very recent and severe population bottleneck during the founding of the admixed population isolates. Furthermore, the proportion of the genome that falls within a long run of homozygosity (ROH) in Costa Rican and Colombian individuals is significantly greater than that in the Finnish, suggesting more recent consanguinity in the Latin American isolates relative to that seen in the Finnish. Lastly, we find that recent consanguinity increased the number of deleterious variants found in the homozygous state, which is relevant if deleterious variants are recessive. Our study suggests that there is no single genetic signature of a population isolate. The use of population isolates to map Mendelian and complex diseases has been a key feature of medical genomics. In addition to experiencing the bottleneck involved with the migration out of Africa, some populations underwent subsequent bottlenecks and remained in relative seclusion afterward. These populations formed present-day isolates.1Peltonen L. Palotie A. Lange K. Use of population isolates for mapping complex traits.Nat. Rev. Genet. 2000; 1: 182-190Crossref PubMed Scopus (298) Google Scholar The genomes of population isolates are thought to exhibit several hallmark features of genetic variation. Due to bottlenecks associated with their founding, it is thought that isolates should carry lower levels of genetic diversity and lower haplotype diversity than closely related non-isolated populations. Drift experienced by isolates is magnified by small population size, which generates more linkage disequilibrium (LD) than in non-isolated populations. In addition to increased LD, individuals from isolated populations tend to share more regions of the genome identical by descent (IBD) due to small population size. Further, due to the isolation after founding and recent mating practices, isolates may have larger regions of the genome found in runs of homozygosity (ROHs) due to recent inbreeding. Lastly, bottlenecks and inbreeding should impact patterns of deleterious variation.2Lohmueller K.E. Indap A.R. Schmidt S. Boyko A.R. Hernandez R.D. Hubisz M.J. Sninsky J.J. White T.J. Sunyaev S.R. Nielsen R. et al.Proportionally more deleterious genetic variation in European than in African populations.Nature. 2008; 451: 994-997Crossref PubMed Scopus (266) Google Scholar, 3Charlesworth D. Willis J.H. The genetics of inbreeding depression.Nat. Rev. Genet. 2009; 10: 783-796Crossref PubMed Scopus (897) Google Scholar, 4Lohmueller K.E. The impact of population demography and selection on the genetic architecture of complex traits.PLoS Genet. 2014; 10 (e1004379.5)Crossref Scopus (74) Google Scholar Consequently, one would predict that individuals from isolates will have fewer segregating sites, and the remaining deleterious variants will be segregating at a higher frequency.5Simons Y.B. Turchin M.C. Pritchard J.K. Sella G. The deleterious mutation load is insensitive to recent population history.Nat. Genet. 2014; 46: 220-224Crossref PubMed Scopus (137) Google Scholar Indeed, genomic studies over the last decade have documented several of these signatures.6Service S. DeYoung J. Karayiorgou M. Roos J.L. Pretorious H. Bedoya G. Ospina J. Ruiz-Linares A. Macedo A. Palha J.A. et al.Magnitude and distribution of linkage disequilibrium in population isolates and implications for genome-wide association studies.Nat. Genet. 2006; 38: 556-560Crossref PubMed Scopus (170) Google Scholar, 7Lim E.T. Würtz P. Havulinna A.S. Palta P. Tukiainen T. Rehnström K. Esko T. Mägi R. Inouye M. Lappalainen T. et al.Sequencing Initiative Suomi (SISu) ProjectDistribution and medical impact of loss-of-function variants in the Finnish founder population.PLoS Genet. 2014; 10: e1004494Crossref PubMed Scopus (190) Google Scholar, 8Xue Y. Mezzavilla M. Haber M. McCarthy S. Chen Y. Narasimhan V. Gilly A. Ayub Q. Colonna V. Southam L. et al.Enrichment of low-frequency functional variants revealed by whole-genome sequencing of multiple isolated European populations.Nat. Commun. 2017; 8: 15927Crossref PubMed Scopus (25) Google Scholar However, it is known that not all isolates share the same demographic history. Therefore, it is essential that we understand how the factors shaping genetic variation in a population are influenced by the unique demographic history of the population. One archetypal human population isolate that has been extensively studied is the Finnish.7Lim E.T. Würtz P. Havulinna A.S. Palta P. Tukiainen T. Rehnström K. Esko T. Mägi R. Inouye M. Lappalainen T. et al.Sequencing Initiative Suomi (SISu) ProjectDistribution and medical impact of loss-of-function variants in the Finnish founder population.PLoS Genet. 2014; 10: e1004494Crossref PubMed Scopus (190) Google Scholar, 9Kittles R.A. Perola M. Peltonen L. Bergen A.W. Aragon R.A. Virkkunen M. Linnoila M. Goldman D. Long J.C. Dual origins of Finns revealed by Y chromosome haplotype variation.Am. J. Hum. Genet. 1998; 62: 1171-1179Abstract Full Text Full Text PDF PubMed Scopus (146) Google Scholar, 10Peltonen L. Jalanko A. Varilo T. Molecular genetics of the Finnish disease heritage.Hum. Mol. Genet. 1999; 8: 1913-1923Crossref PubMed Scopus (301) Google Scholar, 11Wang S.R. Agarwala V. Flannick J. Chiang C.W.K. Altshuler D. Hirschhorn J.N. GoT2D ConsortiumSimulation of Finnish population history, guided by empirical genetic data, to assess power of rare-variant tests in Finland.Am. J. Hum. Genet. 2014; 94: 710-720Abstract Full Text Full Text PDF PubMed Scopus (14) Google Scholar Finland was populated through two separate major migrations. Briefly, a small number of founders, relative isolation, serial bottlenecks, and recent expansion in Finland has allowed drift to play a large role in shaping the gene pool of this population.11Wang S.R. Agarwala V. Flannick J. Chiang C.W.K. Altshuler D. Hirschhorn J.N. GoT2D ConsortiumSimulation of Finnish population history, guided by empirical genetic data, to assess power of rare-variant tests in Finland.Am. J. Hum. Genet. 2014; 94: 710-720Abstract Full Text Full Text PDF PubMed Scopus (14) Google Scholar The aforementioned demographic history of Finland has led to an increase in the prevalence of rare heritable Mendelian diseases, which has made this population particularly fruitful for identifying disease-associated variants.10Peltonen L. Jalanko A. Varilo T. Molecular genetics of the Finnish disease heritage.Hum. Mol. Genet. 1999; 8: 1913-1923Crossref PubMed Scopus (301) Google Scholar, 12de la Chapelle A. Wright F.A. Linkage disequilibrium mapping in isolated populations: the example of Finland revisited.Proc. Natl. Acad. Sci. USA. 1998; 95: 12416-12423Crossref PubMed Scopus (174) Google Scholar Most of the studies in Finland employed LD mapping in affected families and well-curated genealogical records to identify causal and candidate variants.10Peltonen L. Jalanko A. Varilo T. Molecular genetics of the Finnish disease heritage.Hum. Mol. Genet. 1999; 8: 1913-1923Crossref PubMed Scopus (301) Google Scholar More recently, it has been possible to apply population-based linkage analyses to identify disease-associated variants as an alternative to genome-wide association studies (GWASs)13Martin A.R. Karczewski K.J. Kerminen S. Kurki M.I. Sarin A.-P. Artomov M. Eriksson J.G. Esko T. Genovese G. Havulinna A.S. et al.Haplotype sharing provides insights into fine-scale population history and disease in Finland.Am. J. Hum. Genet. 2018; 102: 760-775Abstract Full Text Full Text PDF PubMed Scopus (17) Google Scholar due to the availability of whole-genome sequence data in conjunction with extensive electronic health records. A number of studies have shown that power to detect causal variants can be improved by studying population isolates other than the Finnish.8Xue Y. Mezzavilla M. Haber M. McCarthy S. Chen Y. Narasimhan V. Gilly A. Ayub Q. Colonna V. Southam L. et al.Enrichment of low-frequency functional variants revealed by whole-genome sequencing of multiple isolated European populations.Nat. Commun. 2017; 8: 15927Crossref PubMed Scopus (25) Google Scholar, 14Panoutsopoulou K. Hatzikotoulas K. Xifara D.K. Colonna V. Farmaki A.-E. Ritchie G.R.S. Southam L. Gilly A. Tachmazidou I. Fatumo S. et al.Genetic characterization of Greek population isolates reveals strong genetic drift at missense and trait-associated variants.Nat. Commun. 2014; 5: 5345Crossref PubMed Scopus (32) Google Scholar, 15Nakatsuka N. Moorjani P. Rai N. Sarkar B. Tandon A. Patterson N. Bhavani G.S. Girisha K.M. Mustak M.S. Srinivasan S. et al.The promise of discovering population-specific disease-associated genes in South Asia.Nat. Genet. 2017; 49: 1403-1407Crossref PubMed Scopus (37) Google Scholar, 16Pedersen C.T. Lohmueller K.E. Grarup N. Bjerregaard P. Hansen T. Siegismund H.R. Moltke I. Albrechtsen A. The effect of an extreme and prolonged population bottleneck on patterns of deleterious variation: insights from the Greenlandic Inuit.Genetics. 2017; 205: 787-801Crossref PubMed Scopus (22) Google Scholar For example, the Greenlandic Inuit experienced an extreme bottleneck which caused a depletion of rare variants and segregating sites in their genome.16Pedersen C.T. Lohmueller K.E. Grarup N. Bjerregaard P. Hansen T. Siegismund H.R. Moltke I. Albrechtsen A. The effect of an extreme and prolonged population bottleneck on patterns of deleterious variation: insights from the Greenlandic Inuit.Genetics. 2017; 205: 787-801Crossref PubMed Scopus (22) Google Scholar The remaining segregating variants are maintained at higher allele frequencies and a larger proportion of these SNPs are deleterious when compared to non-isolated populations. Another study of South Asian populations showed similar results. Specifically, South Asian populations have experienced more severe founder effects than the Finnish,15Nakatsuka N. Moorjani P. Rai N. Sarkar B. Tandon A. Patterson N. Bhavani G.S. Girisha K.M. Mustak M.S. Srinivasan S. et al.The promise of discovering population-specific disease-associated genes in South Asia.Nat. Genet. 2017; 49: 1403-1407Crossref PubMed Scopus (37) Google Scholar thus creating an excess of rare alleles associated with recessive disease. A study of European population isolates compared the isolates with the closest non-isolated population from similar geographic regions8Xue Y. Mezzavilla M. Haber M. McCarthy S. Chen Y. Narasimhan V. Gilly A. Ayub Q. Colonna V. Southam L. et al.Enrichment of low-frequency functional variants revealed by whole-genome sequencing of multiple isolated European populations.Nat. Commun. 2017; 8: 15927Crossref PubMed Scopus (25) Google Scholar and found that the total number of segregating sites was depleted across all isolates relative to the comparison non-isolate. Of the sites that were segregating in isolates, between ∼30,000 and 122,000 sites existed at an appreciable frequency (minor allele frequency [MAF] > 5.6%), while remaining rare (MAF < 1.4%) in all the non-isolate population samples.8Xue Y. Mezzavilla M. Haber M. McCarthy S. Chen Y. Narasimhan V. Gilly A. Ayub Q. Colonna V. Southam L. et al.Enrichment of low-frequency functional variants revealed by whole-genome sequencing of multiple isolated European populations.Nat. Commun. 2017; 8: 15927Crossref PubMed Scopus (25) Google Scholar The authors surmised that these common and low-frequency variants could be useful in GWASs for novel associations, as they included SNPs that had been previously associated with cardio-metabolic traits.8Xue Y. Mezzavilla M. Haber M. McCarthy S. Chen Y. Narasimhan V. Gilly A. Ayub Q. Colonna V. Southam L. et al.Enrichment of low-frequency functional variants revealed by whole-genome sequencing of multiple isolated European populations.Nat. Commun. 2017; 8: 15927Crossref PubMed Scopus (25) Google Scholar, 17Tachmazidou I. Dedoussis G. Southam L. Farmaki A.-E. Ritchie G.R. Xifara D.K. Matchan A. Hatzikotoulas K. Rayner N.W. Chen Y. et al.UK10K consortiumA rare functional cardioprotective APOC3 variant has risen in frequency in distinct population isolates.Nat. Commun. 2013; 4: 2872Crossref PubMed Scopus (58) Google Scholar While there have been many studies of genetic variation in population isolates, the studies described above have focused on populations where the founders all came from the same ancestral population. The founders of Latin American population isolates have come from distinct continental populations. We sampled individuals from mountainous regions of Costa Rica and Colombia where geographic barriers resulted in populations remaining isolated since their founding in the 16th and 17th centuries, until the mid-20th century.18Carvajal-Carmona L.G. Ophoff R. Service S. Hartiala J. Molina J. Leon P. Ospina J. Bedoya G. Freimer N. Ruiz-Linares A. Genetic demography of Antioquia (Colombia) and the central valley of Costa Rica.Hum. Genet. 2003; 112: 534-541Crossref PubMed Scopus (92) Google Scholar Both groups share a similar demographic history, having originated primarily from admixture between a few hundred European males and Amerindian females, with a limited contribution from African founders. After the founding event, both populations experienced a subsequent bottleneck and then a recent expansion, within the last 300 years, wherein the expansion increased the population size more than 1,000-fold since the initial founding event.18Carvajal-Carmona L.G. Ophoff R. Service S. Hartiala J. Molina J. Leon P. Ospina J. Bedoya G. Freimer N. Ruiz-Linares A. Genetic demography of Antioquia (Colombia) and the central valley of Costa Rica.Hum. Genet. 2003; 112: 534-541Crossref PubMed Scopus (92) Google Scholar The effect that admixture has had on overall patterns of genetic variation in isolates remains elusive, and it remains unclear whether these populations share the typical genomic signatures seen in population isolates. While the small founding population size could reduce diversity, because the Costa Rican and Colombian isolates were founded from multiple diverse populations, they could potentially have increased in diversity relative to other population isolates. Lastly, the impact of admixture on deleterious variation also remains unclear. To better understand patterns of genetic variation in admixed isolated populations, we compared the Colombian and Costa Rican population isolates to a benchmark isolate, the Finnish, as well as other 1000 Genomes Project populations.191000 Genomes Project ConsortiumA global reference for human genetic variation.Nature. 2015; 526: 68-74Crossref PubMed Scopus (4820) Google Scholar We observe that relative to the Finnish, Latin American isolates have increased genetic diversity but an excess of IBD segments. Moreover, we detect an increase in the proportion of an individual’s genome that falls within a long ROH in Latin American isolates relative to all other sampled populations and an enrichment of deleterious variation within these long ROHs. Demographic simulations and analysis of extended pedigrees indicate that the enrichment of long ROHs is primarily a consequence of recent inbreeding in Latin American isolates. Next, we examine the relationship between the proportion of European, Native American, and African ancestry and the amount of the genome within an ROH, as well as the relationship to an individual’s pedigree inbreeding coefficient. Further, we examine demography across both recent and ancient timescales in these isolates. Our work sheds light on how the distinct demographic histories of population isolates affect both genetic diversity and the distribution of deleterious variation across the genome. Our study included 10 Costa Rican (CR) and 12 Colombian (CO) multi-generational pedigrees ascertained to include individuals affected by bipolar disorder 1. The sampled families are clumped geographically to some degree, and it is worth noting that the Central Valley of Costa Rica and Antioquia are population isolates but each population contains several million people. In Costa Rica there is only one psychiatric hospital, and the Antioquia Department of Colombia has few hospitals, so most case subjects were originally identified in the largest hospital in a city of more than 3 million people. More extensive details about the curation of pedigree data and clinical assessments of diagnosis can be found in Fears et al.20Fears S.C. Service S.K. Kremeyer B. Araya C. Araya X. Bejarano J. Ramirez M. Castrillón G. Gomez-Franco J. Lopez M.C. et al.Multisystem component phenotypes of bipolar disorder for genetic investigations of extended pedigrees.JAMA Psychiatry. 2014; 71: 375-387Crossref PubMed Scopus (63) Google Scholar We defined unrelated individuals as those who are at most third-degree relatives. We chose this threshold of relatedness because the families from CR and CO are known to be cryptically related. We used KING21Manichaikul A. Mychaleckyj J.C. Rich S.S. Daly K. Sale M. Chen W.-M. Robust relationship inference in genome-wide association studies.Bioinformatics. 2010; 26: 2867-2873Crossref PubMed Scopus (638) Google Scholar to identify 30 unrelated individuals from CR and CO. 24 of the 30 unrelated individuals in CO are founders in the pedigree and 15 of the 30 unrelated individuals in CR are founders, and each family sampled is represented by at least one individual, but some families had as many as seven individuals. The algorithm implemented in KING estimates familial relationships by modeling the genetic distance between a pair of individuals as a function of allele frequency and kinship coefficient, assuming that SNPs are in Hardy-Weinberg equilibrium. Further, we also used PC-AiR22Conomos M.P. Miller M.B. Thornton T.A. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.Genet. Epidemiol. 2015; 39: 276-293Crossref PubMed Scopus (75) Google Scholar and PC-Relate23Conomos M.P. Reiner A.P. Weir B.S. Thornton T.A. Model-free estimation of recent genetic relatedness.Am. J. Hum. Genet. 2016; 98: 127-148Abstract Full Text Full Text PDF PubMed Scopus (86) Google Scholar to estimate relatedness as these two methods are robust to population structure, cryptic relatedness, and admixture. We found that 28 of the 30 CO unrelated individuals and 26 of the 30 CR unrelated individuals were contained in the list of unrelated individuals from PC-AiR.22Conomos M.P. Miller M.B. Thornton T.A. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.Genet. Epidemiol. 2015; 39: 276-293Crossref PubMed Scopus (75) Google Scholar Complete overlap was not expected because we retained third-degree relatives when using KING to allow for cryptic relatedness of families sampled from Costa Rica and Colombia due to their demographic history. Lastly, we used KING to identify 30 unrelated individuals from the following 1000 Genomes Project191000 Genomes Project ConsortiumA global reference for human genetic variation.Nature. 2015; 526: 68-74Crossref PubMed Scopus (4820) Google Scholar populations: Yoruba (YRI), CEPH-European (CEU), Finnish (FIN), Colombian (CLM), Peruvian (PEL), Puerto Rican (PUR), and Mexican from Los Angeles (MXL). We used these 30 unrelated individuals per population for all analyses unless otherwise stated (Figure S1). We generated a joint variant call file (VCF) containing single-nucleotide polymorphisms (SNPs) from two separate datasets. The first dataset contained 210 whole-genome sequences sampled from the aforementioned 1000 Genomes Project populations.191000 Genomes Project ConsortiumA global reference for human genetic variation.Nature. 2015; 526: 68-74Crossref PubMed Scopus (4820) Google Scholar The second dataset contained 449 whole-genome sequences from Costa Rican and Colombian individuals. Variants in the second dataset were called following the GATK best practices pipeline24DePristo M.A. Banks E. Poplin R. Garimella K.V. Maguire J.R. Hartl C. Philippakis A.A. del Angel G. Rivas M.A. Hanna M. et al.A framework for variation discovery and genotyping using next-generation DNA sequencing data.Nat. Genet. 2011; 43: 491-498Crossref PubMed Scopus (5381) Google Scholar with the HaplotypeCaller of GATK. All multi-allelic SNVs and variants that failed Variant Quality Score Recalibration were removed. Genotypes with genotype quality score ≤ 20 were set to missing. Further quality control on variants was performed using a logistic regression model that was trained to predict the probability of each variant having good or poor sequencing quality. Individuals with poor sequencing quality and possible sample mix-ups were removed, and all sequenced individuals had high genotype concordance rate between whole-genome sequences and genotypes from microarray data. All sequenced individuals had consistency between the reported sex and sex determined from X chromosome as well as between empirical estimates of kinship and theoretical estimates. More information on sequencing and quality control procedures is discussed in Sul et al.25Sul J.H. Service S.K. Huang A.Y. Ramensky V. Hwang S.-G. Teshiba T.M. Park Y. Ori A.P.S. Zhang Z. Mullins N. et al.Contribution of common and rare variants to bipolar disorder susceptibility in extended pedigrees from population isolates.bioRxiv. 2018; https://doi.org/10.1101/363267Crossref Google Scholar We used the following protocol to merge these two datasets. First, we used guidelines from the 1000 Genomes Project strict mask to filter the Costa Rican and Colombian VCFs as well as the 1000 Genomes Project VCFs. Then, we used GATK to remove sites from both sets of VCFs that were not bi-allelic SNPs or monomorphic. Next, we merged the 1000 Genomes Project VCFs with the Costa Rican and Colombian VCFs into a single joint-VCF for each chromosome. We used only autosomes for our analyses. Lastly, we filtered the merged joint-VCF to only contain sites that were present in at least 90% of individuals. There were a total of 57,597,196 SNPs and 1,891,453,144 monomorphic sites in the final dataset. We ensured that the merged datasets were comparable by examining the number of derived putatively neutral alleles across the 30 unrelated individuals in all sampled populations and we found few differences between populations, which is consistent with theory5Simons Y.B. Turchin M.C. Pritchard J.K. Sella G. The deleterious mutation load is insensitive to recent population history.Nat. Genet. 2014; 46: 220-224Crossref PubMed Scopus (137) Google Scholar (Figure S2). We computed two measures of genetic diversity from sites called across all 30 unrelated individuals from each population: pi (π) and Watterson’s theta (θw). The average number of pairwise differences per site (π) was calculated across the genome as:π= nn−1∑i=1L2pi1−piL,where n is the total number of chromosomes sampled, p is the frequency of a given allele, and L is the length in base pairs of the sampled region. θw was computed by counting the number of segregating sites and dividing by Watterson’s constant, or the n-1 harmonic number.26Watterson G.A. On the number of segregating sites in genetical models without recombination.Theor. Popul. Biol. 1975; 7: 256-276Crossref PubMed Scopus (2761) Google Scholar Site frequency spectra were generated using the 30 unrelated individuals from each population. SNPs with missing data were removed from these analyses. There was a total of 16 SNPs out of the 57,597,196 SNPs that were removed due to missing data. We calculated LD between pairs of SNPs for all unrelated individuals. First, we applied a filter to remove SNPs that were not at a frequency of at least 10% across all populations. Next, pairwise r2 values were calculated using VCFTools.27Danecek P. Auton A. Abecasis G. Albers C.A. Banks E. DePristo M.A. Handsaker R.E. Lunter G. Marth G.T. Sherry S.T. et al.1000 Genomes Project Analysis GroupThe variant call format and VCFtools.Bioinformatics. 2011; 27: 2156-2158Crossref PubMed Scopus (3599) Google Scholar SNP pairs were then binned according to physical distance (bp) between each other and r2 was averaged within each bin. To detect regions of the genome that have shared IBD segments between pairs of individuals, we first removed singleton SNPs in each population since singletons are not informative about IBD. Then, we called IBD segments using IBDSeq.28Browning B.L. Browning S.R. Detecting identity by descent and estimating genotype error rates in sequence data.Am. J. Hum. Genet. 2013; 93: 840-851Abstract Full Text Full Text PDF PubMed Scopus (67) Google Scholar IBDSeq is a likelihood-based method that is designed to detect IBD segments in unphased sequence data. We chose to use IBDSeq because other methods that require computational phasing could be biased when applied to Latin American population isolates, as they do not have a publicly available reference population to aid in phasing. We compared IBDSeq to two well-known methods, Beagle29Browning B.L. Browning S.R. Improving the accuracy and efficiency of identity-by-descent detection in population data.Genetics. 2013; 194: 459-471Crossref PubMed Scopus (217) Google Scholar and GERMLINE,30Gusev A. Lowe J.K. Stoffel M. Daly M.J. Altshuler D. Breslow J.L. Friedman J.M. Pe’er I. Whole population, genome-wide mapping of hidden relatedness.Genome Res. 2009; 19: 318-326Crossref PubMed Scopus (246) Google Scholar to determine whether it was feasible to use IBDSeq on an admixed population (Figure S3). Data for Beagle and GERMLINE were phased beforehand with SHAPEIT31Delaneau O. Marchini J. Zagury J.-F. A linear complexity phasing method for thousands of genomes.Nat. Methods. 2011; 9: 179-181Crossref PubMed Scopus (852) Google Scholar (see Web Resources) using the 1000 Genomes as the reference panel. Beagle produced the shortest IBD segments while GERMLINE produced the longest IBD segments. IBDSeq produced segments with a length distribution similar to what we observed in Beagle, though the average segment length was slightly larger, which we expected given that IBDSeq was created to call longer segments that would have previously been broken up when using Beagle for phasing. We used the default parameters for IBDSeq. Next, we filtered the pooled IBD segments to remove artifacts. First, we calculated the physical distance spanned by each IBD segment. Then, we totaled the number of SNPs that fell within each segment. We observed an appreciable number of IBD segments that were extremely long but sparsely covered by SNPs (Figure S4). IBD segments were removed if the proportion of the IBD segment covered by SNPs was not within one standard deviation (0.0043) of the mean proportion covered (0.0221) across all IBD segments (Figure S4). Strong deviations from the mean could indicate that the IBD segment spans a region of the genome with low mappability where we are only calling the SNPs at the outer ends of the segment. Therefore, the true segment length might be much shorter than what is being calculated by IBDSeq. Lastly, we converted from physical distance to genetic distance using the deCODE genetic map.32Kong Thorleifsson G. Gudbjartsson D.F. Masson G. Sigurdsson A. Jonasdottir A. Walters G.B." @default.
- W2952862024 created "2019-06-27" @default.
- W2952862024 creator A5003629587 @default.
- W2952862024 creator A5003743615 @default.
- W2952862024 creator A5006895356 @default.
- W2952862024 creator A5007441251 @default.
- W2952862024 creator A5008702983 @default.
- W2952862024 creator A5010001310 @default.
- W2952862024 creator A5012270886 @default.
- W2952862024 creator A5012938030 @default.
- W2952862024 creator A5015005946 @default.
- W2952862024 creator A5015061675 @default.
- W2952862024 creator A5018018083 @default.
- W2952862024 creator A5019533423 @default.
- W2952862024 creator A5025278422 @default.
- W2952862024 creator A5026769077 @default.
- W2952862024 creator A5026996483 @default.
- W2952862024 creator A5029198223 @default.
- W2952862024 creator A5031917253 @default.
- W2952862024 creator A5035750579 @default.
- W2952862024 creator A5037488409 @default.
- W2952862024 creator A5037586820 @default.
- W2952862024 creator A5037666695 @default.
- W2952862024 creator A5049183914 @default.
- W2952862024 creator A5053185744 @default.
- W2952862024 creator A5055092029 @default.
- W2952862024 creator A5062316104 @default.
- W2952862024 creator A5062711796 @default.
- W2952862024 creator A5064604619 @default.
- W2952862024 creator A5065925140 @default.
- W2952862024 creator A5066467126 @default.
- W2952862024 creator A5069955843 @default.
- W2952862024 creator A5070041808 @default.
- W2952862024 creator A5073660758 @default.
- W2952862024 creator A5073888048 @default.
- W2952862024 creator A5075562934 @default.
- W2952862024 creator A5081207747 @default.
- W2952862024 creator A5081260924 @default.
- W2952862024 creator A5081870542 @default.
- W2952862024 creator A5085758031 @default.
- W2952862024 creator A5085936676 @default.
- W2952862024 creator A5088295039 @default.
- W2952862024 creator A5089089314 @default.
- W2952862024 date "2018-11-01" @default.
- W2952862024 modified "2023-10-17" @default.
- W2952862024 title "Understanding the Hidden Complexity of Latin American Population Isolates" @default.
- W2952862024 cites W1502735596 @default.
- W2952862024 cites W1554372034 @default.
- W2952862024 cites W1564788673 @default.
- W2952862024 cites W1828904010 @default.
- W2952862024 cites W1857512534 @default.
- W2952862024 cites W1919419508 @default.
- W2952862024 cites W1975720635 @default.
- W2952862024 cites W1980722139 @default.
- W2952862024 cites W1985849948 @default.
- W2952862024 cites W1989280600 @default.
- W2952862024 cites W1989923963 @default.
- W2952862024 cites W1994975196 @default.
- W2952862024 cites W1995148706 @default.
- W2952862024 cites W1995664705 @default.
- W2952862024 cites W1996723946 @default.
- W2952862024 cites W1996801244 @default.
- W2952862024 cites W1998771425 @default.
- W2952862024 cites W2001036195 @default.
- W2952862024 cites W2003424072 @default.
- W2952862024 cites W2005935256 @default.
- W2952862024 cites W2007325254 @default.
- W2952862024 cites W2012294035 @default.
- W2952862024 cites W2019794729 @default.
- W2952862024 cites W2024642024 @default.
- W2952862024 cites W2026187919 @default.
- W2952862024 cites W2026323344 @default.
- W2952862024 cites W2026416497 @default.
- W2952862024 cites W2029983700 @default.
- W2952862024 cites W2033288453 @default.
- W2952862024 cites W2035786794 @default.
- W2952862024 cites W2046223043 @default.
- W2952862024 cites W2046474981 @default.
- W2952862024 cites W2047299569 @default.
- W2952862024 cites W2052235739 @default.
- W2952862024 cites W2055485782 @default.
- W2952862024 cites W2057149047 @default.
- W2952862024 cites W2061539393 @default.
- W2952862024 cites W2064022143 @default.
- W2952862024 cites W2065035179 @default.
- W2952862024 cites W2065637989 @default.
- W2952862024 cites W2068811635 @default.
- W2952862024 cites W2070589766 @default.
- W2952862024 cites W2076658681 @default.
- W2952862024 cites W2082967637 @default.
- W2952862024 cites W2087232033 @default.
- W2952862024 cites W2087437533 @default.
- W2952862024 cites W2090216893 @default.
- W2952862024 cites W2091808830 @default.
- W2952862024 cites W2096303238 @default.
- W2952862024 cites W2098066250 @default.
- W2952862024 cites W2098482403 @default.
- W2952862024 cites W2100751001 @default.