Matches in SemOpenAlex for { <https://semopenalex.org/work/W3150556068> ?p ?o ?g. }
- W3150556068 endingPage "2111" @default.
- W3150556068 startingPage "2099" @default.
- W3150556068 abstract "The integration of genomic data into health systems offers opportunities to identify genomic factors underlying the continuum of rare and common disease. We applied a population-scale haplotype association approach based on identity-by-descent (IBD) in a large multi-ethnic biobank to a spectrum of disease outcomes derived from electronic health records (EHRs) and uncovered a risk locus for liver disease. We used genome sequencing and in silico approaches to fine-map the signal to a non-coding variant (c.2784−12T>C) in the gene ABCB4. In vitro analysis confirmed the variant disrupted splicing of the ABCB4 pre-mRNA. Four of five homozygotes had evidence of advanced liver disease, and there was a significant association with liver disease among heterozygotes, suggesting the variant is linked to increased risk of liver disease in an allele dose-dependent manner. Population-level screening revealed the variant to be at a carrier rate of 1.95% in Puerto Rican individuals, likely as the result of a Puerto Rican founder effect. This work demonstrates that integrating EHR and genomic data at a population scale can facilitate strategies for understanding the continuum of genomic risk for common diseases, particularly in populations underrepresented in genomic medicine. The integration of genomic data into health systems offers opportunities to identify genomic factors underlying the continuum of rare and common disease. We applied a population-scale haplotype association approach based on identity-by-descent (IBD) in a large multi-ethnic biobank to a spectrum of disease outcomes derived from electronic health records (EHRs) and uncovered a risk locus for liver disease. We used genome sequencing and in silico approaches to fine-map the signal to a non-coding variant (c.2784−12T>C) in the gene ABCB4. In vitro analysis confirmed the variant disrupted splicing of the ABCB4 pre-mRNA. Four of five homozygotes had evidence of advanced liver disease, and there was a significant association with liver disease among heterozygotes, suggesting the variant is linked to increased risk of liver disease in an allele dose-dependent manner. Population-level screening revealed the variant to be at a carrier rate of 1.95% in Puerto Rican individuals, likely as the result of a Puerto Rican founder effect. This work demonstrates that integrating EHR and genomic data at a population scale can facilitate strategies for understanding the continuum of genomic risk for common diseases, particularly in populations underrepresented in genomic medicine. Genetic identification of monogenic disease historically relied on tracking the co-segregation of genomic segments and disease state through familial pedigrees, in a process known as linkage mapping.1Donahue R.P. Bias W.B. Renwick J.H. McKusick V.A. Probable assignment of the Duffy blood group locus to chromosome 1 in man.Proc. Natl. Acad. Sci. USA. 1968; 61: 949-955Google Scholar,2McKusick V.A. Current trends in mapping human genes.FASEB J. 1991; 5: 12-20Google Scholar This approach is typically followed by localized sequencing to reveal the disease-causing variant and confirmatory functional studies in vitro or in animal models. This strategy has been used successfully throughout the late 20th century to uncover thousands of loci underlying suspected, rare genetic disorders.3Claussnitzer M. Cho J.H. Collins R. Cox N.J. Dermitzakis E.T. Hurles M.E. Kathiresan S. Kenny E.E. Lindgren C.M. MacArthur D.G. et al.A brief history of human disease genetics.Nature. 2020; 577: 179-189Google Scholar More recently, next generation sequencing technologies have led to the identification of the genetic etiology of disease through the direct sequencing of patient exomes and genomes in close pedigree structures.4Biesecker L.G. Green R.C. Diagnostic clinical genome and exome sequencing.N. Engl. J. Med. 2014; 371: 1170Google Scholar Genomic technologies have also been applied in health systems to uncover unknown pathogenic variants and streamline diagnosis5Turro E. Astle W.J. Megy K. Gräf S. Greene D. Shamardina O. Allen H.L. Sanchis-Juan A. Frontini M. Thys C. et al.NIHR BioResource for the 100,000 Genomes ProjectWhole-genome sequencing of patients with rare diseases in a national health system.Nature. 2020; 583: 96-102Google Scholar and to refine our understanding of the penetrance and frequency of pathogenic variants at a population level.6Abul-Husn N.S. Manickam K. Jones L.K. Wright E.A. Hartzel D.N. Gonzaga-Jauregui C. O’Dushlaine C. Leader J.B. Lester Kirchner H. Lindbuchler D.M. et al.Genetic identification of familial hypercholesterolemia within a single U.S. health care system.Science. 2016; 354: 354Google Scholar However, the preponderance of genome sequencing and genomic medicine research have been performed in populations of European descent, and there is a lag in genomic sequence data available for, and studies directed at, understanding monogenic disorders in non-European populations.7Popejoy A.B. Fullerton S.M. Genomics is failing on diversity.Nature. 2016; 538: 161-164Google Scholar The growth of large-scale biobanks linked to health systems data in recent years has opened avenues to uncovering the etiology of monogenic disorders.8Abul-Husn N.S. Kenny E.E. Personalized Medicine and the Power of Electronic Health Records.Cell. 2019; 177: 58-69Google Scholar With some exceptions,9Van Hout C.V. Tachmazidou I. Backman J.D. Hoffman J.D. Liu D. Pandey A.K. Gonzaga-Jauregui C. Khalid S. Ye B. Banerjee N. et al.Geisinger-Regeneron DiscovEHR CollaborationRegeneron Genetics CenterExome sequencing and characterization of 49,960 individuals in the UK Biobank.Nature. 2020; 586: 749-756Google Scholar,10Schwartz M.L.B. McCormick C.Z. Lazzeri A.L. Lindbuchler D.M. Hallquist M.L.G. Manickam K. Buchanan A.H. Rahm A.K. Giovanni M.A. Frisbie L. et al.A Model for Genome-First Care: Returning Secondary Genomic Findings to Participants and Their Healthcare Providers in a Large Research Cohort.Am. J. Hum. Genet. 2018; 103: 328-337Google Scholar the majority of genomic data generated in biobanks worldwide is on low-cost genotype arrays rather than genome sequencing and many biobanks are designed for population-based recruitment rather than being disease or pedigree focused. However, by leveraging array data in population-based biobanks, it is possible to calculate haplotypes of the genome that have been co-inherited from a recent common ancestor identical-by-descent.11Browning S.R. Browning B.L. Identity by descent between distant relatives: detection and applications.Annu. Rev. Genet. 2012; 46: 617-633Google Scholar Using this strategy, genealogical relationships can be captured locally along the genome among distantly or putatively unrelated members of a population, which are particularly enriched in founder populations.12Browning S.R. Thompson E.A. Detecting rare variant associations by identity-by-descent mapping in case-control studies.Genetics. 2012; 190: 1521-1531Google Scholar, 13Gauvin H. Moreau C. Lefebvre J.-F. Laprise C. Vézina H. Labuda D. Roy-Gagnon M.-H. Genome-wide patterns of identity-by-descent sharing in the French Canadian founder population.Eur. J. Hum. Genet. 2014; 22: 814-821Google Scholar, 14Thompson E.A. Identity by descent: variation in meiosis, across genomes, and in populations.Genetics. 2013; 194: 301-326Google Scholar, 15Te Meerman G.J. Van der Meulen M.A. Sandkuijl L.A. Perspectives of identity by descent (IBD) mapping in founder populations.Clin. Exp. Allergy. 1995; 25: 97-102Google Scholar Identical-by-descent haplotypes have the potential to harbor rare alleles that are not directly ascertained on genotyping arrays, facilitating association mapping of rare variants even when they are not directly observed,12Browning S.R. Thompson E.A. Detecting rare variant associations by identity-by-descent mapping in case-control studies.Genetics. 2012; 190: 1521-1531Google Scholar or are too rare or population-private to be readily imputable with currently existing reference panels; this approach is known as IBD mapping (identity-by-descent mapping).15Te Meerman G.J. Van der Meulen M.A. Sandkuijl L.A. Perspectives of identity by descent (IBD) mapping in founder populations.Clin. Exp. Allergy. 1995; 25: 97-102Google Scholar,16Houwen R.H. Baharloo S. Blankenship K. Raeymaekers P. Juyn J. Sandkuijl L.A. Freimer N.B. Genome screening by searching for shared segments: mapping a gene for benign recurrent intrahepatic cholestasis.Nat. Genet. 1994; 8: 380-386Google Scholar This property of IBD makes it especially useful for rare variant-based associations in diverse and understudied founder populations, for which deep genome-sequencing datasets may not be available. Furthermore, previous studies have leveraged EHR data in concert with genomic data to demonstrate the ubiquity and potential under-recognition of monogenic forms of disease in patient populations.6Abul-Husn N.S. Manickam K. Jones L.K. Wright E.A. Hartzel D.N. Gonzaga-Jauregui C. O’Dushlaine C. Leader J.B. Lester Kirchner H. Lindbuchler D.M. et al.Genetic identification of familial hypercholesterolemia within a single U.S. health care system.Science. 2016; 354: 354Google Scholar,17Bastarache L. Hughey J.J. Hebbring S. Marlo J. Zhao W. Ho W.T. Van Driest S.L. McGregor T.L. Mosley J.D. Wells Q.S. et al.Phenotype risk scores identify patients with unrecognized Mendelian disease patterns.Science. 2018; 359: 1233-1239Google Scholar We previously applied IBD mapping to height in a Puerto Rican (PR) founder population in New York City and identified a monogenic variant underlying the skeletal disorder Steel syndrome,18Belbin G.M. Odgis J. Sorokin E.P. Yee M.-C. Kohli S. Glicksberg B.S. Gignoux C.R. Wojcik G.L. Van Vleck T. Jeff J.M. et al.Genetic identification of a common collagen disease in puerto ricans via identity-by-descent mapping in a health system.eLife. 2017; 6: 6Google Scholar demonstrating the power of population-based strategies for elucidating monogenic disorders. Here we expand our previous approach by systematically associating IBD haplotypes with the full spectrum of EHR derived phenotypes in the large founder population of PR and PR-descent participants in the diverse, multi-ethnic BioMe biobank in New York City. We performed a phenome-wide association study (PheWAS)19Denny J.C. Ritchie M.D. Basford M.A. Pulley J.M. Bastarache L. Brown-Gentry K. Wang D. Masys D.R. Roden D.M. Crawford D.C. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations.Bioinformatics. 2010; 26: 1205-1210Google Scholar of identical-by-descent haplotypes under a recessive model in the PR founder population and identified a significant association between homologous IBD sharing at the locus 7q21.12 and severe liver disease. Fine-mapping of the identical-by-descent haplotypic region uncovered a rare variant in the gene ABCB4 (ABCB4: c.2784−12T>C; rs201498350; MIM: 171060), variants in which are known to play a causal role in multiple forms of hepatobiliary disease.20Reichert M.C. Lammert F. ABCB4 Gene Aberrations in Human Liver Disease: An Evolving Spectrum.Semin. Liver Dis. 2018; 38: 299-307Google Scholar In vitro analysis demonstrated that this variant disrupted splicing, leading to an ABCB4 protein product lacking exon 23. Manual chart review of these individuals revealed evidence of severe liver diseases in four of five homozygotes. We also investigated the impact of harboring one copy of c.2784−12T>C via a combination of PheWAS, analysis of liver function tests, and manual chart review, revealing both an elevation of serum liver enzyme levels and an increased risk of liver disease in heterozygotes. Furthermore, population-level screening revealed the variant to be common in PR (carrier rate of ∼1.9%) while rare (< 1%) in other global populations. These analyses provide a methodological framework for bridging statistical genetics and clinical genomics and demonstrate that EHR-embedded, population-level research can elucidate the continuum of genomic risk for liver disease. Study participants were recruited from the BioMe Biobank Program of The Charles Bronfman Institute for Personalized Medicine at Mount Sinai Medical Center from 2007 onward. The BioMe Biobank Program (Institutional Review Board 07–0529) operates under a Mount Sinai Institutional Review Board-approved research protocol. All study participants provided written informed consent. Genotyping, quality control, and merging of array data across the OMNI and MEGA platforms was performed as described in detail in Vishnu et al.21Vishnu A. Belbin G.M. Wojcik G.L. Bottinger E.P. Gignoux C.R. Kenny E.E. Loos R.J.F. The role of country of birth, and genetic and self-identified ancestry, in obesity susceptibility among African and Hispanic Americans.Am. J. Clin. Nutr. 2019; 110: 16-23Google Scholar In brief, we performed standard quality control for variants based on missingness, heterozygosity, and Hardy-Weinberg equilibrium using PLINKv.1.9.22Purcell S. Neale B. Todd-Brown K. Thomas L. Ferreira M.A.R. Bender D. Maller J. Sklar P. de Bakker P.I.W. Daly M.J. Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses.Am. J. Hum. Genet. 2007; 81: 559-575Google Scholar,23Chang C.C. Chow C.C. Tellier L.C. Vattikuti S. Purcell S.M. Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets.Gigascience. 2015; 4: 7Google Scholar We removed samples that were duplicated across both arrays and subset data to the intersect of variants present on both platforms (n = 461,677 SNPs; n = 21,692 individuals). After subsequently removing palindromic sites with a missingness rate of >1%, this resulted in a total of 377,799 SNPs and 25,750 individuals for downstream analysis. Phasing was performed per chromosome with the EAGLEv.2.0.524Loh P.-R. Danecek P. Palamara P.F. Fuchsberger C. A Reshef Y. K Finucane H. Schoenherr S. Forer L. McCarthy S. Abecasis G.R. et al.Reference-based phasing using the Haplotype Reference Consortium panel.Nat. Genet. 2016; 48: 1443-1448Google Scholar software using the genetic map (hg19) that is included in the EAGLEv.2.0.5 package. An additional two individuals were excluded during the phasing process if they had a per chromosome level missingness rate of greater than 10% for any one autosome, leaving n = 25,748 individuals in total. Phased output from EAGLE was filtered to a MAF of ≥1% and converted to PLINK format using fcGENE.25Roshyara N.R. Scholz M. fcGENE: a versatile tool for processing and transforming SNP datasets.PLoS ONE. 2014; 9: e97589Google Scholar This was used as input for the GERMLINE algorithm.26Gusev A. Lowe J.K. Stoffel M. Daly M.J. Altshuler D. Breslow J.L. Friedman J.M. Pe’er I. Whole population, genome-wide mapping of hidden relatedness.Genome Res. 2009; 19: 318-326Google Scholar We ran GERMLINE over each autosome across all individuals simultaneously using the following flags: “-min_m 3 -err_hom 0 -err_het 2 -bits 25 –haploid.” For quality control, IBD that overlapped with low complexity regions were excluded, along with IBD that fell within regions of excessive IBD sharing (which we defined as regions of the genome where the level of pairwise IBD sharing exceeded three standard deviations above the genome-wide mean). We summed IBD haplotypes along the genome of all n = 25,748 participants and used to construct an adjacency matrix where each node represented a BioMe participant and each weighted edge represented the pairwise sum of IBD sharing between a given pair of individuals. After first excluding edges sharing > = 1500cM of their genome IBD, we employed the InfoMap27Rosvall M. Bergstrom C.T. Maps of random walks on complex networks reveal community structure.Proc. Natl. Acad. Sci. USA. 2008; 105: 1118-1123Google Scholar,28Rosvall M. Axelsson D. Bergstrom C.T. The map equation.Eur. Phys. J. Spec. Top. 2009; 178: 13-23Google Scholar as implemented in the iGraph package (R v.3.2.0) to uncover communities of individuals enriched for IBD sharing. We uncovered a community of N = 5,100 individuals who, based on self-reporting labels, we defined as the Puerto Rican ancestry identical-by-descent community going forward. We first clustered IBD haplotypes inferred via GERMLINE into homologous cliques using the DASH29Gusev A. Kenny E.E. Lowe J.K. Salit J. Saxena R. Kathiresan S. Altshuler D.M. Friedman J.M. Breslow J.L. Pe’er I. DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation.Am. J. Hum. Genet. 2011; 88: 706-717Google Scholar advanced (dash_adv) algorithm across all BioMe participants, including the following additional parameters: “-win 250000 -r2 1.” We then extracted the Puerto Rican community (N = 5,100) from the DASH output and recoded individuals who were homozygous for a given IBD clique as “1” and those who were heterozygous or who were not members of the clique as “0.” We then used this as the primary predictor variable for an IBD-based phenome-wide association that was modeled using an implementation of the Saddle Point approximation30Dey R. Schmidt E.M. Abecasis G.R. Lee S. A Fast and Accurate Algorithm to Test for Binary Phenotypes and Its Application to PheWAS.Am. J. Hum. Genet. 2017; 101: 37-49Google Scholar (using the R package “SPAtest,” R v.3.2.0), with age and sex included as covariates. For each test, one individual from each pair of directly related individuals was excluded prior to association, preferentially excluding “controls” to “cases” for each ICD-9 code. Alignment and variant calling of whole-genome sequence (WGS) data was performed using the pipeline provided by Linderman et al.31Linderman M.D. Brandt T. Edelmann L. Jabado O. Kasai Y. Kornreich R. Mahajan M. Shah H. Kasarskis A. Schadt E.E. Analytical validation of whole exome and whole genome sequencing for clinical applications.BMC Med. Genomics. 2014; 7: 20Google Scholar Further variant annotation was performed using Variant Effect Predictor. These annotations were then intersected with the WGS data for the three homozygotes using an in-house python script. A phenome-wide association of ABCB4 c.2784−12T>C carrier status was conducted using the SAIGE software21Vishnu A. Belbin G.M. Wojcik G.L. Bottinger E.P. Gignoux C.R. Kenny E.E. Loos R.J.F. The role of country of birth, and genetic and self-identified ancestry, in obesity susceptibility among African and Hispanic Americans.Am. J. Clin. Nutr. 2019; 110: 16-23Google Scholar,32Zhou W. Nielsen J.B. Fritsche L.G. Dey R. Gabrielsen M.E. Wolford B.N. LeFaive J. VandeHaar P. Gagliano S.A. Gifford A. et al.Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies.Nat. Genet. 2018; 50: 1335-1341Google Scholar for a total of n = 4,903 Puerto Rican ancestry participants (homozygous individuals were excluded). ICD-9 billing codes served as the phenotypic outcome, and we included age, sex, and the first five principal components (PCs) as covariates, as well as a general relatedness matrix (GRM) to account for relatedness. The association analysis was restricted to ICD-9 codes for which three or more affected individuals were present among carriers (n = 550 ICD-9 codes). Outpatient values for nine laboratory tests for liver enzymes and liver function were extracted from EHRs. For each individual, the median value was taken for each trait. Individuals were stratified according to sex, and outliers that fell more than four standard deviations from the sex-specific population median were excluded. Sex-specific values were subsequently log-transformed and converted to z-scores (mean 0, standard deviation 1) before the data were recombined. These z-scores were then used as the phenotypic outcome in a linear model that included age as a covariate. Related individuals were excluded from the analysis, as were the five individuals who were homozygous for the ABCB4 c.2784−12T>C variant. Manual chart review was performed by a physician blinded to the subject’s ABCB4 c.2784−12T>C carrier status. Subjects with hepatitis C causing viral hepatitis were excluded from further analyses. Text search was performed for “liver disease,” “fatty liver,” “NAFLD,” “fibrosis,” “steatosis,” “sclerosing cholangitis,” and “cirrhosis.” A review of all prior abdominal imaging was performed, specifically assessing for phrases such as “nodular” or “hyperechogenic” liver. If any of these searches yielded a positive result, then clinical notes, alcohol history, BMI, liver function tests, FibroScan results, and any liver biopsies were reviewed to establish the etiology and severity of the subject’s liver disease. A two-tailed Fisher’s exact test was performed to assess for associations between carrier status and the presence of any non-viral liver disease, and a p value of < 0.05 was considered significant. We amplified (PrimeSTAR GXL DNA Polymerase, Takara Bio) and cloned a 4,340 bp ABCB4 genomic region from exons 22 to exons 24 into the pCR2.1-TOPO vector (TOPO TA cloning kit, Invitrogen) using the following forward and reverse primers: 5′-GCGATCGCC ATG GTG TCT TTG ACC CAG GAA AGA AA-3′ and 5′-ACG CGT AGA ACT GGC ATG TCC TAG AGC C-3′. Sequence verified pCR2.1-TOPO with this fragment was used as a template to re-amplify the insert (PrimeSTAR GXL DNA Polymerase, Takara Bio) using the following forward and reverse primers: 5′-CAC TTG GCG ATC GCC ATG GTG TCT TTG ACC CAG GAA AGA A-3′ and 5′-GAT AAC ACG CGT AGA ACT GGC ATG TCC TAG AGC C-3′. The primers introduce a 5′ AsiSI/SgfI and 3′ MluI restriction site (bold and underlined) that were used for cloning the fragment into the pCMV6-entry vector (Origene). The c.2784−12T>C variant was introduced using site-directed mutagenesis (Q5 Site-Directed Mutagenesis kit, NEB) with the following oligonucleotides primers: Q5-Fw 5′-AGTATACTGAcTTGCTTTTCAG-3′ (mutated nucleotide in lower case) and Q5-Rev 5′-TGTAACCATCTCTTCAGC-3′. The wild-type and variant pCMV6-ABCB4 were sequenced to confirm the absence and presence of the variant. Both vectors were transfected into HEK293 cells using Lipofectamine 2000. After 24 h, cells were lysed in QIAzol and RNA isolated (RNeasy mini kit, QIAGEN). RNA was used for cDNA synthesis (SuperScript IV First-strand Synthesis System, Invitrogen) after which the splicing of exons 22–24 was studied using PCR. Because HEK293 cells express low levels of native ABCB4, we used the forward primer annealing in exon 22 used for cloning and a reverse primer on the MYC-DDK tag of the pCMV6 vector: DDK reverse 5′-CCT TAT CGT CGT CAT CCT TGT AAT CC-3′. All PCR fragments were Sanger sequenced to confirm their identity. We previously inferred IBD sharing across BioMe and used IBD haplotypes to cluster individuals into communities linked by recent shared ancestry, as described in Belbin et al.33Belbin G.M. Wenric S. Cullina S. Glicksberg B.S. Moscati A. Wojcik G.L. Shemirani R. Beckmann N.D. Cohain A. Sorokin E.P. et al.Towards a fine-scale population health monitoring system.Cell. 2019; 184: 2068-2083Google Scholar By using this method, we identified a community of individuals of Puerto Rican (PR) ancestry and observed elevated IBD sharing within this group, suggestive of a founder effect. We clustered IBD haplotypes locally along the genome by homology and identified 4,526,956 homologous IBD-clusters within the PR population. Examining the frequency spectrum of these haplotypic alleles, we observed most to be rare (median haplotypic frequency = 0.04%, Figure S1). We hypothesized that we may be able to leverage these haplotypic alleles as proxies for unobserved rare variants in an association testing framework designed for discovery of monogenic recessive disorders (Figure 1). To systematically explore the relationship between haplotypic alleles and HER-derived health outcomes, we performed a PheWAS under a recessive model implementing the Saddle Point Approximation, which accommodates for rare observations and instances of extreme case-control imbalance. Because our method depends on leveraging cryptic relatedness, we applied our approach specifically within PR BioMe participants, on the basis of previous observations of a founder effect within this group.18Belbin G.M. Odgis J. Sorokin E.P. Yee M.-C. Kohli S. Glicksberg B.S. Gignoux C.R. Wojcik G.L. Van Vleck T. Jeff J.M. et al.Genetic identification of a common collagen disease in puerto ricans via identity-by-descent mapping in a health system.eLife. 2017; 6: 6Google Scholar In our model, the haplotypic alleles served as the primary predictor variable and ICD-9 billing codes served as the outcome variable. We restricted analysis to 754 haplotypic alleles for which there were at least 3 observations of individuals that were homozygous for a shared IBD haplotype, and we systematically tested these for association against each ICD-9 code (n = 3,679,520 tests in total). Only one association achieved study-wide significance (SWS, threshold: p < 1.4 × 10−8), an association at a haplotypic allele at 7q21.12 (p < 2.9 × 10−9, haplotypic frequency = 0.7%) (Figure 2A). The significant haplotypic allele represented 3 individuals who were each homozygous for a homologous segment of IBD at the region, and all of whom had EHR record of the rare ICD-9 code “571.6” (which encodes for biliary cirrhosis). While not study-wide significant, the haplotypic allele was also associated with the ICD-9 code “576.1” (which encodes for cholangitis; p < 9.9 × 10−8). In addition to the three individuals who were homozygous for the IBD haplotype at 7q21.12, n = 70 individuals carried the haplotype in the heterozygous state. The significant haplotypic allele spanned a large interval (minimum shared boundary: chr7:86,817,459–90,407,237) (Figure 2B) and contained 21 known genes. To fine-map the signal, we performed whole-genome sequencing of all three homozygous carriers and characterized variants that fell within the minimum shared boundary of the haplotypic allele. Under the hypothesis that the causal variant would be rare, we filtered to retain only variants with a global minor allele frequency of <1% (in any population group from gnoMAD or 1000 Genomes; Table S1). We identified a total of 195 that were shared in the homozygous state between all three individuals, none of which represented non-synonymous coding variation. We found 24 sites homozygous in all three individuals that were also present in ClinVar (Table S2). Intersecting this list with the allele frequency data, only one variant had a MAF of <1% across all population databases, a single nucleotide variant (rs201498350, GenBank: NM_000443.4; ABCB4: c.2784−12T>C); this variant had been asserted as “likely pathogenic” for “progressive familial intrahepatic cholestasis, type 1” (PFIC1; MIM: 211600) by a single submitter. The ABCB4 c.2784−12T>C variant has a CADD34Rentzsch P. Witten D. Cooper G.M. Shendure J. Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome.Nucleic Acids Res. 2019; 47: D886-D894Google Scholar score of 15.9 and a spliceAI35Jaganathan K. Kyriazopoulou Panagiotopoulou S. McRae J.F. Darbandi S.F. Knowles D. Li Y.I. Kosmicki J.A. Arbelaez J. Cui W. Schwartz G.B. et al.Predicting Splicing from Primary Sequence with Deep Learning.Cell. 2019; 176: 535-548.e24Google Scholar score of 0.39 (interpreted as the probability of causing a splice acceptor loss). This variant occurs in a polypyrimidine tract 12 bp from the 3′ splice site of intron 22. The natural occurrence of ABCB4 mRNAs (GenBank: NM_018850.2) that lack exon 23 indicates that this splice site is weak and prone to exon skipping. This is further supported by our observation when examining HEK293 cells that ABCB4 cDNA fragments are expressed both with and without exon 23 (Figure S2). Skipping of exon 23 leads to a 141-bp deletion and likely encodes for a non-functional protein due to the deletion of 47 amino acids (929 to 975), which encompasses the majority of transmembrane helix 11 and the last extracellular loop. To test whether ABCB4 c.2784−12T>C affects splicing of exon 23, we cloned a genomic region of ABCB4 containing exons 22 to 24 in an expression vector and expressed this fragment in HEK293 cells (Figure 3). RT-PCR shows that the resulting pre-mRNA fragment is spliced into mRNA with and without exon 23. In this assay, the mRNA without exon 23 is more abundant than the mRNA with exon 23. Mutating the consensus T at the −12 position of intron 22 into the less-favored pyrimidine C further decreases splicing efficiency at this acceptor site. Mutating it to the purine G appears to prevent splicing completely. Our results show that the splice acceptor site of intron 22 is weak and that the c.2784−12T>C variant increases skipping of exon 23. Subsequent to the discovery of the c.2784−12T>C variant, we obtained exome sequencing data for a larger dataset of unrelated BioMe participants (N = 28,344). This included N = 4,332 PR participants who were in the original discovery dataset, and N = 1,015 independent PR participants. Leveraging off-target exome sequencing reads in the independent dataset, we identified two additional participants who were homozygous for the c.2784−12T>C variant. A subject domain expert performed manual chart review of all five homozygotes. Evaluation of outpatient measures of serum liver enzyme levels and liver function tests revealed significant elevation of measures consistent with liver disease (Table 1). Four of the five homozygotes were found to have a diagnosis of cirrhosis on chart review, and the fifth had liver steatosis on imaging. Each homozygote had a distinct etiology of their liver disease: alcohol-associated cirrhosis, primary sclerosing cholangitis, primary biliary cholangitis (with possible component of alcohol-associated liver disease), and cryptogenic cirrhosis. Two had undergone liver transplant and one was found to hav" @default.
- W3150556068 created "2021-04-13" @default.
- W3150556068 creator A5009285146 @default.
- W3150556068 creator A5029045377 @default.
- W3150556068 creator A5033642313 @default.
- W3150556068 creator A5046822527 @default.
- W3150556068 creator A5060860411 @default.
- W3150556068 creator A5063029982 @default.
- W3150556068 creator A5070483300 @default.
- W3150556068 creator A5074665059 @default.
- W3150556068 creator A5084156654 @default.
- W3150556068 creator A5086266728 @default.
- W3150556068 creator A5087349100 @default.
- W3150556068 creator A5091349510 @default.
- W3150556068 date "2021-11-01" @default.
- W3150556068 modified "2023-09-27" @default.
- W3150556068 title "Leveraging health systems data to characterize a large effect variant conferring risk for liver disease in Puerto Ricans" @default.
- W3150556068 cites W1565651933 @default.
- W3150556068 cites W1590537655 @default.
- W3150556068 cites W1732767131 @default.
- W3150556068 cites W1810783817 @default.
- W3150556068 cites W1971957145 @default.
- W3150556068 cites W1972586062 @default.
- W3150556068 cites W1975541219 @default.
- W3150556068 cites W1984935905 @default.
- W3150556068 cites W1988952847 @default.
- W3150556068 cites W1990010616 @default.
- W3150556068 cites W1990139627 @default.
- W3150556068 cites W1994331453 @default.
- W3150556068 cites W1996030209 @default.
- W3150556068 cites W1996240187 @default.
- W3150556068 cites W2017511982 @default.
- W3150556068 cites W2023854996 @default.
- W3150556068 cites W2030988632 @default.
- W3150556068 cites W2038204679 @default.
- W3150556068 cites W2041575399 @default.
- W3150556068 cites W2045335698 @default.
- W3150556068 cites W2045921037 @default.
- W3150556068 cites W2051904003 @default.
- W3150556068 cites W2056770102 @default.
- W3150556068 cites W2060116186 @default.
- W3150556068 cites W2068378322 @default.
- W3150556068 cites W2081836052 @default.
- W3150556068 cites W2099085143 @default.
- W3150556068 cites W2099686444 @default.
- W3150556068 cites W2108293406 @default.
- W3150556068 cites W2113105800 @default.
- W3150556068 cites W2113697014 @default.
- W3150556068 cites W2113743121 @default.
- W3150556068 cites W2119468178 @default.
- W3150556068 cites W2123623546 @default.
- W3150556068 cites W2124209874 @default.
- W3150556068 cites W2131894220 @default.
- W3150556068 cites W2147234914 @default.
- W3150556068 cites W2148874079 @default.
- W3150556068 cites W2161633633 @default.
- W3150556068 cites W2162327089 @default.
- W3150556068 cites W2164998314 @default.
- W3150556068 cites W2168933019 @default.
- W3150556068 cites W2172134989 @default.
- W3150556068 cites W2529241974 @default.
- W3150556068 cites W2531587846 @default.
- W3150556068 cites W2555379171 @default.
- W3150556068 cites W2567297878 @default.
- W3150556068 cites W2789833091 @default.
- W3150556068 cites W2791466251 @default.
- W3150556068 cites W2898210835 @default.
- W3150556068 cites W2905452503 @default.
- W3150556068 cites W2909194804 @default.
- W3150556068 cites W2923757114 @default.
- W3150556068 cites W2948958588 @default.
- W3150556068 cites W2950099124 @default.
- W3150556068 cites W2951672808 @default.
- W3150556068 cites W2951886963 @default.
- W3150556068 cites W2952535009 @default.
- W3150556068 cites W2976146569 @default.
- W3150556068 cites W2984144402 @default.
- W3150556068 cites W2999574254 @default.
- W3150556068 cites W3021883529 @default.
- W3150556068 cites W3029078495 @default.
- W3150556068 cites W3029661147 @default.
- W3150556068 cites W3038072559 @default.
- W3150556068 cites W3094550675 @default.
- W3150556068 doi "https://doi.org/10.1016/j.ajhg.2021.09.016" @default.
- W3150556068 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/8595966" @default.
- W3150556068 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/34678161" @default.
- W3150556068 hasPublicationYear "2021" @default.
- W3150556068 type Work @default.
- W3150556068 sameAs 3150556068 @default.
- W3150556068 citedByCount "4" @default.
- W3150556068 countsByYear W31505560682022 @default.
- W3150556068 countsByYear W31505560682023 @default.
- W3150556068 crossrefType "journal-article" @default.
- W3150556068 hasAuthorship W3150556068A5009285146 @default.
- W3150556068 hasAuthorship W3150556068A5029045377 @default.
- W3150556068 hasAuthorship W3150556068A5033642313 @default.
- W3150556068 hasAuthorship W3150556068A5046822527 @default.
- W3150556068 hasAuthorship W3150556068A5060860411 @default.