Matches in SemOpenAlex for { <https://semopenalex.org/work/W2891885918> ?p ?o ?g. }
- W2891885918 endingPage "200.e6" @default.
- W2891885918 startingPage "187" @default.
- W2891885918 abstract "•Whippet, a new method for the rapid and accurate profiling of alternative splicing•Whippet reliably detects and quantifies complex alternative splicing events•Approximately one-third of human genes simultaneously express multiple major isoforms•Complex splicing events are conserved, tissue regulated, and more prevalent in cancer Alternative splicing (AS) is a widespread process underlying the generation of transcriptomic and proteomic diversity and is frequently misregulated in human disease. Accordingly, an important goal of biomedical research is the development of tools capable of comprehensively, accurately, and efficiently profiling AS. Here, we describe Whippet, an easy-to-use RNA-seq analysis method that rapidly—with hardware requirements compatible with a laptop—models and quantifies AS events of any complexity without loss of accuracy. Using an entropic measure of splicing complexity, Whippet reveals that one-third of human protein coding genes produce transcripts with complex AS events involving co-expression of two or more principal splice isoforms. We observe that high-entropy AS events are more prevalent in tumor relative to matched normal tissues and correlate with increased expression of proto-oncogenic splicing factors. Whippet thus affords the rapid and accurate analysis of AS events of any complexity, and as such will facilitate future biomedical research. Alternative splicing (AS) is a widespread process underlying the generation of transcriptomic and proteomic diversity and is frequently misregulated in human disease. Accordingly, an important goal of biomedical research is the development of tools capable of comprehensively, accurately, and efficiently profiling AS. Here, we describe Whippet, an easy-to-use RNA-seq analysis method that rapidly—with hardware requirements compatible with a laptop—models and quantifies AS events of any complexity without loss of accuracy. Using an entropic measure of splicing complexity, Whippet reveals that one-third of human protein coding genes produce transcripts with complex AS events involving co-expression of two or more principal splice isoforms. We observe that high-entropy AS events are more prevalent in tumor relative to matched normal tissues and correlate with increased expression of proto-oncogenic splicing factors. Whippet thus affords the rapid and accurate analysis of AS events of any complexity, and as such will facilitate future biomedical research. High-throughput RNA sequencing (RNA-seq) technologies are producing vast repositories of transcriptome profiling data at an ever-expanding pace (Silvester et al., 2018Silvester N. Alako B. Amid C. Cerdeño-Tarrága A. Clarke L. Cleland I. Harrison P.W. Jayathilaka S. Kay S. Keane T. et al.The European Nucleotide Archive in 2017.Nucleic Acids Res. 2018; 46: D36-D40Crossref PubMed Scopus (56) Google Scholar). This explosion in data has enabled genome-wide investigations of the role of alternative splicing (AS) in gene regulation and its dysregulation in human diseases and disorders. Initial investigations using RNA-seq data revealed that ∼95% of human multi-exon gene transcripts undergo AS (Pan et al., 2008Pan Q. Shai O. Lee L.J. Frey B.J. Blencowe B.J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing.Nat. Genet. 2008; 40: 1413-1415Crossref PubMed Scopus (2542) Google Scholar, Wang et al., 2008Wang E.T. Sandberg R. Luo S. Khrebtukova I. Zhang L. Mayr C. Kingsmore S.F. Schroth G.P. Burge C.B. Alternative isoform regulation in human tissue transcriptomes.Nature. 2008; 456: 470-476Crossref PubMed Scopus (3566) Google Scholar). These and more recent studies analyzing ribosome-engaged transcripts and quantitative mass spectrometry data suggest that AS is a major process underlying the generation of transcriptomic and proteomic complexity (Floor and Doudna, 2016Floor S.N. Doudna J.A. Tunable protein synthesis by transcript isoforms in human cells.eLife. 2016; 5: e10921Crossref PubMed Scopus (13) Google Scholar, Liu et al., 2017Liu Y. Gonzàlez-Porta M. Santos S. Brazma A. Marioni J.C. Aebersold R. Venkitaraman A.R. Wickramasinghe V.O. Impact of Alternative Splicing on the Human Proteome.Cell Rep. 2017; 20: 1229-1241Abstract Full Text Full Text PDF PubMed Scopus (87) Google Scholar, Sterne-Weiler et al., 2013Sterne-Weiler T. Martinez-Nunez R.T. Howard J.M. Cvitovik I. Katzman S. Tariq M.A. Pourmand N. Sanford J.R. Frac-seq reveals isoform-specific recruitment to polyribosomes.Genome Res. 2013; 23: 1615-1623Crossref PubMed Scopus (66) Google Scholar, Weatheritt et al., 2016Weatheritt R.J. Sterne-Weiler T. Blencowe B.J. The ribosome-engaged landscape of alternative splicing.Nat. Struct. Mol. Biol. 2016; 23: 1117-1123Crossref PubMed Scopus (84) Google Scholar; reviewed in Blencowe, 2017Blencowe B.J. The Relationship between Alternative Splicing and Proteomic Complexity.Trends Biochem. Sci. 2017; 42: 407-408Abstract Full Text Full Text PDF PubMed Scopus (82) Google Scholar). Furthermore, numerous AS events belonging to co-regulated and evolutionarily conserved exon networks have been shown to provide critical functions in diverse processes (Baralle and Giudice, 2017Baralle F.E. Giudice J. Alternative splicing as a regulator of development and tissue identity.Nat. Rev. Mol. Cell Biol. 2017; 18: 437-451Crossref PubMed Scopus (569) Google Scholar, Tapial et al., 2017Tapial J. Ha K.C.H. Sterne-Weiler T. Gohr A. Braunschweig U. Hermoso-Pulido A. Quesnel-Vallières M. Permanyer J. Sodaei R. Marquez Y. et al.An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms.Genome Res. 2017; 27: 1759-1768Crossref PubMed Scopus (165) Google Scholar). A major challenge confronting genome-wide investigations of AS is that existing methods for analyzing RNA-seq data require extensive computational resources and expertise. For example, widely employed tools involve alignment of reads to a transcriptome or reference genome, followed by quantification by downstream methods that estimate percent spliced in (PSI, Ψ) values for each AS event, such as cassette exons, alternative 5′ and 3′ splice sites, and retained introns. These steps can be time consuming and typically present a bottleneck when analyzing large datasets. Recent developments in transcript expression quantification have circumvented traditional alignment steps by extracting k-mers (i.e., all possible sequences of length k) from reads to identify possible transcripts of origin. Such methods can decrease processing times by 10- to 100-fold (Bray et al., 2016Bray N.L. Pimentel H. Melsted P. Pachter L. Near-optimal probabilistic RNA-seq quantification.Nat. Biotechnol. 2016; 34: 525-527Crossref PubMed Scopus (3805) Google Scholar, Patro et al., 2017Patro R. Duggal G. Love M.I. Irizarry R.A. Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression.Nat. Methods. 2017; 14: 417-419Crossref PubMed Scopus (3602) Google Scholar). However, their accuracy relies on whole “transcript-level” annotation models (i.e., models that record the precise location of intron and exon boundaries, and spliced junctions, for all transcripts), which are incomplete for the majority of species, and inconsistent among even the best-annotated species. The lack of complete annotation models can thus confound the accurate detection and quantification of AS events when using transcript-level methods. More widely used methods for RNA-seq analysis, focusing on the local detection and quantification of AS events, are referred to below as “event-level” approaches (Figure S1A; Katz et al., 2010Katz Y. Wang E.T. Airoldi E.M. Burge C.B. Analysis and design of RNA sequencing experiments for identifying isoform regulation.Nat. Methods. 2010; 7: 1009-1015Crossref PubMed Scopus (887) Google Scholar, Tapial et al., 2017Tapial J. Ha K.C.H. Sterne-Weiler T. Gohr A. Braunschweig U. Hermoso-Pulido A. Quesnel-Vallières M. Permanyer J. Sodaei R. Marquez Y. et al.An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms.Genome Res. 2017; 27: 1759-1768Crossref PubMed Scopus (165) Google Scholar, Wang et al., 2017Wang J. Pan Y. Shen S. Lin L. Xing Y. rMATS-DVR: rMATS discovery of differential variants in RNA.Bioinformatics. 2017; 33: 2216-2217Crossref PubMed Scopus (20) Google Scholar). These methods can achieve considerable accuracy for simple AS events (Vaquero-Garcia et al., 2016Vaquero-Garcia J. Barrera A. Gazzara M.R. González-Vallinas J. Lahens N.F. Hogenesch J.B. Lynch K.W. Barash Y. A new view of transcriptome complexity and regulation through the lens of local splicing variations.eLife. 2016; 5: e11752Crossref PubMed Scopus (188) Google Scholar), yet existing tools are computationally inefficient in comparison with transcript-level methods, and most utilize predetermined simple binary models (i.e., a single alternative exon surrounded by two constitutive exons), making them poorly suited for the analysis of complex AS patterns. In light of these challenges, an important goal for understanding how transcriptomes shape biological processes is to develop methods capable of accurately analyzing simple and complex AS patterns with high efficiency. To address these challenges, we have developed Whippet, an easy-to-use, event-level software tool for the accurate and efficient detection and quantification of AS events of any complexity. Whippet has computational requirements compatible with a laptop computer and is capable of analyzing reads streamed from web-accessible data files by entering a file accession number. Another feature of Whippet is that it uses an entropic measure of AS to facilitate the accurate profiling of AS. We demonstrate the utility of Whippet in the discovery of previously uncharacterized AS complexity in vertebrate transcriptomes associated with the regulation of tandem domains and other protein sequence features, as well as a remarkable increase in AS complexity in cancer transcriptomes. Whippet models transcriptome structure by building “contiguous splice graphs” (CSGs). These are directed graphs whose nodes are non-overlapping exonic sequences, and edges (i.e., connections between nodes) represent splice junctions or adjacent exonic regions (Figures 1A and 1B ). Splice graphs allow single isoforms to be represented as paths through nodes in the graph (Heber et al., 2002Heber S. Alekseyev M. Sze S.H. Tang H. Pevzner P.A. Splicing graphs and EST assembly problem.Bioinformatics. 2002; 18: S181-S188Crossref PubMed Scopus (152) Google Scholar, Trapnell et al., 2010Trapnell C. Williams B.A. Pertea G. Mortazavi A. Kwan G. van Baren M.J. Salzberg S.L. Wold B.J. Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.Nat. Biotechnol. 2010; 28: 511-515Crossref PubMed Scopus (10568) Google Scholar, Vaquero-Garcia et al., 2016Vaquero-Garcia J. Barrera A. Gazzara M.R. González-Vallinas J. Lahens N.F. Hogenesch J.B. Lynch K.W. Barash Y. A new view of transcriptome complexity and regulation through the lens of local splicing variations.eLife. 2016; 5: e11752Crossref PubMed Scopus (188) Google Scholar). Whippet’s CSGs extend the concept of splice graphs to a lightweight data structure that indexes the transcriptome for fast and modular alignment of raw RNA-seq reads across splice junctions (Figures 1B and 1C). To facilitate indexing, Whippet defines incoming and outgoing boundary types (e.g., 5′ or 3′ splice sites or transcription start or end sites; refer to Figure 1B legend for details) that specify the theoretical connectivity through the CSG for each node (Figures 1B and S1B). For each 5′ or 3′ splice site boundary, Whippet’s CSG index records an upstream or downstream k-mer, respectively, so as to enable efficient spliced read alignment across all possible splice junctions; this includes junctions that do not occur within annotated transcripts but which combine annotated donor or acceptor splice sites (Figures 1B–1D, S1C, and S1D; see STAR Methods for details). For example, Whippet’s CSG index for the human genome hg19 build can represent AS events from >1.3 million exon-exon junctions in >2.3 billion theoretically possible isoform paths, whereas only ∼100,000 of these paths are found in GENCODE v25 TSL1 annotated transcripts. After alignment, a Whippet AS event is defined as the collective set of a node’s skipping or connecting edges (e.g., edge 1-3 skips node 2, and edges 1-2 and 2-3 connect to node 2 in Figure 1E; see STAR Methods). When enumerating paths through a node’s AS event, it is possible that multiple paths share common (i.e., ambiguous) edges (e.g., edges 1-2 and 3-4 are shared among multiple paths in Figure 1E). Therefore, to accurately quantify all AS events, the proportional abundance of each path is determined using maximum likelihood estimation by the expectation-maximization (EM) algorithm (see STAR Methods). The percent spliced in (PSI, Ψ; range 0.0 to 1.0) value of a node is then calculated as the sum of the proportional abundance of the paths containing the node (Figure 1E). To assess Whippet’s accuracy, we compared its Ψ values with those measured from RT-PCR data and commonly used RNA-seq event-level analysis tools (Irimia et al., 2014Irimia M. Weatheritt R.J. Ellis J.D. Parikshak N.N. Gonatopoulos-Pournatzis T. Babor M. Quesnel-Vallières M. Tapial J. Raj B. O’Hanlon D. et al.A highly conserved program of neuronal microexons is misregulated in autistic brains.Cell. 2014; 159: 1511-1523Abstract Full Text Full Text PDF PubMed Scopus (343) Google Scholar, Katz et al., 2010Katz Y. Wang E.T. Airoldi E.M. Burge C.B. Analysis and design of RNA sequencing experiments for identifying isoform regulation.Nat. Methods. 2010; 7: 1009-1015Crossref PubMed Scopus (887) Google Scholar, Wang et al., 2017Wang J. Pan Y. Shen S. Lin L. Xing Y. rMATS-DVR: rMATS discovery of differential variants in RNA.Bioinformatics. 2017; 33: 2216-2217Crossref PubMed Scopus (20) Google Scholar, Vaquero-Garcia et al., 2016Vaquero-Garcia J. Barrera A. Gazzara M.R. González-Vallinas J. Lahens N.F. Hogenesch J.B. Lynch K.W. Barash Y. A new view of transcriptome complexity and regulation through the lens of local splicing variations.eLife. 2016; 5: e11752Crossref PubMed Scopus (188) Google Scholar)—which quantify Ψ using reads that directly map to an AS event—as well as transcript-level tools (Trincado et al., 2018Trincado J.L. Entizne J.C. Hysenaj G. Singh B. Skalic M. Elliott D.J. Eyras E. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions.Genome Biol. 2018; 19: 40Crossref PubMed Scopus (193) Google Scholar), which estimate Ψ based on reads mapping across entire transcripts (see Methods S1 and Figures S2A–S2G for details of mapping benchmarking). RT-PCR-derived and RNA-seq-derived Ψ values were both from adult mouse liver and cerebellum, as well as from stimulated and unstimulated human Jurkat T cell line samples (Vaquero-Garcia et al., 2016Vaquero-Garcia J. Barrera A. Gazzara M.R. González-Vallinas J. Lahens N.F. Hogenesch J.B. Lynch K.W. Barash Y. A new view of transcriptome complexity and regulation through the lens of local splicing variations.eLife. 2016; 5: e11752Crossref PubMed Scopus (188) Google Scholar). Notably, Whippet and the other event-level tools display ∼2.5-fold lower median error profiles compared to transcript-level methods, including SUPPA2 (Trincado et al., 2018Trincado J.L. Entizne J.C. Hysenaj G. Singh B. Skalic M. Elliott D.J. Eyras E. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions.Genome Biol. 2018; 19: 40Crossref PubMed Scopus (193) Google Scholar) and Whippet_TPM, an approach developed in the present study to afford direct comparisons of transcript-level Ψ estimates that maintain Whippet’s node definitions (Figures 2A, S2H, S3A, and S3B; Table S1; STAR Methods). Benchmarking against RT-PCR Ψ values, while informative, is limited by the relatively small sample set (n = 162), the types of the events assessed, and possible intrinsic technical biases introduced by PCR. To address this, we assessed the accuracy of Whippet relative to other tools when comparing their Ψ values against synthetic (i.e., “ground truth”) Ψ values simulated from RNA-seq data obtained from a reference transcriptome annotation (GENCODE v25 TSL1 for hg19; STAR Methods). In contrast to results from benchmarking against RT-PCR data, we find that transcript-level methods perform with similar accuracy to event-level approaches, including Whippet, when using simulated RNA-seq data (compare Figures 2A and 2B). This discrepancy is likely due to the artificial nature of the simulation, where the exact transcript-annotations used to generate the reads are provided to the quantification software. In the analysis of RNA-seq data from biological samples, the quantification software will likely be challenged by discrepancies between the annotation model and the set of true transcripts present in the sample (e.g., Figure 2C shows that a large percentage of alternative splice junctions in vertebrate species are not annotated in Ensembl). To investigate such effects, we simulated RNA-seq reads with ground-truth Ψ values using one annotation set (RefSeq Release 84 for hg19) and created an index database for each quantification program using another annotation set (GENCODE v25 TSL1 for hg19). Notably, in this comparison (and the inverse comparison in Figure S3C) there is a 2- to 2.5-fold increase in error rate for estimating Ψ values using transcript-level methods, but minimal change in error rate for any of the event-level tools, including Whippet (Figures 2B and S3D). We conclude that differences in transcript reference annotations can confound estimates for Ψ values when using transcript-level methods, whereas event-based methods are largely insensitive to this issue. The analyses so far used widely employed transcript annotations from human and mouse, which are among the most complete for any species. To assess Whippet’s performance when analyzing species with less extensively annotated transcripts, we applied it to RNA-seq data (Brawand et al., 2011Brawand D. Soumillon M. Necsulea A. Julien P. Csárdi G. Harrigan P. Weier M. Liechti A. Aximu-Petri A. Kircher M. et al.The evolution of gene expression levels in mammalian organs.Nature. 2011; 478: 343-348Crossref PubMed Scopus (759) Google Scholar) from five of the same tissues from gorilla, chimp, opossum, and chicken as well as from mouse and human. While ∼12% of alternative exon-exon junctions aligned by Whippet in human and mouse are unannotated, the percentage of unannotated AS junctions is in the range of 40%–80% in the other species (Figure 2C). These observations further indicate that transcript-level tools, and event-level tools reliant on annotated AS events, fail to detect a considerable amount of unannotated transcript diversity in vertebrates. In contrast, Whippet can detect and accurately quantify AS events involving numerous unannotated splice junctions represented by pairings of combinations of splice sites from its CSG indices (see also below). The benchmarks described so far focus on “simple” AS events, such as single-cassette alternative exons flanked by pre-defined constitutive exons that have binary splicing outcomes. However, many AS events involve splice sites that are variably paired with two or more other sites. Whippet provides output metrics designed to quantify such AS complexity in two related ways. First, it classifies AS events into discrete bins of complexity based on the number of enumerated paths from the event (i.e., n=⌈log2(paths)⌉ such that K(n) can produce at most 2n spliced outcomes for K1, …, K6; Figure 2D). Second, it calculates a Ψ-dependent measure of AS complexity using Shannon’s entropy (i.e., entropy = −Σi Ψi log2 Ψi such that the maximum entropy for an event in K(n) is n; Figures 2E, S4A, and S4B). This entropic measure conveniently formalizes the total number of possible outcomes for an event and the degree of their proportional contribution to the transcriptome in a read-depth- and read-length-independent manner (Figures S4C and S4D) To assess whether Whippet accurately quantifies AS events with increasing degrees of complexity and entropy, we simulated RNA-seq datasets and corresponding Ψ values for events in the formalized categories (K1, …, K6) of increasing complexity and distributed entropy (Figures 2D, 2E, and S4E). In contrast to other methods tested, the accuracy of Whippet-derived estimates for Ψ does not decrease as the complexity and entropy of the simulated AS events increases. This difference in performance is because Whippet has the unique feature among the event-level approaches tested of employing the EM algorithm to assign reads that are ambiguously shared between multiple paths through high-entropy AS events. This capability translates as a ∼2-3 fold greater accuracy for Whippet in the quantification of K2-K6 events than for other tested methods (Figures 2E, 2F, and S4F). To further assess Whippet’s performance relative to other methods, we next investigated whether transcript-level methods potentially achieve comparable accuracy when provided with a predefined annotation set that comprehensively represents complex events. To test this, we built a transcript annotation set from combinatorial Whippet graph paths (N4 annotation file, STAR Methods). While this annotation set allows SUPPA2 to detect unannotated AS events, its error rate in estimating Ψ values is still 4-fold higher than Whippet’s (Figures 2F, S4E, and S4F). To experimentally validate Whippet-derived predictions of high AS-event entropy, RNA-seq data (Raj et al., 2014Raj B. Irimia M. Braunschweig U. Sterne-Weiler T. O’Hanlon D. Lin Z.Y. Chen G.I. Easton L.E. Ule J. Gingras A.C. et al.A global regulatory mechanism for activating an exon network required for neurogenesis.Mol. Cell. 2014; 56: 90-103Abstract Full Text Full Text PDF PubMed Scopus (80) Google Scholar) from mouse neuroblastoma (N2a) cells were analyzed and 10 events with different predicted degrees of entropy and complexity involving tandem arrays of alternative exons were tested by RT-PCR (STAR Methods). Notably, 56/61 (91.8%) of the amplified spliced products were predicted by Whippet, whereas five (8.2%) of the expected isoforms were not detected. Of the detected products, 32 (52.5%) are consistent with annotated isoforms and 24 (39.3%) correspond to novel isoforms (Figures 2G and S5A). Collectively, these data demonstrate that Whippet is an accurate method for the analysis of both simple and complex AS events. To assess Whippet’s efficiency, we benchmarked speed and memory usage relative to published AS quantification methods. When analyzing several paired-end RNA-seq datasets from HeLa cells with increasing read depth (∼15 M, ∼25 M, and ∼50 M), Whippet quantifies AS from a raw paired-end 25 M RNA-seq read dataset in 43 minutes while using less than 1.5 GB of memory on a typical cluster node with a single core (Dual-Core AMD Opteron(tm) Processor 8218, 2.5 GHz, 60GB RAM, 1,024KB cache). This represents a considerable increase in performance over other tested event-level tools, and is of comparable performance to transcript-level methods (Figures 2H, S5B, and S5C; Table S2). For example, MISO, the most highly cited event-level tool, in combination with the read aligner STAR, took days and used 30 GB of memory to analyze the same data (Figures 2H and S5C), whereas the fastest transcript-level methods took approximately 20 minutes. It is important to note that when provided with annotation sets for complex AS events (e.g., N4 annotation file) the runtime and memory usage of transcript-level methods were greater than that of Whippet (Figures 2H and S5C). Moreover, on a personal laptop with a solid-state hard drive (Macbook Pro 3.1 GHz Intel i7), Whippet quantified the ∼25 M dataset in 15 minutes using downloaded data files and in 31 minutes when streaming data from the internet after inputting the SRA identifier. The considerably longer time taken to analyze the same data by MISO and some of the other event-level tools may be influenced by the hardware used to run these programs. The unique features of Whippet thus obviate the use of high-performance computational clusters for the quantitative profiling of AS using RNA-seq data. Taken together with the assessment of accuracy, the results indicate that Whippet offers advantages over other methods in terms of its capacity to reliably and efficiently detect and quantify AS events. Because previously described tools were not designed for the formalized quantitative profiling of AS complexity, we used Whippet to investigate the prevalence and possible biological relevance of high-complexity AS events in mammalian transcriptomes. To this end, we applied Whippet to an analysis of 60 diverse human and mouse tissue RNA-seq datasets (Table S3; Figures 3A and S6A). Remarkably, of more than 13,000 analyzed human protein coding genes, 42.68% harbor an AS event predicted to have an entropy >1.0 (i.e., two or more expressed isoforms) in at least one tissue (Figure S6B; see STAR Methods). Moreover, 4,101 (30.1%) of these genes co-express at least two major isoforms at similar levels in one or more of the same tissue (Figures 3B and S6C; STAR Methods). The majority (∼20%) of events are predicted to undergo substantial tissue-dependent changes in splicing entropy (Figure 3C) without concurrent changes in expression of the corresponding genes (Figure 3D; R2 = 0.074, Pearson correlation). These results contrast with previous proposals that the vast majority of mammalian genes express a single major splice variant (Gonzàlez-Porta et al., 2013Gonzàlez-Porta M. Frankish A. Rung J. Harrow J. Brazma A. Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene.Genome Biol. 2013; 14: R70Crossref PubMed Scopus (159) Google Scholar, Tress et al., 2017Tress M.L. Abascal F. Valencia A. Most Alternative Isoforms Are Not Functionally Important.Trends Biochem. Sci. 2017; 42: 408-410Abstract Full Text Full Text PDF PubMed Scopus (41) Google Scholar), and instead are consistent with data indicating that a substantial fraction of genes express multiple major isoforms either within or between different cell and tissue types (Tapial et al., 2017Tapial J. Ha K.C.H. Sterne-Weiler T. Gohr A. Braunschweig U. Hermoso-Pulido A. Quesnel-Vallières M. Permanyer J. Sodaei R. Marquez Y. et al.An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms.Genome Res. 2017; 27: 1759-1768Crossref PubMed Scopus (165) Google Scholar, Vaquero-Garcia et al., 2016Vaquero-Garcia J. Barrera A. Gazzara M.R. González-Vallinas J. Lahens N.F. Hogenesch J.B. Lynch K.W. Barash Y. A new view of transcriptome complexity and regulation through the lens of local splicing variations.eLife. 2016; 5: e11752Crossref PubMed Scopus (188) Google Scholar, Wang et al., 2008Wang E.T. Sandberg R. Luo S. Khrebtukova I. Zhang L. Mayr C. Kingsmore S.F. Schroth G.P. Burge C.B. Alternative isoform regulation in human tissue transcriptomes.Nature. 2008; 456: 470-476Crossref PubMed Scopus (3566) Google Scholar). However, new isoforms generated by high entropy AS events detected by Whippet further increase the estimated fraction of genes predicted to express multiple major isoforms compared to previous estimates (e.g., up to ∼40% versus ∼18% in Tapial et al., 2017Tapial J. Ha K.C.H. Sterne-Weiler T. Gohr A. Braunschweig U. Hermoso-Pulido A. Quesnel-Vallières M. Permanyer J. Sodaei R. Marquez Y. et al.An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms.Genome Res. 2017; 27: 1759-1768Crossref PubMed Scopus (165) Google Scholar). Supporting the possible biological relevance of these AS events, the corresponding genes are enriched in functions associated with the cytoskeleton, extracellular matrix organization, cell communication, signaling, and muscle biology (Figure 3E, p values < 0.05; FDR corrected). To further investigate the possible significance of high-entropy AS events detected by Whippet, we analyzed their evolutionary conservation using RNA-seq data from six of the same tissues from seven vertebrate species (Brawand et al., 2011Brawand D. Soumillon M. Necsulea A. Julien P. Csárdi G. Harrigan P. Weier M. Liechti A. Aximu-Petri A. Kircher M. et al.The evolution of gene expression levels in mammalian organs.Nature. 2011; 478: 343-348Crossref PubMed Scopus (759) Google Scholar), comparing entropy values for the orthologous exons (1,304 “low-entropy” [<1.0] and 369 “high-entropy” [>1.5] exons; Figures 4A, S6D, and S6E) in each species. This revealed a significantly greater concordance in both Ψ and entropy values for orthologous AS events between the analyzed species than expected by chance when compared to randomly permuted sets of exons from the same data (Figures 4B and 4C, low-entropy AS events: p < 2.2 × 10−16; high-entropy AS events: p < 4.3 × 10−4, Kolmogorov-Smirnov test; Figures S6F and S6G; see STAR Methods). Thus, overall, the degree of entropy of low- and high-complexity AS events detected and quantified by Whippet is conserved across vertebrate species, implying that these patterns may often be functionally important. We next asked whether these events are potentially translated. Due to the extremely limited coverage of currently available mass spectrometry data (Blencowe, 2017Blenc" @default.
- W2891885918 created "2018-09-27" @default.
- W2891885918 creator A5031170324 @default.
- W2891885918 creator A5032292978 @default.
- W2891885918 creator A5065522795 @default.
- W2891885918 creator A5084433086 @default.
- W2891885918 creator A5091334907 @default.
- W2891885918 date "2018-10-01" @default.
- W2891885918 modified "2023-10-16" @default.
- W2891885918 title "Efficient and Accurate Quantitative Profiling of Alternative Splicing Patterns of Any Complexity on a Laptop" @default.
- W2891885918 cites W1537923221 @default.
- W2891885918 cites W1829675031 @default.
- W2891885918 cites W1897931464 @default.
- W2891885918 cites W1964362163 @default.
- W2891885918 cites W1983275320 @default.
- W2891885918 cites W1996572869 @default.
- W2891885918 cites W1999574084 @default.
- W2891885918 cites W2000203435 @default.
- W2891885918 cites W2002354671 @default.
- W2891885918 cites W2004622806 @default.
- W2891885918 cites W2009258895 @default.
- W2891885918 cites W2009302210 @default.
- W2891885918 cites W2011920499 @default.
- W2891885918 cites W2012889542 @default.
- W2891885918 cites W2026771976 @default.
- W2891885918 cites W2034691490 @default.
- W2891885918 cites W2051119339 @default.
- W2891885918 cites W2065128082 @default.
- W2891885918 cites W2081604575 @default.
- W2891885918 cites W2096173343 @default.
- W2891885918 cites W2100779000 @default.
- W2891885918 cites W2102046311 @default.
- W2891885918 cites W2102278945 @default.
- W2891885918 cites W2104224685 @default.
- W2891885918 cites W2112876600 @default.
- W2891885918 cites W2132340288 @default.
- W2891885918 cites W2133528590 @default.
- W2891885918 cites W2136549268 @default.
- W2891885918 cites W2137799805 @default.
- W2891885918 cites W2138773756 @default.
- W2891885918 cites W2139480055 @default.
- W2891885918 cites W2140191982 @default.
- W2891885918 cites W2140729960 @default.
- W2891885918 cites W2141458291 @default.
- W2891885918 cites W2143447881 @default.
- W2891885918 cites W2154537630 @default.
- W2891885918 cites W2155804510 @default.
- W2891885918 cites W2156407548 @default.
- W2891885918 cites W2156561767 @default.
- W2891885918 cites W2159730169 @default.
- W2891885918 cites W2169456326 @default.
- W2891885918 cites W2179438025 @default.
- W2891885918 cites W2226665541 @default.
- W2891885918 cites W2255131131 @default.
- W2891885918 cites W2258573009 @default.
- W2891885918 cites W2323326409 @default.
- W2891885918 cites W2338046461 @default.
- W2891885918 cites W2346350699 @default.
- W2891885918 cites W2410306399 @default.
- W2891885918 cites W2554777987 @default.
- W2891885918 cites W2566979164 @default.
- W2891885918 cites W2592811885 @default.
- W2891885918 cites W2593821392 @default.
- W2891885918 cites W2610621384 @default.
- W2891885918 cites W2611211782 @default.
- W2891885918 cites W2612976808 @default.
- W2891885918 cites W2741879909 @default.
- W2891885918 cites W2750771238 @default.
- W2891885918 cites W2751663599 @default.
- W2891885918 cites W2768838077 @default.
- W2891885918 cites W2775051339 @default.
- W2891885918 cites W2949317881 @default.
- W2891885918 cites W2953020124 @default.
- W2891885918 doi "https://doi.org/10.1016/j.molcel.2018.08.018" @default.
- W2891885918 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/30220560" @default.
- W2891885918 hasPublicationYear "2018" @default.
- W2891885918 type Work @default.
- W2891885918 sameAs 2891885918 @default.
- W2891885918 citedByCount "104" @default.
- W2891885918 countsByYear W28918859182018 @default.
- W2891885918 countsByYear W28918859182019 @default.
- W2891885918 countsByYear W28918859182020 @default.
- W2891885918 countsByYear W28918859182021 @default.
- W2891885918 countsByYear W28918859182022 @default.
- W2891885918 countsByYear W28918859182023 @default.
- W2891885918 crossrefType "journal-article" @default.
- W2891885918 hasAuthorship W2891885918A5031170324 @default.
- W2891885918 hasAuthorship W2891885918A5032292978 @default.
- W2891885918 hasAuthorship W2891885918A5065522795 @default.
- W2891885918 hasAuthorship W2891885918A5084433086 @default.
- W2891885918 hasAuthorship W2891885918A5091334907 @default.
- W2891885918 hasBestOaLocation W28918859181 @default.
- W2891885918 hasConcept C104317684 @default.
- W2891885918 hasConcept C111919701 @default.
- W2891885918 hasConcept C187191949 @default.
- W2891885918 hasConcept C194583182 @default.
- W2891885918 hasConcept C2780008327 @default.
- W2891885918 hasConcept C36823959 @default.