SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4285333698> ?p ?o ?g. }

Showing items 1 to 49 of 49 with 100 items per page.

W4285333698 abstract "Article Figures and data Abstract Editor's evaluation eLife digest Introduction Results Discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract Most bacterial ORFs are identified by automated prediction algorithms. However, these algorithms often fail to identify ORFs lacking canonical features such as a length of >50 codons or the presence of an upstream Shine-Dalgarno sequence. Here, we use ribosome profiling approaches to identify actively translated ORFs in Mycobacterium tuberculosis. Most of the ORFs we identify have not been previously described, indicating that the M. tuberculosis transcriptome is pervasively translated. The newly described ORFs are predominantly short, with many encoding proteins of ≤50 amino acids. Codon usage of the newly discovered ORFs suggests that most have not been subject to purifying selection, and hence are unlikely to contribute to cell fitness. Nevertheless, we identify 90 new ORFs (median length of 52 codons) that bear the hallmarks of purifying selection. Thus, our data suggest that pervasive translation of short ORFs in Mycobacterium tuberculosis serves as a rich source for the evolution of new functional proteins. Editor's evaluation The use of ribosome profiling in this study allowed for the identification of translated regions of the Mycobacterium tuberculosis genome, identifying new genomic regions that undergo active translation. A select set of these appears to have been the subject of purifying evolutionary selection, suggesting that this pervasive translation of short genetic regions serves as the basis for the evolution of new proteins/protein functions. https://doi.org/10.7554/eLife.73980.sa0 Decision letter eLife's review process eLife digest How can you predict which proteins an organism can make? To answer this question, scientists often use computer programs that can scan the genetic information of a species for open reading frames – a type of DNA sequence that codes for a protein. However, very short genes and overlapping genes are often missed through these searches. Mycobacteria are a group of bacteria that includes the species Mycobacterium tuberculosis, which causes tuberculosis. Previous work has predicted several thousand open reading frames for M. tuberculosis, but Smith et al. decided to use a different approach to determine whether there could be more. They focused on ribosomes, the cellular structures that assemble a specific protein by reading the instructions provided by the corresponding gene. Examining the sections of genetic code that ribosomes were processing in M. tuberculosis uncovered hundreds of new open reading frames, most of which carried the instructions to make very short proteins. A closer look suggested that only 90 of these proteins were likely to have a useful role in the life of the bacteria, which could open new doors in tuberculosis research. The rest of the sequences showed no evidence of having evolved a useful job, yet they were still manufactured by the mycobacteria. This pervasive production could play a role in helping the bacteria adapt to quickly changing environments by evolving new, functional proteins. Introduction The canonical mode of bacterial translation initiation begins with the association of a 30 S ribosomal subunit, initiator tRNA, and initiation factors, with the ribosome binding site of an mRNA (Laursen et al., 2005). Binding of the 30 S initiation complex to the mRNA involves base-pairing interactions between the mRNA Shine-Dalgarno (S-D) sequence, located a short distance upstream of the start codon, and the anti-S-D sequence in the 16 S ribosomal RNA (rRNA). Local mRNA secondary structure around the ribosome binding site can reduce interaction with the 30 S initiation complex. Translation initiates at a start codon, typically an AUG; less frequently, translation initiation occurs at GUG or UUG, and in rare instances at AUC, AUU, and AUA start codons (Gvozdjak and Samanta, 2020; Hecht et al., 2017). Hence, the likelihood of translation initiation at a given sequence will depend on the sequence upstream of the start codon, the degree of secondary structure in the region surrounding the start codon, and start codon identity. Due to the requirement for a 5’ untranslated region that includes the S-D sequence, mRNAs translated using the canonical mechanism are referred to as ‘leadered’. By contrast, ‘leaderless’ translation initiation occurs on mRNAs that lack a 5’ UTR, such that the transcription start site (TSS) and translation start codon coincide. The mechanism of leaderless translation initiation is poorly understood. Until recently, there were few known examples of leaderless mRNAs; leaderless translation in the model bacterium Escherichia coli was shown to be rare and inefficient (Moll et al., 2002; Romero et al., 2014; Shell et al., 2015). However, recent studies indicate that leaderless translation initiation is a prevalent and robust mechanism in many bacterial and archaeal species (Beck and Moll, 2018). We and others showed that ~25% of all mRNAs in Mycobacterium smegmatis and Mycobacterium tuberculosis (Mtb) are leaderless (Cortes et al., 2013; Shell et al., 2015). Moreover, our data suggested that any RNA with a 5’ AUG or GUG will be efficiently translated using the leaderless mechanism in M. smegmatis (Shell et al., 2015). Bacterial open reading frames (ORFs) are typically identified from genome sequences using automated prediction algorithms (Besemer and Borodovsky, 2005; Delcher et al., 2007; Hyatt et al., 2010). Among the criteria used by these algorithms are ORF length, and the presence of a S-D sequence. Hence, they often fail to identify non-canonical ORFs, including overlapping ORFs (Burge and Karlin, 1998), leaderless ORFs (Beck and Moll, 2018; Lomsadze et al., 2018), and short ORFs (sORFs; encoding small proteins of 50 or fewer amino acids; most algorithms have a lower size limit of 50 codons). Recent studies have revealed hundreds of sORFs in diverse bacterial species (Orr et al., 2020; Sberro et al., 2019; Storz et al., 2014; Stringer et al., 2021; VanOrsdel et al., 2018; Weaver et al., 2019). Some sORFs encode functional small proteins that contribute to cell fitness, whereas other sORFs function as cis-acting regulators. In eukaryotes, there have been reports of ‘pervasive translation’ of thousands of unannotated sORFs, likely due to the imperfect specificity of the translation machinery (Ingolia et al., 2014; Ruiz-Orera et al., 2018; Wacholder et al., 2021). The function, if any, of these sORFs and their encoded proteins is unclear, although they are rarely subject to purifying selection (Ruiz-Orera et al., 2018; Wacholder et al., 2021). Nonetheless, pervasively translated eukaryotic sORFs may function as ‘proto-genes’, that, over the course of evolution, can acquire a function promoting cell fitness, a process referred to as ‘de novo gene birth’ (Blevins et al., 2021; Carvunis et al., 2012; Ruiz-Orera et al., 2018; Vakirlis et al., 2018; Vakirlis et al., 2020). Ribosome profiling (Ribo-seq) is a powerful experimental approach to identify the translated regions of mRNAs by mapping ribosome-protected RNA fragments (Ingolia et al., 2009). Ribo-RET is a modified form of Ribo-seq in which bacterial cells are treated with the antibiotic retapamulin before lysis; retapamulin traps bacterial ribosomes at sites of translation initiation, whereas elongating ribosomes are free to complete translation (Meydan et al., 2019). Thus, Ribo-RET facilitates the identification of overlapping ORFs by limiting the signal to the start codons (Meydan et al., 2018; Meydan et al., 2019). Ribo-RET was recently applied to E. coli, revealing start codons for many previously undescribed ORFs (Meydan et al., 2019; Stringer et al., 2021; Weaver et al., 2019), including sORFs, and ORFs positioned in frame with annotated ORFs, such that the translated protein is an isoform of the previously described protein. Here, we use a combination of Ribo-seq and Ribo-RET to map translated ORFs in Mtb. We detect thousands of robustly translated, previously undescribed sORFs from leaderless and leadered mRNAs. We also identify hundreds of ORFs that have start codons upstream or downstream of those for annotated genes, in the same reading frame. We conclude that the Mtb transcriptome is pervasively translated, with spurious translation initiation occurring at many sites. We also identify a subset of novel sORFs that appear to be under purifying selection, suggesting these ORFs, or the proteins they encode, contribute to cell fitness. Thus, our data suggest that pervasive translation of sORFs in Mtb serves as a rich source for the evolution of functional genes. Results Hundreds of actively translated sORFs from leaderless mRNAs Mtb has a genome of 4,411,532bp, with 3989 annotated protein-coding genes (RefSeq annotation). Two previous studies of Mtb identified 1285 transcription start sites (TSSs) for which the associated transcript begins with the sequence ‘RUG’ (R = A or G; Supplementary file 1A; Cortes et al., 2013; Shell et al., 2015), suggesting that these transcripts correspond to leaderless mRNAs (Shell et al., 2015). Of the 1285 TSSs associated with a 5’ RUG, 577 match the start codons of protein-coding genes included in the current genome annotation, as previously noted (Cortes et al., 2013; Shell et al., 2015). A further 338 of the RUG-associated TSSs correspond to putative ORFs whose start codons are unannotated, but whose stop codons match those of annotated genes; we refer to this architecture as ‘isoform’, since translation of these putative ORFs would generate N-terminally extended or truncated isoforms of annotated proteins. We note that some isoform ORFs likely reflect mis-annotations, as has been suggested previously (Cortes et al., 2013; Shell et al., 2015). Lastly, 370 of the 1,285 RUG-associated TSSs correspond to putative ORFs whose start and stop codons do not match those of any annotated gene; we refer to these as putative ‘novel’ ORFs. To determine whether the putative isoform and novel leaderless ORFs are actively translated, we performed Ribo-seq in Mtb. Note that all genome-scale data described in this manuscript can be viewed in our interactive genome browser (https://mtb.wadsworth.org/). We first assessed ribosome occupancy profiles for leadered ORFs that are present in the current genome annotation. Consistent with previous studies (Oh et al., 2011; Woolstenhulme et al., 2015), we observed enrichment of ribosome occupancy at start and stop codons of annotated, leadered ORFs; the 3’ ends of ribosome-protected RNA fragments are enriched 15 nt downstream of the start codons, and 12 nt downstream of stop codons (Figure 1A). We note that there are also smaller peaks and troughs of Ribo-seq signal precisely at start and stop codons, likely attributable to sequence biases associated with library preparation that are highlighted when groups of similar sequences (e.g. start/stop codons) are aligned (see Methods). We next assessed ribosome occupancy profiles for the 577 leaderless ORFs that are present in the current genome annotation. As expected, we observed an enrichment of ribosome-protected RNA fragments, with 3’ ends positioned 12 nt downstream of stop codons (Figure 1B), consistent with the profile observed for leadered ORFs. However, 3’ ends of ribosome-protected RNA fragments were not enriched 15 nt downstream of the start codons of the 577 annotated leaderless ORFs; rather, we observed enrichment spread across the region ~25–35 nt downstream of leaderless start codons (Figure 1B), suggesting either that ribosomes at leaderless ORF start codons behave differently to those at leadered ORF start codons, or that ribosome-protected fragments are too small to be represented in the RNA library; this observation is consistent with a previous study (Sawyer et al., 2021). Further confounding analysis of leaderless start codons, which are, by definition, aligned with TSSs, we consistently observed non-random Ribo-seq signals at TSSs of non-leaderless transcripts (Figure 1—figure supplement 1), albeit to a lesser extent than that observed for leaderless gene starts. Figure 1 with 1 supplement see all Download asset Open asset Ribo-seq data support the translation of hundreds of isoform and novel ORFs from leaderless mRNAs. (A) Metagene plot showing normalized Ribo-seq sequence read coverage for untreated cells in the regions around start (left graph) and stop codons (right graph) of previously annotated, leadered ORFs. Note that sequence read coverage is plotted only for the 3’ ends of reads, since these are consistently positioned relative to the ribosome P-site (Woolstenhulme et al., 2015). Data are shown for two biological replicate experiments. The schematics show the position of initiating/terminating ribosomes, highlighting the expected site of ribosome occupancy enrichment at the downstream edge of the ribosome. (B) Equivalent data to (A) but for putative annotated, leaderless ORFs. (C) Equivalent data to (A) but for putative novel, leaderless ORFs. (D) Equivalent data to (A) but for putative isoform, leaderless ORFs. Only data for start codons are shown because the same stop codon is used by both an annotated and isoform ORF. We reasoned that if the putative leaderless isoform and novel ORFs are actively translated, they would exhibit similar ribosome occupancy profiles to the leaderless annotated ORFs. Indeed, this was the case, with similar relative occupancy of ribosomes undergoing translation initiation and termination at start/stop codons (Figure 1C–D; we did not analyze isoform ORF stop codons because they are shared with those of annotated ORFs). Thus, our data are consistent with active translation of the majority of the 370 putative novel ORFs as leaderless mRNAs. Strikingly, 268 of the leaderless novel ORFs are sORFs. We conclude that Mtb has hundreds of actively translated sORFs on leaderless mRNAs. Ribo-RET identifies sites of translation initiation in Mtb While there are likely >1000 leaderless mRNAs in Mtb, most mRNAs are leadered (Cortes et al., 2013; Sawyer et al., 2021; Shell et al., 2015). Given that our data support the existence of >300 novel ORFs translated from the 5’ ends of leaderless mRNAs, we speculated that there are many more unannotated ORFs translated from leadered initiation codons. While sites of leaderless translation initiation can be readily identified from TSS maps, identification of novel leadered ORFs is more challenging. Translated leadered ORFs generate signal in Ribo-seq datasets, but identification of novel ORFs from Ribo-seq data is confounded by (i) the potential for artifactual signal in 5’ UTRs due to the binding of RNA-binding proteins (Ji et al., 2016), and (ii) masking of signal by overlapping ORFs on the same strand. To circumvent these problems, we performed Ribo-RET with Mtb to specifically map sites of translation initiation. We aligned the ribosome-protected RNA fragment sequences to the Mtb genome to identify ‘Initiation-Enriched Ribosome Footprints’ (IERFs), sites of ribosome occupancy that exceed the local background (Supplementary file 1B). Specifically, IERFs correspond to genomic coordinates that have ribosome occupancy coverage that exceeds an arbitrarily defined threshold value (5.5 reads per million) and is at least 10-fold higher than the mean ribosome occupancy coverage in the region 50 nt upstream to 50 nt downstream. We hypothesized that most IERFs correspond to sites of translation initiation. In support of this idea, there is a strong enrichment of IERF 3’ ends 15 nt downstream of the start codons of annotated, leadered genes; this enrichment is substantially greater than that observed for Ribo-seq data from cells grown without retapamulin treatment (Figure 2A; Figure 2—figure supplement 1). Figure 2 with 2 supplements see all Download asset Open asset Ribo-RET of M. tuberculosis identifies sites of translation initiation. (A) Metagene plot showing normalized Ribo-seq and Ribo-RET sequence read coverage (single replicate for each; data indicate the position of ribosome footprint 3’ ends) in the region from –50 to +100 nt relative to the start codons of annotated, leadered ORFs. (B) Heatmap showing the enrichment of eight selected trinucleotide sequences, for regions upstream of IERFs, relative to control regions. Expected positions of start codons and S-D sequences are indicated below the heatmap. We determined the abundance of all trinucleotide sequences in the 40 nt regions upstream of IERF 3’ ends; there is a > 2 fold enrichment of ATG, GTG and TTG (likely start codons), but not CTG, ATT or ATC, 15 nt upstream of IERF 3’ ends, and an enrichment of AGG and GGA (components of a consensus AGGAGGU Shine-Dalgarno sequence) 22–31 nt upstream of IERF 3’ ends (Figure 2B). We also observed >1.5 fold enrichment of ATG and GTG 14, 16, 17, and 18 nt upstream of IERF 3’ ends. The enrichment and position of start codon and Shine-Dalgarno-like sequence features upstream of IERFs are consistent with IERFs marking sites of translation initiation. We observed a strong enrichment of A/T immediately 3’ of the IERFs, i.e. on the other side of the site cleaved by micrococcal nuclease (MNase) during the Ribo-RET procedure; ‘A’ was found most frequently (53% of IERFs), and ‘G’ found the least frequently (2% of IERFs; Figure 2—figure supplement 2). This sequence bias is likely not due to a biological phenomenon, but rather to the sequence preference of MNase, which is known to display sequence bias when cutting DNA (Dingwall et al., 1981) and RNA (Woolstenhulme et al., 2015). The sequence bias is apparent in the complete Ribo-RET libraries, with 74% of sequenced ribosome-protected fragments having an ‘A’ or ‘U’ 3’ of the upstream MNase site. Given that the genomic A/T content in Mtb is only 34%, it is likely that inefficient RNA processing by MNase led to an underrepresentation of some G/C-rich translation initiation sites in the Ribo-RET data, and may explain the extended footprints ( > 15 nt) in G/C-rich contexts (see Discussion). This sequence bias also likely favors cleavage precisely at exposed start codons, which are strongly enriched for A/T bases, creating more RNA library fragments that end in these sequences (e.g. enriched Ribo-seq signal precisely at start codons in Figure 2A). Identification of putative ORFs from Ribo-RET data A total of 1994 IERFs were found in both replicate experiments (Supplementary file 1B). 71% (1406) of these IERFs were associated with a potential ATG or GTG start codon 14–18 nt upstream of their 3’ ends, or a potential TTG start codon 15 nt upstream of their 3’ ends (Supplementary file 1C), a far higher proportion than that expected by chance (17%). Thus, these 1,406 IERFs correspond to the start codons of putative ORFs, with an overall estimated false discovery rate (FDR) of 9% (see Materials and methods for details). 34% (478; FDR of 0.3%) of the putative ORFs precisely match previously annotated ORFs; 27% (373; FDR of 9%) overlap , and are in frame with previously annotated ORFs (i.e. isoform ORFs); 39% (555; FDR of 15%) are novel ORFs, with no match to a previously annotated stop codon. A total of 112 novel ORFs were found entirely in regions presently designated as intergenic; the remaining novel ORFs overlap partly or completely with annotated genes in sense and/or antisense orientations (Figure 3A; Supplementary file 1C). Strikingly, 77% (430) of the novel ORFs we identified are sORFs, with 48 novel ORFs consisting of only a start and stop codon (Supplementary file 1C), an architecture recently described in E. coli (Meydan et al., 2019). Figure 3 with 1 supplement see all Download asset Open asset Features of higher-confidence ORFs identified by Ribo-RET. (A) Distribution of different classes of ORFs identified by Ribo-RET. The pie-chart shows the proportion of identified ORFs in each class. Isoform ORFs are further classified based on whether they are longer (‘N-terminal extension’) or shorter (‘N-terminal truncation’) than the corresponding annotated ORF. Novel ORFs are further classified based on their overlap with annotated genes. ‘Sense’, ‘antisense’, and ‘mixed’ refer to whether the overlapping gene(s) is/are in the sense, antisense, or both (multiple overlapping genes) orientations with respect to the novel ORF. ‘Fully’ and ‘Partially’ indicate whether all or only some of the novel ORF overlaps annotated genes. (B) Strip plot showing the ΔG for the predicted minimum free energy structures for the regions from –40 to +20 nt relative to putative start codons for the different classes of ORF, and for a set of 500 random sequences. Median values are indicated by horizontal lines. We reasoned that if the isoform ORFs and novel ORFs are genuine, they should have S-D sequences upstream, and their start codons should each be associated with a region of reduced RNA secondary structure, as has been described for ORFs in other bacterial species (Baez et al., 2019; Del Campo et al., 2015). As we had observed for the set of all IERFs, regions upstream of isoform ORFs and novel ORFs are associated with an enrichment of AGG and GGA sequences in the expected location of a S-D sequence (Figure 3—figure supplement 1). This enrichment is lower than for annotated genes, but it is important to note that a S-D sequence was likely a contributing criterion in computationally predicting the initiation codons of annotated genes. We also assessed the level of RNA secondary structure upstream of all the putative ORFs identified by Ribo-RET. The predicted secondary structure for a set of random genomic sequences was significantly higher than the predicted secondary structure around the start of the identified annotated, novel, or isoform ORFs (Mann-Whitney U Test P < 2.2e–16 in all cases; Figure 3B). Moreover, the predicted secondary structure around the start of the annotated ORFs was only modestly, albeit significantly, higher than that of novel ORFs (Mann-Whitney U Test P = 1e–3). Collectively, the ORFs identified from Ribo-RET data exhibit the expected features of genuine translation initiation sites. ORFs identified by Ribo-RET are actively translated in untreated cells To determine if isoform ORFs and novel ORFs are actively and fully translated in cells not treated with retapamulin, we analyzed Ribo-seq data generated from cells grown without drug treatment. We assessed ribosome occupancy for annotated, novel, and isoform ORFs identified by Ribo-RET. As for the predicted leaderless ORFs, we reasoned that expressed leadered ORFs would be associated with increased ribosome occupancy at start and stop codons, as exemplified by previously annotated, leadered ORFs (Figure 1A; Oh et al., 2011; Woolstenhulme et al., 2015). Accordingly, annotated ORFs identified by Ribo-RET were strongly enriched for Ribo-seq signal 15 nt downstream of their start codons and 12 nt downstream of their stop codons (Figure 4A–B). We observed similar Ribo-seq enrichment profiles at the start and stop codons of novel ORFs, and downstream of the start codons of isoform ORFs (Figure 4A and C–D), but we did not observe these enrichment profiles for a set of mock ORFs (Figure 4—figure supplement 1A). Moreover, we did not observe enrichment of RNA-seq signal at start/stop codons, ruling out biases associated with library construction (Figure 4—figure supplement 1B-D). Overall, our data are consistent with most Ribo-RET-predicted isoform and novel ORFs being actively translated from start to stop codon, independent of retapamulin treatment. Figure 4 with 2 supplements see all Download asset Open asset Ribo-seq data support the translation of hundreds of isoform and novel ORFs identified by Ribo-RET. (A) Ribo-seq and Ribo-RET sequence read coverage (read 3’ ends) across two genomic regions, showing examples of putative ORFs in the annotated (blue arrow), novel (orange arrow), and isoform (green arrow) categories. ORFs identified by Ribo-RET shown with a black outline. (B) Metagene plot showing normalized Ribo-seq sequence read coverage (data indicate the position of ribosome footprint 3’ ends) for untreated cells in the regions around start (left graph) and stop codons (right graph) of ORFs predicted from Ribo-RET profiles, that correspond to previously annotated genes. (C) Equivalent data to (B) but for putative novel ORFs identified from Ribo-RET data. (D) Equivalent data to (B) but for putative isoform ORFs identified from Ribo-RET data. Only data for start codons are shown because the same stop codon is used by both an annotated and isoform ORF. Identification of lower-confidence ORFs from Ribo-RET data In addition to the 1994 IERFs present in both replicates of Ribo-RET data, 4216 IERFs were found in only the first replicate dataset, which was associated with a stronger enrichment of ribosome occupancy at start codons (compare Figure 2A and Figure 2—figure supplement 1). Strikingly, 2791 (66%) of IERFs found in only the first Ribo-RET dataset were associated with a potential start codon 14–18 nt upstream of their 3’ ends (Supplementary file 1C; see Materials and methods for details), a far higher proportion than that expected by chance (17%), and a similar proportion to that observed for IERFs found in both replicate Ribo-RET datasets (70%). We refer to ORFs identified from only the first Ribo-RET dataset as ‘lower-confidence’ ORFs, reflecting the marginally higher FDRs; we refer to ORFs identified from both Ribo-RET datasets as ‘higher-confidence’ ORFs. 22% (614; FDR of 0.6%) of the lower-confidence ORFs are annotated, 29% (801; FDR of 10%) are isoform, and 49% (1372; FDR of 16%) are novel. 77% (1061) of the novel lower-confidence ORFs are sORFs, with 120 consisting of only a start and stop codon (Figure 4—figure supplement 2A), mirroring the proportions observed in the higher-confidence dataset. Regions upstream of lower-confidence annotated, novel, and isoform ORFs are associated with an enrichment of AGG and GGA sequences in the expected location of a Shine-Dalgarno sequence (Figure 4—figure supplement 2B). The predicted secondary structure for a set of random genomic sequences was significantly higher than the predicted secondary structure around the start of the lower-confidence annotated ORFs, novel ORFs, and isoform ORFs (Mann-Whitney U Test P < 2.2e–16 in all cases; Figure 4—figure supplement 2C). Moreover, the predicted secondary structure around the start of the lower-confidence annotated ORFs was not significantly higher than that of the lower-confidence novel ORFs (Mann-Whitney U Test P = 0.22). Lastly, we examined ribosome occupancy at the start and stop codons of the lower-confidence ORFs from our Ribo-seq data generated from cells grown without drug treatment. Lower-confidence annotated, novel, and isoform ORFs were strongly enriched for Ribo-seq signal 15 nt downstream of their start codons and 12 nt downstream of their stop codons (Figure 4—figure supplement 2D-F). Collectively, the lower-confidence ORFs exhibit the characteristics of actively translated regions. Novel ORFs tend to be weakly transcribed but efficiently translated To investigate how efficiently novel ORFs are expressed, we determined RNA levels from RNA-seq data, and ribosome occupancy levels from Ribo-seq data, for all annotated and novel ORFs detected in this study (leaderless and leadered ORFs). We also determined RNA and ribosome occupancy levels for putatively untranslated regions of 1854 control transcripts (see Materials and methods for details). For novel ORFs, we analyzed only the 871 ORFs for which ≥ 50 nt of the ORF is ≥30 nt from an annotated gene on the same strand, to avoid overlapping signal from other ORFs. As a group, novel ORFs have lower RNA levels and lower ribosome occupancy levels than the 1670 annotated ORFs (Figure 5A top panel; Figure 5—figure supplement 1A top panel; Figure 5—figure supplement 1B-C). By contrast, the non-coding control transcripts as a group have similar RNA levels to novel ORFs, but lower ribosome occupancy levels (Figure 5A, lower panels; Figure 5—figure supplement 1A lower panels; Figure 5—figure supplement 1B-C). To estimate the ribosome occupancy per transcript, we determined the ratio of Ribo-seq reads to RNA-seq reads for each region analyzed (Figure 5B; Supplementary file 1, tabs A + C). As a group, novel ORFs have only slightly lower ribosome occupancy per transcript than annotated ORFs, while both novel and annotated ORFs have markedly higher ribosome occupancy per transcript than the control non-coding transcripts. We conclude that the RNA level for novel ORFs tends to be lower than that for annotated ORFs, but novel ORFs are translated with similar efficiency to annotated ORFs, and are thus clearly distinct from non-coding transcripts. The overall lower expression of novel ORFs relative to annotated ORFs is also reflected by lower Ribo-RET occupancy at their start codons (Figure 5—figure supplement 2). Figure 5 with 2 supplements see all Download asset Open asset Novel ORFs are efficiently translated. (A) Pairwise comparison of normalized RNA-seq and Ribo-seq coverage for annotated, novel and non-coding control transcripts. Reads are plotted as RPM per nucleotide using a single replicate of each dataset for reads aligned to the reference genome at their 3’ ends. The categories compared are: (i) annotated ORFs (higher-confidence and lower-confidence ORFs detected by Ribo-RET, and leaderless ORFs; blue datapoints), (ii) novel ORFs (higher-confidence and lower-confidence ORFs detected by Ribo-RET and leaderless ORFs, for regions" @default.
W4285333698 created "2022-07-14" @default.
W4285333698 creator A5083302085 @default.
W4285333698 date "2021-11-04" @default.
W4285333698 modified "2023-09-26" @default.
W4285333698 title "Decision letter: Pervasive translation in Mycobacterium tuberculosis" @default.
W4285333698 doi "https://doi.org/10.7554/elife.73980.sa1" @default.
W4285333698 hasPublicationYear "2021" @default.
W4285333698 type Work @default.
W4285333698 citedByCount "0" @default.
W4285333698 crossrefType "peer-review" @default.
W4285333698 hasAuthorship W4285333698A5083302085 @default.
W4285333698 hasBestOaLocation W42853336981 @default.
W4285333698 hasConcept C104317684 @default.
W4285333698 hasConcept C105580179 @default.
W4285333698 hasConcept C142724271 @default.
W4285333698 hasConcept C149364088 @default.
W4285333698 hasConcept C2777975735 @default.
W4285333698 hasConcept C2781069245 @default.
W4285333698 hasConcept C41008148 @default.
W4285333698 hasConcept C54355233 @default.
W4285333698 hasConcept C71924100 @default.
W4285333698 hasConcept C86803240 @default.
W4285333698 hasConceptScore W4285333698C104317684 @default.
W4285333698 hasConceptScore W4285333698C105580179 @default.
W4285333698 hasConceptScore W4285333698C142724271 @default.
W4285333698 hasConceptScore W4285333698C149364088 @default.
W4285333698 hasConceptScore W4285333698C2777975735 @default.
W4285333698 hasConceptScore W4285333698C2781069245 @default.
W4285333698 hasConceptScore W4285333698C41008148 @default.
W4285333698 hasConceptScore W4285333698C54355233 @default.
W4285333698 hasConceptScore W4285333698C71924100 @default.
W4285333698 hasConceptScore W4285333698C86803240 @default.
W4285333698 hasLocation W42853336981 @default.
W4285333698 hasOpenAccess W4285333698 @default.
W4285333698 hasPrimaryLocation W42853336981 @default.
W4285333698 hasRelatedWork W2042252940 @default.
W4285333698 hasRelatedWork W2062409682 @default.
W4285333698 hasRelatedWork W2110949356 @default.
W4285333698 hasRelatedWork W2748952813 @default.
W4285333698 hasRelatedWork W2899084033 @default.
W4285333698 hasRelatedWork W2972094210 @default.
W4285333698 hasRelatedWork W3029116816 @default.
W4285333698 hasRelatedWork W3210687582 @default.
W4285333698 hasRelatedWork W4210506554 @default.
W4285333698 hasRelatedWork W4381715549 @default.
W4285333698 isParatext "false" @default.
W4285333698 isRetracted "false" @default.
W4285333698 workType "peer-review" @default.