Matches in SemOpenAlex for { <https://semopenalex.org/work/W3202053154> ?p ?o ?g. }
- W3202053154 abstract "Article Figures and data Abstract Introduction Results Discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract High-throughput genomics of SARS-CoV-2 is essential to characterize virus evolution and to identify adaptations that affect pathogenicity or transmission. While single-nucleotide variations (SNVs) are commonly considered as driving virus adaption, RNA recombination events that delete or insert nucleic acid sequences are also critical. Whole genome targeting sequencing of SARS-CoV-2 is typically achieved using pairs of primers to generate cDNA amplicons suitable for next-generation sequencing (NGS). However, paired-primer approaches impose constraints on where primers can be designed, how many amplicons are synthesized and requires multiple PCR reactions with non-overlapping primer pools. This imparts sensitivity to underlying SNVs and fails to resolve RNA recombination junctions that are not flanked by primer pairs. To address these limitations, we have designed an approach called ‘Tiled-ClickSeq’, which uses hundreds of tiled-primers spaced evenly along the virus genome in a single reverse-transcription reaction. The other end of the cDNA amplicon is generated by azido-nucleotides that stochastically terminate cDNA synthesis, removing the need for a paired-primer. A sequencing adaptor containing a Unique Molecular Identifier (UMI) is appended to the cDNA fragment using click-chemistry and a PCR reaction generates a final NGS library. Tiled-ClickSeq provides complete genome coverage, including the 5’UTR, at high depth and specificity to the virus on both Illumina and Nanopore NGS platforms. Here, we analyze multiple SARS-CoV-2 isolates and clinical samples to simultaneously characterize minority variants, sub-genomic mRNAs (sgmRNAs), structural variants (SVs) and D-RNAs. Tiled-ClickSeq therefore provides a convenient and robust platform for SARS-CoV-2 genomics that captures the full range of RNA species in a single, simple assay. Introduction Virus genomics and next-generation sequencing (NGS) are essential components of viral outbreak responses (Grubaugh et al., 2019b). Reconstruction of consensus genetic sequences is essential to identify adaptations correlated with changes in pathogenicity or transmission (Gussow et al., 2020). In addition to single nucleotide variations, studies of SARS-CoV-2 have identified numerous genomic structural variants (SVs) (Yi, 2020) that arise due to non-homologous RNA recombination. SVs typically comprise small insertions/deletions that nonetheless allow the variant genome to independently replicate and transmit. Numerous SVs have been described for CoVs including deletions of the accessory open-reading frames (aORFs) (Yc et al., 2020; Muth et al., 2018) and changes in spike protein observed in the B.1.1.7 (Alpha) and other variants of concern (Kemp et al., 2020). Adaptation of SARS-CoV-2 also occurs during passaging in cell-culture, such as small deletions that arise near the furin cleavage site of spike protein during amplification on Vero cells (Ogando et al., 2020). These deletions can alter the fitness and virulence of SARS-CoV-2 isolates and thus must be genetically characterized at the within-culture population level prior to passaged stock use in subsequent studies. Similar to SVs, non-homologous RNA recombination also gives rise to defective-RNAs (D-RNAs), also known as defective viral denomes (DVGs). D-RNAs have been observed in multiple studies of coronaviruses (CoVs), including mouse hepatitis virus (MHV) (Makino et al., 1984; Makino et al., 1985; Makino et al., 1988a; Makino et al., 1988b), bovine CoV (Chang et al., 1994), avian infectious bronchitis virus (IBV) (Penzes et al., 1995), human CoV 299E (Viehweger et al., 2019; Banerjee et al., 2001; Joo et al., 1996; Kim et al., 1993). We recently demonstrated that SARS-CoV-2 is >10-fold more recombinogenic in cell culture than other CoVs such as MERS (Gribble et al., 2021) and generates abundant D-RNAs containing RNA recombination junctions that most commonly flank U-rich RNA sequences. D-RNAs may change the fitness, disease outcomes, and vaccine effectiveness for SARS-CoV-2 similar to other respiratory pathogens such as influenza and RSV (Vignuzzi and Lopez, 2019). Together, these findings highlight the need to identify these RNA species and their impact on SARS-CoV-2 infection and pathogenesis. Whole genome sequencing can be achieved through a range of approaches including non-targeted (random) NGS of virus isolates amplified in cell culture or directly from patient samples. However, when input material is limited, low viral genome copy numbers necessitate a template-targeted approach followed by molecular amplification by PCR or iso-thermal amplification to generate sufficient nucleic acid for sequencing. Generally, these require knowledge of the virus genome and the design of pairs of primers that anneal to the target genome. Perhaps the most popular method for SARS-CoV-2 sequencing is the ‘ARTIC’ approach (Tyson et al., 2020), which can reliably identify SNVs and minority variants present in as little as 3% of genomes (Grubaugh et al., 2019a). However, the requirement for pairs of primers constrains where amplicons can be designed and imparts sensitivity to single nucleotide variants (SNVs). Multiple PCR reactions containing different pools of paired-primers must also be performed in order to obtain cDNA amplicons of the correct size and to prevent the interaction or mis-priming of PCR primers. Importantly, pairs of primers that do not flank RNA recombination junctions will be unable to detect unexpected or unpredicted RNA recombinant species. Finally, paired-primer approaches also necessitate the re-design and validation of alternative sets of primer-pairs for each specific NGS platform used (e.g. Illumina amplicons are 200–500 nts, Nanopore amplicons are ~2000–5000 nts). To address these limitations and optimize the ability of NGS to quantify all types of viral genetic variants, we have combined ‘ClickSeq’ with tiled-amplicon approaches. ClickSeq (Routh et al., 2015b; Jaworski and Routh, 2018) is a click-chemistry-based platform for NGS that prevents artifactual sequence chimeras in the output data (Gorzer et al., 2010). Using ClickSeq, the 3’end of an amplified cDNA segment is generated by the stochastic incorporation of terminating 3’ azido-nucleotides (AzNTPs) during reverse transcription. A downstream adaptor is ‘click-ligated’ onto the cDNA using copper-catalyzed azide-alkyne cycloaddition (CuAAC). Therefore, ‘Tiled-ClickSeq’ only requires one template-specific primer per cDNA amplicon. To achieve whole genome sequencing of a virus isolate or sample, multiple tiled primers are designed evenly along the virus genome. Only one pool of RT-primers is required, even when > 300 template-specific primers and their corresponding cDNA amplicons are generated in the same reaction. This simplifies the assay design, and importantly removes constraints imposed in paired-primer strategies (Itokawa et al., 2020). Furthermore, the same primer set can be used for both Illumina and Nanopore platforms even when requiring different cDNA amplicon sizes. The library construction allows for additional quality control features including the use of unique molecular identifiers (UMIs) in the ‘click-adaptor’ as well as the ability to identify each RT-primer that gives rise to specific cDNA amplicon when using paired-read NGS. Here, we utilize the Tiled-ClickSeq method to analyze multiple isolates of SARS-CoV-2 both from cell-culture and clinical specimens used in routine diagnostics for COVID19 and demonstrate that ‘Tiled-ClickSeq’ accurately reconstructs full-length viral genomes. The method also captures recombinant RNA species including sgmRNAs, SVs, and D-RNAs. Overall, Tiled-ClickSeq therefore provides a convenient and robust platform for full genetic characterization of viral isolates. Results Overview of sequencing strategy Most tiled approaches for complete viral genomes sequencing from viral isolates require the design of pairs of primers that generate pre-defined overlapping amplicons in multiple pools (Figure 1A and B). However, this can prevent the detection of recombinant viral genomic materials such as sub-genomic mRNAs (sgmRNAs) or Defective-RNAs (D-RNAs). To overcome these challenges, we designed a template directed tiled-primer approach to reverse transcribe segments of the SARS-CoV-2 genome based upon the ‘ClickSeq’ method for NGS library synthesis (Routh et al., 2015b). Instead of random-hexamer or oligo-dT primers as used in ClickSeq and Poly(A)-ClickSeq, respectively (Routh et al., 2017), we use multiple ‘tiled’ RT-primers designed at regular intervals along the viral genome (Figure 1C). In ‘Tiled-ClickSeq’, pooled primers initiate a reverse transcription in a reaction that has been supplemented with 3’-azido-nucleotides (AzNTPs). This yields stochastically terminated 3’-azido-cDNA fragments, which can be click-ligated onto a hexynyl-functionalized Illumina i5 sequencing adaptor (Figure 1D). After click-ligation, the single-stranded triazole-linked cDNA is PCR-amplified using indexing p7 adaptors to fill in the ends of the NGS library, yielding the final library schema shown in Figure 1E. We designed the click-adaptor with an additional 12 random nucleotides at its 5’ end. As each adaptor can only be ligated once onto each unique cDNA molecule, this provides a unique molecular identifier (UMI) (Jabara et al., 2011). Due to the stochastic termination of cDNA synthesis in the RT step, a random distribution of cDNA fragments is generated from each primer, giving rise to the hypothetical read coverage depicted in Figure 1F. The lengths of these fragments, and thus the obtained read coverage can be optimized to ensure overlapping read data from each amplicon by adjusting the ratio of AzNTPs to dNTPs in the RT reaction (Routh et al., 2015b). With this approach, we found that we could robustly make NGS libraries from as little as 8 ng of total cellular RNA with only 18 PCR cycles (Figure 1G). Final libraries are excised from agarose gels (300-600nt cDNA size), pooled, and are compatible with Illumina sequencing platforms. A computational pipeline was compiled into a batch script (Source data 2) depicted by the flow-chart in Figure 1H. Figure 1 Download asset Open asset Schematic of Tiled-ClickSeq and computational pipeline: (A) Schematic of SARS-CoV-2 genome with two examples of sub-genomic mRNAs. (B) Paired-primer approaches typically generate short amplicons flanked by upstream and downstream primers that are PCR amplified in non-overlapping pools. (C) Tiled-ClickSeq uses a single pool of primers at the reverse-transcription step with the upstream site generated by stochastic termination by azido-nucleotides. (D) 3’-Azido-blocked single-stranded cDNA fragments are ‘click-ligated’ using copper-catalyzed azide alkyne cycloaddition (CuAAC) to hexynyl functionalized Illumina i5 sequencing adaptors. Triazole-linked ssDNA is PCR amplified to generate a final cDNA library. (E) The structure of the final cDNA is illustrated indicating the presence of the i5 and i7 adaptors, the 12 N unique molecular identifier (UMI), the expected location of the triazole linkage, and the origins of the cDNA in the reads including the tiled primer-derived DNA, which is captured using paired-end sequencing. (F) The hypothetical read coverage over a viral genome is indicated in red, yielding overlapping ‘saw-tooth’ patterns of sequencing coverage. Longer fragment lengths with more extensive overlapping can be obtained using decreased AzNTP:dNTP ratios. (G) Final cDNA libraries are analyzed and size-selected by gel electrophoresis (2 % agarose gel). Duplicates of libraries synthesized from 8, 80, and 800 ng of input SARS-CoV-2 RNA input are shown. (H) Flowchart of the data processing and bioinformatic pipeline. Input data is in Blue, output data are in Green, scripts/processes are Purple. Validation with WA-1 strain To test this approach, we obtained 200 ng RNA from an SARS-CoV-2 isolate deposited at the World Reference Center for Emerging Viruses and Arboviruses (WRCEVA) at UTMB (Harcourt et al., 2020b) and performed Tiled-ClickSeq using CoV2 primer pool v1 and a 1:35 AzNTP:dNTP mix. NGS libraries were sequenced on an Illumina MiSeq (2 × 150 reads). Reads were quality processed using fastp (Chen et al., 2018) and mapped to the virus genome using bowtie2 (Langmead and Salzberg, 2012). A ‘saw-tooth’ pattern of read coverage over the genome was generated (Figure 2A, orange plot) with ‘teeth’ appearing as expected upstream of each tiled primer. Peaks of coverage for each ‘tooth’ ranged from ~13,000 x to ~100 x. Overall, we obtained genome coverage >25 X from nucleotide 3–29823 (50nts from the 3’ end of the genome). This depth is sufficient to reconstruct a consensus genome sequence that was found to be identical to that already deposited (MT020881) for this isolate (Harcourt et al., 2020a). Figure 2 with 2 supplements see all Download asset Open asset Read coverage over the SARS-CoV-2 genome using Tiled-ClickSeq. (A) Read coverage obtained from Tiled-ClickSeq over the whole viral genome is depicted when sequencing using an Illumina MiSeq (orange) or on an Oxford Nanopore Technologies MinION device (blue). A ‘saw-tooth’ pattern of coverage is observed with ‘teeth’ upstream of tiled-primers, indicated at the bottom of the plot by short black lines. (B) Zoomed in read coverage of nts 1–2400 of the SARS-CoV-2 genome with coverage of Illumina MiSeq reads from five individual primers coloured to illustrate coverage from downstream amplicons overlapping the primer-binding sites of upstream tiled-primers (Blue: Read coverage from primer 1; Orange: coverage from primer 2; Green: coverage from primer 3; Red: coverage from primer 4; Purple: coverage from primer 5). When using paired-end sequencing, the ‘forward’/’R1’ read is derived from the click-adaptor and contains the UMI. The ‘reverse’/’R2’ read is derived directly from the tiled primer (see schematic in Figure 1E). We wrote a custom python3 script to split all the forward ‘R1’ reads into multiple individual FASTQ files based upon which primer generated each fragment. The mapping coverage obtained from five individual tiled-primers is shown in Figure 2B. The coverage for each primer (denoted by individual colours in Figure 2B) spans approximately 500–600 nts and extends 5’-wards from the tiled RT-primer. Read coverage from each primer overlaps the read coverage of the upstream primer. This allows for continuous gap-free read coverage over the viral genome which, importantly, allows a downstream cDNA amplicon to provide sequence information over and beyond an upstream primer. Additionally, we can determine the frequency with which each primer either successfully maps to the viral genome, mis-primes from the host RNA, or gives rise to adaptor-dimers or other sequencing artifacts. This information can be used to identify primers that yield poor viral priming efficiency and therefore a more specific primer can be designed and substituted as needed. For nanopore sequencing, we also synthesized Tiled-ClickSeq libraries but using a 1:100 AzNTP:dNTP ratio to generate cDNA amplicons of increased lengths. We retained cDNA fragments > 600 nts, yielding a few nanograms of dsDNA. This library, though containing the Illumina adaptors, can nonetheless be used as input in the default Oxford Nanopore Technologies (ONT) Ligation-Sequencing protocol (LSK-109) that appends ONT adaptors directly onto the ends of A-tailed dsDNA fragments. We sequenced this library using an ONT MinION device and obtained 279,192 reads greater than 1kbp in length. These were mapped to the WA-1 viral genome using minimap2 yielding continuous genome coverage (Figure 2A, blue). A similar profile of read coverage to the Illumina data was observed, with peaks of coverage upstream of tiled-primer sites. The deeper dips in coverage were avoided however, due to the longer reads lengths that give greater overlap between cDNA amplicons. Genome reconstruction of 12 isolates: ClickSeq, Tiled-ClickSeq, and Nanopore-Tiled-ClickSeq To validate the suitability of Tiled-ClickSeq for whole virus genome reconstruction, we obtained RNA extracted from 12 outgrowth samples of SARS-CoV-2 deposited at WRCEVA from nasopharyngeal swabs collected between March and April 2020. We synthesized 12 Tiled-ClickSeq libraries and 12 random-primed ClickSeq libraries in parallel. These were submitted for sequencing on a NextSeq (2 × 150) yielding ~2–5 M reads per sample (Table 1). Random-primed ClickSeq data were quality-filtered and adaptor trimmed using fastp (Chen et al., 2018Chen et al., 2018) retaining only the forward R1 reads. Tiled-ClickSeq read data were processed and mapped following the scheme in Figure 1H. Table 1 Read counts and mapping rates for random-primed versus Tiled-ClickSeq approaches. SampleCTClickSeq readsVirus mapped% Viral ReadsTiled v1 readsVirus mapped% Viral ReadsWRCEVA_0050112.94,665,869116,0362.5%2,359,7952,204,75093.4%WRCEVA_0050212.94,989,513118,2602.4%1,962,5811,820,92592.8%WRCEVA_0050512.73,894,32571,8091.8%2,779,6722,482,85489.3%WRCEVA_0050612.54,979,989108,5322.2%2,395,7502,148,25689.7%WRCEVA_0050712.95,659,073161,0592.8%2,056,6701,867,01290.8%WRCEVA_0050816.83,987,00991,4522.3%1,787,4181,433,00580.2%WRCEVA_0050917.14,057,92857,4241.4%2,202,6611,856,63384.3%WRCEVA_0051016.25,328,82965,2811.2%2,040,3321,601,54478.5%WRCEVA_0051316.04,391,17569,1691.6%1,641,2131,455,99188.7%WRCEVA_0051412.94,340,08484,2111.9%2,089,2411,902,74891.1%WRCEVA_0051515.75,416853102,1791.9%2,205,1661,915,12986.8%WRCEVA_0051617.44,290,92961,0171.4%1,988,9391,715,44886.2% In the Tiled-ClickSeq data, after UMI deduplication, each isolate had an average coverage between 4500 and 7500 reads and a coverage of 25 reads in greater than 99.5% (29753/29903 nts) of the SARS-CoV-2 genome. Read coverage was also obtained covering the 5’UTR of each strain ( > 25 reads for all isolates from nucleotide three onwards (Figure 3A and B)). When using paired-primer approaches, the 5’UTR is ordinarily obscured by the 5’-most primer used in each pool (nts 30–54 for the ARTIC primer set depicted in Figure 3A). As the 5’ end is resolved here due to stochastic incorporation of a single AzNTP in a template-specific manner, the entirety of the viral genome can be resolved. We reconstructed reference genomes from mapped reads using pilon (Walker et al., 2014Walker et al., 2014) requiring 25 x coverage for variant calling. In all cases, the reconstructed reference genomes were identical with or without controlling for PCR duplicates using the UMIs. We found 5–12 SNVs per viral genome (Source data 3), including the prevalent D614G (A23403G) spike adaptation, which enhances SARS-CoV-2 transmission (Plante et al., 2020), in 11 out of the 12 isolates (Figure 3C). Figure 3 Download asset Open asset Genome Reconstruction of 12 SARS-CoV-2 isolates deposited at the World Reference Center for Emerging Viruses and Arboviruses (WRCEVA). (A) Read coverage is depicted over the 5’ UTR of the SARS-CoV-2 genome for each isolate revealing capture of this region. The 5’-most primer from the ARTICv3 protocol at nts-30–54 is illustrated. (B) Snapshot of read data from Tiled-ClickSeq is depicted using the Tablet Sequencing Viewer from WRCEVA_000508 over the same region of the 5’UTR as A. (C) The most common single-nucleotide variants (SNVs) found in complete genome reconstructions from all 12 isolates are illustrated and colour-coded to depict the underlying viral protein. (D) Phylogenetic tree of 12 WRCEVA isolates with their corresponding clade indicated. Genome reconstruction was similarly performed using the random-primed ClickSeq data reads. Identical genomes to the Tiled data were obtained for 11 out of 12 isolates, with only one SNV difference in one sample (WRCEVA _000510: T168C). In this case, the read coverage was too low in the random-primed data for pilon to report an SNV. Nevertheless, visual inspection of the mapped data revealed that all nucleotides at this locus were indeed C’s, as reported for the Tiled-ClickSeq data. Phylogenetic tree reconstruction using NextStrain (Hadfield et al., 2018) placed 10 of the isolates in the A2a clade (Figure 3D) and two isolates (WRCEVA_00508, WRCEVA_00513) were Clade B/B1. We also retained cDNA fragments > 600 bps from the Tiled-ClickSeq libraries and sequenced these using an ONT MinION device. We used the ONT native barcoding kit to multiplex the 12 samples and the Ligation-Sequencing protocol (LSK-109) to generate final libraries. Reads were mapped with minimap2 (Li, 2016) yielding at least 100 x coverage over >99.6% of the genome for each isolate (Figure 2—figure supplement 1). Again, reference genomes were reconstructed from the mapped data using pilon (Source data 3). With the exception of WRCEVA_000514 which contained a single additional SNV (C14220T), the reference genomes reconstructed from the nanopore data were identical to those generated from the Tiled-ClickSeq Illumina data. These data illustrate that Tiled-ClickSeq performs as well as random-primed methods either on Illumina or Nanopore platforms for whole genome reconstruction. To further validate our approaches, we used the well described ARTIC v3 protocol for amplicon sequencing of whole SARS-CoV-2 viral genomes (Tyson et al., 2020) using the same input RNA as above for Tiled-ClickSeq NGS library synthesis. In every case, the reported SNVs were identical between the ARTIC data and the Tiled-ClickSeq data. Read coverage over the viral genomes is illustrated in Figure 2—figure supplement 2. Minority variants Our initial primer design (v1) (Figure 4A, blue plots) successfully yielded coverage suitable for complete genome reconstruction. However, some regions still received low coverage with fewer than a 100 deduplicated reads, preventing identification of minority variants in these regions. Therefore, we redesigned our primer scheme by adding an additional 326 primers (v2) previously reported (Guo et al., 2020) for tiled coronavirus sequencing (Source data 1) to make a pool comprising a total of 396 unique primers (v3). We re-sequenced the 12 WRCEVA isolates analyzed as described above plus an additional four that subsequently became available. An example of mapping coverage for isolate WCREVA_000508 is illustrated in Figure 4A, where the coverage over the viral genome is more even with less extreme ranges of read depth. Figure 4 Download asset Open asset Additional tiled-primers improves read coverage and allows identification of minority variants. (A) Read coverage obtained from Tiled-ClickSeq over the whole viral genome is depicted using an Illumina MiSeq when using the original primers as in Figure 2 (v1 - blue) or with an additional 326 tiled-primers (v3 - pink). Tiled-primers are indicated at the bottom of the plot by short blue (v1) or pink (v3) lines. (B) The rates of mismatching nucleotides found in mapped NGS reads is depicted across the SARS-CoV-2 genome for isolate WRECVA_000508 prior to trimming the tiled primers from forward/‘R1’ reads and without PCR deduplication. (C) The rates of mismatching is also depicted after data quality processing to remove PCR duplicates and primer-derived nucleotides in the reads, revealing three minority variants in this sample with frequencies > 2%. Figure 4—source data 1 The frequency of all mapped nucleotides at each genome coordinate for each WRCEVA isolate is provided. The reference genome, nucleotide coordinate and expected reference Nucleotide is provided. Total read coverage and the numbers of each non-reference nucleotide are also shown. Finally, the mismatch/error rate at each site is provided which reveals minority variants in each isolate. https://cdn.elifesciences.org/articles/68479/elife-68479-fig4-data1-v1.xlsx Download elife-68479-fig4-data1-v1.xlsx Using the R2 read, we can determine which primer gives rise to each R1 read and trim primer-derived nucleotides from the R1 read. This is an important quality control as it prevents the assignment (or failure thereof) of SNVs and/or the mapping of recombination events due to primer mis-priming. If reads are mapped without trimming away the primer-derived nucleotides found in the R1 read (as depicted in Figure 4B), we see numerous high frequency (2–50%) minority variants. The majority of these apparent minority variants overlap primer-target sites and are likely artefactual. Furthermore, the same high-frequency events are often seen across multiple independent samples. To control for this, we mapped reads after trimming away primer-derived nucleotides from the R1 reads as per our pipeline described above (schematic in Figure 1H). Finally, to control for PCR duplication events, we make use of the UMIs embedded in the click-adaptor. The final de-duplicated mapped, primer-trimmed reads (Figure 4C) provide a robust readout of minority variants in these isolates (Table 2 and Figure 4—source data 1). Across 10 WRCEVA isolates we found only 26 minority variants present at >2% all of which were unique within this dataset. Six isolates reported no minority variants at all. Table 2 Minority variants and rates ( > 2%) found across 16 WRCEVA isolates. SampleNtNucReadDepthAUGCVariantRateLocationResultWRCEVA_00050112,049C2,116095120204.5%ORF1abN3928KWRCEVA_00050210,207C2,240011802,1225.3%--WRCEVA_00050216,050U3,85303,322053113.8%--WRCEVA_00050217,489A4,5974,433162113.6%ORF1abE5742VWRCEVA_00050221,526A8,7496,50802,240125.6%ORF1abI7088VWRCEVA_00050314,220C1,638146301,17428.3%--WRCEVA_0005041,556A2,8282,4990328111.6%ORF1abI431VWRCEVA_00050427,925C2,857013402,7234.7%ORF8T11IWRCEVA_00050719,515A2,3932,29519704.1%--WRCEVA_0005089,756G1,3762801,34802.1%ORF1abR3164HWRCEVA_00050826,056G2092086200604.1%ORF3aD222YWRCEVA_00050827,556G20661280193806.2%ORF7aA55TWRCEVA_00050911,956C1962019901,76310.1%--WRCEVA_00050917,245C4,062247003,59011.6%ORF1abR5661CWRCEVA_00050918,005U5,40814,94945808.5%ORF1abL5915RWRCEVA_00050925,569U3,44843,32611353.5%--WRCEVA_00050927,919U83908090303.6%ORF8I9TWRCEVA_00050928,767C20110109019025.4%NT165IWRCEVA_0005113,003U2,880792,7871132.7%ORF1abV913EWRCEVA_00051110,738U4,58004,44001403.1%--WRCEVA_00051125,892U1330130032.3%ORF3aI167TWRCEVA_00051128,001G1,4141291,38402.1%--WRCEVA_00051327,046C5,539013805,4012.5%MT175MWRCEVA_00051411,603A5,4055,075033006.1%ORF1abM3780VWRCEVA_00051426,526G52502050503.8%MA2S RNA recombination: sgmRNAs, structural variants, and defective RNAs To characterize RNA recombination, we used our bespoke ViReMa pipeline (Routh and Johnson, 2014) to map RNA recombination events in NGS reads that correspond to either sgmRNAs, SVs, or D-RNAs. ViReMa can detect agnostically a range of expected and unusual RNA recombination events including deletions, insertions, duplications, inversions as well as virus-to-host chimeric events and provides BED files containing the junction sites and frequencies of RNA recombination events. We mapped the Tiled-ClickSeq data to the corrected reference genome for each WRCEVA isolate using ViReMa. We also took total cellular RNA and RNA extracted from the supernatants of Vero cells transfected with RNA derived from an in vitro infectious clone of SARS-CoV-2 (icSARS-CoV-2) (Xie et al., 2020). These clone-derived RNAs contained either the WT SARS-CoV-2, or were engineered with a deletion near the furin cleavage site of the spike protein, which we recently demonstrated is a common adaption to Vero cells and which alters SARS-CoV-2 pathogenesis in mammalian models of infection (Johnson et al., 2021). The identities and frequencies of the 13 most abundant RNA recombination events are illustrated in Figure 5A. We found all the expected sgmRNAs previously annotated for SARS-CoV-2 (Kim et al., 2020) as well as non-canonical sgmRNAs. An overview of mapped data over the SARS-CoV-2 illustrating large recombination events (depicted by the blue horizontal lines) is provided in Figure 5—figure supplement 1. We found that sgmRNAs were highly enriched in the cellular fractions from expressed icSARS-CoV-2 isolates (comprising >95% of the total viral genetic materials) but were relatively depleted in the supernatant fraction. This reflects a strong restriction of the packaging of these RNA species into virions. In the icSARS-CoV-2 samples, Tiled-ClickSeq and ViReMa accurately reported the expected deletion (Δ23603^23616). Interestingly, we also identified small structural variants (Δ23583^23599) in seven of the WRCEVA isolates with a frequency of 2–50%, similar to reports of the selection of variants containing deletions at this site after in vitro passaging on Vero cells (Klimstra et al., 2020). We also found a novel SV in one isolate (WRCEVA_000504: Δ27619^27642) present in 3.5% of the reads resulting in an eight amino acid deletion in ORF7a. We additionally identified a small number of micro-indels (Table 3) in some isolates. Table 3 Micro-indels and rates ( > 2%) found across 16 WRCEVA isolates. SampleMicroInDelNucsVariantRateLocationResultWRCEVA_000502Δ519^523UGGUU2.2%ORF1ABFrameshiftWRCEVA_000504Δ29686^29,693CAGUGUGU3.5%3’UTR-WRCEVA_000505Δ519^523UGGUU2.9%ORF1ABFrameshiftWRCEVA_000506Δ519^523UGGUU3.8%ORF1ABFrameshiftWRCEVA_000509Δ1237^1,239UCA2.9%ORF1ABΔH325WRCEVA_000510Δ686^694AAGUCAUUU5.1%ORF1abΔLSF141-143WRCEVA_000511Δ519^523UGGUU3.7%ORF1ABFrameshiftWRCEVA_000511Δ10811^10,813CUU3.1%ORF1ABΔL3516WRCEVA_000512Δ29750^29,759GAUCGAGUG10.0%3’UTR- Figure 5 with 1 supplement see all Download asset Open asset Tiled-ClickSeq identifies sub-genomic mRNAs, structural variant" @default.
- W3202053154 created "2021-10-11" @default.
- W3202053154 creator A5001816949 @default.
- W3202053154 creator A5005352504 @default.
- W3202053154 creator A5007324419 @default.
- W3202053154 creator A5010610081 @default.
- W3202053154 creator A5010708909 @default.
- W3202053154 creator A5013244749 @default.
- W3202053154 creator A5015324776 @default.
- W3202053154 creator A5022371052 @default.
- W3202053154 creator A5030629501 @default.
- W3202053154 creator A5038614700 @default.
- W3202053154 creator A5038943906 @default.
- W3202053154 creator A5042089203 @default.
- W3202053154 creator A5045799176 @default.
- W3202053154 creator A5049544135 @default.
- W3202053154 creator A5053203417 @default.
- W3202053154 creator A5056727404 @default.
- W3202053154 creator A5058377507 @default.
- W3202053154 creator A5059897573 @default.
- W3202053154 creator A5061796753 @default.
- W3202053154 creator A5063390927 @default.
- W3202053154 creator A5067977741 @default.
- W3202053154 creator A5080058462 @default.
- W3202053154 creator A5087548457 @default.
- W3202053154 creator A5088390884 @default.
- W3202053154 date "2021-09-03" @default.
- W3202053154 modified "2023-09-26" @default.
- W3202053154 title "Author response: Tiled-ClickSeq for targeted sequencing of complete coronavirus genomes with simultaneous capture of RNA recombination and minority variants" @default.
- W3202053154 doi "https://doi.org/10.7554/elife.68479.sa2" @default.
- W3202053154 hasPublicationYear "2021" @default.
- W3202053154 type Work @default.
- W3202053154 sameAs 3202053154 @default.
- W3202053154 citedByCount "0" @default.
- W3202053154 crossrefType "peer-review" @default.
- W3202053154 hasAuthorship W3202053154A5001816949 @default.
- W3202053154 hasAuthorship W3202053154A5005352504 @default.
- W3202053154 hasAuthorship W3202053154A5007324419 @default.
- W3202053154 hasAuthorship W3202053154A5010610081 @default.
- W3202053154 hasAuthorship W3202053154A5010708909 @default.
- W3202053154 hasAuthorship W3202053154A5013244749 @default.
- W3202053154 hasAuthorship W3202053154A5015324776 @default.
- W3202053154 hasAuthorship W3202053154A5022371052 @default.
- W3202053154 hasAuthorship W3202053154A5030629501 @default.
- W3202053154 hasAuthorship W3202053154A5038614700 @default.
- W3202053154 hasAuthorship W3202053154A5038943906 @default.
- W3202053154 hasAuthorship W3202053154A5042089203 @default.
- W3202053154 hasAuthorship W3202053154A5045799176 @default.
- W3202053154 hasAuthorship W3202053154A5049544135 @default.
- W3202053154 hasAuthorship W3202053154A5053203417 @default.
- W3202053154 hasAuthorship W3202053154A5056727404 @default.
- W3202053154 hasAuthorship W3202053154A5058377507 @default.
- W3202053154 hasAuthorship W3202053154A5059897573 @default.
- W3202053154 hasAuthorship W3202053154A5061796753 @default.
- W3202053154 hasAuthorship W3202053154A5063390927 @default.
- W3202053154 hasAuthorship W3202053154A5067977741 @default.
- W3202053154 hasAuthorship W3202053154A5080058462 @default.
- W3202053154 hasAuthorship W3202053154A5087548457 @default.
- W3202053154 hasAuthorship W3202053154A5088390884 @default.
- W3202053154 hasBestOaLocation W32020531541 @default.
- W3202053154 hasConcept C104317684 @default.
- W3202053154 hasConcept C141231307 @default.
- W3202053154 hasConcept C142724271 @default.
- W3202053154 hasConcept C156695909 @default.
- W3202053154 hasConcept C2777648638 @default.
- W3202053154 hasConcept C2779134260 @default.
- W3202053154 hasConcept C3008058167 @default.
- W3202053154 hasConcept C51679486 @default.
- W3202053154 hasConcept C524204448 @default.
- W3202053154 hasConcept C54355233 @default.
- W3202053154 hasConcept C552990157 @default.
- W3202053154 hasConcept C67705224 @default.
- W3202053154 hasConcept C70721500 @default.
- W3202053154 hasConcept C71924100 @default.
- W3202053154 hasConcept C86803240 @default.
- W3202053154 hasConceptScore W3202053154C104317684 @default.
- W3202053154 hasConceptScore W3202053154C141231307 @default.
- W3202053154 hasConceptScore W3202053154C142724271 @default.
- W3202053154 hasConceptScore W3202053154C156695909 @default.
- W3202053154 hasConceptScore W3202053154C2777648638 @default.
- W3202053154 hasConceptScore W3202053154C2779134260 @default.
- W3202053154 hasConceptScore W3202053154C3008058167 @default.
- W3202053154 hasConceptScore W3202053154C51679486 @default.
- W3202053154 hasConceptScore W3202053154C524204448 @default.
- W3202053154 hasConceptScore W3202053154C54355233 @default.
- W3202053154 hasConceptScore W3202053154C552990157 @default.
- W3202053154 hasConceptScore W3202053154C67705224 @default.
- W3202053154 hasConceptScore W3202053154C70721500 @default.
- W3202053154 hasConceptScore W3202053154C71924100 @default.
- W3202053154 hasConceptScore W3202053154C86803240 @default.
- W3202053154 hasLocation W32020531541 @default.
- W3202053154 hasOpenAccess W3202053154 @default.
- W3202053154 hasPrimaryLocation W32020531541 @default.
- W3202053154 hasRelatedWork W1508030956 @default.
- W3202053154 hasRelatedWork W1988509800 @default.
- W3202053154 hasRelatedWork W1991523530 @default.
- W3202053154 hasRelatedWork W2002128513 @default.
- W3202053154 hasRelatedWork W2020824267 @default.
- W3202053154 hasRelatedWork W2057739827 @default.
- W3202053154 hasRelatedWork W2075354549 @default.