Matches in SemOpenAlex for { <https://semopenalex.org/work/W2524656430> ?p ?o ?g. }
Showing items 1 to 64 of
64
with 100 items per page.
- W2524656430 endingPage "1666" @default.
- W2524656430 startingPage "1664" @default.
- W2524656430 abstract "The majority of genome assemblies to date fail to represent the true structure of native genomes. This lack of completeness is largely due to the inability to assemble the variable (often significant) fraction of nuclear genomes that is composed primarily of repeated sequences (with either a structural function such as satellite DNA and simple sequence repeats or “selfish DNA” such as high-copy transposable elements [TEs]), herein defined as the “dark side of the genome.” To address this problem, we developed a method to detect and quantify the dark side of the genome and used it to infer the genomic composition and dynamic evolution of the majority of native repeats and TEs present within several test eukaryotic genomes. Eukaryotic genomes range in size by about four orders of magnitude, with flowering plants having the widest variation (Fedoroff, 2012Fedoroff N.V. Transposable elements, epigenetics, and genome evolution.Science. 2012; 338: 758-767Crossref PubMed Scopus (373) Google Scholar). Without taking into account whole-genome duplication and polyploidization events, it is well established that genome size is highly correlated with TEs and repeated sequences (e.g., centromeric, satellite, ribosomal DNA) content (Kidwell, 2002Kidwell M.G. Transposable elements and the evolution of genome size in eukaryotes.Genetica. 2002; 115: 49-63Crossref PubMed Scopus (423) Google Scholar). TE activity and dynamics can have major effects on the host's genetic material by acting as mutagenic agents, as substrates for inducing changes in gene content and regulation, and as a general source of genetic variability (Lisch, 2013Lisch D. How important are transposons for plant evolution?.Nat. Rev. Genet. 2013; 14: 49-61Crossref PubMed Scopus (520) Google Scholar). Despite the documented importance of repetitive sequences and TEs in genome biology, the majority of publicly available eukaryotic genome assemblies today lack high-quality and comprehensive representations of these sequences. In some extreme cases, significant portions of a genome's repeat fraction are barely present in its draft assembly. This results in biased analyses of genome composition, based solely on the assembled portion of a given genome. The goal of our study was to develop a method to discover (with the least bias conceivable) the total repetitive/TE sequence content of native genomes (including the unexplored dark side), and to compare these results with the TE/repeat content of a corresponding set of genome assemblies. To accomplish these goals we aligned sets of either unassembled or genome-derived single-sequence reads (i.e., a low-coverage short-read genome skim) to highly curated repeat libraries and tallied the hits to each repeat/TE category. Of note, since the comprehensiveness and accuracy of classification of a repeat library is crucial in identifying and quantifying all of the genomic elements, we developed repeat libraries using orthogonal approaches (both structure- and homology-based methods; see Copetti et al., 2015Copetti D. Zhang J. El Baidouri M. Gao D. Wang J. Barghini E. Cossu R.M. Angelova A. Maldonado L C.E. Roffler S. et al.RiTE database: a resource database for genus-wide rice genomics and evolutionary biology.BMC Genomics. 2015; 16: 538Crossref PubMed Scopus (50) Google Scholar and this work) on all species surveyed. A more detailed description of our methods, datasets, and software adopted is outlined in Supplemental Figure 1 and Supplemental Information. The serial application of our method to a set of 16 heterogeneous assemblies (Supplemental Table 1) led to three key observations: (1) although variable, the amount of repeats and TEs in a given assembly (A) was consistently lower than those found in the corresponding genome (G) (Figure 1A , Supplemental Figure 2, and Supplemental Table 1); (2) genome assemblies were identified where the repeat and TE differences between A and G were negligible, or were statistically significant (i.e., the repeat composition of A was significantly under-represented with respect to G, Supplemental Table 2); and (3) these differences were associated with the assembly strategy rather than with the size of the genome. The first finding was expected, as genome assemblers often fail to unambiguously place repeats and discard them from the assembly. The latter two observations should prompt investigators to consider not only the assembly, but also to include the dark side of the genome when describing a genome assembly as a true representation of a native genome. For example, our analyses of the maize (genome size [GS] ∼2.7 Gb) and barley (GS ∼5.4 Gb) genome assemblies demonstrated that the assumption that all large genome assemblies are depleted in repeats is not supported if a local assembly strategy is applied to assemble a genome (such as BAC-by-BAC rather than a whole-genome shotgun). Plotting the log values of G and A abundances provided another view as to how to quantify the completeness of an assembly. When the majority of repeat/TE types are assembled accurately, the quantity of G and A will be the same and will graph along the bisector (Figure 1B and Supplemental Figure 3A). When repeat/TE sequences present in G are not assembled in A, the data will scatter (toward higher G values), thereby decreasing the coefficient of determination, which also affects the line equation (Supplemental Figure 3B and 3C). In addition to reaching the conclusion drew by Ross-Ibarra’s group (Figure 1A in Tenaillon et al., 2011Tenaillon M.I. Hufford M.B. Gaut B.S. Ross-Ibarra J. Genome size and transposable element content as determined by high-throughput sequencing in maize and Zea luxurians.Genome Biol. Evol. 2011; 3: 219-229Crossref PubMed Scopus (126) Google Scholar), our data suggest that R2 is not the only metric to consider when measuring differences among repeat abundances. Our results on 16 species systematically confirmed (see Figure 1B and Supplemental Figure 3A for some examples) how the content imbalance in G and A affects the line equation, and that all components of the regression must be taken into account when describing the features of a given genome. By measuring similarity among reads, our method could also detect the presence of different quantities of repeats and TEs across species, and provided information on the amount of recently duplicated sequences in a genome and their representation in an assembly. For example, in all species examined, footprints of recent and possibly ongoing TE activity could be observed in the large amount of hits with low sequence divergence (left panels of Figure 1C and Supplemental Figure 4). The different profiles obtained from assembly-derived reads (right panels of Figure 1C and Supplemental Figure 4) revealed that most of these recently duplicated sequences are not present in their corresponding assemblies, a known flaw of assembly algorithms that our analyses were able to demonstrate in a practical example. By comparing differential repeat/TE abundance in pairs of closely related Oryza genomes, our method was also able to detect instances of significant over- or under-representation. Looking at the resulting matrix together, the overall patterns were coherent with the species' phylogeny of the Oryza genus (Figure 1D and Supplemental Figure 5), enabling us to detect signatures of past repeat/TE burst and sequence removal. Moreover, each case showed an evolutionary pattern that was independent of both the host genome and other repeat/TE class evolution. In conclusion, using low-coverage SGS reads as a proxy for genome composition, we developed an “assembly-independent” method to quantify repetitive sequences and TEs in native genomes. Previous studies (Tenaillon et al., 2011Tenaillon M.I. Hufford M.B. Gaut B.S. Ross-Ibarra J. Genome size and transposable element content as determined by high-throughput sequencing in maize and Zea luxurians.Genome Biol. Evol. 2011; 3: 219-229Crossref PubMed Scopus (126) Google Scholar, Sveinsson et al., 2013Sveinsson S. Gill N. Kane N.C. Cronk Q. Transposon fingerprinting using low coverage whole genome shotgun sequencing in cacao (Theobroma cacao L.) and related species.BMC Genomics. 2013; 14: 502Crossref PubMed Scopus (16) Google Scholar) attempted to describe genome features by adopting a similar principle, but in our opinion their results are affected by the incompleteness of the repeat libraries used in terms of both the species represented and the lack of non-assembled repeats. With the additions and modifications implemented here, we demonstrated that our approach can detect and quantify repeats and TEs in a more comprehensive way. Our analyses demonstrated that by using a standard set of bioinformatics tools, coupled with highly curated and comprehensive repeat libraries, the native repeat and TE content of a given genome can be easily measured in a routine fashion. We also demonstrated how the genome assembly strategy, and not the genome size per se, has a major impact on the repeat content present in a given genome assembly. Lastly, given that repeats are a key component of eukaryotic genomes, we emphasize how important it is to specify how inclusive a repeat analysis is, and especially to distinguish whether surveyed repeats represent the content of a whole genome or an assembly." @default.
- W2524656430 created "2016-10-07" @default.
- W2524656430 creator A5006286921 @default.
- W2524656430 creator A5072198160 @default.
- W2524656430 date "2016-12-01" @default.
- W2524656430 modified "2023-09-26" @default.
- W2524656430 title "The Dark Side of the Genome: Revealing the Native Transposable Element/Repeat Content of Eukaryotic Genomes" @default.
- W2524656430 cites W1597884192 @default.
- W2524656430 cites W1630811241 @default.
- W2524656430 cites W1976876832 @default.
- W2524656430 cites W1999078569 @default.
- W2524656430 cites W2064155145 @default.
- W2524656430 cites W2097140040 @default.
- W2524656430 cites W2127266436 @default.
- W2524656430 doi "https://doi.org/10.1016/j.molp.2016.09.006" @default.
- W2524656430 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/27693675" @default.
- W2524656430 hasPublicationYear "2016" @default.
- W2524656430 type Work @default.
- W2524656430 sameAs 2524656430 @default.
- W2524656430 citedByCount "4" @default.
- W2524656430 countsByYear W25246564302017 @default.
- W2524656430 countsByYear W25246564302020 @default.
- W2524656430 countsByYear W25246564302021 @default.
- W2524656430 crossrefType "journal-article" @default.
- W2524656430 hasAuthorship W2524656430A5006286921 @default.
- W2524656430 hasAuthorship W2524656430A5072198160 @default.
- W2524656430 hasBestOaLocation W25246564301 @default.
- W2524656430 hasConcept C104317684 @default.
- W2524656430 hasConcept C141231307 @default.
- W2524656430 hasConcept C4918238 @default.
- W2524656430 hasConcept C54355233 @default.
- W2524656430 hasConcept C70721500 @default.
- W2524656430 hasConcept C78458016 @default.
- W2524656430 hasConcept C86803240 @default.
- W2524656430 hasConceptScore W2524656430C104317684 @default.
- W2524656430 hasConceptScore W2524656430C141231307 @default.
- W2524656430 hasConceptScore W2524656430C4918238 @default.
- W2524656430 hasConceptScore W2524656430C54355233 @default.
- W2524656430 hasConceptScore W2524656430C70721500 @default.
- W2524656430 hasConceptScore W2524656430C78458016 @default.
- W2524656430 hasConceptScore W2524656430C86803240 @default.
- W2524656430 hasIssue "12" @default.
- W2524656430 hasLocation W25246564301 @default.
- W2524656430 hasLocation W25246564302 @default.
- W2524656430 hasLocation W25246564303 @default.
- W2524656430 hasOpenAccess W2524656430 @default.
- W2524656430 hasPrimaryLocation W25246564301 @default.
- W2524656430 hasRelatedWork W1552803813 @default.
- W2524656430 hasRelatedWork W1981218481 @default.
- W2524656430 hasRelatedWork W1994346163 @default.
- W2524656430 hasRelatedWork W1998681124 @default.
- W2524656430 hasRelatedWork W2027815767 @default.
- W2524656430 hasRelatedWork W2029970011 @default.
- W2524656430 hasRelatedWork W2606371917 @default.
- W2524656430 hasRelatedWork W2974873635 @default.
- W2524656430 hasRelatedWork W3010826188 @default.
- W2524656430 hasRelatedWork W4283070870 @default.
- W2524656430 hasVolume "9" @default.
- W2524656430 isParatext "false" @default.
- W2524656430 isRetracted "false" @default.
- W2524656430 magId "2524656430" @default.
- W2524656430 workType "article" @default.