Matches in SemOpenAlex for { <https://semopenalex.org/work/W3104642761> ?p ?o ?g. }
- W3104642761 abstract "Abstract In mammals, genomic regions with enhancer activity turnover rapidly; in contrast, gene expression patterns and transcription factor binding preferences are largely conserved. Based on this conservation, we hypothesized that enhancers active in different mammals would exhibit conserved sequence patterns in spite of their different genomic locations. We tested this hypothesis by quantifying the conservation of sequence patterns underlying histone-mark defined enhancers across six diverse mammals in two machine learning frameworks. We first trained support vector machine (SVM) classifiers based on the frequency spectrum of short DNA sequence patterns. These classifiers accurately identified many adult liver, developing limb, and developing brain enhancers in each species. Then, we applied these classifiers across species and found that classifiers trained in one species and tested in another performed nearly as well as classifiers trained and tested on the same species. This indicates that the short sequence patterns predictive of enhancers are largely conserved. We also observed similar cross-species conservation when applying the models to human and mouse enhancers validated in transgenic assays. The sequence patterns most predictive of enhancers in each species matched the binding motifs for a common set of TFs enriched for expression in relevant tissues, supporting the biological relevance of the learned features. To test the conservation of more complex sequences patterns, we trained convolutional neural networks (CNNs) on enhancer sequences in each species. The CNNs demonstrated better performance overall, but worse cross-species generalization than SVMs, suggesting the importance of combinatorial interactions between motifs, but less conservation of these more complex sequence patterns. Thus, despite the rapid change of active enhancer locations between mammals, cross-species enhancer prediction is often possible. Furthermore, short sequence patterns encoding enhancer activity have been maintained across more than 180 million years of mammalian evolution, with evolutionary change in more complex sequence patterns. Author summary Alterations in gene expression levels are a driving force of both speciation and complex disease; therefore, it is of great importance to understand the mechanisms underlying the evolution and function gene regulatory DNA sequences. Recent studies have revealed that while gene expression patterns and transcription factor binding preferences are broadly conserved across diverse animals, there is extensive turnover in distal gene regulatory regions, called enhancers, between closely related species. We investigate this seeming incongruence by analyzing genome-wide enhancer datasets from six diverse mammalian species. We trained two machine-learning classifiers—a k -mer spectrum support vector machine (SVM) and convolutional neural network (CNN)—to distinguish enhancers from the genomic background. The k -mer spectrum SVM models the occurrences of short sequence patterns while the CNN models both the short sequences patterns and their combinatorial patterns. Both the SVM and CNN enhancer prediction models trained in one species are able to predict enhancers in the same cellular context in other species. However, CNNs performed better at predicting enhancers in each species, but they generalize less well across species than the SVMs. This argues that the short sequence properties encoding regulatory activity are remarkably conserved across more than 180 million years of mammalian evolution with more evolutionary turnover in the more complex combinations of the conserved short sequence motifs." @default.
- W3104642761 created "2020-11-23" @default.
- W3104642761 creator A5059301033 @default.
- W3104642761 creator A5068905398 @default.
- W3104642761 creator A5071036018 @default.
- W3104642761 date "2017-02-21" @default.
- W3104642761 modified "2023-09-27" @default.
- W3104642761 title "Deep learning reveals evolutionary conservation and divergence of sequence properties underlying gene regulatory enhancers across mammals" @default.
- W3104642761 cites W1019830208 @default.
- W3104642761 cites W1437335841 @default.
- W3104642761 cites W1490161904 @default.
- W3104642761 cites W1847131992 @default.
- W3104642761 cites W1885076136 @default.
- W3104642761 cites W1964614067 @default.
- W3104642761 cites W1964733653 @default.
- W3104642761 cites W1976644679 @default.
- W3104642761 cites W1981030303 @default.
- W3104642761 cites W1983238320 @default.
- W3104642761 cites W1983275320 @default.
- W3104642761 cites W1987320425 @default.
- W3104642761 cites W1988581590 @default.
- W3104642761 cites W1991036278 @default.
- W3104642761 cites W1991151742 @default.
- W3104642761 cites W1998306196 @default.
- W3104642761 cites W2016015848 @default.
- W3104642761 cites W2020816856 @default.
- W3104642761 cites W2022736304 @default.
- W3104642761 cites W2031333605 @default.
- W3104642761 cites W2033201031 @default.
- W3104642761 cites W2038584294 @default.
- W3104642761 cites W2041171293 @default.
- W3104642761 cites W2046842005 @default.
- W3104642761 cites W2053433605 @default.
- W3104642761 cites W2062819328 @default.
- W3104642761 cites W2065103652 @default.
- W3104642761 cites W2067103105 @default.
- W3104642761 cites W2067956561 @default.
- W3104642761 cites W2076154138 @default.
- W3104642761 cites W2078059415 @default.
- W3104642761 cites W2079684286 @default.
- W3104642761 cites W2092756750 @default.
- W3104642761 cites W2095005830 @default.
- W3104642761 cites W2095150428 @default.
- W3104642761 cites W2095903180 @default.
- W3104642761 cites W2098412979 @default.
- W3104642761 cites W2103017472 @default.
- W3104642761 cites W2103777723 @default.
- W3104642761 cites W2115717891 @default.
- W3104642761 cites W2115870954 @default.
- W3104642761 cites W2118247745 @default.
- W3104642761 cites W2119336981 @default.
- W3104642761 cites W2119867435 @default.
- W3104642761 cites W2125125731 @default.
- W3104642761 cites W2128701949 @default.
- W3104642761 cites W2135284568 @default.
- W3104642761 cites W2144700381 @default.
- W3104642761 cites W2145010689 @default.
- W3104642761 cites W2147166867 @default.
- W3104642761 cites W2148498486 @default.
- W3104642761 cites W2149744269 @default.
- W3104642761 cites W2155496026 @default.
- W3104642761 cites W2157998140 @default.
- W3104642761 cites W2161093157 @default.
- W3104642761 cites W2161350677 @default.
- W3104642761 cites W2165649312 @default.
- W3104642761 cites W2172146316 @default.
- W3104642761 cites W2176034226 @default.
- W3104642761 cites W2183523328 @default.
- W3104642761 cites W2198606573 @default.
- W3104642761 cites W2212528563 @default.
- W3104642761 cites W2259938310 @default.
- W3104642761 cites W2336509392 @default.
- W3104642761 cites W2345512687 @default.
- W3104642761 cites W2574144134 @default.
- W3104642761 cites W2591130492 @default.
- W3104642761 cites W2626191279 @default.
- W3104642761 cites W2745970610 @default.
- W3104642761 cites W2749620102 @default.
- W3104642761 cites W4210767115 @default.
- W3104642761 cites W4293767581 @default.
- W3104642761 doi "https://doi.org/10.1101/110676" @default.
- W3104642761 hasPublicationYear "2017" @default.
- W3104642761 type Work @default.
- W3104642761 sameAs 3104642761 @default.
- W3104642761 citedByCount "2" @default.
- W3104642761 countsByYear W31046427612017 @default.
- W3104642761 countsByYear W31046427612020 @default.
- W3104642761 crossrefType "posted-content" @default.
- W3104642761 hasAuthorship W3104642761A5059301033 @default.
- W3104642761 hasAuthorship W3104642761A5068905398 @default.
- W3104642761 hasAuthorship W3104642761A5071036018 @default.
- W3104642761 hasBestOaLocation W31046427611 @default.
- W3104642761 hasConcept C104317684 @default.
- W3104642761 hasConcept C111936080 @default.
- W3104642761 hasConcept C150194340 @default.
- W3104642761 hasConcept C167625842 @default.
- W3104642761 hasConcept C199216141 @default.
- W3104642761 hasConcept C21592294 @default.
- W3104642761 hasConcept C54355233 @default.
- W3104642761 hasConcept C70721500 @default.