Matches in SemOpenAlex for { <https://semopenalex.org/work/W2509668036> ?p ?o ?g. }
- W2509668036 endingPage "286.e4" @default.
- W2509668036 startingPage "278" @default.
- W2509668036 abstract "•Considering DNA shape features improved the prediction of TF binding in vivo•DNA shape at flanking regions of binding sites refined the prediction of TF binding•Larger improvements were observed for the E2F and MADS-domain TF families•Propeller twist at specific nucleotide positions of the MADS-box contributed most Interactions of transcription factors (TFs) with DNA comprise a complex interplay between base-specific amino acid contacts and readout of DNA structure. Recent studies have highlighted the complementarity of DNA sequence and shape in modeling TF binding in vitro. Here, we have provided a comprehensive evaluation of in vivo datasets to assess the predictive power obtained by augmenting various DNA sequence-based models of TF binding sites (TFBSs) with DNA shape features (helix twist, minor groove width, propeller twist, and roll). Results from 400 human ChIP-seq datasets for 76 TFs show that combining DNA shape features with position-specific scoring matrix (PSSM) scores improves TFBS predictions. Improvement has also been observed using TF flexible models and a machine-learning approach using a binary encoding of nucleotides in lieu of PSSMs. Incorporating DNA shape information is most beneficial for E2F and MADS-domain TF families. Our findings indicate that incorporating DNA sequence and shape information benefits the modeling of TF binding under complex in vivo conditions. Interactions of transcription factors (TFs) with DNA comprise a complex interplay between base-specific amino acid contacts and readout of DNA structure. Recent studies have highlighted the complementarity of DNA sequence and shape in modeling TF binding in vitro. Here, we have provided a comprehensive evaluation of in vivo datasets to assess the predictive power obtained by augmenting various DNA sequence-based models of TF binding sites (TFBSs) with DNA shape features (helix twist, minor groove width, propeller twist, and roll). Results from 400 human ChIP-seq datasets for 76 TFs show that combining DNA shape features with position-specific scoring matrix (PSSM) scores improves TFBS predictions. Improvement has also been observed using TF flexible models and a machine-learning approach using a binary encoding of nucleotides in lieu of PSSMs. Incorporating DNA shape information is most beneficial for E2F and MADS-domain TF families. Our findings indicate that incorporating DNA sequence and shape information benefits the modeling of TF binding under complex in vivo conditions. One of many mechanisms that control gene expression, transcriptional regulation involves transcription factors (TFs) as key proteins (Jacob and Monod, 1961Jacob F. Monod J. Genetic regulatory mechanisms in the synthesis of proteins.J. Mol. Biol. 1961; 3: 318-356Crossref PubMed Scopus (2977) Google Scholar, Ptashne and Gann, 1997Ptashne M. Gann A. Transcriptional activation by recruitment.Nature. 1997; 386: 569-577Crossref PubMed Scopus (940) Google Scholar). Most TFs are sequence-specific DNA binding proteins that recognize specific genome positions through a complex interplay between nucleotide-amino-acid contacts (base readout) and readout of DNA structure (shape readout) (Slattery et al., 2014Slattery M. Zhou T. Yang L. Dantas Machado A.C. Gordân R. Rohs R. Absence of a simple code: how transcription factors read the genome.Trends Biochem. Sci. 2014; 39: 381-399Abstract Full Text Full Text PDF PubMed Scopus (306) Google Scholar). Deciphering how TFs identify and bind specific target sequences—the TF binding sites (TFBSs)—is a key challenge in understanding transcriptional gene regulation (Dror et al., 2016Dror I. Rohs R. Mandel-Gutfreund Y. How motif environment influences transcription factor search dynamics: Finding a needle in a haystack.BioEssays. 2016; 38: 605-612Crossref PubMed Scopus (27) Google Scholar, Wasserman and Sandelin, 2004Wasserman W.W. Sandelin A. Applied bioinformatics for the identification of regulatory elements.Nat. Rev. Genet. 2004; 5: 276-287Crossref PubMed Scopus (852) Google Scholar, Zambelli et al., 2013Zambelli F. Pesole G. Pavesi G. Motif discovery and transcription factor binding sites before and after the next-generation sequencing era.Brief. Bioinform. 2013; 14: 225-237Crossref PubMed Scopus (87) Google Scholar). TFBSs are short and often degenerate sequence motifs. These characteristics make it computationally difficult to model and predict TFBSs at the genomic scale (Badis et al., 2009Badis G. Berger M.F. Philippakis A.A. Talukder S. Gehrke A.R. Jaeger S.A. Chan E.T. Metzler G. Vedenko A. Chen X. et al.Diversity and complexity in DNA recognition by transcription factors.Science. 2009; 324: 1720-1723Crossref PubMed Scopus (715) Google Scholar). Moving beyond initial consensus sequence methods, the classical computational model to describe TFBSs is the position-specific scoring matrix (PSSM), which uses an additive method to summarize frequencies of every nucleotide at each position of the TFBS (Stormo, 2013Stormo G.D. Modeling the specificity of protein-DNA interactions.Quant. Biol. 2013; 1: 115-130Crossref PubMed Scopus (118) Google Scholar). These second-generation models, however, do not capture position interdependencies or variable spacing. Therefore, several experimental assays have been designed to unravel characteristics of TF-DNA interactions at the large scale. In vitro high-throughput (HT) binding assays, such as protein binding microarrays (PBMs) (Berger et al., 2006Berger M.F. Philippakis A.A. Qureshi A.M. He F.S. Estep 3rd, P.W. Bulyk M.L. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities.Nat. Biotechnol. 2006; 24: 1429-1435Crossref PubMed Scopus (491) Google Scholar), HT SELEX (Jolma et al., 2010Jolma A. Kivioja T. Toivonen J. Cheng L. Wei G. Enge M. Taipale M. Vaquerizas J.M. Yan J. Sillanpää M.J. et al.Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities.Genome Res. 2010; 20: 861-873Crossref PubMed Scopus (295) Google Scholar, Zhao et al., 2009Zhao Y. Granas D. Stormo G.D. Inferring binding energies from selected binding sites.PLoS Comput. Biol. 2009; 5: e1000590Crossref PubMed Scopus (160) Google Scholar), and SELEX-seq (Slattery et al., 2011Slattery M. Riley T. Liu P. Abe N. Gomez-Alcala P. Dror I. Zhou T. Rohs R. Honig B. Bussemaker H.J. Mann R.S. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins.Cell. 2011; 147: 1270-1282Abstract Full Text Full Text PDF PubMed Scopus (328) Google Scholar), expose DNA sequences selected by TFs and reveal their binding preferences. Chromatin immunoprecipitation sequencing (ChIP-seq) represents the in vivo counterpart of these in vitro assays, allowing for the identification of DNA regions bound by a targeted TF at the genomic scale (Johnson et al., 2007Johnson D.S. Mortazavi A. Myers R.M. Wold B. Genome-wide mapping of in vivo protein-DNA interactions.Science. 2007; 316: 1497-1502Crossref PubMed Scopus (1992) Google Scholar). Large-scale data derived from HT experiments highlight higher-order positional interaction features of TFBSs that cannot be captured by classical PSSMs, even though the methods based on these traditional models perform quite well (Weirauch et al., 2013Weirauch M.T. Cote A. Norel R. Annala M. Zhao Y. Riley T.R. Saez-Rodriguez J. Cokelaer T. Vedenko A. Talukder S. et al.DREAM5 ConsortiumEvaluation of methods for modeling transcription factor sequence specificity.Nat. Biotechnol. 2013; 31: 126-134Crossref PubMed Scopus (246) Google Scholar). Recently, computational advances have used experimental assays to construct sophisticated models that capture a broad range of TFBS representations. For instance, PSSMs have been extended to dinucleotides to capture interrelationships within TFBSs (Siddharthan, 2010Siddharthan R. Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix.PLoS ONE. 2010; 5: e9722Crossref PubMed Scopus (61) Google Scholar). Using PBM data, binding energy models include energy parameters to describe contributions of dinucleotides to binding affinity (Zhao et al., 2012Zhao Y. Ruan S. Pandey M. Stormo G.D. Improved models for transcription factor binding site identification using nonindependent interactions.Genetics. 2012; 191: 781-790Crossref PubMed Scopus (92) Google Scholar). These models describe TF-DNA binding specificity well in cases in which PSSMs have performed insufficiently. Utilizing ChIP-seq data, we developed the TF flexible model (TFFM) framework to improve in vivo prediction of TFBSs (Mathelier and Wasserman, 2013Mathelier A. Wasserman W.W. The next generation of transcription factor binding site prediction.PLoS Comput. Biol. 2013; 9: e1003214Crossref PubMed Scopus (114) Google Scholar). TFFMs capture interdependencies of successive nucleotides within TFBSs and the flexible length of TFBSs within a single hidden Markov model framework. The above-mentioned third-generation methods enable TFBS prediction by representing sequence properties. A parallel approach utilizes the 3D DNA structure, or DNA shape, to capture, at least in part, the interdependencies between nucleotide positions within TFBSs (Gordân et al., 2013Gordân R. Shen N. Dror I. Zhou T. Horton J. Rohs R. Bulyk M.L. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape.Cell Rep. 2013; 3: 1093-1104Abstract Full Text Full Text PDF PubMed Scopus (186) Google Scholar, Tsai et al., 2015Tsai Z.T.-Y. Shiu S.-H. Tsai H.-K. Contribution of sequence motif, chromatin state, and dna structure features to predictive models of transcription factor binding in yeast.PLoS Comput. Biol. 2015; 11: e1004418Crossref PubMed Scopus (21) Google Scholar, Yang and Ramsey, 2015Yang J. Ramsey S.A. A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites.Bioinformatics. 2015; 31: 3445-3450Crossref PubMed Scopus (15) Google Scholar, Zhou et al., 2015Zhou T. Shen N. Yang L. Abe N. Horton J. Mann R.S. Bussemaker H.J. Gordân R. Rohs R. Quantitative modeling of transcription factor binding specificities using DNA shape.Proc. Natl. Acad. Sci. USA. 2015; 112: 4654-4659Crossref PubMed Scopus (146) Google Scholar). Large-scale DNA structural information can be computed by the DNAshape method (Zhou et al., 2013Zhou T. Yang L. Lu Y. Dror I. Dantas Machado A.C. Ghane T. Di Felice R. Rohs R. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale.Nucleic Acids Res. 2013; 41: W56-W62Crossref PubMed Scopus (182) Google Scholar), which computes four DNA shape features: helix twist (HelT), minor groove width (MGW), propeller twist (ProT), and Roll. Recent studies have demonstrated the complementary role of DNA sequence and shape information in determining protein-DNA binding specificity in vitro (Joshi et al., 2007Joshi R. Passner J.M. Rohs R. Jain R. Sosinsky A. Crickmore M.A. Jacob V. Aggarwal A.K. Honig B. Mann R.S. Functional specificity of a Hox protein mediated by the recognition of minor groove structure.Cell. 2007; 131: 530-543Abstract Full Text Full Text PDF PubMed Scopus (261) Google Scholar, Rohs et al., 2009Rohs R. West S.M. Sosinsky A. Liu P. Mann R.S. Honig B. The role of DNA shape in protein-DNA recognition.Nature. 2009; 461: 1248-1253Crossref PubMed Scopus (703) Google Scholar, Slattery et al., 2011Slattery M. Riley T. Liu P. Abe N. Gomez-Alcala P. Dror I. Zhou T. Rohs R. Honig B. Bussemaker H.J. Mann R.S. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins.Cell. 2011; 147: 1270-1282Abstract Full Text Full Text PDF PubMed Scopus (328) Google Scholar). For example, the binding specificity of Hox proteins was analyzed using SELEX-seq data to show the direct role of DNA shape features in protein-DNA readout (Abe et al., 2015Abe N. Dror I. Yang L. Slattery M. Zhou T. Bussemaker H.J. Rohs R. Mann R.S. Deconvolving the recognition of DNA shape from sequence.Cell. 2015; 161: 307-318Abstract Full Text Full Text PDF PubMed Scopus (121) Google Scholar). Using PBM and SELEX-seq data, we showed that complementing DNA sequence with shape information enhanced the prediction of TF binding affinities (Zhou et al., 2015Zhou T. Shen N. Yang L. Abe N. Horton J. Mann R.S. Bussemaker H.J. Gordân R. Rohs R. Quantitative modeling of transcription factor binding specificities using DNA shape.Proc. Natl. Acad. Sci. USA. 2015; 112: 4654-4659Crossref PubMed Scopus (146) Google Scholar). DNA shape information at regions flanking core binding sites was highly predictive of differential binding derived from BunDLE-seq assays (Levo et al., 2015Levo M. Zalckvar E. Sharon E. Dantas Machado A.C. Kalma Y. Lotam-Pompan M. Weinberger A. Yakhini Z. Rohs R. Segal E. Unraveling determinants of transcription factor binding outside the core binding site.Genome Res. 2015; 25: 1018-1029Crossref PubMed Scopus (96) Google Scholar). While previous works have demonstrated that models combining DNA sequence and shape improve quantitative models of TF binding in vitro, we have addressed here three key questions:(1)Do more complex in vivo protein-DNA interactions exhibit similar properties?(2)When DNA shape properties are integrated with sequence-based TFBS prediction methods, do we observe an improvement in performance?(3)Do specific TF families benefit more than others from the integration of DNA shape features in TF binding models? Here, we have capitalized on the availability of DNA shape information extracted from GBshape (Chiu et al., 2015Chiu T.-P. Yang L. Zhou T. Main B.J. Parker S.C.J. Nuzhdin S.V. Tullius T.D. Rohs R. GBshape: a genome browser database for DNA shape annotations.Nucleic Acids Res. 2015; 43: D103-D109Crossref PubMed Scopus (36) Google Scholar), our genome browser database of DNA shape features computed from our DNAshape prediction tool (Zhou et al., 2013Zhou T. Yang L. Lu Y. Dror I. Dantas Machado A.C. Ghane T. Di Felice R. Rohs R. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale.Nucleic Acids Res. 2013; 41: W56-W62Crossref PubMed Scopus (182) Google Scholar), at TF-bound regions derived from ChIP-seq experiments to address the three aforementioned questions. To assess the effects of including DNA structural information in predictions of TFBSs in ChIP-seq datasets, we developed a computational framework combining DNA sequence and shape information to model and predict TFBSs. The availability of numerous ChIP-seq regions enables the application of a discriminative supervised machine learning approach (Libbrecht and Noble, 2015Libbrecht M.W. Noble W.S. Machine learning applications in genetics and genomics.Nat. Rev. Genet. 2015; 16: 321-332Crossref PubMed Scopus (948) Google Scholar). Specifically, a DNA sequence that is considered as a potential TFBS was represented by a (feature) vector that combined 1 to 4n features that encode sequence information and 8n features that capture DNA shape information, where n is DNA sequence length. We encoded DNA sequence information of the putative TFBS by using either the PSSM or TFFM score computed from the sequence, or a binary encoding using 4 bits per nucleotide (Zhou et al., 2015Zhou T. Shen N. Yang L. Abe N. Horton J. Mann R.S. Bussemaker H.J. Gordân R. Rohs R. Quantitative modeling of transcription factor binding specificities using DNA shape.Proc. Natl. Acad. Sci. USA. 2015; 112: 4654-4659Crossref PubMed Scopus (146) Google Scholar). DNA shape-related features are the predicted values of HelT, MGW, ProT, and Roll at each position of the TFBS, extracted from GBshape (Chiu et al., 2015Chiu T.-P. Yang L. Zhou T. Main B.J. Parker S.C.J. Nuzhdin S.V. Tullius T.D. Rohs R. GBshape: a genome browser database for DNA shape annotations.Nucleic Acids Res. 2015; 43: D103-D109Crossref PubMed Scopus (36) Google Scholar). The vector was further augmented with four second-order shape features that capture structural dependencies at adjacent nucleotide positions (Zhou et al., 2015Zhou T. Shen N. Yang L. Abe N. Horton J. Mann R.S. Bussemaker H.J. Gordân R. Rohs R. Quantitative modeling of transcription factor binding specificities using DNA shape.Proc. Natl. Acad. Sci. USA. 2015; 112: 4654-4659Crossref PubMed Scopus (146) Google Scholar) (Figure 1). Assuming that each ChIP-seq region contains a TFBS, we constructed a feature vector for the best hit per ChIP-seq peak and background region predicted by a TF binding profile (PSSM or TFFM) to train a classifier. To discriminate between TF bound (ChIP-seq) and unbound (background) regions, we used a gradient boosting classifier, which is an ensemble machine learning classifier that combines multiple weak learners to improve predictive power (Friedman et al., 2001Friedman J. Hastie T. Tibshirani R. The elements of statistical learning. Springer series in statistics. Springer, 2001Google Scholar). The gradient boosting classifier was based on decision trees that, given an input feature vector, output the probability of the feature vector to be associated with a ChIP-seq peak or a background region. This approach naturally handles heterogenous features (e.g., DNA sequence and shape information), is robust to outliers, and is able to manage irrelevant input, such as noise from ChIP-seq experiments (Friedman et al., 2001Friedman J. Hastie T. Tibshirani R. The elements of statistical learning. Springer series in statistics. Springer, 2001Google Scholar). Classifiers combining PSSM score, TFFM score, or 4-bits nucleotide encoding with DNA shape features are referred to as PSSM + DNA shape, TFFM + DNA shape, or 4-bits + DNA shape classifiers, respectively. Open-source Python software for generating and using these classifiers is provided at https://github.com/amathelier/DNAshapedTFBS. We compiled a set of 400 uniformly processed human ENCODE ChIP-seq datasets for which a JASPAR TF binding profile (Mathelier et al., 2014Mathelier A. Zhao X. Zhang A.W. Parcy F. Worsley-Hunt R. Arenillas D.J. Buchman S. Chen C.Y. Chou A. Ienasescu H. et al.JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles.Nucleic Acids Res. 2014; 42: D142-D147Crossref PubMed Scopus (775) Google Scholar) was available for the corresponding immunoprecipitated (ChIPed) TF (Data S1). These datasets, covering 76 TFs, were used to compare the predictive powers of three computational models that consider DNA sequence information alone with their DNA shape-augmented classifiers. The first two DNA sequence-based models are PSSMs and TFFMs, which are widely used to score TFBSs in ChIP-seq datasets. The third model, the 4-bits classifier, is a discriminative model that uses a binary encoding of DNA sequence information (Zhou et al., 2015Zhou T. Shen N. Yang L. Abe N. Horton J. Mann R.S. Bussemaker H.J. Gordân R. Rohs R. Quantitative modeling of transcription factor binding specificities using DNA shape.Proc. Natl. Acad. Sci. USA. 2015; 112: 4654-4659Crossref PubMed Scopus (146) Google Scholar). Here, the predictive power of a model refers to its ability to discriminate ChIP-seq regions (defined as the 50-bp region surrounding each side of the ChIP-seq peak maximum) from matched background sequences. The 50-bp regions were selected because they are enriched for TFBSs (Worsley Hunt et al., 2014Worsley Hunt R. Mathelier A. Del Peso L. Wasserman W.W. Improving analysis of transcription factor binding sites within ChIP-seq data based on topological motif enrichment.BMC Genomics. 2014; 15: 472Crossref PubMed Scopus (28) Google Scholar, Wilbanks and Facciotti, 2010Wilbanks E.G. Facciotti M.T. Evaluation of algorithm performance in ChIP-seq peak detection.PLoS ONE. 2010; 5: e11471Crossref PubMed Scopus (211) Google Scholar). To avoid sequence composition biases, we selected each set of background sequences to match either the G+C (%GC) content or dinucleotide composition of the ChIP-seq regions. Unless otherwise indicated, background sequences matching the %GC content of ChIP-seq regions were used in the following results. Predictive powers of PSSM scores and PSSM + DNA shape classifiers were assessed through 10-fold cross-validation (CV). We optimized the PSSMs derived from JASPAR TF binding profiles with the perceptron algorithm using the DiMO tool (Patel and Stormo, 2014Patel R.Y. Stormo G.D. Discriminative motif optimization based on perceptron training.Bioinformatics. 2014; 30: 941-948Crossref PubMed Scopus (18) Google Scholar) on the constructed foreground and background training sets (range: 495–83,123, median: 15,171, mean: 21,098, SD: 17,220 sequences). Parameters of PSSM + DNA shape classifiers were learned from the same training sets. Vectors used by the classifiers for a ChIP-seq region correspond to the combination of the best PSSM score in the region and the 8n DNA shape feature values computed for this hit. To assess predictive power, we varied the threshold for scores to compute the recall (sensitivity), specificity, and precision values. Areas under the precision and recall curve (AUPRC) and the receiver-operating characteristic curve (AUROC) were computed for each model on each ChIP-seq dataset to evaluate predictive power. Unless otherwise noted, we provide the AUPRC values and the p values for significance calculated by a Wilcoxon signed-rank test. By comparing AUPRC values derived from the PSSM scores or PSSM + DNA shape classifiers, we found that shape-augmented classifiers performed better for all 400 ChIP-seq datasets (p = 2.7 × 10−67; Figure 2A). By considering the median AUPRC values per TF over all ChIP-seq datasets associated with the TF, we observed consistent improvement for all TFs when DNA shape features were incorporated (p = 3.6 × 10−14; Figure 2B). We computed the difference of discriminative power between the two models (Figure 2C) to assess the improvement obtained by using the PSSM + DNA shape classifiers. Using the same analyses, we found that the predictive power of the TFFM + DNA shape classifiers is better than that of the TFFMs for 396/400 ChIP-seq datasets (p = 4.4 × 10−67; Data S2). Classifiers performed strictly better than TFFMs for all TFs when we considered the median AUPRC values per TF (p = 3.6 × 10−14; Data S2). Finally, we compared the 4-bits and 4-bits + DNA shape classifiers, which were trained and tested on sequences of the highest-scoring hit per ChIP-seq region derived from the PSSMs. DNA shape-augmented classifiers performed consistently better than 4-bits classifiers for 365/400 ChIP-seq datasets (p = 2.7 × 10−57) and 70/76 TFs (p = 1.3 × 10−12) when considering the median AUPRC values (Data S2). We confirmed the improvement in discriminative power of the models incorporating DNA shape features by considering background sequences matching the dinucleotide composition of ChIP-seq regions (Data S3) and TF-bound regions recurrently found in multiple ChIP-seq datasets for the same TF (Data S4). The relative improvement obtained when incorporating DNA shape information varied depending on the baseline DNA sequence-based approach. Unsurprisingly, the 4-bits + DNA shape classifiers exhibited a smaller improvement over the 4-bits classifiers compared to the shape-based improvements obtained with PSSMs and TFFMs. The higher baseline performance of the 4-bits method is consistent with the superiority of discriminative over generative models to distinguish bound from unbound regions in ChIP-seq (Libbrecht and Noble, 2015Libbrecht M.W. Noble W.S. Machine learning applications in genetics and genomics.Nat. Rev. Genet. 2015; 16: 321-332Crossref PubMed Scopus (948) Google Scholar) (Figure 3A; Data S5). Nonetheless, PSSM + DNA shape classifiers performed consistently better than 4-bits + DNA shape classifiers for 344/400 datasets (p = 7.7 × 10−43; Figure 3B) and 64/76 TFs (p = 1.0 × 10−8; Data S5). Although 4-bits classifiers outperformed PSSM scores, the higher discriminative power of PSSM + DNA shape compared to 4-bits + DNA shape classifiers reinforces the capacity of DNA shape features to improve TFBS predictions in ChIP-seq datasets. Importantly, the combination of sequence information (captured by PSSMs, TFFMs, or 4-bits classifiers) with DNA shape properties performed better than generative (PSSM and TFFM) and discriminative (4-bits classifier) approaches modeling DNA sequence, indicating that DNA shape provides additional information. Although the utility of DNA shape to predict TFBSs was reported before (Abe et al., 2015Abe N. Dror I. Yang L. Slattery M. Zhou T. Bussemaker H.J. Rohs R. Mann R.S. Deconvolving the recognition of DNA shape from sequence.Cell. 2015; 161: 307-318Abstract Full Text Full Text PDF PubMed Scopus (121) Google Scholar, Yang and Ramsey, 2015Yang J. Ramsey S.A. A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites.Bioinformatics. 2015; 31: 3445-3450Crossref PubMed Scopus (15) Google Scholar, Yang et al., 2014Yang L. Zhou T. Dror I. Mathelier A. Wasserman W.W. Gordân R. Rohs R. TFBSshape: a motif database for DNA shape features of transcription factor binding sites.Nucleic Acids Res. 2014; 42: D148-D155Crossref PubMed Scopus (91) Google Scholar, Zhou et al., 2015Zhou T. Shen N. Yang L. Abe N. Horton J. Mann R.S. Bussemaker H.J. Gordân R. Rohs R. Quantitative modeling of transcription factor binding specificities using DNA shape.Proc. Natl. Acad. Sci. USA. 2015; 112: 4654-4659Crossref PubMed Scopus (146) Google Scholar), we provide evidence, from an extensive collection of 400 human in vivo datasets for 76 TFs, that this observation is generalizable and relevant to noisy environments and data (Fan and Struhl, 2009Fan X. Struhl K. Where does mediator bind in vivo?.PLoS ONE. 2009; 4: e5029Crossref PubMed Scopus (54) Google Scholar, Worsley Hunt et al., 2014Worsley Hunt R. Mathelier A. Del Peso L. Wasserman W.W. Improving analysis of transcription factor binding sites within ChIP-seq data based on topological motif enrichment.BMC Genomics. 2014; 15: 472Crossref PubMed Scopus (28) Google Scholar, Worsley Hunt and Wasserman, 2014Worsley Hunt R. Wasserman W.W. Non-targeted transcription factors motifs are a systemic component of ChIP-seq datasets.Genome Biol. 2014; 15: 412Crossref PubMed Scopus (39) Google Scholar, Jain et al., 2015Jain D. Baldi S. Zabel A. Straub T. Becker P.B. Active promoters give rise to false positive ‘Phantom Peaks’ in ChIP-seq experiments.Nucleic Acids Res. 2015; 43: 6959-6968Crossref PubMed Scopus (94) Google Scholar, Park et al., 2013Park D. Lee Y. Bhupindersingh G. Iyer V.R. Widespread misinterpretable ChIP-seq bias in yeast.PLoS ONE. 2013; 8: e83506Crossref PubMed Scopus (100) Google Scholar, Teytelman et al., 2013Teytelman L. Thurtle D.M. Rine J. van Oudenaarden A. Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins.Proc. Natl. Acad. Sci. USA. 2013; 110: 18602-18607Crossref PubMed Scopus (261) Google Scholar). Sequences immediately flanking TFBSs have been previously shown to contribute to TF binding specificity (Gordân et al., 2013Gordân R. Shen N. Dror I. Zhou T. Horton J. Rohs R. Bulyk M.L. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape.Cell Rep. 2013; 3: 1093-1104Abstract Full Text Full Text PDF PubMed Scopus (186) Google Scholar), which is determined, in part, by DNA shape outside the core binding sites (Afek et al., 2014Afek A. Schipper J.L. Horton J. Gordân R. Lukatsky D.B. Protein-DNA binding in the absence of specific base-pair recognition.Proc. Natl. Acad. Sci. USA. 2014; 111: 17140-17145Crossref PubMed Scopus (72) Google Scholar, Barozzi et al., 2014Barozzi I. Simonatto M. Bonifacio S. Yang L. Rohs R. Ghisletti S. Natoli G. Coregulation of transcription factor binding and nucleosome occupancy through DNA features of mammalian enhancers.Mol. Cell. 2014; 54: 844-857Abstract Full Text Full Text PDF PubMed Scopus (142) Google Scholar, Dror et al., 2015Dror I. Golan T. Levy C. Rohs R. Mandel-Gutfreund Y. A widespread role of the motif environment in transcription factor binding across diverse protein families.Genome Res. 2015; 25: 1268-1280Crossref PubMed Scopus (81) Google Scholar). We extended our DNA shape-augmented models to consider eight DNA shape features at 15-bp-long regions 5’ and 3’ of the TFBSs, as in Barozzi et al., 2014Barozzi I. Simonatto M. Bonifacio S. Yang L. Rohs R. Ghisletti S. Natoli G. Coregulation of transcription factor binding and nucleosome occupancy through DNA features of mammalian enhancers.Mol. Cell. 2014; 54: 844-857Abstract Full Text Full Text PDF PubMed Scopus (142) Google Scholar. Augmenting DNA shape-based classifiers with additional DNA shape information from flanking sequences has improved the discriminatory power of classifiers trained using 10-fold CV for 378 (∼94%), 373 (∼93%), and 375 (∼94%) datasets compared to PSSM + DNA shape, TFFM + DNA shape, and 4-bits + DNA shape classifiers, respectively (Figure 4; Data S6). Our findings agree with results from in vitro studies of the role of flanking regions in TF-DNA binding (Dror et al., 2016Dror I. Rohs R. Mandel-Gutfreund Y. How motif environment influences transcription factor search dynamics: Finding a needle in a haystack.BioEssays. 2016; 38: 605-612Crossref PubMed Scopus (27) Google Scholar, Gordân et al., 2013Gordân R. Shen N. Dror I. Zhou T. Horton J. Rohs R. Bulyk M.L. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape.Cell Rep. 2013; 3: 1093-1104Abstract Full Text Full Text PDF PubMed Scopus (186) Google Scholar, Levo et al., 2015Levo M. Zalckvar E. Sharon E. Dantas Machado A.C. K" @default.
- W2509668036 created "2016-09-16" @default.
- W2509668036 creator A5019996279 @default.
- W2509668036 creator A5026664289 @default.
- W2509668036 creator A5029522686 @default.
- W2509668036 creator A5041587003 @default.
- W2509668036 creator A5081578390 @default.
- W2509668036 creator A5087735169 @default.
- W2509668036 date "2016-09-01" @default.
- W2509668036 modified "2023-10-16" @default.
- W2509668036 title "DNA Shape Features Improve Transcription Factor Binding Site Predictions In Vivo" @default.
- W2509668036 cites W1505191356 @default.
- W2509668036 cites W1515749105 @default.
- W2509668036 cites W1610193264 @default.
- W2509668036 cites W1964733653 @default.
- W2509668036 cites W1987122345 @default.
- W2509668036 cites W1990893875 @default.
- W2509668036 cites W1992010506 @default.
- W2509668036 cites W1994122295 @default.
- W2509668036 cites W1995841566 @default.
- W2509668036 cites W2012114217 @default.
- W2509668036 cites W2012627860 @default.
- W2509668036 cites W2013414181 @default.
- W2509668036 cites W2022736304 @default.
- W2509668036 cites W2029201460 @default.
- W2509668036 cites W2032619610 @default.
- W2509668036 cites W2042392223 @default.
- W2509668036 cites W2044164623 @default.
- W2509668036 cites W2061939373 @default.
- W2509668036 cites W2064604026 @default.
- W2509668036 cites W2065192733 @default.
- W2509668036 cites W2069520341 @default.
- W2509668036 cites W2071962674 @default.
- W2509668036 cites W2080466240 @default.
- W2509668036 cites W2085055961 @default.
- W2509668036 cites W2087058596 @default.
- W2509668036 cites W2087400406 @default.
- W2509668036 cites W2087950896 @default.
- W2509668036 cites W2102473929 @default.
- W2509668036 cites W2102619694 @default.
- W2509668036 cites W2105389823 @default.
- W2509668036 cites W2110738405 @default.
- W2509668036 cites W2111826500 @default.
- W2509668036 cites W2114850508 @default.
- W2509668036 cites W2122429294 @default.
- W2509668036 cites W2125235344 @default.
- W2509668036 cites W2127572393 @default.
- W2509668036 cites W2129795133 @default.
- W2509668036 cites W2136124952 @default.
- W2509668036 cites W2143172590 @default.
- W2509668036 cites W2146890600 @default.
- W2509668036 cites W2148853260 @default.
- W2509668036 cites W2152121035 @default.
- W2509668036 cites W2152264745 @default.
- W2509668036 cites W2155925461 @default.
- W2509668036 cites W2159324718 @default.
- W2509668036 cites W2159942968 @default.
- W2509668036 cites W2164235180 @default.
- W2509668036 cites W2166755632 @default.
- W2509668036 cites W2167296922 @default.
- W2509668036 cites W2171564701 @default.
- W2509668036 cites W2172368992 @default.
- W2509668036 cites W2259938310 @default.
- W2509668036 cites W2268267797 @default.
- W2509668036 cites W2338876157 @default.
- W2509668036 cites W2401121789 @default.
- W2509668036 cites W4230266413 @default.
- W2509668036 doi "https://doi.org/10.1016/j.cels.2016.07.001" @default.
- W2509668036 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/5042832" @default.
- W2509668036 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/27546793" @default.
- W2509668036 hasPublicationYear "2016" @default.
- W2509668036 type Work @default.
- W2509668036 sameAs 2509668036 @default.
- W2509668036 citedByCount "111" @default.
- W2509668036 countsByYear W25096680362016 @default.
- W2509668036 countsByYear W25096680362017 @default.
- W2509668036 countsByYear W25096680362018 @default.
- W2509668036 countsByYear W25096680362019 @default.
- W2509668036 countsByYear W25096680362020 @default.
- W2509668036 countsByYear W25096680362021 @default.
- W2509668036 countsByYear W25096680362022 @default.
- W2509668036 countsByYear W25096680362023 @default.
- W2509668036 crossrefType "journal-article" @default.
- W2509668036 hasAuthorship W2509668036A5019996279 @default.
- W2509668036 hasAuthorship W2509668036A5026664289 @default.
- W2509668036 hasAuthorship W2509668036A5029522686 @default.
- W2509668036 hasAuthorship W2509668036A5041587003 @default.
- W2509668036 hasAuthorship W2509668036A5081578390 @default.
- W2509668036 hasAuthorship W2509668036A5087735169 @default.
- W2509668036 hasBestOaLocation W25096680361 @default.
- W2509668036 hasConcept C101762097 @default.
- W2509668036 hasConcept C104317684 @default.
- W2509668036 hasConcept C107824862 @default.
- W2509668036 hasConcept C150194340 @default.
- W2509668036 hasConcept C207001950 @default.
- W2509668036 hasConcept C3662595 @default.
- W2509668036 hasConcept C54355233 @default.
- W2509668036 hasConcept C552990157 @default.