Matches in SemOpenAlex for { <https://semopenalex.org/work/W29233750> ?p ?o ?g. }
- W29233750 abstract "Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures.These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate.Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks.In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available.In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available.Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity.Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains.This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain.Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'.In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific.In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences.Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks." @default.
- W29233750 created "2016-06-24" @default.
- W29233750 creator A5081504592 @default.
- W29233750 date "2012-01-01" @default.
- W29233750 modified "2023-09-24" @default.
- W29233750 title "Prediction of transcriptional regulatory interactions in bacteria : a comparative genomics approach" @default.
- W29233750 cites W1263422348 @default.
- W29233750 cites W1491459594 @default.
- W29233750 cites W1498410508 @default.
- W29233750 cites W1510073064 @default.
- W29233750 cites W1511789414 @default.
- W29233750 cites W1515928236 @default.
- W29233750 cites W1519815462 @default.
- W29233750 cites W1543459401 @default.
- W29233750 cites W1576520375 @default.
- W29233750 cites W1583996334 @default.
- W29233750 cites W1592870802 @default.
- W29233750 cites W1605121899 @default.
- W29233750 cites W1645874374 @default.
- W29233750 cites W1742988005 @default.
- W29233750 cites W1797459856 @default.
- W29233750 cites W1838422459 @default.
- W29233750 cites W1898581989 @default.
- W29233750 cites W1931298889 @default.
- W29233750 cites W1965399445 @default.
- W29233750 cites W1973670631 @default.
- W29233750 cites W1979136055 @default.
- W29233750 cites W1981509058 @default.
- W29233750 cites W1987122345 @default.
- W29233750 cites W1988037580 @default.
- W29233750 cites W1989921625 @default.
- W29233750 cites W1992855765 @default.
- W29233750 cites W1994095434 @default.
- W29233750 cites W1999309464 @default.
- W29233750 cites W2004419591 @default.
- W29233750 cites W2005129098 @default.
- W29233750 cites W2005486255 @default.
- W29233750 cites W2005642805 @default.
- W29233750 cites W2009659277 @default.
- W29233750 cites W2010460143 @default.
- W29233750 cites W2011869597 @default.
- W29233750 cites W2014723793 @default.
- W29233750 cites W2014869951 @default.
- W29233750 cites W2018527742 @default.
- W29233750 cites W2018934112 @default.
- W29233750 cites W2021497905 @default.
- W29233750 cites W2023098599 @default.
- W29233750 cites W2025853251 @default.
- W29233750 cites W2026257220 @default.
- W29233750 cites W2027662098 @default.
- W29233750 cites W2031121939 @default.
- W29233750 cites W2035564383 @default.
- W29233750 cites W2035776702 @default.
- W29233750 cites W203631273 @default.
- W29233750 cites W2037187863 @default.
- W29233750 cites W2039290359 @default.
- W29233750 cites W2039700732 @default.
- W29233750 cites W2041726559 @default.
- W29233750 cites W2041877620 @default.
- W29233750 cites W2047033234 @default.
- W29233750 cites W2047718759 @default.
- W29233750 cites W2050207914 @default.
- W29233750 cites W2055615325 @default.
- W29233750 cites W2057791956 @default.
- W29233750 cites W2057854870 @default.
- W29233750 cites W2059041636 @default.
- W29233750 cites W2060842569 @default.
- W29233750 cites W2061425021 @default.
- W29233750 cites W2062402656 @default.
- W29233750 cites W2066718232 @default.
- W29233750 cites W2067474031 @default.
- W29233750 cites W2070672503 @default.
- W29233750 cites W2070974192 @default.
- W29233750 cites W2071424950 @default.
- W29233750 cites W2071753700 @default.
- W29233750 cites W2074703985 @default.
- W29233750 cites W2078034546 @default.
- W29233750 cites W2079327011 @default.
- W29233750 cites W2080178217 @default.
- W29233750 cites W2081501570 @default.
- W29233750 cites W2082835279 @default.
- W29233750 cites W2084909105 @default.
- W29233750 cites W2087887622 @default.
- W29233750 cites W2088851040 @default.
- W29233750 cites W2097382368 @default.
- W29233750 cites W2099035403 @default.
- W29233750 cites W2100332892 @default.
- W29233750 cites W2103447044 @default.
- W29233750 cites W2105222452 @default.
- W29233750 cites W2105784079 @default.
- W29233750 cites W2109344242 @default.
- W29233750 cites W2110699703 @default.
- W29233750 cites W2111473801 @default.
- W29233750 cites W2111507859 @default.
- W29233750 cites W2111538555 @default.
- W29233750 cites W2115979107 @default.
- W29233750 cites W2116423958 @default.
- W29233750 cites W2117630412 @default.
- W29233750 cites W2119821739 @default.
- W29233750 cites W2120248467 @default.