Matches in SemOpenAlex for { <https://semopenalex.org/work/W1761901537> ?p ?o ?g. }
- W1761901537 abstract "One of the most important and challenging ``knowledge extraction' tasks in bioinformatics is the reverse engineering of gene regulatory networks (GRNs) from DNA microarray gene expression data. Indeed, as a result of the development of high-throughput data-collection techniques, biology is experiencing a data flood phenomenon that pushes biologists toward a new view of biology--systems biology--that aims at system-level understanding of biological systems.Unfortunately, even for small model organisms such as the yeast Saccharomyces cerevisiae, the number p of genes is much larger than the number n of expression data samples. The dimensionality issue induced by this ``small n, large p' data setting renders standard statistical learning methods inadequate. Restricting the complexity of the models enables to deal with this serious impediment. Indeed, by introducing (a priori undesirable) bias in the model selection procedure, one reduces the variance of the selected model thereby increasing its accuracy.Gaussian graphical models (GGMs) have proven to be a very powerful formalism to infer GRNs from expression data. Standard GGM selection techniques can unfortunately not be used in the ``small n, large p' data setting. One way to overcome this issue is to resort to regularization. In particular, shrinkage estimators of the covariance matrix--required to infer GGMs--have proven to be very effective. Our first contribution consists in a new shrinkage estimator that improves upon existing ones through the use of a Monte Carlo (parametric bootstrap) procedure.Another approach to GGM selection in the ``small n, large p' data setting consists in reverse engineering limited-order partial correlation graphs (q-partial correlation graphs) to approximate GGMs. Our second contribution consists in an inference algorithm, the q-nested procedure, that builds a sequence of nested q-partial correlation graphs to take advantage of the smaller order graphs' topology to infer higher order graphs. This allows us to significantly speed up the inference of such graphs and to avoid problems related to multiple testing. Consequently, we are able to consider higher order graphs, thereby increasing the accuracy of the inferred graphs.Another important challenge in bioinformatics is the prediction of gene function. An example of such a prediction task is the identification of genes that are targets of the nitrogen catabolite repression (NCR) selection mechanism in the yeast Saccharomyces cerevisiae. The study of model organisms such as Saccharomyces cerevisiae is indispensable for the understanding of more complex organisms. Our third contribution consists in extending the standard two-class classification approach by enriching the set of variables and comparing several feature selection techniques and classification algorithms.Finally, our fourth contribution formulates the prediction of NCR target genes as a network inference task. We use GGM selection to infer multivariate dependencies between genes, and, starting from a set of genes known to be sensitive to NCR, we classify the remaining genes. We hence avoid problems related to the choice of a negative training set and take advantage of the robustness of GGM selection techniques in the ``small n, large p' data setting." @default.
- W1761901537 created "2016-06-24" @default.
- W1761901537 creator A5050279993 @default.
- W1761901537 creator A5072869049 @default.
- W1761901537 date "2009-07-02" @default.
- W1761901537 modified "2023-09-26" @default.
- W1761901537 title "Gaussian graphical model selection for gene regulatory network reverse engineering and function prediction" @default.
- W1761901537 cites W103930971 @default.
- W1761901537 cites W123451444 @default.
- W1761901537 cites W127548258 @default.
- W1761901537 cites W135270437 @default.
- W1761901537 cites W1481919380 @default.
- W1761901537 cites W1485715719 @default.
- W1761901537 cites W1487784090 @default.
- W1761901537 cites W1491459594 @default.
- W1761901537 cites W1491718027 @default.
- W1761901537 cites W1494413412 @default.
- W1761901537 cites W1511926043 @default.
- W1761901537 cites W1513861746 @default.
- W1761901537 cites W1513869833 @default.
- W1761901537 cites W1516866308 @default.
- W1761901537 cites W1521708890 @default.
- W1761901537 cites W1524326598 @default.
- W1761901537 cites W1524761913 @default.
- W1761901537 cites W1526097585 @default.
- W1761901537 cites W1528046055 @default.
- W1761901537 cites W1530964327 @default.
- W1761901537 cites W1542184596 @default.
- W1761901537 cites W1542652324 @default.
- W1761901537 cites W1544923801 @default.
- W1761901537 cites W1545459885 @default.
- W1761901537 cites W1549935796 @default.
- W1761901537 cites W1551066950 @default.
- W1761901537 cites W1551490917 @default.
- W1761901537 cites W1554944419 @default.
- W1761901537 cites W1560107318 @default.
- W1761901537 cites W1564947197 @default.
- W1761901537 cites W1571975558 @default.
- W1761901537 cites W1573602524 @default.
- W1761901537 cites W1584444527 @default.
- W1761901537 cites W1588787385 @default.
- W1761901537 cites W1598266570 @default.
- W1761901537 cites W16136897 @default.
- W1761901537 cites W1619226191 @default.
- W1761901537 cites W1629653960 @default.
- W1761901537 cites W1660272337 @default.
- W1761901537 cites W1663973292 @default.
- W1761901537 cites W1746680969 @default.
- W1761901537 cites W1766594731 @default.
- W1761901537 cites W1769824028 @default.
- W1761901537 cites W1774711529 @default.
- W1761901537 cites W178619957 @default.
- W1761901537 cites W1829443747 @default.
- W1761901537 cites W1840338487 @default.
- W1761901537 cites W1922017469 @default.
- W1761901537 cites W1925571297 @default.
- W1761901537 cites W1957654651 @default.
- W1761901537 cites W1963861745 @default.
- W1761901537 cites W1964356176 @default.
- W1761901537 cites W1966311386 @default.
- W1761901537 cites W1966626540 @default.
- W1761901537 cites W1967030981 @default.
- W1761901537 cites W1971224531 @default.
- W1761901537 cites W1971266246 @default.
- W1761901537 cites W1971672021 @default.
- W1761901537 cites W1976526581 @default.
- W1761901537 cites W1977339054 @default.
- W1761901537 cites W1977675996 @default.
- W1761901537 cites W1983916623 @default.
- W1761901537 cites W1985094434 @default.
- W1761901537 cites W1988047583 @default.
- W1761901537 cites W1989284002 @default.
- W1761901537 cites W1989373272 @default.
- W1761901537 cites W1989885743 @default.
- W1761901537 cites W1990512452 @default.
- W1761901537 cites W1992452843 @default.
- W1761901537 cites W1995945562 @default.
- W1761901537 cites W1999512674 @default.
- W1761901537 cites W19996625 @default.
- W1761901537 cites W2005486846 @default.
- W1761901537 cites W2007471837 @default.
- W1761901537 cites W2008107402 @default.
- W1761901537 cites W2010288201 @default.
- W1761901537 cites W2017337590 @default.
- W1761901537 cites W2020925091 @default.
- W1761901537 cites W2021850405 @default.
- W1761901537 cites W2022211548 @default.
- W1761901537 cites W2024083363 @default.
- W1761901537 cites W2024105897 @default.
- W1761901537 cites W2029826056 @default.
- W1761901537 cites W2034162523 @default.
- W1761901537 cites W2034562813 @default.
- W1761901537 cites W2035184080 @default.
- W1761901537 cites W2036338631 @default.
- W1761901537 cites W2039408393 @default.
- W1761901537 cites W2040870580 @default.
- W1761901537 cites W2040884411 @default.
- W1761901537 cites W2044600950 @default.
- W1761901537 cites W2045638068 @default.
- W1761901537 cites W2046074668 @default.