Matches in SemOpenAlex for { <https://semopenalex.org/work/W2115465732> ?p ?o ?g. }
- W2115465732 abstract "The prediction of protein-protein interactions (PPI) has recently emerged as an important problem in the fields of bioinformatics and systems biology, due to the fact that most essential cellular processes are mediated by these kinds of interactions. In this thesis we focussed in the prediction of co-complex interactions, where the objective is to identify and characterize protein pairs which are members of the same protein complex.Although high-throughput methods for the direct identification of PPI have been developed in the last years. It has been demonstrated that the data obtained by these methods is often incomplete and suffers from high false-positive and false-negative rates. In order to deal with this technology-driven problem, several machine learning techniques have been employed in the past to improve the accuracy and trustability of predicted protein interacting pairs, demonstrating that the combined use of direct and indirect biological insights can improve the quality of predictive PPI models. This task has been commonly viewed as a binary classification problem. However, the nature of the data creates two major problems. Firstly, the imbalanced class problem due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly, the selection of negative examples is based on some unreliable assumptions which could introduce some bias in the classification results.The first part of this dissertation addresses these drawbacks by exploring the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilize examples of just one class to generate a predictive model which is consequently independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We designed and carried out a performance evaluation study of several OCC methods for this task. We also undertook a comparative performance evaluation with several conventional learning techniques.Furthermore, we pay attention to a new potential drawback which appears to affect the performance of PPI prediction. This is associated with the composition of the positive gold standard set, which contain a high proportion of examples associated with interactions of ribosomal proteins. We demonstrate that this situation indeed biases the classification task, resulting in an over-optimistic performance result. The prediction of non-ribosomal PPI is a much more difficult task. We investigate some strategies in order to improve the performance of this subtask, integrating new kinds of data as well as combining diverse classification models generated from different sets of data.In this thesis, we undertook a preliminary validation study of the new PPI predicted by using OCC methods. To achieve this, we focus in three main aspects: look for biological evidence in the literature that support the new predictions; the analysis of predicted PPI networks properties; and the identification of highly interconnected groups of proteins which can be associated with new protein complexes.Finally, this thesis explores a slightly different area, related to the prediction of PPI types. This is associated with the classification of PPI structures (complexes) contained in the Protein Data Bank (PDB) data base according to its function and binding affinity. Considering the relatively reduced number of crystalized protein complexes available, it is not possible at the moment to link these results with the ones obtained previously for the prediction of PPI complexes. However, this could be possible in the near future when more PPI structures will be available." @default.
- W2115465732 created "2016-06-24" @default.
- W2115465732 creator A5091078408 @default.
- W2115465732 date "2010-01-01" @default.
- W2115465732 modified "2023-09-27" @default.
- W2115465732 title "Machine learning for the prediction of protein-protein interactions" @default.
- W2115465732 cites W134111903 @default.
- W2115465732 cites W1480376833 @default.
- W2115465732 cites W1487588218 @default.
- W2115465732 cites W1488833649 @default.
- W2115465732 cites W1491993603 @default.
- W2115465732 cites W1495061682 @default.
- W2115465732 cites W1498077316 @default.
- W2115465732 cites W1498183065 @default.
- W2115465732 cites W1506285740 @default.
- W2115465732 cites W1516193414 @default.
- W2115465732 cites W1534477342 @default.
- W2115465732 cites W1554663460 @default.
- W2115465732 cites W1554672939 @default.
- W2115465732 cites W1554944419 @default.
- W2115465732 cites W1570448133 @default.
- W2115465732 cites W1571895549 @default.
- W2115465732 cites W1588496946 @default.
- W2115465732 cites W1589195347 @default.
- W2115465732 cites W1605688901 @default.
- W2115465732 cites W1663973292 @default.
- W2115465732 cites W1783384641 @default.
- W2115465732 cites W1792865908 @default.
- W2115465732 cites W1926568554 @default.
- W2115465732 cites W1967408543 @default.
- W2115465732 cites W1970088130 @default.
- W2115465732 cites W1970350442 @default.
- W2115465732 cites W1973644476 @default.
- W2115465732 cites W1987749411 @default.
- W2115465732 cites W1992018127 @default.
- W2115465732 cites W1994803330 @default.
- W2115465732 cites W1995945562 @default.
- W2115465732 cites W1997803213 @default.
- W2115465732 cites W2001457899 @default.
- W2115465732 cites W2008708467 @default.
- W2115465732 cites W2008896880 @default.
- W2115465732 cites W2010068428 @default.
- W2115465732 cites W2018045523 @default.
- W2115465732 cites W2018049970 @default.
- W2115465732 cites W2031772330 @default.
- W2115465732 cites W2037036397 @default.
- W2115465732 cites W2037433020 @default.
- W2115465732 cites W2042614373 @default.
- W2115465732 cites W2043329626 @default.
- W2115465732 cites W2043699100 @default.
- W2115465732 cites W2045131140 @default.
- W2115465732 cites W2046618236 @default.
- W2115465732 cites W2047693963 @default.
- W2115465732 cites W2050721857 @default.
- W2115465732 cites W2053724458 @default.
- W2115465732 cites W2053906518 @default.
- W2115465732 cites W2060861141 @default.
- W2115465732 cites W2063007776 @default.
- W2115465732 cites W2065304353 @default.
- W2115465732 cites W2067571661 @default.
- W2115465732 cites W2071209325 @default.
- W2115465732 cites W2080182143 @default.
- W2115465732 cites W2081931663 @default.
- W2115465732 cites W2084619201 @default.
- W2115465732 cites W2087430029 @default.
- W2115465732 cites W2088216962 @default.
- W2115465732 cites W2094002366 @default.
- W2115465732 cites W2094148990 @default.
- W2115465732 cites W2096451472 @default.
- W2115465732 cites W2096495474 @default.
- W2115465732 cites W2097697746 @default.
- W2115465732 cites W2097698606 @default.
- W2115465732 cites W2100585269 @default.
- W2115465732 cites W2100827857 @default.
- W2115465732 cites W2103017472 @default.
- W2115465732 cites W2103538877 @default.
- W2115465732 cites W2103729016 @default.
- W2115465732 cites W2104315543 @default.
- W2115465732 cites W2105099387 @default.
- W2115465732 cites W2107340752 @default.
- W2115465732 cites W2108066538 @default.
- W2115465732 cites W2109597229 @default.
- W2115465732 cites W2110587743 @default.
- W2115465732 cites W2110625774 @default.
- W2115465732 cites W2111823238 @default.
- W2115465732 cites W2113019448 @default.
- W2115465732 cites W2113242816 @default.
- W2115465732 cites W2113647549 @default.
- W2115465732 cites W2113654464 @default.
- W2115465732 cites W2115629999 @default.
- W2115465732 cites W2117412805 @default.
- W2115465732 cites W2117569775 @default.
- W2115465732 cites W2120423098 @default.
- W2115465732 cites W2122892819 @default.
- W2115465732 cites W2122964432 @default.
- W2115465732 cites W2123280311 @default.
- W2115465732 cites W2123500593 @default.
- W2115465732 cites W2124868070 @default.
- W2115465732 cites W2125055259 @default.
- W2115465732 cites W2125179362 @default.