Matches in SemOpenAlex for { <https://semopenalex.org/work/W2117863384> ?p ?o ?g. }
- W2117863384 endingPage "1914" @default.
- W2117863384 startingPage "1905" @default.
- W2117863384 abstract "Gene expression profiling technologies can generally produce mRNA abundance data for all genes in a genome. A dearth of proteomic data persists because identification range and sensitivity of proteomic measurements lag behind those of transcriptomic measurements. Using partial proteomic data, it is likely that integrative transcriptomic and proteomic analysis may introduce significant bias. Developing methodologies to accurately estimate missing proteomic data will allow better integration of transcriptomic and proteomic datasets and provide deeper insight into metabolic mechanisms underlying complex biological systems.In this study, we present a non-linear data-driven model to predict abundance for undetected proteins using two independent datasets of cognate transcriptomic and proteomic data collected from Desulfovibrio vulgaris. We use stochastic gradient boosted trees (GBT) to uncover possible non-linear relationships between transcriptomic and proteomic data, and to predict protein abundance for the proteins not experimentally detected based on relevant predictors such as mRNA abundance, cellular role, molecular weight, sequence length, protein length, guanine-cytosine (GC) content and triple codon counts. Initially, we constructed a GBT model using all possible variables to assess their relative importance and characterize the behavior of the predictive model. A strong plateau effect in the regions of high mRNA values and sparse data occurred in this model. Hence, we removed genes in those areas based on thresholds estimated from the partial dependency plots where this behavior was captured. At this stage, only the strongest predictors of protein abundance were retained to reduce the complexity of the GBT model. After removing genes in the plateau region, mRNA abundance, main cellular functional categories and few triple codon counts emerged as the top-ranked predictors of protein abundance. We then created a new tuned GBT model using the five most significant predictors. The construction of our non-linear model consists of a set of serial regression trees models with implicit strength in variable selection. The model provides variable relative importance measures using as a criterion mean square error. The results showed that coefficients of determination for our nonlinear models ranged from 0.393 to 0.582 in both datasets, providing better results than linear regression used in the past. We evaluated the validity of this non-linear model using biological information of operons, regulons and pathways, and the results demonstrated that the coefficients of variation of estimated protein abundance values within operons, regulons or pathways are indeed smaller than those for random groups of proteins.Supplementary data are available at Bioinformatics online." @default.
- W2117863384 created "2016-06-24" @default.
- W2117863384 creator A5002544391 @default.
- W2117863384 creator A5007118532 @default.
- W2117863384 creator A5035916673 @default.
- W2117863384 creator A5070564774 @default.
- W2117863384 creator A5088656458 @default.
- W2117863384 date "2009-05-15" @default.
- W2117863384 modified "2023-10-10" @default.
- W2117863384 title "Integrative analysis of transcriptomic and proteomic data of <i>Desulfovibrio vulgaris</i>: a non-linear model to predict abundance of undetected proteins" @default.
- W2117863384 cites W1519199376 @default.
- W2117863384 cites W1646666744 @default.
- W2117863384 cites W1678356000 @default.
- W2117863384 cites W1805258884 @default.
- W2117863384 cites W1976317980 @default.
- W2117863384 cites W1991537543 @default.
- W2117863384 cites W1994133913 @default.
- W2117863384 cites W1994662308 @default.
- W2117863384 cites W2008883486 @default.
- W2117863384 cites W2013378651 @default.
- W2117863384 cites W2014481298 @default.
- W2117863384 cites W2045718560 @default.
- W2117863384 cites W2067642105 @default.
- W2117863384 cites W2070493638 @default.
- W2117863384 cites W2079221894 @default.
- W2117863384 cites W2082516237 @default.
- W2117863384 cites W2089492568 @default.
- W2117863384 cites W2093772131 @default.
- W2117863384 cites W2100762672 @default.
- W2117863384 cites W2101345690 @default.
- W2117863384 cites W2117212863 @default.
- W2117863384 cites W2120055794 @default.
- W2117863384 cites W2124681451 @default.
- W2117863384 cites W2135695572 @default.
- W2117863384 cites W2144342921 @default.
- W2117863384 cites W2149856760 @default.
- W2117863384 cites W2152353895 @default.
- W2117863384 cites W2161723277 @default.
- W2117863384 cites W2164954378 @default.
- W2117863384 cites W2168846334 @default.
- W2117863384 cites W2170076251 @default.
- W2117863384 cites W2170949754 @default.
- W2117863384 cites W2173011457 @default.
- W2117863384 cites W2177299793 @default.
- W2117863384 cites W3104595043 @default.
- W2117863384 doi "https://doi.org/10.1093/bioinformatics/btp325" @default.
- W2117863384 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/2712339" @default.
- W2117863384 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/19447782" @default.
- W2117863384 hasPublicationYear "2009" @default.
- W2117863384 type Work @default.
- W2117863384 sameAs 2117863384 @default.
- W2117863384 citedByCount "33" @default.
- W2117863384 countsByYear W21178633842012 @default.
- W2117863384 countsByYear W21178633842013 @default.
- W2117863384 countsByYear W21178633842014 @default.
- W2117863384 countsByYear W21178633842016 @default.
- W2117863384 countsByYear W21178633842017 @default.
- W2117863384 countsByYear W21178633842019 @default.
- W2117863384 countsByYear W21178633842020 @default.
- W2117863384 countsByYear W21178633842021 @default.
- W2117863384 countsByYear W21178633842022 @default.
- W2117863384 countsByYear W21178633842023 @default.
- W2117863384 crossrefType "journal-article" @default.
- W2117863384 hasAuthorship W2117863384A5002544391 @default.
- W2117863384 hasAuthorship W2117863384A5007118532 @default.
- W2117863384 hasAuthorship W2117863384A5035916673 @default.
- W2117863384 hasAuthorship W2117863384A5070564774 @default.
- W2117863384 hasAuthorship W2117863384A5088656458 @default.
- W2117863384 hasBestOaLocation W21178633841 @default.
- W2117863384 hasConcept C104317684 @default.
- W2117863384 hasConcept C104397665 @default.
- W2117863384 hasConcept C150194340 @default.
- W2117863384 hasConcept C162317418 @default.
- W2117863384 hasConcept C18431079 @default.
- W2117863384 hasConcept C46111723 @default.
- W2117863384 hasConcept C54355233 @default.
- W2117863384 hasConcept C70721500 @default.
- W2117863384 hasConcept C86803240 @default.
- W2117863384 hasConceptScore W2117863384C104317684 @default.
- W2117863384 hasConceptScore W2117863384C104397665 @default.
- W2117863384 hasConceptScore W2117863384C150194340 @default.
- W2117863384 hasConceptScore W2117863384C162317418 @default.
- W2117863384 hasConceptScore W2117863384C18431079 @default.
- W2117863384 hasConceptScore W2117863384C46111723 @default.
- W2117863384 hasConceptScore W2117863384C54355233 @default.
- W2117863384 hasConceptScore W2117863384C70721500 @default.
- W2117863384 hasConceptScore W2117863384C86803240 @default.
- W2117863384 hasIssue "15" @default.
- W2117863384 hasLocation W21178633841 @default.
- W2117863384 hasLocation W21178633842 @default.
- W2117863384 hasLocation W21178633843 @default.
- W2117863384 hasLocation W21178633844 @default.
- W2117863384 hasOpenAccess W2117863384 @default.
- W2117863384 hasPrimaryLocation W21178633841 @default.
- W2117863384 hasRelatedWork W1881709772 @default.
- W2117863384 hasRelatedWork W1971124177 @default.
- W2117863384 hasRelatedWork W2002649322 @default.
- W2117863384 hasRelatedWork W2045448027 @default.