Matches in SemOpenAlex for { <https://semopenalex.org/work/W2949457892> ?p ?o ?g. }
- W2949457892 startingPage "671263" @default.
- W2949457892 abstract "Abstract Motivation Each organism contains genes with no protein homolog in other species (“orphan genes”). Some of these have arisen de novo from non-genic material, while others may be the result of ultra-rapid mutation of existing genes. The challenges of identifying orphan genes and predicting their functions are immense, resulting in under-appreciation of their importance. The yeast genome expresses thousands of transcripts, many that contain ORFs that are translated, that are not annotated as genes. Here, we apply computational approaches to re-cycle and re-evaluate massive raw public RNA-Seq data to identify those ORFs that are the best candidates to represent orphan genes. Results We created a pooled, aggregated RNA-Seq dataset from the raw reads and metadata of over 3,400 RNA-seq samples from 172 studies in the NCBI-Sequence Read Archives (SRA) database (Leinonen et al., 2011), and realigned these reads to a transcriptome consisting of the Saccharomyce Genome Database ((Cherry et al., 1998), SGD)-annotated genes and 29,354 unannotated ORFs of the Saccharomyces cerevisiae genome. Phylostratigraphy analysis of the predicted proteins from the 29,354 non-annotated open reading frames (ORFs) in the S. cerevisiae genome inferred: 15,806 are orphans (“orphan-ORFs”), 11,942 are genus-specific, and 1,606 are more highly conserved. These RNA-Seq data reveal over 150 of transcripts containing orphan encoding-ORFs with mean levels of expression across all samples comparable to half of annotated non-orphan genes. Most orphan-encoding ORFs are highly expressed only under limited conditions. We built a co-expression matrix from the transcription dataset, and optimized partitioning by Markov Chain Clustering. The MCL clustering result is significant different from random clusters based on GO enrichment analysis to show the biological significance. Over 3,000 significant GO terms (p-value 0.8) to annotated genes. For example, cluster 112 is composed of seripauperin genes, and smORF247301 is correlated to YPL223C with a 0.95 Pearson correlation. We provide the results of the optimized aggregate-data analysis in a tool that can be used for powerful statistical analysis and visualization of specific transcripts under user-selected conditions. This approach maximizes a user‘s ability to view potential interactions across experimental perturbations, and provides a rich context for experimental biologists to make novel, experimentally-testable hypotheses as to potential functions of as yet unannotated transcripts. Contact evewurtele@gmail.com" @default.
- W2949457892 created "2019-06-27" @default.
- W2949457892 creator A5011162092 @default.
- W2949457892 creator A5012677271 @default.
- W2949457892 creator A5020167764 @default.
- W2949457892 creator A5085417026 @default.
- W2949457892 date "2019-06-21" @default.
- W2949457892 modified "2023-09-27" @default.
- W2949457892 title "Recycling RNA-Seq Data to Identify Candidate Orphan Genes for Experimental Analysis" @default.
- W2949457892 cites W1561682452 @default.
- W2949457892 cites W1570599752 @default.
- W2949457892 cites W1907486465 @default.
- W2949457892 cites W1963604994 @default.
- W2949457892 cites W1965346152 @default.
- W2949457892 cites W1968185051 @default.
- W2949457892 cites W1973133163 @default.
- W2949457892 cites W1996423252 @default.
- W2949457892 cites W2000629769 @default.
- W2949457892 cites W2003861150 @default.
- W2949457892 cites W2009415656 @default.
- W2949457892 cites W2010074168 @default.
- W2949457892 cites W2015726760 @default.
- W2949457892 cites W2035618305 @default.
- W2949457892 cites W2040415319 @default.
- W2949457892 cites W2046085083 @default.
- W2949457892 cites W2047405351 @default.
- W2949457892 cites W2051418786 @default.
- W2949457892 cites W2057844822 @default.
- W2949457892 cites W2059223767 @default.
- W2949457892 cites W2060696386 @default.
- W2949457892 cites W2078266900 @default.
- W2949457892 cites W2079588819 @default.
- W2949457892 cites W2083711705 @default.
- W2949457892 cites W2085567058 @default.
- W2949457892 cites W2087923822 @default.
- W2949457892 cites W2088201027 @default.
- W2949457892 cites W2092207442 @default.
- W2949457892 cites W2092987085 @default.
- W2949457892 cites W2094238140 @default.
- W2949457892 cites W2102278945 @default.
- W2949457892 cites W2102619694 @default.
- W2949457892 cites W2103453943 @default.
- W2949457892 cites W2108331169 @default.
- W2949457892 cites W2108552532 @default.
- W2949457892 cites W2109636763 @default.
- W2949457892 cites W2114104545 @default.
- W2949457892 cites W2116041602 @default.
- W2949457892 cites W2121350645 @default.
- W2949457892 cites W2121564430 @default.
- W2949457892 cites W2122064342 @default.
- W2949457892 cites W2122406433 @default.
- W2949457892 cites W2122708083 @default.
- W2949457892 cites W2123879569 @default.
- W2949457892 cites W2127673162 @default.
- W2949457892 cites W2128643990 @default.
- W2949457892 cites W2129883957 @default.
- W2949457892 cites W2131933020 @default.
- W2949457892 cites W2144909274 @default.
- W2949457892 cites W2146729343 @default.
- W2949457892 cites W2150926065 @default.
- W2949457892 cites W2151936673 @default.
- W2949457892 cites W2162891889 @default.
- W2949457892 cites W2165909549 @default.
- W2949457892 cites W2166336766 @default.
- W2949457892 cites W2168292723 @default.
- W2949457892 cites W2169777893 @default.
- W2949457892 cites W2189876713 @default.
- W2949457892 cites W2205343347 @default.
- W2949457892 cites W2216234219 @default.
- W2949457892 cites W2259938310 @default.
- W2949457892 cites W2270221062 @default.
- W2949457892 cites W2296941811 @default.
- W2949457892 cites W2410574804 @default.
- W2949457892 cites W2416395093 @default.
- W2949457892 cites W2479687413 @default.
- W2949457892 cites W2534128196 @default.
- W2949457892 cites W2596044508 @default.
- W2949457892 cites W2601961903 @default.
- W2949457892 cites W2605611918 @default.
- W2949457892 cites W2609769603 @default.
- W2949457892 cites W2662175080 @default.
- W2949457892 cites W2739060139 @default.
- W2949457892 cites W2750286977 @default.
- W2949457892 cites W2753149515 @default.
- W2949457892 cites W2765725484 @default.
- W2949457892 cites W2767316502 @default.
- W2949457892 cites W2770089238 @default.
- W2949457892 cites W2774045279 @default.
- W2949457892 cites W2793222368 @default.
- W2949457892 cites W2807075354 @default.
- W2949457892 cites W2886714415 @default.
- W2949457892 cites W2890489399 @default.
- W2949457892 cites W2903241977 @default.
- W2949457892 cites W2907129715 @default.
- W2949457892 cites W2917576966 @default.
- W2949457892 cites W2921128410 @default.
- W2949457892 cites W2945066961 @default.
- W2949457892 cites W2945570991 @default.
- W2949457892 cites W2953269056 @default.