Matches in SemOpenAlex for { <https://semopenalex.org/work/W2951232102> ?p ?o ?g. }
- W2951232102 endingPage "411.e8" @default.
- W2951232102 startingPage "395" @default.
- W2951232102 abstract "•Latent spaces provide greater insight into biological systems than marker genes alone•scCoGAPS learns biologically meaningful latent spaces from sparse scRNA-Seq data•Transfer learning (TL) enables discovery across experimental systems and species•ProjectR is a TL framework to rapidly explore latent spaces across independent datasets Analysis of gene expression in single cells allows for decomposition of cellular states as low-dimensional latent spaces. However, the interpretation and validation of these spaces remains a challenge. Here, we present scCoGAPS, which defines latent spaces from a source single-cell RNA-sequencing (scRNA-seq) dataset, and projectR, which evaluates these latent spaces in independent target datasets via transfer learning. Application of developing mouse retina to scRNA-Seq reveals intrinsic relationships across biological contexts and assays while avoiding batch effects and other technical features. We compare the dimensions learned in this source dataset to adult mouse retina, a time-course of human retinal development, select scRNA-seq datasets from developing brain, chromatin accessibility data, and a murine-cell type atlas to identify shared biological features. These tools lay the groundwork for exploratory analysis of scRNA-seq data via latent space representations, enabling a shift in how we compare and identify cells beyond reliance on marker genes or ensemble molecular identity. Analysis of gene expression in single cells allows for decomposition of cellular states as low-dimensional latent spaces. However, the interpretation and validation of these spaces remains a challenge. Here, we present scCoGAPS, which defines latent spaces from a source single-cell RNA-sequencing (scRNA-seq) dataset, and projectR, which evaluates these latent spaces in independent target datasets via transfer learning. Application of developing mouse retina to scRNA-Seq reveals intrinsic relationships across biological contexts and assays while avoiding batch effects and other technical features. We compare the dimensions learned in this source dataset to adult mouse retina, a time-course of human retinal development, select scRNA-seq datasets from developing brain, chromatin accessibility data, and a murine-cell type atlas to identify shared biological features. These tools lay the groundwork for exploratory analysis of scRNA-seq data via latent space representations, enabling a shift in how we compare and identify cells beyond reliance on marker genes or ensemble molecular identity. The identity of an individual cell is determined by the combinatorial effects of diverse biological processes. Dimension reduction techniques deconvolve gene expression data into discrete latent spaces, which may correspond to biological and technical influences on the transcriptome (Brunet et al., 2004Brunet J.P. Tamayo P. Golub T.R. Mesirov J.P. Metagenes and molecular pattern discovery using matrix factorization.Proc. Natl. Acad. Sci. USA. 2004; 101: 4164-4169Crossref PubMed Scopus (1308) Google Scholar, Cleary et al., 2017Cleary B. Cong L. Cheung A. Lander E.S. Regev A. Efficient generation of transcriptomic profiles by random composite measurements.Cell. 2017; 171: 1424-1436Abstract Full Text Full Text PDF PubMed Scopus (52) Google Scholar, Kossenkov et al., 2007Kossenkov A.V. Peterson A.J. Ochs M.F. Determining transcription factor activity from microarray data using Bayesian Markov chain Monte Carlo sampling.Stud. Health Technol. Inform. 2007; 129: 1250-1254PubMed Google Scholar, Stein-O’Brien et al., 2018Stein-O’Brien G.L. Arora R. Culhane A.C. Favorov A.V. Garmire L.X. Greene C.S. Goff L.A. Li Y. Ngom A. Ochs M.F. et al.Enter the matrix: factorization uncovers knowledge from omics.Trends Genet. 2018; 34: 790-805Abstract Full Text Full Text PDF PubMed Scopus (83) Google Scholar, Wagner et al., 2016Wagner A. Regev A. Yosef N. Revealing the vectors of cellular identity with single-cell genomics.Nat. Biotechnol. 2016; 34: 1145-1160Crossref PubMed Scopus (326) Google Scholar, Zhu et al., 2017Zhu X. Ching T. Pan X. Weissman S.M. Garmire L. Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization.PeerJ. 2017; 5: e2888Crossref PubMed Scopus (44) Google Scholar). Latent space techniques are frequently used in the context of novel biological discovery from high-dimensional genomics datasets. Discovery requires evaluation of both the accuracy of the learned latent space and interpretation of biological processes from the low dimensional representation. Both of these tasks are challenging, if not entirely ineffective, using standard analytical methods, requiring biological validation to provide a gold standard (Cleary et al., 2017Cleary B. Cong L. Cheung A. Lander E.S. Regev A. Efficient generation of transcriptomic profiles by random composite measurements.Cell. 2017; 171: 1424-1436Abstract Full Text Full Text PDF PubMed Scopus (52) Google Scholar, Kiselev et al., 2019Kiselev V.Y. Andrews T.S. Hemberg M. Challenges in unsupervised clustering of single-cell RNA-seq data.Nat. Rev. Genet. 2019; 20: 273-282Crossref PubMed Scopus (405) Google Scholar, Stein-O’Brien et al., 2018Stein-O’Brien G.L. Arora R. Culhane A.C. Favorov A.V. Garmire L.X. Greene C.S. Goff L.A. Li Y. Ngom A. Ochs M.F. et al.Enter the matrix: factorization uncovers knowledge from omics.Trends Genet. 2018; 34: 790-805Abstract Full Text Full Text PDF PubMed Scopus (83) Google Scholar). However, in many applications, such a gold standard does not exist. Nonetheless, multiple datasets and measurement assays of the same biological system should reflect a similar set of biological processes. Furthermore, subsets of cellular features may further be preserved across experimental systems from related biological contexts. These properties can be utilized to improve selection, analysis, and interpretation of diverse biological systems by leveraging information learned from different data sources. Specifically, we propose that establishing the biological relevance of latent spaces requires a 3-fold approach to (1) learn gene-expression signatures associated with biological processes, (2) demonstrate their association with specific cellular features in the dataset from which they are inferred, and (3) test their robustness across related but diverse biological contexts. These latent spaces are best learned from single-cell measures instead of bulk measurements where learned latent spaces may reflect confounded features across cell types and states. The first two steps of this process are prevalent across single-cell RNA-sequencing (scRNA-seq) analyses, but the second often relies on heuristic analysis and expert curation (Zappia et al., 2018Zappia L. Phipson B. Oshlack A. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database.PLoS Comput. Biol. 2018; 14: e1006245Crossref PubMed Scopus (125) Google Scholar). Transfer-learning approaches can be used to perform the last two steps, thereby enabling in silico validation, interpretation, and exploration across diverse types of modern high-throughput biological data. The machine-learning subdomain of transfer learning exploits the fact that if two datasets share common latent spaces, a feature mapping between the two can identify and characterize relationships between the data defined by individual latent spaces (Pan et al., 2008Pan, S.J., Kwok, J.T., and Yang, Q. (2008). Transfer learning via dimensionality reduction. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence. 677–682.Google Scholar). In this framework, one dataset is the source in which the latent space representation is learned, and another is the target that is mapped into the latent spaces learned in the source. The distribution, domain, or feature space of the source and target data may differ (Pan et al., 2008Pan, S.J., Kwok, J.T., and Yang, Q. (2008). Transfer learning via dimensionality reduction. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence. 677–682.Google Scholar, Torrey and Shavlik, 2009Torrey L. Shavlik J. Transfer learning.in: Olivas E.S. Handbook of Research on Machine Learning Applications and Trends Algorithms, Methods, and Techniques. IGI Global, 2009: 242-264Google Scholar). Thus, transfer-learning techniques are ideally suited to assess shared latent spaces from one or more sources. Once the robustness of a biological process is established across systems, these approaches can also be applied to use these learned latent spaces to enable exploration of process use across data platforms, modalities, and studies. The established conservation of specific biological processes across systems, such as common developmental pathways across tissues or organisms, can be further leveraged to enable cross-study validation. In this case, the low-dimensional patterns learned from latent space techniques will be shared in samples with biologically meaningful relationships between datasets, while dataset-specific factors and technical artifacts across datasets will not. The challenge then arises in providing a computational tool to enable this in silico validation. We have adapted a transfer-learning approach for high-throughput genomic data analysis with two new methods, scCoGAPS and projectR. These tools provided a framework enabling the identification, evaluation, and exploration of latent-space features in both source and target datasets. To demonstrate this workflow across a variety of contexts, we apply these tools to a time course scRNA-seq dataset from murine retina development and demonstrate recovery of meaningful representations of biological features within individual latent spaces. Application of scCoGAPS identified gene-expression signatures of discrete cell types and biological processes associated with cell-cycle regulation, neurogenesis, and cell-fate specification. We empirically evaluate our transfer-learning approach across a diverse collection of single-cell datasets. In addition to performance assessment, these analyses also demonstrate a wide range of biological applications. We demonstrate how to classify learned cell types in a previously published adult retina scRNA-seq dataset via projectR projection (Macosko et al., 2015Macosko E.Z. Basu A. Satija R. Nemesh J. Shekhar K. Goldman M. Tirosh I. Bialas A.R. Kamitaki N. Martersteck E.M. et al.Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets.Cell. 2015; 161: 1202-1214Abstract Full Text Full Text PDF PubMed Scopus (3656) Google Scholar). We further illustrate how transfer learning can be used to extract meaningful biological insights across experimental modalities and species by projecting a bulk RNA sequencing (RNA-seq) human retinal development time course (Hoshino et al., 2017Hoshino A. Ratnapriya R. Brooks M.J. Chaitankar V. Wilken M.S. Zhang C. Starostik M.R. Gieser L. La Torre A. Nishio M. et al.Molecular anatomy of the developing human retina.Dev. Cell. 2017; 43: 763-779Abstract Full Text Full Text PDF PubMed Scopus (135) Google Scholar) and a mouse bulk Assay for Transposase-Accessible Chromatin for Sequencing (ATAC-Seq) dataset into the learned latent spaces from a developing mouse retina scRNA-seq dataset. To highlight the ability of projected patterns to recover related biological processes and cell types across developmentally related systems, we compare pattern usage between the developing mouse retina and two independent datasets derived from the developing cortex (Nowakowski et al., 2017Nowakowski T.J. Bhaduri A. Pollen A.A. Alvarado B. Mostajo-Radji M.A. Di Lullo E. Haeussler M. Sandoval-Espinosa C. Liu S.J. Velmeshev D. et al.Spatiotemporal gene expression trajectories reveal developmental hierarchies of the human cortex.Science. 2017; 358: 1318-1323Crossref PubMed Scopus (422) Google Scholar, Zhong et al., 2018Zhong S. Zhang S. Fan X. Wu Q. Yan L. Dong J. Zhang H. Li L. Sun L. Pan N. et al.A Single-Cell RNA-Seq Survey of the Developmental Landscape of the Human Prefrontal Cortex. Nature Publishing, 2018Crossref Scopus (323) Google Scholar) and another from the developing mouse midbrain (La Manno et al., 2016La Manno G. Gyllborg D. Codeluppi S. Nishimura K. Salto C. Zeisel A. Borm L.E. Stott S.R.W. Toledo E.M. Villaescusa J.C. et al.Molecular diversity of midbrain development in mouse, human, and stem cells.Cell. 2016; 167: 566-580Abstract Full Text Full Text PDF PubMed Scopus (417) Google Scholar). Finally, to examine the power of pattern exploration via transfer learning, we identify shared cellular features across a large collection of single cells from an atlas of mouse tissues (Tabula Muris Consortium et al., 2018Tabula Muris ConsortiumOverall coordinationLogistical coordinationOrgan collection and processingLibrary preparation and sequencingComputational data analysisCell type annotationWriting groupSupplemental text writing groupPrincipal investigatorsSingle-cell transcriptomics of 20 mouse organs creates a Tabula Muris.Nature. 2018; 562: 367-372Crossref PubMed Scopus (998) Google Scholar). In aggregate, these analyses highlight the diversity of potential applications for transfer-learning approaches to rapidly identify and describe related components between a source dataset, in this case derived from the developing mouse retina, and a variety of independent data sources using learned latent spaces. Using a collection of latent spaces, learned from a dataset of single-cell gene expression estimates, we demonstrate the utility of a combined reduced dimensional representation and transfer-learning approach to identify shared cellular attributes and biological processes across diverse data types in a manner that avoids the complications of normalization or sample alignment. Our approach is able to annotate latent spaces and reveal novel parallels between different tissues, molecular features, and species. Our approach demonstrates that projectR can rapidly transfer annotations, classify cells, and identify the use of biological processes without a priori knowledge or annotation within the source dataset. While we focus this application on low dimensional factors learned with scCoGAPS, projectR generalizes as an exploratory analysis and biological interpretation method for other dimension reduction techniques that find latent spaces associated with continuous gene weights. ScCoGAPS is a non-negative matrix factorization (NMF) algorithm. NMF algorithms factor a data matrix into two related matrices containing gene weights, the Amplitude (A) matrix, and sample weights, the Pattern (P) matrix (Figure 1A). Each column of A or row of P defines a factor, and together, these sets of factors define the latent spaces amongst genes and samples, respectively. Each sample-level relationship in a row of the pattern matrix is referred to as a pattern and the corresponding gene weights as an amplitude. In NMF, the values of the elements in the A and P matrices are required to be greater than or equal to zero. This constraint simultaneously reflects the non-negative nature of gene expression data and enforces additivity of factors, generating solutions that are biologically intuitive (Lee and Seung, 1999Lee D.D. Seung H.S. Learning the Parts of Objects by Non-Negative Matrix Factorization. Nature Publishing, 1999Crossref Scopus (9452) Google Scholar). The concept of up- or down-regulation reflects a relative difference between two conditions that can, and often is, described by comparing non-negative gene weights between patterns. Bayesian NMF techniques can embed biological and technical structure in the data in prior distributions on the A and P matrices (Kossenkov et al., 2007Kossenkov A.V. Peterson A.J. Ochs M.F. Determining transcription factor activity from microarray data using Bayesian Markov chain Monte Carlo sampling.Stud. Health Technol. Inform. 2007; 129: 1250-1254PubMed Google Scholar, Ochs and Fertig, 2012Ochs M.F. Fertig E.J. Matrix factorization for transcriptional regulatory network inference.IEEE Symp. Comput. Intell. Bioinforma. Comput. Biol. Proc. 2012; : 387-396Google Scholar). To accomplish this for bulk data, we previously developed the Bayesian NMF Coordinated Gene Activity in Pattern Sets (CoGAPS) method (Fertig et al., 2010Fertig E.J. Ding J. Favorov A.V. Parmigiani G. Ochs M.F. CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data.Bioinformatics. 2010; 26: 2792-2793Crossref PubMed Scopus (43) Google Scholar). CoGAPS uses an atomic prior (Sibisi and Skilling, 1996Sibisi S. Skilling J. Bayesian density estimation.in: Maximum Entropy and Bayesian Methods. SpringerLink, 1996: 189-198Crossref Google Scholar, Skilling and Sibisi, 1996Skilling J. Sibisi S. Priors on measures.in: Maximum Entropy and Bayesian Methods. SpringerLink, 1996: 261-270Crossref Google Scholar) to model three biological constraints: non-negativity reflective of pleiotropy, sparsity reflective of parsimony, and smoothness reflective of gene co-regulation and smooth dynamic transitions. The atomic prior in CoGAPS is unique in enforcing a sample- and gene-specific sparsity constraint, which we term “adaptive sparsity.” In the atomic prior, each element of the A and P matrices is either zero or follows a gamma distribution. Adaptive sparsity is achieved by placing a Poisson prior on the discrete shape parameter in the gamma distribution for each matrix element (αAi,j,αPi,j) and a fixed-scale parameter for all matrix elements (λA and λP) in A and P, respectively. Smaller values of αi,j result in smaller values of the corresponding matrix elements and vice versa for larger values. Thus, the sparsity constraint on values of latent factors will be relaxed in this model, constraining some matrix elements away from zero (Figure 1B). Adaptive sparsity can also model biological structure in the presence of the technical dropouts and true biological zeros in scRNA-seq. To accommodate the additional sparsity of scRNA-seq data, λA and λP are set as proportional to the mean of all non-zero values in the data. In contrast, λA and λP for bulk RNA-seq data are set using the means of the entire dataset. A normal prior on the data enables an empirical solution for the conditional distributions with this gamma prior, enabling efficient Gibbs sampling with this sparsity constraint (STAR Methods). This also models smoothness by grouping closely related dimensions near each other via move and exchange steps that shift a single exponential between adjacent matrix elements (Figure 1C). In practice, these steps retain the global Poisson prior on shape and the gamma prior on matrix elements while altering the shape parameters between adjacent matrix elements to model smoothness. Bayesian NMF algorithms such as CoGAPS have substantial computing costs that limit their application to the large datasets generated as tissue atlases with scRNA-seq data. As we describe in the STAR Methods, representing the gamma distribution as a sum of exponentials enables efficient Gibbs sampling. We couple this representation with new data structures for their storage and corresponding calculations that are more efficient than previous versions of CoGAPS and greatly reduce the computational cost for scRNA-seq analysis (Figure S1A). We can leverage our hypothesis that latent spaces learned from scRNA-seq data are reflective of relative gene use in biological processes to enhance the efficiency of Bayesian NMF methods. In this case, distinct subsets of cells sampled from the same condition will have similar factors in a latent space, similar to our previous observation of similar factors across distinct subsets of genes in bulk data (Stein-O’Brien et al., 2018Stein-O’Brien G.L. Arora R. Culhane A.C. Favorov A.V. Garmire L.X. Greene C.S. Goff L.A. Li Y. Ngom A. Ochs M.F. et al.Enter the matrix: factorization uncovers knowledge from omics.Trends Genet. 2018; 34: 790-805Abstract Full Text Full Text PDF PubMed Scopus (83) Google Scholar). Inference with Bayesian NMF is parallelized for distinct subsets of cells in the input scRNA-seq data. We selected the ratio of cells in each set to enable inference of latent space factors in highly skewed distributions of samples as can occur with rare cell types. As a result, this approach is a semi-supervised method in which inference of gene weights in factors is unsupervised. Consensus factors are then created across the sets as described previously for random sets of genes (Stein-O’Brien et al., 2018Stein-O’Brien G.L. Arora R. Culhane A.C. Favorov A.V. Garmire L.X. Greene C.S. Goff L.A. Li Y. Ngom A. Ochs M.F. et al.Enter the matrix: factorization uncovers knowledge from omics.Trends Genet. 2018; 34: 790-805Abstract Full Text Full Text PDF PubMed Scopus (83) Google Scholar). In addition to gaining efficiency, the factors estimated in parallel across subsets of cells can also be compared to enable cross-validation of the inferred latent spaces (Figure S1B). In our model, known and latent factors of a biological system can be used to compare independent, biologically related datasets. This comparison is made by defining a function from the factors in one dataset and projecting an independent, biologically related target dataset into a lower dimensional space that is common to both. Projection is defined as a mapping or transformation of points from one space to another, often a lower-dimensional space. Mathematically, this can be described as a function φ(x) = y: RD ↦ Rd s.t for d ≤ D, x ∈ RD, y ∈ Rd. The innovation of projectR is the use of a mapping function defined from the latent spaces in a source dataset, which enables the transfer of associated cellular phenotypes, annotations, and other metadata to samples in the target dataset (Figure 2). We propose that projection of well-defined latent spaces should capture shared biology across independent datasets. In this study, we perform projection in the column space defined by the amplitude matrix from scCoGAPS (factors representing gene weights). This is accomplished by estimating the patterns P associated with the amplitude matrix by a generalized least-squares fit to the target data (Fertig et al., 2013aFertig E.J. Markovic A. Danilova L.V. Gaykalova D.A. Cope L. Chung C.H. Ochs M.F. Califano J.A. Preferential activation of the hedgehog pathway by epigenetic modulations in HPV negative HNSCC identified with meta-pathway analysis.PLoS One. 2013; 8: e78127Crossref PubMed Scopus (27) Google Scholar) (STAR Methods). We select this projection approach as a computationally efficient method. Moreover, the lack of the orthogonality constraint allows for greater application of the transfer-learning approach to non-orthogonal latent spaces, allowing for greater independence of factor projections. Assuming that a given dimension is associated with a specific cellular attribute in the target dataset, the magnitude of the value in this source dataset can indicate its presence within the target dataset. Inversely, if the cellular feature is not shared across the datasets, then projection of the target data into the given latent space will have no significant value. The significance of each projected pattern can be calculated using a Wald test for each sample:latent space interaction. Depending on the distribution or number of the projected sample weights, statistical comparisons between annotated groups can be performed to quantify the presence of these inferred processes in the target data. For example, the mean projected pattern weight between two groups can be compared using standard t tests or regression-based contrasts. Additionally, classifiers can be built using the projected pattern weights, and the predictive value of each pattern assessed globally. This information transfer enables rapid and highly scalable comparison of very different datasets through the lens of a projected latent space learned in a reference dataset. This analysis can leverage the massive amount of publicly available data and their associated metadata to annotate phenotypes in source data more efficiently. Further, the ability to evaluate whether the processes described by latent spaces are shared, despite significant overall differences in the original high dimensional datasets, can enable hypothesis generation and integrated analyses. The developing mammalian retina provides an ideal model system to evaluate the degree to which latent spaces reflect known developmental biology. Features such as discrete cell-type signatures, continuous state transitions, signaling pathway usage, developmental age, and sex should each be represented in independent latent spaces. An open question in retinal development is how progenitor cells can generate specific subtypes of neuronal and glial cell types during specific intervals during development—a phenomenon known as progenitor competence (Bassett and Wallace, 2012Bassett E.A. Wallace V.A. Cell fate determination in the vertebrate retina.Trends Neurosci. 2012; 35: 565-573Abstract Full Text Full Text PDF PubMed Scopus (192) Google Scholar, Javed and Cayouette, 2017Javed A. Cayouette M. Temporal progression of retinal progenitor cell identity: implications in cell replacement therapies.Front. Neural Circuits. 2017; 11: 105Crossref PubMed Scopus (13) Google Scholar). In an effort to identify genes associated with changes in retinal progenitor cell (RPC) competence, we performed bulk RNA-seq analysis on replicate populations of fluorescence-activated cell sorting (FACS)-isolated RPCs and post-mitotic cells, which were isolated using the Chx10:GFP reporter (Rowan and Cepko, 2004Rowan S. Cepko C.L. Genetic analysis of the homeodomain transcription factor Chx10 in the retina using a novel multifunctional BAC transgenic mouse reporter.Dev. Biol. 2004; 271: 388-402Crossref PubMed Scopus (244) Google Scholar) and assessed the fidelity of patterns learned in this bulk analysis across other experimental contexts. FACS-sorted Chx10:GFP+ RPCs and Chx10:GFP- post-mitotic retinal neurons (Rowan and Cepko, 2004Rowan S. Cepko C.L. Genetic analysis of the homeodomain transcription factor Chx10 in the retina using a novel multifunctional BAC transgenic mouse reporter.Dev. Biol. 2004; 271: 388-402Crossref PubMed Scopus (244) Google Scholar) were collected from the developing mouse retina at three time points, embryonic day 14 (E14), embryonic day 18 (E18), and postnatal day 2 (P2), and subjected to standard bulk RNA sequencing (Zibetti et al., 2017Zibetti C. Liu S. Wan J. Qian J. Blackshaw S. Epigenomic profiling of retinal progenitors reveals LHX2 is required for developmental regulationof open chromatin.Commun. Biol. 2019; 2 (Published online April 25, 2019)Crossref PubMed Scopus (22) Google Scholar). We applied our previous genome-wide GWCoGAPS pipeline for bulk RNA-Seq to the normalized FPKM gene expression estimates to identify a latent space consisting of 10 patterns of co-regulated genes (Stein-O’Brien et al., 2017Stein-O’Brien G.L. Carey J.L. Lee W.S. Considine M. Favorov A.V. Flam E. Guo T. Li S. Marchionni L. Sherman T. et al.PatternMarkers & GWCoGAPS for novel data-driven biomarkers via whole transcriptome NMF.Bioinformatics. 2017; 33: 1892-1894Crossref PubMed Scopus (15) Google Scholar). Dimensionality can be optimized by maximizing the robustness of patterns between dimensions (Moloshok et al., 2002Moloshok T.D. Klevecz R.R. Grant J.D. Manion F.J. Speier 4nd, W.F. Ochs M.F. Application of Bayesian decomposition for analysing microarray data.Bioinformatics. 2002; 18: 566-575Crossref PubMed Scopus (61) Google Scholar). Moreover, hierarchies of cell types or subtypes can be resolved by comparing patterns across dimensions (Fertig et al., 2013aFertig E.J. Markovic A. Danilova L.V. Gaykalova D.A. Cope L. Chung C.H. Ochs M.F. Califano J.A. Preferential activation of the hedgehog pathway by epigenetic modulations in HPV negative HNSCC identified with meta-pathway analysis.PLoS One. 2013; 8: e78127Crossref PubMed Scopus (27) Google Scholar). Therefore, we applied GWCoGAPS to the bulk data using a range of dimensionalizations to identify patterns associated with specific biological features or cellular states. Final dimensionality was assessed by comparing factorizations of different dimensions using the ClutrFree (Bidaut and Ochs, 2004Bidaut G. Ochs M.F. ClutrFree: cluster tree visualization and interpretation.Bioinformatics. 2004; 20: 2869-2871Crossref PubMed Scopus (17) Google Scholar) algorithm (STAR Methods). Patterns were strongly correlated (r2 > 0.7) between factorizations at different dimensions, indicating the overall robustness of the factors across dimensions (Figure S1C). For example, a pattern broadly associated with all retinal neurons at a lower dimensionality split into two patterns describing photoreceptors and inner retinal cells at a higher dimensionality, as assessed by correlation of cell-type specific marker-gene expression with individual patterns. We next evaluated whether patterns identified from bulk RNA-seq could describe discrete cell-type signatures obtained from a comprehensive scRNA-seq dataset conducted across retinal development (Clark et al., 2019Clark B. Stein-O’Brien G. Shiau F. Cannon G. Davis E. Sherman T. Rajaii F. James-Esposito R. Gronostajski R. Fertig E. et al.Single cell RNA-Seq analysis of retinal development identifies NFI factors as regulating mitotic exit and late-born cell specification.Neuron. 2019; 102Abstract Full Text Full Text PDF Scopus (169) Google Scholar). In this study, we isolated 120,804 individual cells from whole mouse retina at 10 developmental time points, ranging from embryonic day 11 (E11) to postnatal day 14 (P14). scRNA-seq gene expression profiles were obtained using the 10× Genomics Chromium platform (Clark et al., 2019Clark B. Stein-O’Brien G. Shiau F. Cann" @default.
- W2951232102 created "2019-06-27" @default.
- W2951232102 creator A5001916762 @default.
- W2951232102 creator A5005509338 @default.
- W2951232102 creator A5007149840 @default.
- W2951232102 creator A5027701541 @default.
- W2951232102 creator A5048435659 @default.
- W2951232102 creator A5051919465 @default.
- W2951232102 creator A5057519297 @default.
- W2951232102 creator A5066532319 @default.
- W2951232102 creator A5069366529 @default.
- W2951232102 creator A5077721872 @default.
- W2951232102 creator A5080758121 @default.
- W2951232102 creator A5084004045 @default.
- W2951232102 date "2019-05-01" @default.
- W2951232102 modified "2023-10-14" @default.
- W2951232102 title "Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species" @default.
- W2951232102 cites W1964602001 @default.
- W2951232102 cites W1969415786 @default.
- W2951232102 cites W1974944966 @default.
- W2951232102 cites W1987368870 @default.
- W2951232102 cites W2001895044 @default.
- W2951232102 cites W2017844868 @default.
- W2951232102 cites W2027555661 @default.
- W2951232102 cites W2027753607 @default.
- W2951232102 cites W2033585891 @default.
- W2951232102 cites W2036099373 @default.
- W2951232102 cites W2039033307 @default.
- W2951232102 cites W2063721381 @default.
- W2951232102 cites W2065594360 @default.
- W2951232102 cites W2082253757 @default.
- W2951232102 cites W2101502746 @default.
- W2951232102 cites W2102212449 @default.
- W2951232102 cites W2102278945 @default.
- W2951232102 cites W2102619694 @default.
- W2951232102 cites W2103431543 @default.
- W2951232102 cites W2108234281 @default.
- W2951232102 cites W2117757143 @default.
- W2951232102 cites W2122268695 @default.
- W2951232102 cites W2130410032 @default.
- W2951232102 cites W2136787567 @default.
- W2951232102 cites W2138207763 @default.
- W2951232102 cites W2140117251 @default.
- W2951232102 cites W2145126338 @default.
- W2951232102 cites W2146512944 @default.
- W2951232102 cites W2157582398 @default.
- W2951232102 cites W2157710407 @default.
- W2951232102 cites W2166820820 @default.
- W2951232102 cites W2170551349 @default.
- W2951232102 cites W2297381334 @default.
- W2951232102 cites W2528543174 @default.
- W2951232102 cites W2555892463 @default.
- W2951232102 cites W2586464090 @default.
- W2951232102 cites W2612649098 @default.
- W2951232102 cites W2615767183 @default.
- W2951232102 cites W2622807556 @default.
- W2951232102 cites W2624831627 @default.
- W2951232102 cites W2746514073 @default.
- W2951232102 cites W2766959028 @default.
- W2951232102 cites W2768455369 @default.
- W2951232102 cites W2772624269 @default.
- W2951232102 cites W2775484513 @default.
- W2951232102 cites W2777506565 @default.
- W2951232102 cites W2796170779 @default.
- W2951232102 cites W2800392236 @default.
- W2951232102 cites W2800980964 @default.
- W2951232102 cites W2809365568 @default.
- W2951232102 cites W2894687190 @default.
- W2951232102 cites W2902652978 @default.
- W2951232102 cites W2907783748 @default.
- W2951232102 cites W2941003117 @default.
- W2951232102 cites W2946389085 @default.
- W2951232102 cites W2949829455 @default.
- W2951232102 cites W2950245999 @default.
- W2951232102 cites W2951381561 @default.
- W2951232102 cites W2951638683 @default.
- W2951232102 doi "https://doi.org/10.1016/j.cels.2019.04.004" @default.
- W2951232102 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/6588402" @default.
- W2951232102 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/33600760" @default.
- W2951232102 hasPublicationYear "2019" @default.
- W2951232102 type Work @default.
- W2951232102 sameAs 2951232102 @default.
- W2951232102 citedByCount "116" @default.
- W2951232102 countsByYear W29512321022019 @default.
- W2951232102 countsByYear W29512321022020 @default.
- W2951232102 countsByYear W29512321022021 @default.
- W2951232102 countsByYear W29512321022022 @default.
- W2951232102 countsByYear W29512321022023 @default.
- W2951232102 crossrefType "journal-article" @default.
- W2951232102 hasAuthorship W2951232102A5001916762 @default.
- W2951232102 hasAuthorship W2951232102A5005509338 @default.
- W2951232102 hasAuthorship W2951232102A5007149840 @default.
- W2951232102 hasAuthorship W2951232102A5027701541 @default.
- W2951232102 hasAuthorship W2951232102A5048435659 @default.
- W2951232102 hasAuthorship W2951232102A5051919465 @default.
- W2951232102 hasAuthorship W2951232102A5057519297 @default.
- W2951232102 hasAuthorship W2951232102A5066532319 @default.
- W2951232102 hasAuthorship W2951232102A5069366529 @default.