Matches in SemOpenAlex for { <https://semopenalex.org/work/W4311756026> ?p ?o ?g. }
- W4311756026 endingPage "100658" @default.
- W4311756026 startingPage "100658" @default.
- W4311756026 abstract "•Unsupervised MEGMA and supervised AggMapNet pipeline for host phenotype prediction•Aggregating correlated microbes for signal amplification through manifold embedding•Grouping-based multichannel operations boost the model performance significantly•Identifying key microbial markers through global feature importance and saliency map Machine learning methods have been practically employed for metagenomic phenotype prediction and biomarker discovery in clinical diagnostic investigations. Deep learning methods have also been explored as potential tools, but their practical applications are hindered by high dimensionality and low sample size. In this paper, we developed the unsupervised microbial embedding, grouping, and mapping algorithm (MEGMA) to enhance the downstream tasks of disease prediction and key biomarker recognition. Our study suggests that, through MEGMA unsupervised learning, structured multichannel and signal-amplified metagenomic feature maps can be constructed to enhance downstream supervised tasks of disease prediction and key biomarker recognition. Metagenomic analysis has been explored for disease diagnosis and biomarker discovery. Low sample sizes, high dimensionality, and sparsity of metagenomic data challenge metagenomic investigations. Here, an unsupervised microbial embedding, grouping, and mapping algorithm (MEGMA) was developed to transform metagenomic data into individualized multichannel microbiome 2D representation by manifold learning and clustering of microbial profiles (e.g., composition, abundance, hierarchy, and taxonomy). These 2D representations enable enhanced disease prediction by established ConvNet-based AggMapNet models, outperforming the commonly used machine learning and deep learning models in metagenomic benchmark datasets. These 2D representations combined with AggMapNet explainable module robustly identified more reliable and replicable disease-prediction microbes (biomarkers). Employing the MEGMA-AggMapNet pipeline for biomarker identification from 5 disease datasets, 84% of the identified biomarkers have been described in over 74 distinct works as important for these diseases. Moreover, the method also discovered highly consistent sets of biomarkers in cross-cohort colorectal cancer (CRC) patients and microbial shifts in different CRC stages. Metagenomic analysis has been explored for disease diagnosis and biomarker discovery. Low sample sizes, high dimensionality, and sparsity of metagenomic data challenge metagenomic investigations. Here, an unsupervised microbial embedding, grouping, and mapping algorithm (MEGMA) was developed to transform metagenomic data into individualized multichannel microbiome 2D representation by manifold learning and clustering of microbial profiles (e.g., composition, abundance, hierarchy, and taxonomy). These 2D representations enable enhanced disease prediction by established ConvNet-based AggMapNet models, outperforming the commonly used machine learning and deep learning models in metagenomic benchmark datasets. These 2D representations combined with AggMapNet explainable module robustly identified more reliable and replicable disease-prediction microbes (biomarkers). Employing the MEGMA-AggMapNet pipeline for biomarker identification from 5 disease datasets, 84% of the identified biomarkers have been described in over 74 distinct works as important for these diseases. Moreover, the method also discovered highly consistent sets of biomarkers in cross-cohort colorectal cancer (CRC) patients and microbial shifts in different CRC stages. Metagenomic analysis has been explored for non-invasive diagnosis and biomarker discovery.1Cho I. Blaser M.J. The human microbiome: at the interface of health and disease.Nat. Rev. Genet. 2012; 13: 260-270https://doi.org/10.1038/nrg3182Crossref PubMed Scopus (2209) Google Scholar,2Tjalsma H. Boleij A. Marchesi J.R. Dutilh B.E. A bacterial driver-passenger model for colorectal cancer: beyond the usual suspects.Nat. Rev. Microbiol. 2012; 10: 575-582https://doi.org/10.1038/nrmicro2819Crossref PubMed Scopus (576) Google Scholar Machine learning (ML) and deep learning (DL) methods facilitate metagenomics-based disease prediction and the discovery of consistent, replicable, and cross-cohort microbial biomarkers.3Wirbel J. Pyl P.T. Kartal E. Zych K. Kashani A. Milanese A. Fleck J.S. Voigt A.Y. Palleja A. Ponnudurai R. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer.Nat. Med. 2019; 25: 679-689https://doi.org/10.1038/s41591-019-0406-6Crossref PubMed Scopus (485) Google Scholar,4Yachida S. Mizutani S. Shiroma H. Shiba S. Nakajima T. Sakamoto T. Watanabe H. Masuda K. Nishimoto Y. Kubo M. et al.Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer.Nat. Med. 2019; 25: 968-976https://doi.org/10.1038/s41591-019-0458-7Crossref PubMed Scopus (536) Google Scholar,5Pasolli E. Truong D.T. Malik F. Waldron L. Segata N. Machine learning meta-analysis of large metagenomic datasets: tools and biological insights.PLoS Comput. Biol. 2016; 12: e1004977https://doi.org/10.1371/journal.pcbi.1004977Crossref PubMed Scopus (299) Google Scholar,6Oudah M. Henschel A. Taxonomy-aware feature engineering for microbiome classification.BMC Bioinf. 2018; 19: 227https://doi.org/10.1186/s12859-018-2205-3Crossref PubMed Scopus (45) Google Scholar,7Fioravanti D. Giarratano Y. Maggio V. Agostinelli C. Chierici M. Jurman G. Furlanello C. Phylogenetic convolutional neural networks in metagenomics.BMC Bioinf. 2018; 19: 49https://doi.org/10.1186/s12859-018-2033-5Crossref PubMed Scopus (53) Google Scholar,8Reiman D. Metwally A.A. Sun J. Dai Y. PopPhy-CNN: a phylogenetic tree embedded architecture for convolutional neural networks to predict host phenotype from metagenomic data.IEEE J. Biomed. Health Inf. 2020; 24: 2993-3001https://doi.org/10.1109/JBHI.2020.2993761Crossref PubMed Scopus (28) Google Scholar,9Reiman D. Metwally A.A. Sun J. Dai Y. Meta-signer: metagenomic signature identifier based on rank aggregation of features.bioRxiv. 2020; (Preprint at)https://doi.org/10.1101/2020.05.09.085993Crossref Scopus (0) Google Scholar However, metagenomic data of individual clinical investigations are typical of low sample sizes (dozens-to-hundreds of samples),3Wirbel J. Pyl P.T. Kartal E. Zych K. Kashani A. Milanese A. Fleck J.S. Voigt A.Y. Palleja A. Ponnudurai R. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer.Nat. Med. 2019; 25: 679-689https://doi.org/10.1038/s41591-019-0406-6Crossref PubMed Scopus (485) Google Scholar,4Yachida S. Mizutani S. Shiroma H. Shiba S. Nakajima T. Sakamoto T. Watanabe H. Masuda K. Nishimoto Y. Kubo M. et al.Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer.Nat. Med. 2019; 25: 968-976https://doi.org/10.1038/s41591-019-0458-7Crossref PubMed Scopus (536) Google Scholar,10Li H. Microbiome, metagenomics, and high-dimensional compositional data analysis.Annu. Rev. Stat. Appl. 2015; 2: 73-94https://doi.org/10.1146/annurev-statistics-010814-020351Crossref Scopus (153) Google Scholar high dimensionality (hundreds-to-thousands of microbes),3Wirbel J. Pyl P.T. Kartal E. Zych K. Kashani A. Milanese A. Fleck J.S. Voigt A.Y. Palleja A. Ponnudurai R. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer.Nat. Med. 2019; 25: 679-689https://doi.org/10.1038/s41591-019-0406-6Crossref PubMed Scopus (485) Google Scholar,4Yachida S. Mizutani S. Shiroma H. Shiba S. Nakajima T. Sakamoto T. Watanabe H. Masuda K. Nishimoto Y. Kubo M. et al.Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer.Nat. Med. 2019; 25: 968-976https://doi.org/10.1038/s41591-019-0458-7Crossref PubMed Scopus (536) Google Scholar,10Li H. Microbiome, metagenomics, and high-dimensional compositional data analysis.Annu. Rev. Stat. Appl. 2015; 2: 73-94https://doi.org/10.1146/annurev-statistics-010814-020351Crossref Scopus (153) Google Scholar sparsity (sparsely distributed across taxonomic hierarchies), and high variations (biological and environmental).11Vujkovic-Cvijin I. Sklar J. Jiang L. Natarajan L. Knight R. Belkaid Y. Host variables confound gut microbiota studies of human disease.Nature. 2020; 587: 448-454https://doi.org/10.1038/s41586-020-2881-9Crossref PubMed Scopus (226) Google Scholar These problems confound statistical inference and learning outcomes to random chances and false discoveries12Teschendorff A.E. Avoiding common pitfalls in machine learning omic data science.Nat. Mater. 2019; 18: 422-427https://doi.org/10.1038/s41563-018-0241-zCrossref PubMed Scopus (54) Google Scholar and mask the identification of genuine biomarkers.12Teschendorff A.E. Avoiding common pitfalls in machine learning omic data science.Nat. Mater. 2019; 18: 422-427https://doi.org/10.1038/s41563-018-0241-zCrossref PubMed Scopus (54) Google Scholar,13Knights D. Parfrey L.W. Zaneveld J. Lozupone C. Knight R. Human-associated microbial signatures: examining their predictive value.Cell Host Microbe. 2011; 10: 292-296https://doi.org/10.1016/j.chom.2011.09.003Abstract Full Text Full Text PDF PubMed Scopus (99) Google Scholar DL outcomes are difficult to interpret, particularly in microbiome-wide association studies.9Reiman D. Metwally A.A. Sun J. Dai Y. Meta-signer: metagenomic signature identifier based on rank aggregation of features.bioRxiv. 2020; (Preprint at)https://doi.org/10.1101/2020.05.09.085993Crossref Scopus (0) Google Scholar,14Cammarota G. Ianiro G. Ahern A. Carbone C. Temko A. Claesson M.J. Gasbarrini A. Tortora G. Gut microbiome, big data and machine learning to promote precision medicine for cancer.Nat. Rev. Gastroenterol. Hepatol. 2020; 17: 635-648https://doi.org/10.1038/s41575-020-0327-3Crossref PubMed Scopus (121) Google Scholar Instead of the end-to-end DL methods, ML methods with feature selection strategy have been practically used for metagenomic investigations of low sample sizes.3Wirbel J. Pyl P.T. Kartal E. Zych K. Kashani A. Milanese A. Fleck J.S. Voigt A.Y. Palleja A. Ponnudurai R. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer.Nat. Med. 2019; 25: 679-689https://doi.org/10.1038/s41591-019-0406-6Crossref PubMed Scopus (485) Google Scholar,4Yachida S. Mizutani S. Shiroma H. Shiba S. Nakajima T. Sakamoto T. Watanabe H. Masuda K. Nishimoto Y. Kubo M. et al.Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer.Nat. Med. 2019; 25: 968-976https://doi.org/10.1038/s41591-019-0458-7Crossref PubMed Scopus (536) Google Scholar,15Prifti E. Chevaleyre Y. Hanczar B. Belda E. Danchin A. Clement K. Zucker J.D. Interpretable and accurate prediction models for metagenomics data.GigaScience. 2020; 9: giaa010https://doi.org/10.1093/gigascience/giaa010Crossref PubMed Scopus (23) Google Scholar For example, the “Meta-Singer” is to rank the microbial features based on the aggregation of identified features from multiple ML models,9Reiman D. Metwally A.A. Sun J. Dai Y. Meta-signer: metagenomic signature identifier based on rank aggregation of features.bioRxiv. 2020; (Preprint at)https://doi.org/10.1101/2020.05.09.085993Crossref Scopus (0) Google Scholar while the novel “predomics” tool employs the genetic algorithm to find the best number of features for simple condition models, leading to better accuracy and interpretability than the previous state-of-the-art (SOTA) ML models using fewer features.15Prifti E. Chevaleyre Y. Hanczar B. Belda E. Danchin A. Clement K. Zucker J.D. Interpretable and accurate prediction models for metagenomics data.GigaScience. 2020; 9: giaa010https://doi.org/10.1093/gigascience/giaa010Crossref PubMed Scopus (23) Google Scholar However, microbiome data are complex, and ML methods with fewer selected features may be limited in the representation capability of the models and in learning complex patterns from the data.14Cammarota G. Ianiro G. Ahern A. Carbone C. Temko A. Claesson M.J. Gasbarrini A. Tortora G. Gut microbiome, big data and machine learning to promote precision medicine for cancer.Nat. Rev. Gastroenterol. Hepatol. 2020; 17: 635-648https://doi.org/10.1038/s41575-020-0327-3Crossref PubMed Scopus (121) Google Scholar New interpretable DL methods are needed for enhanced learning and interpretability of metagenomic data to complement existing ML and DL methods. The widely used metagenomic ML methods include the least absolute shrinkage and selection operator (LASSO), ensemble tree-based random forest (RF), and support vector machines (SVMs) in combination with various feature selection techniques.3Wirbel J. Pyl P.T. Kartal E. Zych K. Kashani A. Milanese A. Fleck J.S. Voigt A.Y. Palleja A. Ponnudurai R. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer.Nat. Med. 2019; 25: 679-689https://doi.org/10.1038/s41591-019-0406-6Crossref PubMed Scopus (485) Google Scholar,4Yachida S. Mizutani S. Shiroma H. Shiba S. Nakajima T. Sakamoto T. Watanabe H. Masuda K. Nishimoto Y. Kubo M. et al.Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer.Nat. Med. 2019; 25: 968-976https://doi.org/10.1038/s41591-019-0458-7Crossref PubMed Scopus (536) Google Scholar,5Pasolli E. Truong D.T. Malik F. Waldron L. Segata N. Machine learning meta-analysis of large metagenomic datasets: tools and biological insights.PLoS Comput. Biol. 2016; 12: e1004977https://doi.org/10.1371/journal.pcbi.1004977Crossref PubMed Scopus (299) Google Scholar They learn unordered 1D vectors of taxa or microbe abundances. These tabular 1D vectors of high dimensionality and sparsity are not the most appropriate data form for efficient DL. The disease prediction capability of DL may be improved by converting metagenomic data into phylogenetically ordered representations based on taxa hierarchical trees.6Oudah M. Henschel A. Taxonomy-aware feature engineering for microbiome classification.BMC Bioinf. 2018; 19: 227https://doi.org/10.1186/s12859-018-2205-3Crossref PubMed Scopus (45) Google Scholar Hence, appropriate metagenomic representation is important for enhanced learning. Two convolutional neural networks (ConvNets) Ph-CNN7Fioravanti D. Giarratano Y. Maggio V. Agostinelli C. Chierici M. Jurman G. Furlanello C. Phylogenetic convolutional neural networks in metagenomics.BMC Bioinf. 2018; 19: 49https://doi.org/10.1186/s12859-018-2033-5Crossref PubMed Scopus (53) Google Scholar and PopPhy-CNN8Reiman D. Metwally A.A. Sun J. Dai Y. PopPhy-CNN: a phylogenetic tree embedded architecture for convolutional neural networks to predict host phenotype from metagenomic data.IEEE J. Biomed. Health Inf. 2020; 24: 2993-3001https://doi.org/10.1109/JBHI.2020.2993761Crossref PubMed Scopus (28) Google Scholar,9Reiman D. Metwally A.A. Sun J. Dai Y. Meta-signer: metagenomic signature identifier based on rank aggregation of features.bioRxiv. 2020; (Preprint at)https://doi.org/10.1101/2020.05.09.085993Crossref Scopus (0) Google Scholar have been developed with the abundances of the taxonomically ordered microbes and 2D matrix of the embedded phylogenetic tree as input data, respectively. Moreover, to alleviate the over-fitting issue when conducting disease predictions by DL, a promising algorithm, Met2Img,16Nguyen T.H. Prifti E. Chevaleyre Y. Sokolovska N. Zucker J.-D. Disease classification in metagenomics with 2d embeddings and deep learning.arXiv. 2018; (Preprint at)https://doi.org/10.48550/arXiv.1806.09046Crossref Google Scholar,17Nguyen T.H. Prifti E. Sokolovska N. Zucker J.-D. Disease Prediction Using Synthetic Image Representations of Metagenomic Data and Convolutional Neural Networks. IEEE, 2019: 1-6https://doi.org/10.1109/RIVF.2019.8713670Crossref Scopus (12) Google Scholar has been introduced to exploit taxonomic (the so-called “Fill-up”) and manifold embeddings (MEs), such as t-distributed stochastic neighbor embedding (t-SNE) to transform abundance data into “synthetic images,” which enables the efficient exploration of ConvNets for disease classification based on the image created.17Nguyen T.H. Prifti E. Sokolovska N. Zucker J.-D. Disease Prediction Using Synthetic Image Representations of Metagenomic Data and Convolutional Neural Networks. IEEE, 2019: 1-6https://doi.org/10.1109/RIVF.2019.8713670Crossref Scopus (12) Google Scholar In the “synthetic images,” the abundances are binned to generate the color space based on a given color map type. Testing results have indicated that the integration of phylogenetic information alongside abundance data improves classification performances.16Nguyen T.H. Prifti E. Chevaleyre Y. Sokolovska N. Zucker J.-D. Disease classification in metagenomics with 2d embeddings and deep learning.arXiv. 2018; (Preprint at)https://doi.org/10.48550/arXiv.1806.09046Crossref Google Scholar,17Nguyen T.H. Prifti E. Sokolovska N. Zucker J.-D. Disease Prediction Using Synthetic Image Representations of Metagenomic Data and Convolutional Neural Networks. IEEE, 2019: 1-6https://doi.org/10.1109/RIVF.2019.8713670Crossref Scopus (12) Google Scholar While the Met2Img tool can generate color synthetic images, the corresponding color space simply duplicates the abundance information, but it lacks the local-coherence characteristics on the transformed images and has overlap issues of the feature points (FPs) when using MEs, such as t-SNE.16Nguyen T.H. Prifti E. Chevaleyre Y. Sokolovska N. Zucker J.-D. Disease classification in metagenomics with 2d embeddings and deep learning.arXiv. 2018; (Preprint at)https://doi.org/10.48550/arXiv.1806.09046Crossref Google Scholar Nevertheless, highly efficient ConvNet models rely on the recognition of 2D local-coherence and multichannel characters of natural color image data.18Shen W.X. Liu Y. Chen Y. Zeng X. Tan Y. Jiang Y.Y. Chen Y.Z. AggMapNet: enhanced and explainable low-sample omics deep learning with feature-aggregated multi-channel networks.Nucleic Acids Res. 2022; 50: e45https://doi.org/10.1093/nar/gkac010Crossref PubMed Scopus (5) Google Scholar Natural images are highly structured and low-noise data, where character-distinguishing features of the data are concentrated in local regions of images. The restructuring of metagenomic data into spatially correlated 2D color image-like data is still needed for efficient DL with ConvNets. In this work, we developed an unsupervised metagenomic microbial embedding, grouping, and mapping algorithm (MEGMA) to transform tabular metagenomic data into spatially correlated color image-like 2D representations named 2D microbiomeprints (3D tensor data in the form of width, height, and channel). Each channel contains a group of microbes, marked with a different color (Figure 1A ). The MEs and position mappings were used to enhance the local connectivity and local coherence of image-like 2D representations, while the grouping operations were used to generate the multichannel (i.e., the number of the colors) characteristics of the 2D representations. Therefore, the final MEGMA 2D microbiomeprints are structured multichannel 3D feature maps (Fmaps) for enhanced performances in the subsequent learning tasks. For example, ConvNet-based AggMapNet (Figure 1B) DL models can be trained with MEGMA 2D microbiomeprints as inputs to learn the metagenomic data for disease prediction and biomarker discovery (Figure 1C). Nineteen publicly available low sample size metagenomic datasets were used in this study, including a Disease-Set18Reiman D. Metwally A.A. Sun J. Dai Y. PopPhy-CNN: a phylogenetic tree embedded architecture for convolutional neural networks to predict host phenotype from metagenomic data.IEEE J. Biomed. Health Inf. 2020; 24: 2993-3001https://doi.org/10.1109/JBHI.2020.2993761Crossref PubMed Scopus (28) Google Scholar,9Reiman D. Metwally A.A. Sun J. Dai Y. Meta-signer: metagenomic signature identifier based on rank aggregation of features.bioRxiv. 2020; (Preprint at)https://doi.org/10.1101/2020.05.09.085993Crossref Scopus (0) Google Scholar and a Disease-Set216Nguyen T.H. Prifti E. Chevaleyre Y. Sokolovska N. Zucker J.-D. Disease classification in metagenomics with 2d embeddings and deep learning.arXiv. 2018; (Preprint at)https://doi.org/10.48550/arXiv.1806.09046Crossref Google Scholar,17Nguyen T.H. Prifti E. Sokolovska N. Zucker J.-D. Disease Prediction Using Synthetic Image Representations of Metagenomic Data and Convolutional Neural Networks. IEEE, 2019: 1-6https://doi.org/10.1109/RIVF.2019.8713670Crossref Scopus (12) Google Scholar that are related to five diseases (cirrhosis, obesity, type 2 diabetes [T2D], inflammatory bowel disease [IBD], and colorectal cancer [CRC]), and two sets of recently published CRC gut metagenomic CRC-Nation3Wirbel J. Pyl P.T. Kartal E. Zych K. Kashani A. Milanese A. Fleck J.S. Voigt A.Y. Palleja A. Ponnudurai R. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer.Nat. Med. 2019; 25: 679-689https://doi.org/10.1038/s41591-019-0406-6Crossref PubMed Scopus (485) Google Scholar and CRC-Stage4Yachida S. Mizutani S. Shiroma H. Shiba S. Nakajima T. Sakamoto T. Watanabe H. Masuda K. Nishimoto Y. Kubo M. et al.Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer.Nat. Med. 2019; 25: 968-976https://doi.org/10.1038/s41591-019-0458-7Crossref PubMed Scopus (536) Google Scholar datasets (Table 1). All these datasets used in our study were directly obtained from the processed data and results in previous studies. These datasets cover a different number of microbial species, and the metagenomic taxonomic profiles are generated by different pipelines, such as the MetaPhlAn2,19Truong D.T. Franzosa E.A. Tickle T.L. Scholz M. Weingart G. Pasolli E. Tett A. Huttenhower C. Segata N. MetaPhlAn2 for enhanced metagenomic taxonomic profiling.Nat. Methods. 2015; 12: 902-903https://doi.org/10.1038/nmeth.3589Crossref PubMed Scopus (1204) Google Scholar the mOTU2,20Milanese A. Mende D.R. Paoli L. Salazar G. Ruscheweyh H.J. Cuenca M. Hingamp P. Alves R. Costea P.I. Coelho L.P. et al.Microbial abundance, activity and population genomic profiling with mOTUs2.Nat. Commun. 2019; 10: 1014https://doi.org/10.1038/s41467-019-08844-4Crossref PubMed Scopus (184) Google Scholar and the SILVA Living Tree Project (LTP).21Yilmaz P. Parfrey L.W. Yarza P. Gerken J. Pruesse E. Quast C. Schweer T. Peplies J. Ludwig W. Glöckner F.O. The SILVA and “all-species living tree project (LTP)” taxonomic frameworks.Nucleic Acids Res. 2014; 42: D643-D648https://doi.org/10.1093/nar/gkt1209Crossref PubMed Scopus (1731) Google Scholar We evaluated whether the ME methods are better than the random uniform embedding (RUE) method in generating 2D microbiomeprints. Moreover, the performances of ConvNet-based AggMapNet models trained on multichannel 2D microbiomeprints were compared with those trained on single-channel grayscale 2D microbiomeprints to determine if the former is more superior than the latter. These enable AggMapNet models to outperform the commonly used ML and DL models in the metagenomic benchmark datasets of Disease-Set1 (Table S1). We also compared MEGMA with the existing method Met2Img16Nguyen T.H. Prifti E. Chevaleyre Y. Sokolovska N. Zucker J.-D. Disease classification in metagenomics with 2d embeddings and deep learning.arXiv. 2018; (Preprint at)https://doi.org/10.48550/arXiv.1806.09046Crossref Google Scholar,17Nguyen T.H. Prifti E. Sokolovska N. Zucker J.-D. Disease Prediction Using Synthetic Image Representations of Metagenomic Data and Convolutional Neural Networks. IEEE, 2019: 1-6https://doi.org/10.1109/RIVF.2019.8713670Crossref Scopus (12) Google Scholar to assess which image-like 2D representation generation algorithm performs better for disease prediction (Table S2).Table 1Summary of the human gut metagenomic datasets in this studyData GroupReferenceDatasetNo. of casesNo. of controlsNo. of speciesDisease-Set1PopPhy-CNN and Meta-Singer, Reiman et al.8Reiman D. Metwally A.A. Sun J. Dai Y. PopPhy-CNN: a phylogenetic tree embedded architecture for convolutional neural networks to predict host phenotype from metagenomic data.IEEE J. Biomed. Health Inf. 2020; 24: 2993-3001https://doi.org/10.1109/JBHI.2020.2993761Crossref PubMed Scopus (28) Google Scholar,9Reiman D. Metwally A.A. Sun J. Dai Y. Meta-signer: metagenomic signature identifier based on rank aggregation of features.bioRxiv. 2020; (Preprint at)https://doi.org/10.1101/2020.05.09.085993Crossref Scopus (0) Google Scholarcirrhosis114118542IBD2585443obesity16489465T2D223217606CRC48, 3947507Disease-Set2Met2Img, Nguyen et al.16Nguyen T.H. Prifti E. Chevaleyre Y. Sokolovska N. Zucker J.-D. Disease classification in metagenomics with 2d embeddings and deep learning.arXiv. 2018; (Preprint at)https://doi.org/10.48550/arXiv.1806.09046Crossref Google Scholar,17Nguyen T.H. Prifti E. Sokolovska N. Zucker J.-D. Disease Prediction Using Synthetic Image Representations of Metagenomic Data and Convolutional Neural Networks. IEEE, 2019: 1-6https://doi.org/10.1109/RIVF.2019.8713670Crossref Scopus (12) Google Scholarcirrhosis118114542IBD2585443obesity16489465T2D170174572CRC4873503Cross-nation sets of CRC-NationWirbel et al.3Wirbel J. Pyl P.T. Kartal E. Zych K. Kashani A. Milanese A. Fleck J.S. Voigt A.Y. Palleja A. Ponnudurai R. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer.Nat. Med. 2019; 25: 679-689https://doi.org/10.1038/s41591-019-0406-6Crossref PubMed Scopus (485) Google ScholarAUS4663849CHN7454849DEU6060849FRA5361849USA5252849Disease-stage sets of CRC-StageYachida et al.4Yachida S. Mizutani S. Shiroma H. Shiba S. Nakajima T. Sakamoto T. Watanabe H. Masuda K. Nishimoto Y. Kubo M. et al.Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer.Nat. Med. 2019; 25: 968-976https://doi.org/10.1038/s41591-019-0458-7Crossref PubMed Scopus (536) Google ScholarMP401277,278S0271277,278SI/II691277,278SIII/IV541277,278 Open table in a new tab We further show that the AggMapNet explainable module in the analysis of the 2D microbiomeprints led to the identification of the important microbes (IMs), consistent with literature-reported biomarkers and biological mechanisms. A saliency map22Simonyan K. Vedaldi A. Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps.arXiv. 2013; (Preprint at)https://doi.org/10.48550/arXiv.1312.6034Crossref Google Scholar is used to reflect the degree of importance of a feature (i.e., a microbe) in the input 2D microbiomeprints. On the identification of the biomarkers for the Disease-Set1, 84 of the 100 identified IMs, which include the top 20 species for each disease of cirrhosis, IBD, T2D, obesity, and CRC, are consistent with the disease-relevance reports in the 74 distinct literature reports. On identifying of the consistent and replicable microbial signatures across cohorts of five nations in the early detection of CRC,3Wirbel J. Pyl P.T. Kartal E. Zych K. Kashani A. Milanese A. Fleck J.S. Voigt A.Y. Palleja A. Ponnudurai R. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer.Nat. Med. 2019; 25: 679-689https://doi.org/10.1038/s41591-019-0406-6Crossref PubMed Scopus (485) Google Scholar the global feature importance (GFI) of AggMapNet is better than the commonly used marker-identifying methods, such as generalized fold change (FC),3Wirbel J. Pyl P.T. Kartal E. Zych K. Kashani A. Milanese A. Fleck J.S. Voigt A.Y. Palleja A. Ponnudurai R. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer.Nat. Med. 2019; 25: 679-689https://doi.org/10.1038/s41591-019-0406-6Crossref PubMed Scopus (485) Google Scholar two-sided Wilcoxon rank-sum test (WRST) p value with Benjamini-Hochberg false-discovery rate (FDR) correction (q value),3Wirbel J. Pyl P.T. Kartal E. Zych K. Kashani A. Milanese A. Fleck J.S. Voigt A.Y. Palleja A. Ponnudurai R. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer.Nat. Med. 2019; 25: 679-689https://doi.org/10.1038/s41591-019-0406-6Crossref PubMed Scopus (485) Google Scholar LASSO model coefficient, and RF feature importance (FI). The AggMapNet GFI-based saliency map can also detect the microbial shifts in different stages of CRC. In conclusion, we show that the use of 2D microbiomeprints as metagenomic representations can significantly enhance downstream tasks of DL-based disease prediction and discovery of key signatures, and an interpretable DL-based metagenomic learning MEGMA-AggMapNet-GFI pipeline (released in aggmap: https://pypi.org/project/aggmap/1.1.7/) with good performance has been developed for disease prediction and biomarker discovery. MEGMA was developed to transform high-dimensional and sparse metagenomic data from the tabulated 1D vector forms into color image-like multichannel 2D microbiomeprints. Each 2D microbiomeprint represents a microbial abundance 2D imprint of individual samples. Natural color images are highly structured and low-noise data with two important characteristics, namely local coherence and multiple channels (e.g., RGB channels). The ME and metagenomic/taxonomic grouping (MG/TG) of microbes were particularly designed to construct the local coherence and multichannel (i.e., the number of the colors or groups) characters of the 2D microbio" @default.
- W4311756026 created "2022-12-28" @default.
- W4311756026 creator A5008597035 @default.
- W4311756026 creator A5014418198 @default.
- W4311756026 creator A5059978034 @default.
- W4311756026 creator A5077700312 @default.
- W4311756026 date "2023-01-01" @default.
- W4311756026 modified "2023-10-14" @default.
- W4311756026 title "Enhanced metagenomic deep learning for disease prediction and consistent signature recognition by restructured microbiome 2D representations" @default.
- W4311756026 cites W1703384511 @default.
- W4311756026 cites W1964027278 @default.
- W4311756026 cites W1974809348 @default.
- W4311756026 cites W1978709058 @default.
- W4311756026 cites W1993784588 @default.
- W4311756026 cites W2001141328 @default.
- W4311756026 cites W2002061256 @default.
- W4311756026 cites W2004549986 @default.
- W4311756026 cites W2023615726 @default.
- W4311756026 cites W2053186076 @default.
- W4311756026 cites W2071841602 @default.
- W4311756026 cites W2095416491 @default.
- W4311756026 cites W2101565190 @default.
- W4311756026 cites W2103296919 @default.
- W4311756026 cites W2121797177 @default.
- W4311756026 cites W2122131141 @default.
- W4311756026 cites W2125826054 @default.
- W4311756026 cites W2130725058 @default.
- W4311756026 cites W2139688603 @default.
- W4311756026 cites W2147637673 @default.
- W4311756026 cites W2163903604 @default.
- W4311756026 cites W2328023807 @default.
- W4311756026 cites W2473355215 @default.
- W4311756026 cites W2537258608 @default.
- W4311756026 cites W2768799378 @default.
- W4311756026 cites W2801983100 @default.
- W4311756026 cites W2808466192 @default.
- W4311756026 cites W2892221324 @default.
- W4311756026 cites W2901794234 @default.
- W4311756026 cites W2920716817 @default.
- W4311756026 cites W2927453907 @default.
- W4311756026 cites W2938740358 @default.
- W4311756026 cites W2949004447 @default.
- W4311756026 cites W2963776453 @default.
- W4311756026 cites W3011527816 @default.
- W4311756026 cites W3024771599 @default.
- W4311756026 cites W3035449334 @default.
- W4311756026 cites W3040384889 @default.
- W4311756026 cites W3040925197 @default.
- W4311756026 cites W3049718907 @default.
- W4311756026 cites W3097338697 @default.
- W4311756026 cites W3115276438 @default.
- W4311756026 cites W3126791156 @default.
- W4311756026 cites W3127452014 @default.
- W4311756026 cites W3128828446 @default.
- W4311756026 cites W3134146005 @default.
- W4311756026 cites W3135299124 @default.
- W4311756026 cites W4206055961 @default.
- W4311756026 doi "https://doi.org/10.1016/j.patter.2022.100658" @default.
- W4311756026 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/36699735" @default.
- W4311756026 hasPublicationYear "2023" @default.
- W4311756026 type Work @default.
- W4311756026 citedByCount "2" @default.
- W4311756026 countsByYear W43117560262023 @default.
- W4311756026 crossrefType "journal-article" @default.
- W4311756026 hasAuthorship W4311756026A5008597035 @default.
- W4311756026 hasAuthorship W4311756026A5014418198 @default.
- W4311756026 hasAuthorship W4311756026A5059978034 @default.
- W4311756026 hasAuthorship W4311756026A5077700312 @default.
- W4311756026 hasBestOaLocation W43117560261 @default.
- W4311756026 hasConcept C104317684 @default.
- W4311756026 hasConcept C142724271 @default.
- W4311756026 hasConcept C143121216 @default.
- W4311756026 hasConcept C15151743 @default.
- W4311756026 hasConcept C154945302 @default.
- W4311756026 hasConcept C2522767166 @default.
- W4311756026 hasConcept C2524010 @default.
- W4311756026 hasConcept C2779134260 @default.
- W4311756026 hasConcept C2779696439 @default.
- W4311756026 hasConcept C33923547 @default.
- W4311756026 hasConcept C41008148 @default.
- W4311756026 hasConcept C54355233 @default.
- W4311756026 hasConcept C60644358 @default.
- W4311756026 hasConcept C70721500 @default.
- W4311756026 hasConcept C71924100 @default.
- W4311756026 hasConcept C78458016 @default.
- W4311756026 hasConcept C86803240 @default.
- W4311756026 hasConceptScore W4311756026C104317684 @default.
- W4311756026 hasConceptScore W4311756026C142724271 @default.
- W4311756026 hasConceptScore W4311756026C143121216 @default.
- W4311756026 hasConceptScore W4311756026C15151743 @default.
- W4311756026 hasConceptScore W4311756026C154945302 @default.
- W4311756026 hasConceptScore W4311756026C2522767166 @default.
- W4311756026 hasConceptScore W4311756026C2524010 @default.
- W4311756026 hasConceptScore W4311756026C2779134260 @default.
- W4311756026 hasConceptScore W4311756026C2779696439 @default.
- W4311756026 hasConceptScore W4311756026C33923547 @default.
- W4311756026 hasConceptScore W4311756026C41008148 @default.
- W4311756026 hasConceptScore W4311756026C54355233 @default.