Matches in SemOpenAlex for { <https://semopenalex.org/work/W2991266812> ?p ?o ?g. }
- W2991266812 endingPage "303" @default.
- W2991266812 startingPage "293" @default.
- W2991266812 abstract "Pseudouridine (Ψ) is the most abundant RNA modification and has been found in many kinds of RNAs, including snRNA, rRNA, tRNA, mRNA, and snoRNA. Thus, Ψ sites play a significant role in basic research and drug development. Although some experimental techniques have been developed to identify Ψ sites, they are expensive and time consuming, especially in the post-genomic era with the explosive growth of known RNA sequences. Thus, highly accurate computational methods are urgently required to quickly detect the Ψ sites on uncharacterized RNA sequences. Several predictors have been proposed using multifarious features, but their evaluated performances are still unsatisfactory. In this study, we first identified Ψ sites for H. sapiens, S. cerevisiae, and M. musculus using the sequence features from the bi-profile Bayes (BPB) method based on the random forest (RF) and support vector machine (SVM) algorithms, where the performances were evaluated using 5-fold cross-validation and independent tests. It was found that the SVM-based accuracies were 3.55% and 5.09% lower than the iPseU-CUU predictor for the H_990 and S_628 datasets, respectively. Almost the same-level results were obtained for M_994 and an independent H_200 dataset, even showing a 5.0% improvement for S_200. Then, three different kinds of features, including basic Kmer, general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-General), and nucleotide chemical property (NCP) and nucleotide density (ND) from the iRNA-PseU method, were combined with BPB to show their comprehensive performances, where the effective features are selected by the max-relevance-max-distance (MRMD) method. The best evaluated accuracies of the combined features for the S_628 and M_994 datasets were achieved at 70.54% and 72.45%, which were 2.39% and 0.65% higher than iPseU-CUU. For the S_200 dataset, it was also improved 8% from 69% to 77%. However, there was no obvious improvement for H. sapiens, which was evaluated as approximately 63.23% and 72.0% for the H_990 and H_200 datasets, respectively. The overall performances for Ψ identification using BPB features as well as the combined features were not obviously improved. Although some kinds of feature extraction methods based on the RNA sequence information have been applied to construct the predictors in previous studies, the corresponding accuracies are generally in the range of 60%–70%. Thus, researchers need to reconsider whether there is any sequence feature in the RNA Ψ modification prediction problem. Pseudouridine (Ψ) is the most abundant RNA modification and has been found in many kinds of RNAs, including snRNA, rRNA, tRNA, mRNA, and snoRNA. Thus, Ψ sites play a significant role in basic research and drug development. Although some experimental techniques have been developed to identify Ψ sites, they are expensive and time consuming, especially in the post-genomic era with the explosive growth of known RNA sequences. Thus, highly accurate computational methods are urgently required to quickly detect the Ψ sites on uncharacterized RNA sequences. Several predictors have been proposed using multifarious features, but their evaluated performances are still unsatisfactory. In this study, we first identified Ψ sites for H. sapiens, S. cerevisiae, and M. musculus using the sequence features from the bi-profile Bayes (BPB) method based on the random forest (RF) and support vector machine (SVM) algorithms, where the performances were evaluated using 5-fold cross-validation and independent tests. It was found that the SVM-based accuracies were 3.55% and 5.09% lower than the iPseU-CUU predictor for the H_990 and S_628 datasets, respectively. Almost the same-level results were obtained for M_994 and an independent H_200 dataset, even showing a 5.0% improvement for S_200. Then, three different kinds of features, including basic Kmer, general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-General), and nucleotide chemical property (NCP) and nucleotide density (ND) from the iRNA-PseU method, were combined with BPB to show their comprehensive performances, where the effective features are selected by the max-relevance-max-distance (MRMD) method. The best evaluated accuracies of the combined features for the S_628 and M_994 datasets were achieved at 70.54% and 72.45%, which were 2.39% and 0.65% higher than iPseU-CUU. For the S_200 dataset, it was also improved 8% from 69% to 77%. However, there was no obvious improvement for H. sapiens, which was evaluated as approximately 63.23% and 72.0% for the H_990 and H_200 datasets, respectively. The overall performances for Ψ identification using BPB features as well as the combined features were not obviously improved. Although some kinds of feature extraction methods based on the RNA sequence information have been applied to construct the predictors in previous studies, the corresponding accuracies are generally in the range of 60%–70%. Thus, researchers need to reconsider whether there is any sequence feature in the RNA Ψ modification prediction problem. Pseudouridine (Ψ) is the most prevalent post-transcriptional modification, and it has been widely found in a series of biological and cellular processes.1Hudson G.A. Bloomingdale R.J. Znosko B.M. Thermodynamic contribution and nearest-neighbor parameters of pseudouridine-adenosine base pairs in oligoribonucleotides.RNA. 2013; 19: 1474-1482Crossref PubMed Scopus (44) Google Scholar,2Sloan K.E. Warda A.S. Sharma S. Entian K.D. Lafontaine D.L.J. Bohnsack M.T. Tuning the ribosome: The influence of rRNA modification on eukaryotic ribosome biogenesis and function.RNA Biol. 2017; 14: 1138-1152Crossref PubMed Scopus (122) Google Scholar Recent studies have demonstrated that Ψ sites exist in many kinds of RNAs, such as small nuclear RNA (snRNA), rRNA, tRNA, mRNA, and small nucleolar RNA (snoRNA).3Ge J. Yu Y.T. RNA pseudouridylation: new insights into an old modification.Trends Biochem. Sci. 2013; 38: 210-218Abstract Full Text Full Text PDF PubMed Scopus (115) Google Scholar, 4Han S. Liang Y. Ma Q. Xu Y. Zhang Y. Du W. Wang C. Li Y. LncFinder: an integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information and physicochemical property.Brief. Bioinform. 2018; (Published online July 31, 2018)https://doi.org/10.1093/bib/bby065Crossref Scopus (4) Google Scholar, 5Lu S.J. Xie J. Li Y. Yu B. Ma Q. Liu B.Q. Identification of lncRNAs-gene interactions in transcription regulation based on co-expression analysis of RNA-seq data.Math. Biosci. Eng. 2019; 16: 7112-7125Crossref PubMed Scopus (0) Google Scholar, 6Cantara W.A. Crain P.F. Rozenski J. McCloskey J.A. Harris K.A. Zhang X. Vendeix F.A. Fabris D. Agris P.F. The RNA modification database, RNAMDB: 2011 update.Nucleic Acids Res. 2011; 39: D195-D201Crossref PubMed Scopus (458) Google Scholar, 7Boccaletto P. Machnicka M.A. Purta E. Piatkowski P. Baginski B. Wirecki T.K. de Crécy-Lagard V. Ross R. Limbach P.A. Kotter A. et al.MODOMICS: a database of RNA modification pathways. 2017 update.Nucleic Acids Res. 2018; 46: D303-D307Crossref PubMed Scopus (327) Google Scholar, 8Tang J. Fu J. Wang Y. Luo Y. Yang Q. Li B. Tu G. Hong J. Cui X. Chen Y. et al.Simultaneous improvement in the precision, accuracy, and robustness of label-free proteome quantification by optimizing data manipulation chains.Mol. Cell. Proteomics. 2019; 18: 1683-1699Crossref PubMed Google Scholar, 9Cheng L. Wang P. Tian R. Wang S. Guo Q. Luo M. Zhou W. Liu G. Jiang H. Jiang Q. LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse.Nucleic Acids Res. 2019; 47: D140-D144Crossref PubMed Scopus (84) Google Scholar, 10Cheng L. Sun J. Xu W. Dong L. Hu Y. Zhou M. OAHG: an integrated resource for annotating human genes with multi-level ontologies.Sci. Rep. 2016; 6: 34820Crossref PubMed Scopus (65) Google Scholar, 11Chen J. Peng H. Han G. Cai H. Cai J. HOGMMNC: a higher order graph matching with multiple network constraints model for gene-drug regulatory modules identification.Bioinformatics. 2019; 35: 602-610Crossref PubMed Scopus (1) Google Scholar Thus, the Ψ site plays a crucial role in biological research and drug development. More specifically, Ψ is an isomer of uridine catalyzed by the Ψ synthase (PUS) that removes the uridine residue’s base from its sugar, followed by “rotating” it 180° along the N3-C6 axis, and subsequently reattaches the base’s 5-carbon to the 1’-carbon of the sugar.12Charette M. Gray M.W. Pseudouridine in RNA: what, where, how, and why.IUBMB Life. 2000; 49: 341-351Crossref PubMed Google Scholar Although there are several experimental methods based on the high-throughput techniques that have been developed to recognize the Ψ modifications, they are both costly and time consuming.13Carlile T.M. Rojas-Duran M.F. Zinshteyn B. Shin H. Bartoli K.M. Gilbert W.V. Pseudouridine profiling reveals regulated mRNA pseudouridylation in yeast and human cells.Nature. 2014; 515: 143-146Crossref PubMed Scopus (384) Google Scholar, 14Lovejoy A.F. Riordan D.P. Brown P.O. Transcriptome-wide mapping of pseudouridines: pseudouridine synthases modify specific mRNAs in S. cerevisiae.PLoS ONE. 2014; 9: e110799Crossref PubMed Scopus (151) Google Scholar, 15Schwartz S. Bernstein D.A. Mumbach M.R. Jovanovic M. Herbst R.H. León-Ricardo B.X. Engreitz J.M. Guttman M. Satija R. Lander E.S. et al.Transcriptome-wide mapping reveals widespread dynamic-regulated pseudouridylation of ncRNA and mRNA.Cell. 2014; 159: 148-162Abstract Full Text Full Text PDF PubMed Scopus (358) Google Scholar, 16Li X. Zhu P. Ma S. Song J. Bai J. Sun F. Yi C. Chemical pulldown reveals dynamic pseudouridylation of the mammalian transcriptome.Nat. Chem. Biol. 2015; 11: 592-597Crossref PubMed Scopus (173) Google Scholar, 17Tang J. Fu J. Wang Y. Li B. Li Y. Yang Q. Cui X. Hong J. Li X. Chen Y. et al.ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies.Brief. Bioinform. 2019; (Published online January 15, 2019)https://doi.org/10.1093/bib/bby127Crossref Scopus (25) Google Scholar In addition, researchers are facing an explosive increase of RNA data in the post-genomic age.18Zhou M. Zhao H. Wang X. Sun J. Su J. Analysis of long noncoding RNAs highlights region-specific altered expression patterns and diagnostic roles in Alzheimer’s disease.Brief. Bioinform. 2019; 20: 598-608Crossref PubMed Scopus (11) Google Scholar, 19Zhou M. Zhang Z. Zhao H. Bao S. Cheng L. Sun J. An immune-related six-lncRNA signature to improve prognosis prediction of glioblastoma multiforme.Mol. Neurobiol. 2018; 55: 3684-3697PubMed Google Scholar, 20Zhou M. Hu L. Zhang Z. Wu N. Sun J. Su J. Recurrence-associated long non-coding RNA signature for determining the risk of recurrence in patients with colon cancer.Mol. Ther. Nucleic Acids. 2018; 12: 518-529Abstract Full Text Full Text PDF PubMed Scopus (42) Google Scholar, 21Zhou M. Zhao H. Xu W. Bao S. Cheng L. Sun J. Discovery and validation of immune-associated long non-coding RNA biomarkers associated with clinically molecular subtype and prognosis in diffuse large B cell lymphoma.Mol. Cancer. 2017; 16 (Article 16)Crossref Scopus (24) Google Scholar, 22Zhou M. Zhao H. Wang Z. Cheng L. Yang L. Shi H. Yang H. Sun J. Identification and validation of potential prognostic lncRNA biomarkers for predicting survival in patients with multiple myeloma.J. Exp. Clin. Cancer Res. 2015; 34: 102Crossref PubMed Scopus (0) Google Scholar, 23Yu L. Zhao J. Gao L. Predicting potential drugs for breast cancer based on miRNA and tissue specificity.Int. J. Biol. Sci. 2018; 14: 971-982Crossref PubMed Scopus (34) Google Scholar, 24Tang G. Shi J. Wu W. Yue X. Zhang W. Sequence-based bacterial small RNAs prediction using ensemble learning strategies.BMC Bioinformatics. 2018; 19: 503Crossref PubMed Scopus (1) Google Scholar, 25Zhang W. Qu Q. Zhang Y. Wang W. The linear neighborhood propagation method for predicting long non-coding RNA–protein interactions.Neurocomputing. 2018; 273: 526-534Crossref Scopus (71) Google Scholar, 26Zhang W. Yue X. Tang G. Wu W. Huang F. Zhang X. SFPEL-LPI: sequence-based feature projection ensemble learning for predicting lncRNA-protein interactions.PLoS Comput. Biol. 2018; 14: e1006616Crossref PubMed Scopus (12) Google Scholar, 27Zhang W. Li Z. Guo W. Yang W. Huang F. A fast linear neighborhood similarity-based network link inference method to predict microRNA-disease associations.IEEE/ACM Trans. Comput. Biol. Bioinform. 2019; (Published online July 29, 2019)https://doi.org/10.1109/TCBB.2019.2931546.31369383Crossref Google Scholar, 28Li D. Luo L. Zhang W. Liu F. Luo F. A genetic algorithm-based weighted ensemble method for predicting transposon-derived piRNAs.BMC Bioinformatics. 2016; 17: 329Crossref PubMed Scopus (40) Google Scholar, 29Liao Z. Li D. Wang X. Li L. Zou Q. Cancer diagnosis from isomiR expression with machine learning method.Curr. Bioinform. 2018; 13: 57-63Crossref Scopus (15) Google Scholar, 30Xu A. Chen J. Peng H. Han G. Cai H. Simultaneous interrogation of cancer omics to identify subtypes with significant clinical differences.Front. Genet. 2019; 10: 236Crossref PubMed Scopus (1) Google Scholar Therefore, intelligent computational approaches are highly desirable to predict Ψ sites on RNA sequences. To the best of our knowledge, six predictors have been reported to identify Ψ sites. Specifically, Panwar and Raghava31Panwar B. Raghava G.P.S. Prediction of uridine modifications in tRNA sequences.BMC Bioinformatics. 2014; 15: 326Crossref PubMed Scopus (10) Google Scholar first proposed the tRNAmod model to predict Ψ sites in tRNA. Li et al.32Li Y.H. Zhang G. Cui Q. PPUS: a web server to predict PUS-specific pseudouridine sites.Bioinformatics. 2015; 31: 3362-3364Crossref PubMed Scopus (25) Google Scholar then developed the PPUS method based on the support vector machine (SVM) to identify PUS-specific Ψ sites. Later, Chen et al.33Chen W. Tang H. Ye J. Lin H. Chou K.C. iRNA-PseU: identifying RNA pseudouridine sites.Mol. Ther. Nucleic Acids. 2016; 5: e332Abstract Full Text Full Text PDF PubMed Google Scholar provided the iRNA-PseU predictor, and He et al.34He J. Fang T. Zhang Z. Huang B. Zhu X. Xiong Y. PseUI: pseudouridine sites identification based on RNA sequence information.BMC Bioinformatics. 2018; 19: 306Crossref PubMed Scopus (38) Google Scholar introduced the PseUI predictor, which are both based on the SVM classifier. In addition, Tahir et al.35Tahir M. Tayara H. Chong K.T. iPseU-CNN: identifying RNA pseudouridine sites using convolutional neural networks.Mol. Ther. Nucleic Acids. 2019; 16: 463-470Abstract Full Text Full Text PDF PubMed Scopus (0) Google Scholar built the iPseU-CUU model based on the convolution neural network (CNN). Most recently, Chen et al.36Liu K. Chen W. Lin H. XG-PseU: an eXtreme Gradient Boosting based method for identifying pseudouridine sites.Mol. Genet. Genomics. 2019; (Published online August 7, 2019)https://doi.org/10.1007/s00438-019-01600-9Crossref Scopus (8) Google Scholar proposed an eXtreme Gradient Boosting (xgboost)-based method (XG-PseU). It should be noted that the same datasets, built by Chen et al.,33Chen W. Tang H. Ye J. Lin H. Chou K.C. iRNA-PseU: identifying RNA pseudouridine sites.Mol. Ther. Nucleic Acids. 2016; 5: e332Abstract Full Text Full Text PDF PubMed Google Scholar were applied in the three studies (iRNA-PseU, PseUI, and iPseU-CUU) to build the predictors, including the benchmark training datasets (H_990, S_628, and M_944) and the independent testing datasets (H_200 and S_200). Here, H, S, and M represent the RNA samples for H. sapiens, S. cerevisiae, and M. musculus, while 990, 628, 944, and 200 indicate the corresponding sample numbers in each dataset. Thus, we used the datasets mentioned earlier in this article for convenient comparisons. The performances of the four predictors (iRNA-PseU, PseUI, iPseU-CUU, and XG-PseU) are listed in Table 1, where the XG-PseU results for independent datasets were obtained by the web server at http://www.bioml.cn. The jackknife test, 5-fold cross-validation, and 10-fold cross-validation are used for the iRNA-PseU, PseUI/iPseU-CUU, and XG-PseU models, respectively. It can be seen that their overall performances are gradually improved through the scientists’ efforts. Taking H_990 as an example, the accuracies have been improved by 6.28% from 60.40% (iRNA-PseU) to 61.24% (PseUI) and to 66.68% (iPseU-CUU). However, it must be noted that these predictive accuracies are still unsatisfactory.Table 1Results of the Proposed iRNA-PseU, PseUI, iPseU-CUU, and XG-PseU Predictors for Training Datasets H_990, S_628, and M_944 and Testing Datasets H_200 and S_200PredictorsTraining DatasetsAcc (%)MCCSn (%)Sp (%)Testing DatasetsAcc (%)MCCSn (%)Sp (%)iRNA-PseUaThe predictor developed by Chen et al.33H_99060.40.2161.0159.8H_20065.000.3060.0070.00PseUIbThe predictor proposed by He et al.3464.240.2864.8563.6465.500.3163.0068.00iPseU-CUUcThe predictor constructed by Tahir et al.3566.680.3465.0068.7869.000.4077.7260.81XG-PseUdThe predictor constructed by Liu et al.3665.440.3163.6467.2467.000.3467.0067.00iRNA-PseUaThe predictor developed by Chen et al.33S_62864.490.2964.6564.33S_20073.000.4681.0065.00PseUIbThe predictor proposed by He et al.3466.560.3362.171.0268.500.3772.0065.00iPseU-CUUcThe predictor constructed by Tahir et al.3568.150.3766.3670.4573.500.4768.7677.82XG-PseUdThe predictor constructed by Liu et al.3668.150.3766.8469.4571.000.4275.0067.00iRNA-PseUaThe predictor developed by Chen et al.33M_94469.070.3873.3164.83PseUIbThe predictor proposed by He et al.3470.440.4174.5866.31iPseU-CUUcThe predictor constructed by Tahir et al.3571.810.4474.4969.11XG-PseUdThe predictor constructed by Liu et al.3672.030.4576.4867.57a The predictor developed by Chen et al.33Chen W. Tang H. Ye J. Lin H. Chou K.C. iRNA-PseU: identifying RNA pseudouridine sites.Mol. Ther. Nucleic Acids. 2016; 5: e332Abstract Full Text Full Text PDF PubMed Google Scholarb The predictor proposed by He et al.34He J. Fang T. Zhang Z. Huang B. Zhu X. Xiong Y. PseUI: pseudouridine sites identification based on RNA sequence information.BMC Bioinformatics. 2018; 19: 306Crossref PubMed Scopus (38) Google Scholarc The predictor constructed by Tahir et al.35Tahir M. Tayara H. Chong K.T. iPseU-CNN: identifying RNA pseudouridine sites using convolutional neural networks.Mol. Ther. Nucleic Acids. 2019; 16: 463-470Abstract Full Text Full Text PDF PubMed Scopus (0) Google Scholard The predictor constructed by Liu et al.36Liu K. Chen W. Lin H. XG-PseU: an eXtreme Gradient Boosting based method for identifying pseudouridine sites.Mol. Genet. Genomics. 2019; (Published online August 7, 2019)https://doi.org/10.1007/s00438-019-01600-9Crossref Scopus (8) Google Scholar Open table in a new tab As a crucial step toward building a machine-learning-based predictor, feature extraction becomes a particularly important process. Several sequence representation methods have been used in previous works to obtain feature vectors. For example, a hybrid approach of the binary profile of patterns (BPP) and structural information is applied in the tRNAmod.31Panwar B. Raghava G.P.S. Prediction of uridine modifications in tRNA sequences.BMC Bioinformatics. 2014; 15: 326Crossref PubMed Scopus (10) Google Scholar In addition, the PPUS model uses the nucleotides around Ψ as the features to identify.32Li Y.H. Zhang G. Cui Q. PPUS: a web server to predict PUS-specific pseudouridine sites.Bioinformatics. 2015; 31: 3362-3364Crossref PubMed Scopus (25) Google Scholar For the successful iRNA-PseU method, dinucleotide chemical properties (DCP) and nucleotide density (ND) are incorporated for identification.33Chen W. Tang H. Ye J. Lin H. Chou K.C. iRNA-PseU: identifying RNA pseudouridine sites.Mol. Ther. Nucleic Acids. 2016; 5: e332Abstract Full Text Full Text PDF PubMed Google Scholar For the PseUI, the effective features are selected from five different feature extraction techniques using the sequential forward-feature-selection method, including nucleotide composition (NC), dinucleotide composition (DNC), pseudo-dinucleotide composition (PseDNC), position-specific nucleotide propensity (PSNP), and position-specific dinucleotide propensity (PSDP).37Chen C.Y. Chuang T.J. Comment on “A comprehensive overview and evaluation of circular RNA detection tools”.PLoS Comput. Biol. 2019; 15: e1006158Crossref PubMed Scopus (2) Google Scholar,38Xin Z. Ma Q. Ren S. Wang G. Li F. The understanding of circular RNAs as special triggers in carcinogenesis.Brief. Funct. Genomics. 2017; 16: 80-86PubMed Google Scholar For the iPseU-CUU method, the features are obtained automatically by a CNN model based on a deep learning machine, which is widely used in bioinformatics.39Zhang Z. Zhao Y. Liao X. Shi W. Li K. Zou Q. Peng S. Deep learning in omics: a survey and guideline.Brief. Funct. Genomics. 2019; 18: 41-57Crossref PubMed Scopus (12) Google Scholar, 40Wei L. Su R. Wang B. Li X. Zou Q. Gao X. Integration of deep feature representations and handcrafted features to improve the prediction of N 6-methyladenosine sites.Neurocomputing. 2019; 324: 3-9Crossref Scopus (0) Google Scholar, 41Lv Z. Ao C. Zou Q. Protein function prediction: from traditional classifier to deep learning.Proteomics. 2019; 19: e1900119Crossref PubMed Scopus (2) Google Scholar, 42Wei L. Ding Y. Su R. Tang J. Zou Q. Prediction of human protein subcellular localization using deep learning.J. Parallel Distrib. Comput. 2018; 117: 212-217Crossref Scopus (77) Google Scholar Furthermore, two additional feature extraction techniques, n-gram and multivariate mutual information (MMI), are also applied for the machine learning approach by the SVM method, where they still give a low accuracy (Acc).35Tahir M. Tayara H. Chong K.T. iPseU-CNN: identifying RNA pseudouridine sites using convolutional neural networks.Mol. Ther. Nucleic Acids. 2019; 16: 463-470Abstract Full Text Full Text PDF PubMed Scopus (0) Google Scholar For the newly reported XG-PseU predictor, six feature extraction techniques are used, namely, NC, DNC, trinucleotide composition (TNC), nucleotide chemical property (NCP), ND, and one-hot encode (one hot). At the same time, the identification of many types of RNA modifications using the machine-learning-based computational approaches shows the excellent performance, including for N6-methyladenosine (m6A),43Chen W. Ding H. Zhou X. Lin H. Chou K.C. iRNA(m6A)-PseDNC: identifying N6-methyladenosine sites using pseudo dinucleotide composition.Anal. Biochem. 2018; 561-562: 59-65Crossref PubMed Scopus (67) Google Scholar, 44Chen K. Wei Z. Zhang Q. Wu X. Rong R. Lu Z. Su J. de Magalhães J.P. Rigden D.J. Meng J. WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach.Nucleic Acids Res. 2019; 47: e41Crossref PubMed Scopus (5) Google Scholar, 45Zou Q. Xing P. Wei L. Liu B. Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA.RNA. 2019; 25: 205-218Crossref PubMed Scopus (76) Google Scholar 5-methylcytosine (m5C),46Feng P. Ding H. Chen W. Lin H. Identifying RNA 5-methylcytosine sites via pseudo nucleotide compositions.Mol. Biosyst. 2016; 12: 3307-3311Crossref PubMed Google Scholar, 47Qiu W.R. Jiang S.Y. Xu Z.C. Xiao X. Chou K.C. iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition.Oncotarget. 2017; 8: 41178-41188Crossref PubMed Scopus (127) Google Scholar, 48Zhang M. Xu Y. Li L. Liu Z. Yang X. Yu D.J. Accurate RNA 5-methylcytosine site prediction based on heuristic physical-chemical properties reduction and classifier ensemble.Anal. Biochem. 2018; 550: 41-48Crossref PubMed Scopus (14) Google Scholar, 49Sabooh M.F. Iqbal N. Khan M. Khan M. Maqbool H.F. Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou’s PseKNC.J. Theor. Biol. 2018; 452: 1-9Crossref PubMed Scopus (48) Google Scholar, 50Li J. Huang Y. Yang X. Zhou Y. Zhou Y. RNAm5Cfinder: a web-server for predicting RNA 5-methylcytosine (m5C) sites based on random forest.Sci. Rep. 2018; 8: 17299Crossref PubMed Scopus (12) Google Scholar, 51Song J. Zhai J. Bian E. Song Y. Yu J. Ma C. Transcriptome-wide annotation of m5C RNA modifications using machine learning.Front. Plant Sci. 2018; 9: 519Crossref PubMed Scopus (15) Google Scholar, 52Lv H. Zhang Z.M. Li S.H. Tan J.X. Chen W. Lin H. Evaluation of different computational methods on 5-methylcytosine sites identification.Brief. Bioinform. 2019; : bbz048PubMed Google Scholar, 53Xue W. Yang F. Wang P. Zheng G. Chen Y. Yao X. Zhu F. What contributes to serotonin-norepinephrine reuptake inhibitors’ dual-targeting mechanism? The key role of transmembrane domain 6 in human serotonin and norepinephrine transporters revealed by molecular dynamics simulation.ACS Chem. Neurosci. 2018; 9: 1128-1140Crossref PubMed Scopus (0) Google Scholar N1-methyladenosine (m1A),54Chen W. Feng P. Tang H. Ding H. Lin H. RAMPred: identifying the N(1)-methyladenosine sites in eukaryotic transcriptomes.Sci. Rep. 2016; 6: 31080Crossref PubMed Scopus (16) Google Scholar, 55Feng P. Ding H. Yang H. Chen W. Lin H. Chou K.-C. iRNA-PseColl: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC.Mol. Ther. Nucleic Acids. 2017; 7: 155-163Abstract Full Text Full Text PDF PubMed Scopus (194) Google Scholar, 56Chen W. Feng P. Yang H. Ding H. Lin H. Chou K.C. iRNA-3typeA: identifying three types of modification at RNA’s adenosine sites.Mol. Ther. Nucleic Acids. 2018; 11: 468-474Abstract Full Text Full Text PDF PubMed Scopus (112) Google Scholar and so forth. The related kinds of computational models used for these purposes have been summarized in a review,57Chen X. Sun Y.Z. Liu H. Zhang L. Li J.Q. Meng J. RNA methylation and diseases: experimental results, databases, Web servers and computational models.Brief. Bioinform. 2019; 20: 896-917Crossref PubMed Scopus (19) Google Scholar in which the recently reported overall accuracies are basically above 90%. In particular, the SVM-based iRNA(m6A)-PseDNC model demonstrates an Acc of 91.24% of 10-fold cross-validation for m6A identification for S. cerevisiae.43Chen W. Ding H. Zhou X. Lin H. Chou K.C. iRNA(m6A)-PseDNC: identifying N6-methyladenosine sites using pseudo dinucleotide composition.Anal. Biochem. 2018; 561-562: 59-65Crossref PubMed Scopus (67) Google Scholar For the m5C site, the recently developed iRNA-m5C predictor by the Random Forest (RF) algorithm shows a jackknife test Acc up to 92.9% for H. sapiens.52Lv H. Zhang Z.M. Li S.H. Tan J.X. Chen W. Lin H. Evaluation of different computational methods on 5-methylcytosine sites identification.Brief. Bioinform. 2019; : bbz048PubMed Google Scholar For m1A, the SVM-based iRNA-3typeA method obtains a jackknife validation Acc of 99.13% on H. sapiens and 98.73% for M. musculus.56Chen W. Feng P. Yang H. Ding H. Lin H. Chou K.C. iRNA-3typeA: identifying three types of modification at RNA’s adenosine sites.Mol. Ther. Nucleic Acids. 2018; 11: 468-474Abstract Full Text Full Text PDF PubMed Scopus (112) Google Scholar However, as mentioned earlier, the evaluated accuracies of Ψ site identification of different models are basically only 60%–70%, where there is still a large amount of improvement possible. We noticed that a predictor called “KELMPSP” reported a better performance, where the accuracies for the H_990, S_628, M_949, H_200, and S_200 datasets are up to 74.55%, 85.53%, 79.45%, 72.5%, and 76.00%, respectively.58Li Y.Z. FY X. FY X. KELMPSP: pseudouridine sites identification based on kernel extreme learning machine.Chin. J. Biochem. Mol. Biol. 2018; 34: 785-793Google Scholar In this method, the kernel extreme learning machine (KELM) algorithm is applied, where the final features are obtained by combining NCP, nucleotide concentrations, and position-specific mononucleotide, dinucleotide, and trinucleotide propensity characteristics. However, the related web server at http://39.105.77.161:8890/KELMPSP is no longer available. In this paper, we first applied the bi-profile Bayes method (BPB)59Shao J. Xu D. Tsai S.-N. Wang Y. Ngai S.-M. Computational identification of protein methylation sites through bi-profile Bayes feature extraction.PLoS ONE. 2009; 4: e4920Crossref PubMed Scopus (126) Google Scholar to extract the RNA sequence features to identify the Ψ sites. Two algorithms, RF and SVM, were both used to construct the models, where the performances were evaluated by 5-fold cross-validation and independent tests. Then, we incorporated three different features with BPB to show their comprehensive performance, including basic Kmer (Kmer),60Wei L. Liao M. Gao Y. Ji R. He Z. Zou Q. Improved and promising identification of human microRNAs by incorporating a high-quality negative set.IEEE/ACM Trans. Comput. Biol. Bioinformatics. 2014; 11: 192-201Crossref PubMed Scopus (123) Google Scholar general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-General) generated from the web server Pse-in-One,61Liu B. Liu F. Wang X. Chen J. Fang L. Chou K.C. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.Nucleic Acids Res. 2015; 43: W65-W71Crossref PubMed Google Scholar and NCP with ND (NCP+ND). Also, high-quality features were selected using the MRMD62Zou Q. Zeng J.C. Cao L.J. Ji R.R. A novel features ranking metric with application to scalable visual and bioinformatics data classification.Neurocomputing. 2016; 173: 346-354Crossref Scopus (0) Google Scholar method to predict the Ψ sites. First, we extracted the RNA features using the BPB method fo" @default.
- W2991266812 created "2019-12-05" @default.
- W2991266812 creator A5011545312 @default.
- W2991266812 creator A5032162374 @default.
- W2991266812 creator A5033551204 @default.
- W2991266812 creator A5045511627 @default.
- W2991266812 creator A5062628676 @default.
- W2991266812 date "2020-03-01" @default.
- W2991266812 modified "2023-10-11" @default.
- W2991266812 title "Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem?" @default.
- W2991266812 cites W1253096047 @default.
- W2991266812 cites W1494484168 @default.
- W2991266812 cites W1500914927 @default.
- W2991266812 cites W1947267113 @default.
- W2991266812 cites W1988594017 @default.
- W2991266812 cites W1991244466 @default.
- W2991266812 cites W1992577925 @default.
- W2991266812 cites W2022965734 @default.
- W2991266812 cites W2048356293 @default.
- W2991266812 cites W2051808158 @default.
- W2991266812 cites W2065898700 @default.
- W2991266812 cites W2085928743 @default.
- W2991266812 cites W2094113855 @default.
- W2991266812 cites W2102354653 @default.
- W2991266812 cites W2103413006 @default.
- W2991266812 cites W2114024619 @default.
- W2991266812 cites W2120026469 @default.
- W2991266812 cites W2131614043 @default.
- W2991266812 cites W2132247880 @default.
- W2991266812 cites W2135893370 @default.
- W2991266812 cites W2139691165 @default.
- W2991266812 cites W2153635508 @default.
- W2991266812 cites W2159192177 @default.
- W2991266812 cites W2161890613 @default.
- W2991266812 cites W2167616678 @default.
- W2991266812 cites W2171696743 @default.
- W2991266812 cites W2177612949 @default.
- W2991266812 cites W2274610632 @default.
- W2991266812 cites W2302101231 @default.
- W2991266812 cites W2333144702 @default.
- W2991266812 cites W2427122612 @default.
- W2991266812 cites W2508414719 @default.
- W2991266812 cites W2513334320 @default.
- W2991266812 cites W2515160119 @default.
- W2991266812 cites W2524367868 @default.
- W2991266812 cites W2526285516 @default.
- W2991266812 cites W2549247408 @default.
- W2991266812 cites W2557904026 @default.
- W2991266812 cites W2575552627 @default.
- W2991266812 cites W2577954834 @default.
- W2991266812 cites W2579268832 @default.
- W2991266812 cites W2592644437 @default.
- W2991266812 cites W2593867025 @default.
- W2991266812 cites W2599457435 @default.
- W2991266812 cites W2607378088 @default.
- W2991266812 cites W2614935527 @default.
- W2991266812 cites W2747811776 @default.
- W2991266812 cites W2750547662 @default.
- W2991266812 cites W2752850911 @default.
- W2991266812 cites W2754289562 @default.
- W2991266812 cites W2757915849 @default.
- W2991266812 cites W2765541862 @default.
- W2991266812 cites W2768833223 @default.
- W2991266812 cites W2770191688 @default.
- W2991266812 cites W2774506992 @default.
- W2991266812 cites W2777287263 @default.
- W2991266812 cites W2780725426 @default.
- W2991266812 cites W2780936345 @default.
- W2991266812 cites W2782438044 @default.
- W2991266812 cites W2782565892 @default.
- W2991266812 cites W2792533056 @default.
- W2991266812 cites W2793278326 @default.
- W2991266812 cites W2794797435 @default.
- W2991266812 cites W2798106464 @default.
- W2991266812 cites W2799620190 @default.
- W2991266812 cites W2800245053 @default.
- W2991266812 cites W2801398392 @default.
- W2991266812 cites W2804672687 @default.
- W2991266812 cites W2807018623 @default.
- W2991266812 cites W2808487499 @default.
- W2991266812 cites W2811451736 @default.
- W2991266812 cites W2883534252 @default.
- W2991266812 cites W2883977408 @default.
- W2991266812 cites W2886373325 @default.
- W2991266812 cites W2888906255 @default.
- W2991266812 cites W2890517686 @default.
- W2991266812 cites W2893880892 @default.
- W2991266812 cites W2896365426 @default.
- W2991266812 cites W2896605526 @default.
- W2991266812 cites W2896636431 @default.
- W2991266812 cites W2899075300 @default.
- W2991266812 cites W2899288360 @default.
- W2991266812 cites W2900490197 @default.
- W2991266812 cites W2900694973 @default.
- W2991266812 cites W2901890703 @default.
- W2991266812 cites W2902565488 @default.
- W2991266812 cites W2902773418 @default.
- W2991266812 cites W2904828342 @default.