Matches in SemOpenAlex for { <https://semopenalex.org/work/W2933350524> ?p ?o ?g. }
- W2933350524 endingPage "3071" @default.
- W2933350524 startingPage "3057" @default.
- W2933350524 abstract "Accurate identification of protein–DNA binding sites is significant for both understanding protein function and drug design. Machine-learning-based methods have been extensively used for the prediction of protein–DNA binding sites. However, the data imbalance problem, in which the number of nonbinding residues (negative-class samples) is far larger than that of binding residues (positive-class samples), seriously restricts the performance improvements of machine-learning-based predictors. In this work, we designed a two-stage imbalanced learning algorithm, called ensembled hyperplane-distance-based support vector machines (E-HDSVM), to improve the prediction performance of protein–DNA binding sites. The first stage of E-HDSVM designs a new iterative sampling algorithm, called hyperplane-distance-based under-sampling (HD-US), to extract multiple subsets from the original imbalanced data set, each of which is used to train a support vector machine (SVM). Unlike traditional sampling algorithms, HD-US selects samples by calculating the distances between the samples and the separating hyperplane of the SVM. The second stage of E-HDSVM proposes an enhanced AdaBoost (EAdaBoost) algorithm to ensemble multiple trained SVMs. As an enhanced version of the original AdaBoost algorithm, EAdaBoost overcomes the overfitting problem. Stringent cross-validation and independent tests on benchmark data sets demonstrated the superiority of E-HDSVM over several popular imbalanced learning algorithms. Based on the proposed E-HDSVM algorithm, we further implemented a sequence-based protein–DNA binding site predictor, called DNAPred, which is freely available at http://csbio.njust.edu.cn/bioinf/dnapred/ for academic use. The computational experimental results showed that our predictor achieved an average overall accuracy of 91.7% and a Mathew’s correlation coefficient of 0.395 on five benchmark data sets and outperformed several state-of-the-art sequence-based protein–DNA binding site predictors." @default.
- W2933350524 created "2019-04-11" @default.
- W2933350524 creator A5008960436 @default.
- W2933350524 creator A5018880478 @default.
- W2933350524 creator A5029914279 @default.
- W2933350524 creator A5052610760 @default.
- W2933350524 date "2019-04-03" @default.
- W2933350524 modified "2023-10-15" @default.
- W2933350524 title "DNAPred: Accurate Identification of DNA-Binding Sites from Protein Sequence by Ensembled Hyperplane-Distance-Based Support Vector Machines" @default.
- W2933350524 cites W1491553314 @default.
- W2933350524 cites W1925336368 @default.
- W2933350524 cites W1963631516 @default.
- W2933350524 cites W1964880253 @default.
- W2933350524 cites W1969370763 @default.
- W2933350524 cites W1971894358 @default.
- W2933350524 cites W1973356821 @default.
- W2933350524 cites W1974079881 @default.
- W2933350524 cites W1977628834 @default.
- W2933350524 cites W1997401305 @default.
- W2933350524 cites W1999318832 @default.
- W2933350524 cites W2005423065 @default.
- W2933350524 cites W2006020859 @default.
- W2933350524 cites W2018661561 @default.
- W2933350524 cites W2019858993 @default.
- W2933350524 cites W2019971393 @default.
- W2933350524 cites W2024631466 @default.
- W2933350524 cites W2032867948 @default.
- W2933350524 cites W2038670706 @default.
- W2933350524 cites W2043638247 @default.
- W2933350524 cites W2044628302 @default.
- W2933350524 cites W2053724458 @default.
- W2933350524 cites W2064675550 @default.
- W2933350524 cites W2089716128 @default.
- W2933350524 cites W2096986143 @default.
- W2933350524 cites W2099550922 @default.
- W2933350524 cites W2103525038 @default.
- W2933350524 cites W2108067237 @default.
- W2933350524 cites W2112832917 @default.
- W2933350524 cites W2116139800 @default.
- W2933350524 cites W2118978333 @default.
- W2933350524 cites W2119498311 @default.
- W2933350524 cites W2119948329 @default.
- W2933350524 cites W2122137509 @default.
- W2933350524 cites W2128533758 @default.
- W2933350524 cites W2128965734 @default.
- W2933350524 cites W2133312664 @default.
- W2933350524 cites W2137154596 @default.
- W2933350524 cites W2138769522 @default.
- W2933350524 cites W2144347309 @default.
- W2933350524 cites W2149997286 @default.
- W2933350524 cites W2152116196 @default.
- W2933350524 cites W2152365098 @default.
- W2933350524 cites W2152856869 @default.
- W2933350524 cites W2153153865 @default.
- W2933350524 cites W2153187042 @default.
- W2933350524 cites W2156125289 @default.
- W2933350524 cites W2156136208 @default.
- W2933350524 cites W2157122545 @default.
- W2933350524 cites W2158261944 @default.
- W2933350524 cites W2160800572 @default.
- W2933350524 cites W2161652323 @default.
- W2933350524 cites W2165745599 @default.
- W2933350524 cites W2166248605 @default.
- W2933350524 cites W2168012258 @default.
- W2933350524 cites W2169837652 @default.
- W2933350524 cites W2314115072 @default.
- W2933350524 cites W2325763597 @default.
- W2933350524 cites W2528774686 @default.
- W2933350524 cites W2531919408 @default.
- W2933350524 cites W2582176104 @default.
- W2933350524 cites W2766389776 @default.
- W2933350524 cites W2767813457 @default.
- W2933350524 cites W2792363117 @default.
- W2933350524 cites W2793168264 @default.
- W2933350524 cites W2800333284 @default.
- W2933350524 cites W2810225085 @default.
- W2933350524 cites W2890951984 @default.
- W2933350524 cites W2894836881 @default.
- W2933350524 cites W2909686100 @default.
- W2933350524 cites W4213149192 @default.
- W2933350524 cites W90286923 @default.
- W2933350524 doi "https://doi.org/10.1021/acs.jcim.8b00749" @default.
- W2933350524 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/30943723" @default.
- W2933350524 hasPublicationYear "2019" @default.
- W2933350524 type Work @default.
- W2933350524 sameAs 2933350524 @default.
- W2933350524 citedByCount "41" @default.
- W2933350524 countsByYear W29333505242019 @default.
- W2933350524 countsByYear W29333505242020 @default.
- W2933350524 countsByYear W29333505242021 @default.
- W2933350524 countsByYear W29333505242022 @default.
- W2933350524 countsByYear W29333505242023 @default.
- W2933350524 crossrefType "journal-article" @default.
- W2933350524 hasAuthorship W2933350524A5008960436 @default.
- W2933350524 hasAuthorship W2933350524A5018880478 @default.
- W2933350524 hasAuthorship W2933350524A5029914279 @default.
- W2933350524 hasAuthorship W2933350524A5052610760 @default.
- W2933350524 hasConcept C10010492 @default.