Matches in SemOpenAlex for { <https://semopenalex.org/work/W3195367207> ?p ?o ?g. }
- W3195367207 endingPage "51" @default.
- W3195367207 startingPage "44" @default.
- W3195367207 abstract "Accurate automatic annotation of protein function relies on both innovative models and robust datasets. Due to their importance in biological processes, the identification of DNA-binding proteins directly from protein sequence has been the focus of many studies. However, the datasets used to train and evaluate these methods have suffered from substantial flaws. We describe some of the weaknesses of the datasets used in previous DNA-binding protein literature and provide several new datasets addressing these problems. We suggest new evaluative benchmark tasks that more realistically assess real-world performance for protein annotation models. We propose a simple new model for the prediction of DNA-binding proteins and compare its performance on the improved datasets to two previously published models. In addition, we provide extensive tests showing how the best models predict across taxa.Our new gradient boosting model, which uses features derived from a published protein language model, outperforms the earlier models. Perhaps surprisingly, so does a baseline nearest neighbor model using BLAST percent identity. We evaluate the sensitivity of these models to perturbations of DNA-binding regions and control regions of protein sequences. The successful data-driven models learn to focus on DNA-binding regions. When predicting across taxa, the best models are highly accurate across species in the same kingdom and can provide some information when predicting across kingdoms.The data and results for this article can be found at https://doi.org/10.5281/zenodo.5153906. The code for this article can be found at https://doi.org/10.5281/zenodo.5153683. The code, data and results can also be found at https://github.com/AZaitzeff/tools_for_dna_binding_proteins." @default.
- W3195367207 created "2021-08-30" @default.
- W3195367207 creator A5014650825 @default.
- W3195367207 creator A5019205480 @default.
- W3195367207 creator A5038199555 @default.
- W3195367207 creator A5052322423 @default.
- W3195367207 creator A5065188919 @default.
- W3195367207 date "2021-08-20" @default.
- W3195367207 modified "2023-09-26" @default.
- W3195367207 title "Improved datasets and evaluation methods for the automatic prediction of DNA-binding proteins" @default.
- W3195367207 cites W1490286156 @default.
- W3195367207 cites W1659313070 @default.
- W3195367207 cites W1892469892 @default.
- W3195367207 cites W1992116297 @default.
- W3195367207 cites W2003893812 @default.
- W3195367207 cites W2010688088 @default.
- W3195367207 cites W2055043387 @default.
- W3195367207 cites W2070980386 @default.
- W3195367207 cites W2083563185 @default.
- W3195367207 cites W2102551551 @default.
- W3195367207 cites W2103017472 @default.
- W3195367207 cites W2144939151 @default.
- W3195367207 cites W2149851123 @default.
- W3195367207 cites W2173801226 @default.
- W3195367207 cites W2322691988 @default.
- W3195367207 cites W2337731955 @default.
- W3195367207 cites W2470414691 @default.
- W3195367207 cites W2557229948 @default.
- W3195367207 cites W2557383173 @default.
- W3195367207 cites W2748005921 @default.
- W3195367207 cites W2757522837 @default.
- W3195367207 cites W2766430481 @default.
- W3195367207 cites W2769306988 @default.
- W3195367207 cites W2771169143 @default.
- W3195367207 cites W2777094228 @default.
- W3195367207 cites W2804549231 @default.
- W3195367207 cites W2883467144 @default.
- W3195367207 cites W2890911678 @default.
- W3195367207 cites W2895810213 @default.
- W3195367207 cites W2900353268 @default.
- W3195367207 cites W2946492269 @default.
- W3195367207 cites W2950595506 @default.
- W3195367207 cites W2955151077 @default.
- W3195367207 cites W2957436444 @default.
- W3195367207 cites W2984726926 @default.
- W3195367207 cites W3020418214 @default.
- W3195367207 doi "https://doi.org/10.1093/bioinformatics/btab603" @default.
- W3195367207 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/34415301" @default.
- W3195367207 hasPublicationYear "2021" @default.
- W3195367207 type Work @default.
- W3195367207 sameAs 3195367207 @default.
- W3195367207 citedByCount "3" @default.
- W3195367207 countsByYear W31953672072022 @default.
- W3195367207 countsByYear W31953672072023 @default.
- W3195367207 crossrefType "journal-article" @default.
- W3195367207 hasAuthorship W3195367207A5014650825 @default.
- W3195367207 hasAuthorship W3195367207A5019205480 @default.
- W3195367207 hasAuthorship W3195367207A5038199555 @default.
- W3195367207 hasAuthorship W3195367207A5052322423 @default.
- W3195367207 hasAuthorship W3195367207A5065188919 @default.
- W3195367207 hasBestOaLocation W31953672072 @default.
- W3195367207 hasConcept C104317684 @default.
- W3195367207 hasConcept C111919701 @default.
- W3195367207 hasConcept C116834253 @default.
- W3195367207 hasConcept C119857082 @default.
- W3195367207 hasConcept C120665830 @default.
- W3195367207 hasConcept C121332964 @default.
- W3195367207 hasConcept C124101348 @default.
- W3195367207 hasConcept C13280743 @default.
- W3195367207 hasConcept C154945302 @default.
- W3195367207 hasConcept C185798385 @default.
- W3195367207 hasConcept C192209626 @default.
- W3195367207 hasConcept C205649164 @default.
- W3195367207 hasConcept C207060522 @default.
- W3195367207 hasConcept C2776321320 @default.
- W3195367207 hasConcept C2986374874 @default.
- W3195367207 hasConcept C41008148 @default.
- W3195367207 hasConcept C43126263 @default.
- W3195367207 hasConcept C46686674 @default.
- W3195367207 hasConcept C54355233 @default.
- W3195367207 hasConcept C59822182 @default.
- W3195367207 hasConcept C86803240 @default.
- W3195367207 hasConceptScore W3195367207C104317684 @default.
- W3195367207 hasConceptScore W3195367207C111919701 @default.
- W3195367207 hasConceptScore W3195367207C116834253 @default.
- W3195367207 hasConceptScore W3195367207C119857082 @default.
- W3195367207 hasConceptScore W3195367207C120665830 @default.
- W3195367207 hasConceptScore W3195367207C121332964 @default.
- W3195367207 hasConceptScore W3195367207C124101348 @default.
- W3195367207 hasConceptScore W3195367207C13280743 @default.
- W3195367207 hasConceptScore W3195367207C154945302 @default.
- W3195367207 hasConceptScore W3195367207C185798385 @default.
- W3195367207 hasConceptScore W3195367207C192209626 @default.
- W3195367207 hasConceptScore W3195367207C205649164 @default.
- W3195367207 hasConceptScore W3195367207C207060522 @default.
- W3195367207 hasConceptScore W3195367207C2776321320 @default.
- W3195367207 hasConceptScore W3195367207C2986374874 @default.
- W3195367207 hasConceptScore W3195367207C41008148 @default.