Matches in SemOpenAlex for { <https://semopenalex.org/work/W2890129985> ?p ?o ?g. }
- W2890129985 abstract "The ICTV develops, refines and maintains a universal virus taxonomy; Order is the highest taxon in the branching hierarchy of recognised viral taxa. Historically, ICTV (sub)committees have classified viruses on the basis of morphological characteristics and various other attributes. Today, virtually all new viral genomes are assembled from metagenomic datasets and are not linked directly to biological agents. Thus, placing a virus into a taxonomic scheme solely from primary genome structure is an increasingly important problem. Various simple descriptive statistics of a viral genome sequence have been used successfully for virus classification. Here, we use the NCBI's viral and viroid reference sequence collection (RefSeq) and a common experimental framework to compare the performance of different genome sequence-derived features and classifiers in the task of assigning a virus to one of seven ICTV Orders. The nucleotide-, word-, and compression-based features we consider include genome length, the k-mer Natural Vector (k = 1, ..., 6) and its derivatives, return time distribution, and general-purpose and DNA-specific compression ratios; the classifiers used are the k-NN and SVM. The combination of genome length and k-NN has the worst, yet still respectable, performance (mean error rate of 0.137); the best performance is achieved using 4-mer counts and SVM (mean error rate of 0.006). We investigate the main causes of misclassification, explore which viruses are more difficult to classify, and use the best performing combination to predict the Orders of 1,834 unclassified viruses. A subsequent version of RefSeq assigned Orders to 17 of these previously unlabelled viruses. Since 16 of our predictions match these assignments, our approach could aid virologists dealing with viruses that are known only from sequence data." @default.
- W2890129985 created "2018-09-27" @default.
- W2890129985 creator A5000488844 @default.
- W2890129985 creator A5013104996 @default.
- W2890129985 creator A5047532120 @default.
- W2890129985 date "2018-09-11" @default.
- W2890129985 modified "2023-09-27" @default.
- W2890129985 title "Virus genome sequence classification using features based on nucleotides, words and compression" @default.
- W2890129985 cites W1501531009 @default.
- W2890129985 cites W1563088657 @default.
- W2890129985 cites W18298531 @default.
- W2890129985 cites W1931027898 @default.
- W2890129985 cites W1971876518 @default.
- W2890129985 cites W1971985263 @default.
- W2890129985 cites W1985258161 @default.
- W2890129985 cites W1985657529 @default.
- W2890129985 cites W1990400309 @default.
- W2890129985 cites W1999866852 @default.
- W2890129985 cites W2002638840 @default.
- W2890129985 cites W2006737337 @default.
- W2890129985 cites W2013609071 @default.
- W2890129985 cites W2018140810 @default.
- W2890129985 cites W2060108852 @default.
- W2890129985 cites W2068881314 @default.
- W2890129985 cites W2076184339 @default.
- W2890129985 cites W2081447201 @default.
- W2890129985 cites W2087249882 @default.
- W2890129985 cites W2094890728 @default.
- W2890129985 cites W2100233488 @default.
- W2890129985 cites W2107745473 @default.
- W2890129985 cites W2112072385 @default.
- W2890129985 cites W2118978333 @default.
- W2890129985 cites W2119821739 @default.
- W2890129985 cites W2123845384 @default.
- W2890129985 cites W2125598538 @default.
- W2890129985 cites W2127774996 @default.
- W2890129985 cites W2132926880 @default.
- W2890129985 cites W2137015675 @default.
- W2890129985 cites W2143142146 @default.
- W2890129985 cites W2148603752 @default.
- W2890129985 cites W2148848603 @default.
- W2890129985 cites W2153635508 @default.
- W2890129985 cites W2157883091 @default.
- W2890129985 cites W2158678815 @default.
- W2890129985 cites W2161488606 @default.
- W2890129985 cites W2161984693 @default.
- W2890129985 cites W2171963266 @default.
- W2890129985 cites W2187089797 @default.
- W2890129985 cites W2259096502 @default.
- W2890129985 cites W2295653214 @default.
- W2890129985 cites W2318680981 @default.
- W2890129985 cites W2338384026 @default.
- W2890129985 cites W2508498860 @default.
- W2890129985 cites W2518183482 @default.
- W2890129985 cites W2531091319 @default.
- W2890129985 cites W2562468253 @default.
- W2890129985 cites W2777007467 @default.
- W2890129985 cites W2791796577 @default.
- W2890129985 cites W2890113837 @default.
- W2890129985 cites W2950577311 @default.
- W2890129985 cites W3088019604 @default.
- W2890129985 cites W806664756 @default.
- W2890129985 cites W2586417385 @default.
- W2890129985 hasPublicationYear "2018" @default.
- W2890129985 type Work @default.
- W2890129985 sameAs 2890129985 @default.
- W2890129985 citedByCount "2" @default.
- W2890129985 countsByYear W28901299852018 @default.
- W2890129985 countsByYear W28901299852019 @default.
- W2890129985 crossrefType "posted-content" @default.
- W2890129985 hasAuthorship W2890129985A5000488844 @default.
- W2890129985 hasAuthorship W2890129985A5013104996 @default.
- W2890129985 hasAuthorship W2890129985A5047532120 @default.
- W2890129985 hasConcept C104317684 @default.
- W2890129985 hasConcept C141231307 @default.
- W2890129985 hasConcept C151810110 @default.
- W2890129985 hasConcept C192953774 @default.
- W2890129985 hasConcept C20850961 @default.
- W2890129985 hasConcept C41008148 @default.
- W2890129985 hasConcept C54355233 @default.
- W2890129985 hasConcept C70721500 @default.
- W2890129985 hasConcept C86803240 @default.
- W2890129985 hasConceptScore W2890129985C104317684 @default.
- W2890129985 hasConceptScore W2890129985C141231307 @default.
- W2890129985 hasConceptScore W2890129985C151810110 @default.
- W2890129985 hasConceptScore W2890129985C192953774 @default.
- W2890129985 hasConceptScore W2890129985C20850961 @default.
- W2890129985 hasConceptScore W2890129985C41008148 @default.
- W2890129985 hasConceptScore W2890129985C54355233 @default.
- W2890129985 hasConceptScore W2890129985C70721500 @default.
- W2890129985 hasConceptScore W2890129985C86803240 @default.
- W2890129985 hasLocation W28901299851 @default.
- W2890129985 hasOpenAccess W2890129985 @default.
- W2890129985 hasPrimaryLocation W28901299851 @default.
- W2890129985 hasRelatedWork W1546989640 @default.
- W2890129985 hasRelatedWork W2037158686 @default.
- W2890129985 hasRelatedWork W206232798 @default.
- W2890129985 hasRelatedWork W2081651733 @default.
- W2890129985 hasRelatedWork W2112072385 @default.
- W2890129985 hasRelatedWork W2158242638 @default.