Matches in SemOpenAlex for { <https://semopenalex.org/work/W2348256457> ?p ?o ?g. }
- W2348256457 abstract "Abstract Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labour-intensive manual curation process. Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences (“mislabels”) using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity / 91.7% precision) as well as correction (94.9% sensitivity / 89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria. SATIVA is freely available at https://github.com/amkozlov/sativa ." @default.
- W2348256457 created "2016-06-24" @default.
- W2348256457 creator A5013412405 @default.
- W2348256457 creator A5025640090 @default.
- W2348256457 creator A5026199574 @default.
- W2348256457 creator A5084095365 @default.
- W2348256457 creator A5091470163 @default.
- W2348256457 date "2016-03-04" @default.
- W2348256457 modified "2023-10-02" @default.
- W2348256457 title "Phylogeny-aware Identification and Correction of Taxonomically Mislabeled Sequences" @default.
- W2348256457 cites W1516507208 @default.
- W2348256457 cites W1974592771 @default.
- W2348256457 cites W1993038297 @default.
- W2348256457 cites W1995946872 @default.
- W2348256457 cites W2001088260 @default.
- W2348256457 cites W2020302957 @default.
- W2348256457 cites W2034285706 @default.
- W2348256457 cites W2051161693 @default.
- W2348256457 cites W2061177996 @default.
- W2348256457 cites W2064218103 @default.
- W2348256457 cites W2068187483 @default.
- W2348256457 cites W2068687524 @default.
- W2348256457 cites W2072970694 @default.
- W2348256457 cites W2075623226 @default.
- W2348256457 cites W2077991901 @default.
- W2348256457 cites W2083208025 @default.
- W2348256457 cites W2088209062 @default.
- W2348256457 cites W2089620180 @default.
- W2348256457 cites W2096151547 @default.
- W2348256457 cites W2108718991 @default.
- W2348256457 cites W2109296318 @default.
- W2348256457 cites W2114392707 @default.
- W2348256457 cites W2124141867 @default.
- W2348256457 cites W2124351063 @default.
- W2348256457 cites W2136879569 @default.
- W2348256457 cites W2141052558 @default.
- W2348256457 cites W2150605939 @default.
- W2348256457 cites W2151350595 @default.
- W2348256457 cites W2151372451 @default.
- W2348256457 cites W2152751713 @default.
- W2348256457 cites W2152885278 @default.
- W2348256457 cites W2154026962 @default.
- W2348256457 cites W2154071138 @default.
- W2348256457 cites W2155806125 @default.
- W2348256457 cites W2157552136 @default.
- W2348256457 cites W2160947363 @default.
- W2348256457 cites W2161777741 @default.
- W2348256457 cites W2166865790 @default.
- W2348256457 cites W2168696662 @default.
- W2348256457 cites W2313542430 @default.
- W2348256457 doi "https://doi.org/10.1101/042200" @default.
- W2348256457 hasPublicationYear "2016" @default.
- W2348256457 type Work @default.
- W2348256457 sameAs 2348256457 @default.
- W2348256457 citedByCount "0" @default.
- W2348256457 crossrefType "posted-content" @default.
- W2348256457 hasAuthorship W2348256457A5013412405 @default.
- W2348256457 hasAuthorship W2348256457A5025640090 @default.
- W2348256457 hasAuthorship W2348256457A5026199574 @default.
- W2348256457 hasAuthorship W2348256457A5084095365 @default.
- W2348256457 hasAuthorship W2348256457A5091470163 @default.
- W2348256457 hasBestOaLocation W23482564571 @default.
- W2348256457 hasConcept C104317684 @default.
- W2348256457 hasConcept C116834253 @default.
- W2348256457 hasConcept C119857082 @default.
- W2348256457 hasConcept C124101348 @default.
- W2348256457 hasConcept C154945302 @default.
- W2348256457 hasConcept C18903297 @default.
- W2348256457 hasConcept C189592816 @default.
- W2348256457 hasConcept C193252679 @default.
- W2348256457 hasConcept C204321447 @default.
- W2348256457 hasConcept C23123220 @default.
- W2348256457 hasConcept C2776321320 @default.
- W2348256457 hasConcept C2778112365 @default.
- W2348256457 hasConcept C41008148 @default.
- W2348256457 hasConcept C54355233 @default.
- W2348256457 hasConcept C58642233 @default.
- W2348256457 hasConcept C71640776 @default.
- W2348256457 hasConcept C86803240 @default.
- W2348256457 hasConcept C90132467 @default.
- W2348256457 hasConceptScore W2348256457C104317684 @default.
- W2348256457 hasConceptScore W2348256457C116834253 @default.
- W2348256457 hasConceptScore W2348256457C119857082 @default.
- W2348256457 hasConceptScore W2348256457C124101348 @default.
- W2348256457 hasConceptScore W2348256457C154945302 @default.
- W2348256457 hasConceptScore W2348256457C18903297 @default.
- W2348256457 hasConceptScore W2348256457C189592816 @default.
- W2348256457 hasConceptScore W2348256457C193252679 @default.
- W2348256457 hasConceptScore W2348256457C204321447 @default.
- W2348256457 hasConceptScore W2348256457C23123220 @default.
- W2348256457 hasConceptScore W2348256457C2776321320 @default.
- W2348256457 hasConceptScore W2348256457C2778112365 @default.
- W2348256457 hasConceptScore W2348256457C41008148 @default.
- W2348256457 hasConceptScore W2348256457C54355233 @default.
- W2348256457 hasConceptScore W2348256457C58642233 @default.
- W2348256457 hasConceptScore W2348256457C71640776 @default.
- W2348256457 hasConceptScore W2348256457C86803240 @default.
- W2348256457 hasConceptScore W2348256457C90132467 @default.
- W2348256457 hasLocation W23482564571 @default.
- W2348256457 hasLocation W23482564572 @default.