Matches in SemOpenAlex for { <https://semopenalex.org/work/W3211570503> ?p ?o ?g. }
- W3211570503 abstract "Abstract Background In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of potentially many members of a community, it can be important to identify the exact species the read is from. In other settings, it is important to distinguish which reads are from the targeted sample and which are from potential contaminants. In both cases, identification of the correct source of a read enables further investigation of relevant reads, while minimizing wasted work. This task is particularly challenging for long reads, which can have a substantial error rate that obscures the origins of each read. Results Existing tools for the read classification problem are often alignment or index-based, but such methods can have large time and/or space overheads. In this work, we investigate the effectiveness of several sampling and sketching-based approaches for read classification. In these approaches, a chosen sampling or sketching algorithm is used to generate a reduced representation (a “screen”) of potential source genomes for a query readset before reads are streamed in and compared against this screen. Using a query read’s similarity to the elements of the screen, the methods predict the source of the read. Such an approach requires limited pre-processing, stores and works with only a subset of the input data, and is able to perform classification with a high degree of accuracy. Conclusions The sampling and sketching approaches investigated include uniform sampling, methods based on MinHash and its weighted and order variants, a minimizer-based technique, and a novel clustering-based sketching approach. We demonstrate the effectiveness of these techniques both in identifying the source microbial genomes for reads from a metagenomic long read sequencing experiment, and in distinguishing between long reads from organisms of interest and potential contaminant reads. We then compare these approaches to existing alignment, index and sketching-based tools for read classification, and demonstrate how such a method is a viable alternative for determining the source of query reads. Finally, we present a reference implementation of these approaches at https://github.com/arun96/sketching ." @default.
- W3211570503 created "2021-11-22" @default.
- W3211570503 creator A5058192453 @default.
- W3211570503 creator A5065014595 @default.
- W3211570503 date "2021-11-05" @default.
- W3211570503 modified "2023-09-27" @default.
- W3211570503 title "Sketching and sampling approaches for fast and accurate long read classification" @default.
- W3211570503 cites W1579534339 @default.
- W3211570503 cites W2003347102 @default.
- W3211570503 cites W2013887116 @default.
- W3211570503 cites W2037500862 @default.
- W3211570503 cites W2127175247 @default.
- W3211570503 cites W2144560237 @default.
- W3211570503 cites W2147783737 @default.
- W3211570503 cites W2159954944 @default.
- W3211570503 cites W2266239166 @default.
- W3211570503 cites W2525711135 @default.
- W3211570503 cites W2583363792 @default.
- W3211570503 cites W2621345043 @default.
- W3211570503 cites W2754579667 @default.
- W3211570503 cites W2758005814 @default.
- W3211570503 cites W2789843538 @default.
- W3211570503 cites W2795650864 @default.
- W3211570503 cites W2912990896 @default.
- W3211570503 cites W2937044610 @default.
- W3211570503 cites W2950150251 @default.
- W3211570503 cites W2950964375 @default.
- W3211570503 cites W2951160681 @default.
- W3211570503 cites W2951254987 @default.
- W3211570503 cites W2952609831 @default.
- W3211570503 cites W2953263404 @default.
- W3211570503 cites W2958388979 @default.
- W3211570503 cites W2961460076 @default.
- W3211570503 cites W2972805712 @default.
- W3211570503 cites W2987650093 @default.
- W3211570503 cites W2990618091 @default.
- W3211570503 cites W2992400060 @default.
- W3211570503 cites W3025757981 @default.
- W3211570503 cites W3097643970 @default.
- W3211570503 cites W3104209586 @default.
- W3211570503 cites W3108225425 @default.
- W3211570503 cites W3109735703 @default.
- W3211570503 cites W3163970933 @default.
- W3211570503 cites W3200242814 @default.
- W3211570503 cites W4225293672 @default.
- W3211570503 doi "https://doi.org/10.1101/2021.11.04.467374" @default.
- W3211570503 hasPublicationYear "2021" @default.
- W3211570503 type Work @default.
- W3211570503 sameAs 3211570503 @default.
- W3211570503 citedByCount "0" @default.
- W3211570503 crossrefType "posted-content" @default.
- W3211570503 hasAuthorship W3211570503A5058192453 @default.
- W3211570503 hasAuthorship W3211570503A5065014595 @default.
- W3211570503 hasBestOaLocation W32115705031 @default.
- W3211570503 hasConcept C106131492 @default.
- W3211570503 hasConcept C116834253 @default.
- W3211570503 hasConcept C119857082 @default.
- W3211570503 hasConcept C124101348 @default.
- W3211570503 hasConcept C140779682 @default.
- W3211570503 hasConcept C154945302 @default.
- W3211570503 hasConcept C162324750 @default.
- W3211570503 hasConcept C17744445 @default.
- W3211570503 hasConcept C185592680 @default.
- W3211570503 hasConcept C187736073 @default.
- W3211570503 hasConcept C198531522 @default.
- W3211570503 hasConcept C199539241 @default.
- W3211570503 hasConcept C23123220 @default.
- W3211570503 hasConcept C2776359362 @default.
- W3211570503 hasConcept C2780451532 @default.
- W3211570503 hasConcept C31972630 @default.
- W3211570503 hasConcept C41008148 @default.
- W3211570503 hasConcept C43617362 @default.
- W3211570503 hasConcept C59822182 @default.
- W3211570503 hasConcept C86803240 @default.
- W3211570503 hasConcept C94625758 @default.
- W3211570503 hasConceptScore W3211570503C106131492 @default.
- W3211570503 hasConceptScore W3211570503C116834253 @default.
- W3211570503 hasConceptScore W3211570503C119857082 @default.
- W3211570503 hasConceptScore W3211570503C124101348 @default.
- W3211570503 hasConceptScore W3211570503C140779682 @default.
- W3211570503 hasConceptScore W3211570503C154945302 @default.
- W3211570503 hasConceptScore W3211570503C162324750 @default.
- W3211570503 hasConceptScore W3211570503C17744445 @default.
- W3211570503 hasConceptScore W3211570503C185592680 @default.
- W3211570503 hasConceptScore W3211570503C187736073 @default.
- W3211570503 hasConceptScore W3211570503C198531522 @default.
- W3211570503 hasConceptScore W3211570503C199539241 @default.
- W3211570503 hasConceptScore W3211570503C23123220 @default.
- W3211570503 hasConceptScore W3211570503C2776359362 @default.
- W3211570503 hasConceptScore W3211570503C2780451532 @default.
- W3211570503 hasConceptScore W3211570503C31972630 @default.
- W3211570503 hasConceptScore W3211570503C41008148 @default.
- W3211570503 hasConceptScore W3211570503C43617362 @default.
- W3211570503 hasConceptScore W3211570503C59822182 @default.
- W3211570503 hasConceptScore W3211570503C86803240 @default.
- W3211570503 hasConceptScore W3211570503C94625758 @default.
- W3211570503 hasLocation W32115705031 @default.
- W3211570503 hasLocation W32115705032 @default.
- W3211570503 hasOpenAccess W3211570503 @default.
- W3211570503 hasPrimaryLocation W32115705031 @default.