Matches in SemOpenAlex for { <https://semopenalex.org/work/W2890895743> ?p ?o ?g. }
- W2890895743 abstract "Abstract Motivation The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research; allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for the compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching, and classification of microbiome samples in near real-time. Results We apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed ‘histosketch’ that can be used to efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we show that histosketches can be used to train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a Random Forest Classifier that could accurately predict whether the neonate had received antibiotic treatment (95% accuracy, precision 97%) and could subsequently be used to classify microbiome data streams in less than 12 seconds. We provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2GB microbiome in 50 seconds on a standard laptop using 4 cores, with the sketch occupying 3000 bytes of disk space. Availability Our implementation (HULK) is written in Go and is available at: https://github.com/will-rowe/hulk (MIT License)" @default.
- W2890895743 created "2018-09-27" @default.
- W2890895743 creator A5012480083 @default.
- W2890895743 creator A5031731892 @default.
- W2890895743 creator A5037456386 @default.
- W2890895743 creator A5043161270 @default.
- W2890895743 creator A5046594988 @default.
- W2890895743 creator A5046811852 @default.
- W2890895743 creator A5048928273 @default.
- W2890895743 creator A5063471667 @default.
- W2890895743 creator A5078257079 @default.
- W2890895743 creator A5084050064 @default.
- W2890895743 date "2018-09-04" @default.
- W2890895743 modified "2023-09-26" @default.
- W2890895743 title "Streaming histogram sketching for rapid microbiome analytics" @default.
- W2890895743 cites W1505191356 @default.
- W2890895743 cites W1965092590 @default.
- W2890895743 cites W2093830129 @default.
- W2890895743 cites W2096986408 @default.
- W2890895743 cites W2102813808 @default.
- W2890895743 cites W2105024467 @default.
- W2890895743 cites W2115546424 @default.
- W2890895743 cites W2126907894 @default.
- W2890895743 cites W2128769815 @default.
- W2890895743 cites W2132139217 @default.
- W2890895743 cites W2140466960 @default.
- W2890895743 cites W2148781362 @default.
- W2890895743 cites W2254734290 @default.
- W2890895743 cites W2339602899 @default.
- W2890895743 cites W2519890620 @default.
- W2890895743 cites W2604248105 @default.
- W2890895743 cites W2610977365 @default.
- W2890895743 cites W2750898971 @default.
- W2890895743 cites W2765571384 @default.
- W2890895743 cites W2765745983 @default.
- W2890895743 cites W2766358903 @default.
- W2890895743 cites W2772632044 @default.
- W2890895743 cites W2773939681 @default.
- W2890895743 cites W2789347344 @default.
- W2890895743 cites W2791093166 @default.
- W2890895743 cites W2794262303 @default.
- W2890895743 cites W2800234325 @default.
- W2890895743 cites W2950150251 @default.
- W2890895743 cites W4249759701 @default.
- W2890895743 cites W905687799 @default.
- W2890895743 doi "https://doi.org/10.1101/408070" @default.
- W2890895743 hasPublicationYear "2018" @default.
- W2890895743 type Work @default.
- W2890895743 sameAs 2890895743 @default.
- W2890895743 citedByCount "0" @default.
- W2890895743 crossrefType "posted-content" @default.
- W2890895743 hasAuthorship W2890895743A5012480083 @default.
- W2890895743 hasAuthorship W2890895743A5031731892 @default.
- W2890895743 hasAuthorship W2890895743A5037456386 @default.
- W2890895743 hasAuthorship W2890895743A5043161270 @default.
- W2890895743 hasAuthorship W2890895743A5046594988 @default.
- W2890895743 hasAuthorship W2890895743A5046811852 @default.
- W2890895743 hasAuthorship W2890895743A5048928273 @default.
- W2890895743 hasAuthorship W2890895743A5063471667 @default.
- W2890895743 hasAuthorship W2890895743A5078257079 @default.
- W2890895743 hasAuthorship W2890895743A5084050064 @default.
- W2890895743 hasBestOaLocation W28908957431 @default.
- W2890895743 hasConcept C104317684 @default.
- W2890895743 hasConcept C115961682 @default.
- W2890895743 hasConcept C119857082 @default.
- W2890895743 hasConcept C124101348 @default.
- W2890895743 hasConcept C143121216 @default.
- W2890895743 hasConcept C15151743 @default.
- W2890895743 hasConcept C153180895 @default.
- W2890895743 hasConcept C154945302 @default.
- W2890895743 hasConcept C169258074 @default.
- W2890895743 hasConcept C41008148 @default.
- W2890895743 hasConcept C48044578 @default.
- W2890895743 hasConcept C53533937 @default.
- W2890895743 hasConcept C55493867 @default.
- W2890895743 hasConcept C60644358 @default.
- W2890895743 hasConcept C70721500 @default.
- W2890895743 hasConcept C77088390 @default.
- W2890895743 hasConcept C86803240 @default.
- W2890895743 hasConceptScore W2890895743C104317684 @default.
- W2890895743 hasConceptScore W2890895743C115961682 @default.
- W2890895743 hasConceptScore W2890895743C119857082 @default.
- W2890895743 hasConceptScore W2890895743C124101348 @default.
- W2890895743 hasConceptScore W2890895743C143121216 @default.
- W2890895743 hasConceptScore W2890895743C15151743 @default.
- W2890895743 hasConceptScore W2890895743C153180895 @default.
- W2890895743 hasConceptScore W2890895743C154945302 @default.
- W2890895743 hasConceptScore W2890895743C169258074 @default.
- W2890895743 hasConceptScore W2890895743C41008148 @default.
- W2890895743 hasConceptScore W2890895743C48044578 @default.
- W2890895743 hasConceptScore W2890895743C53533937 @default.
- W2890895743 hasConceptScore W2890895743C55493867 @default.
- W2890895743 hasConceptScore W2890895743C60644358 @default.
- W2890895743 hasConceptScore W2890895743C70721500 @default.
- W2890895743 hasConceptScore W2890895743C77088390 @default.
- W2890895743 hasConceptScore W2890895743C86803240 @default.
- W2890895743 hasLocation W28908957431 @default.
- W2890895743 hasLocation W28908957432 @default.
- W2890895743 hasLocation W28908957433 @default.
- W2890895743 hasLocation W28908957434 @default.