Matches in SemOpenAlex for { <https://semopenalex.org/work/W2570372911> ?p ?o ?g. }
- W2570372911 endingPage "1330" @default.
- W2570372911 startingPage "1324" @default.
- W2570372911 abstract "Abstract Motivation Many bioinformatics algorithms are designed for the analysis of sequences of some uniform length, conventionally referred to as k-mers. These include de Bruijn graph assembly methods and sequence alignment tools. An efficient algorithm to enumerate the number of unique k-mers, or even better, to build a histogram of k-mer frequencies would be desirable for these tools and their downstream analysis pipelines. Among other applications, estimated frequencies can be used to predict genome sizes, measure sequencing error rates, and tune runtime parameters for analysis tools. However, calculating a k-mer histogram from large volumes of sequencing data is a challenging task. Results Here, we present ntCard, a streaming algorithm for estimating the frequencies of k-mers in genomics datasets. At its core, ntCard uses the ntHash algorithm to efficiently compute hash values for streamed sequences. It then samples the calculated hash values to build a reduced representation multiplicity table describing the sample distribution. Finally, it uses a statistical model to reconstruct the population distribution from the sample distribution. We have compared the performance of ntCard and other cardinality estimation algorithms. We used three datasets of 480 GB, 500 GB and 2.4 TB in size, where the first two representing whole genome shotgun sequencing experiments on the human genome and the last one on the white spruce genome. Results show ntCard estimates k-mer coverage frequencies >15× faster than the state-of-the-art algorithms, using similar amount of memory, and with higher accuracy rates. Thus, our benchmarks demonstrate ntCard as a potentially enabling technology for large-scale genomics applications. Availability and Implementation ntCard is written in C ++ and is released under the GPL license. It is freely available at https://github.com/bcgsc/ntCard. Supplementary information Supplementary data are available at Bioinformatics online." @default.
- W2570372911 created "2017-01-13" @default.
- W2570372911 creator A5017539699 @default.
- W2570372911 creator A5037360052 @default.
- W2570372911 creator A5056009314 @default.
- W2570372911 date "2017-01-05" @default.
- W2570372911 modified "2023-10-12" @default.
- W2570372911 title "ntCard: a streaming algorithm for cardinality estimation in genomics data" @default.
- W2570372911 cites W1966822396 @default.
- W2570372911 cites W2025051251 @default.
- W2570372911 cites W2057253402 @default.
- W2570372911 cites W2065128082 @default.
- W2570372911 cites W2080234606 @default.
- W2570372911 cites W2080745194 @default.
- W2570372911 cites W2096128575 @default.
- W2570372911 cites W2104677379 @default.
- W2570372911 cites W2121530737 @default.
- W2570372911 cites W2125266506 @default.
- W2570372911 cites W2127768708 @default.
- W2570372911 cites W2132926880 @default.
- W2570372911 cites W2133531097 @default.
- W2570372911 cites W2142749416 @default.
- W2570372911 cites W2148425737 @default.
- W2570372911 cites W2160265768 @default.
- W2570372911 cites W2160969485 @default.
- W2570372911 cites W2161546116 @default.
- W2570372911 cites W2163584430 @default.
- W2570372911 cites W2168546919 @default.
- W2570372911 cites W2168645015 @default.
- W2570372911 cites W2422772473 @default.
- W2570372911 cites W2463091895 @default.
- W2570372911 cites W2515342656 @default.
- W2570372911 cites W2952870794 @default.
- W2570372911 cites W2952932047 @default.
- W2570372911 doi "https://doi.org/10.1093/bioinformatics/btw832" @default.
- W2570372911 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/5408799" @default.
- W2570372911 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/28453674" @default.
- W2570372911 hasPublicationYear "2017" @default.
- W2570372911 type Work @default.
- W2570372911 sameAs 2570372911 @default.
- W2570372911 citedByCount "42" @default.
- W2570372911 countsByYear W25703729112017 @default.
- W2570372911 countsByYear W25703729112018 @default.
- W2570372911 countsByYear W25703729112019 @default.
- W2570372911 countsByYear W25703729112020 @default.
- W2570372911 countsByYear W25703729112021 @default.
- W2570372911 countsByYear W25703729112022 @default.
- W2570372911 countsByYear W25703729112023 @default.
- W2570372911 crossrefType "journal-article" @default.
- W2570372911 hasAuthorship W2570372911A5017539699 @default.
- W2570372911 hasAuthorship W2570372911A5037360052 @default.
- W2570372911 hasAuthorship W2570372911A5056009314 @default.
- W2570372911 hasBestOaLocation W25703729111 @default.
- W2570372911 hasConcept C104317684 @default.
- W2570372911 hasConcept C111919701 @default.
- W2570372911 hasConcept C11413529 @default.
- W2570372911 hasConcept C115961682 @default.
- W2570372911 hasConcept C124101348 @default.
- W2570372911 hasConcept C141231307 @default.
- W2570372911 hasConcept C144024400 @default.
- W2570372911 hasConcept C149923435 @default.
- W2570372911 hasConcept C150194340 @default.
- W2570372911 hasConcept C154945302 @default.
- W2570372911 hasConcept C162317418 @default.
- W2570372911 hasConcept C189206191 @default.
- W2570372911 hasConcept C18949551 @default.
- W2570372911 hasConcept C2279292 @default.
- W2570372911 hasConcept C2908647359 @default.
- W2570372911 hasConcept C38652104 @default.
- W2570372911 hasConcept C41008148 @default.
- W2570372911 hasConcept C53533937 @default.
- W2570372911 hasConcept C55493867 @default.
- W2570372911 hasConcept C67388219 @default.
- W2570372911 hasConcept C74912251 @default.
- W2570372911 hasConcept C86803240 @default.
- W2570372911 hasConcept C87117476 @default.
- W2570372911 hasConcept C99138194 @default.
- W2570372911 hasConceptScore W2570372911C104317684 @default.
- W2570372911 hasConceptScore W2570372911C111919701 @default.
- W2570372911 hasConceptScore W2570372911C11413529 @default.
- W2570372911 hasConceptScore W2570372911C115961682 @default.
- W2570372911 hasConceptScore W2570372911C124101348 @default.
- W2570372911 hasConceptScore W2570372911C141231307 @default.
- W2570372911 hasConceptScore W2570372911C144024400 @default.
- W2570372911 hasConceptScore W2570372911C149923435 @default.
- W2570372911 hasConceptScore W2570372911C150194340 @default.
- W2570372911 hasConceptScore W2570372911C154945302 @default.
- W2570372911 hasConceptScore W2570372911C162317418 @default.
- W2570372911 hasConceptScore W2570372911C189206191 @default.
- W2570372911 hasConceptScore W2570372911C18949551 @default.
- W2570372911 hasConceptScore W2570372911C2279292 @default.
- W2570372911 hasConceptScore W2570372911C2908647359 @default.
- W2570372911 hasConceptScore W2570372911C38652104 @default.
- W2570372911 hasConceptScore W2570372911C41008148 @default.
- W2570372911 hasConceptScore W2570372911C53533937 @default.
- W2570372911 hasConceptScore W2570372911C55493867 @default.
- W2570372911 hasConceptScore W2570372911C67388219 @default.
- W2570372911 hasConceptScore W2570372911C74912251 @default.