Matches in SemOpenAlex for { <https://semopenalex.org/work/W4200364103> ?p ?o ?g. }
- W4200364103 abstract "Abstract Sequencing data is rapidly accumulating in public repositories. Making this resource accessible for interactive analysis at scale requires efficient approaches for its storage and indexing. There have recently been remarkable advances in building compressed representations of annotated (or colored ) de Bruijn graphs for efficiently indexing k-mer sets. However, approaches for representing quantitative attributes such as gene expression or genome positions in a general manner have remained underexplored. In this work, we propose Counting de Bruijn graphs (Counting DBGs), a notion generalizing annotated de Bruijn graphs by supplementing each node-label relation with one or many attributes (e.g., a k-mer count or its positions). Counting DBGs index k-mer abundances from 2,652 human RNA-Seq samples in over 8-fold smaller representations compared to state-of-the-art bioinformatics tools and yet faster to construct and query. Furthermore, Counting DBGs with positional annotations losslessly represent entire reads in indexes on average 27% smaller than the input compressed with gzip for human Illumina RNA-Seq and 57% smaller for PacBio HiFi sequencing of viral samples. A complete searchable index of all viral PacBio SMRT reads from NCBI’s SRA (152,884 samples, 875 Gbp) comprises only 178 GB. Finally, on the full RefSeq collection, we generate a lossless and fully queryable index that is 4.4-fold smaller than the MegaBLAST index. The techniques proposed in this work naturally complement existing methods and tools employing de Bruijn graphs and significantly broaden their applicability: from indexing k-mer counts and genome positions to implementing novel sequence alignment algorithms on top of highly compressed graph-based sequence indexes." @default.
- W4200364103 created "2021-12-31" @default.
- W4200364103 creator A5033347097 @default.
- W4200364103 creator A5035416263 @default.
- W4200364103 creator A5070168682 @default.
- W4200364103 creator A5085609478 @default.
- W4200364103 date "2021-11-11" @default.
- W4200364103 modified "2023-10-01" @default.
- W4200364103 title "Lossless Indexing with Counting de Bruijn Graphs" @default.
- W4200364103 cites W2010361633 @default.
- W4200364103 cites W2104846587 @default.
- W4200364103 cites W2105656684 @default.
- W4200364103 cites W2116041602 @default.
- W4200364103 cites W2116258248 @default.
- W4200364103 cites W2128964206 @default.
- W4200364103 cites W2141978199 @default.
- W4200364103 cites W2157539385 @default.
- W4200364103 cites W2157714561 @default.
- W4200364103 cites W2167708455 @default.
- W4200364103 cites W2474973645 @default.
- W4200364103 cites W2583363792 @default.
- W4200364103 cites W2606715885 @default.
- W4200364103 cites W2759261668 @default.
- W4200364103 cites W2786984148 @default.
- W4200364103 cites W2789843538 @default.
- W4200364103 cites W2809649683 @default.
- W4200364103 cites W2884435343 @default.
- W4200364103 cites W2888300707 @default.
- W4200364103 cites W2905575949 @default.
- W4200364103 cites W2913847081 @default.
- W4200364103 cites W2950150251 @default.
- W4200364103 cites W2952379095 @default.
- W4200364103 cites W2953263404 @default.
- W4200364103 cites W2979522844 @default.
- W4200364103 cites W2998284004 @default.
- W4200364103 cites W3011785159 @default.
- W4200364103 cites W3042305844 @default.
- W4200364103 cites W3043153155 @default.
- W4200364103 cites W3046420150 @default.
- W4200364103 cites W3088400220 @default.
- W4200364103 cites W3090493712 @default.
- W4200364103 cites W3099878876 @default.
- W4200364103 cites W3113074526 @default.
- W4200364103 cites W3118443291 @default.
- W4200364103 cites W3126228859 @default.
- W4200364103 cites W3168724350 @default.
- W4200364103 cites W3176037658 @default.
- W4200364103 cites W3178792299 @default.
- W4200364103 doi "https://doi.org/10.1101/2021.11.09.467907" @default.
- W4200364103 hasPublicationYear "2021" @default.
- W4200364103 type Work @default.
- W4200364103 citedByCount "3" @default.
- W4200364103 countsByYear W42003641032022 @default.
- W4200364103 countsByYear W42003641032023 @default.
- W4200364103 crossrefType "posted-content" @default.
- W4200364103 hasAuthorship W4200364103A5033347097 @default.
- W4200364103 hasAuthorship W4200364103A5035416263 @default.
- W4200364103 hasAuthorship W4200364103A5070168682 @default.
- W4200364103 hasAuthorship W4200364103A5085609478 @default.
- W4200364103 hasBestOaLocation W42003641031 @default.
- W4200364103 hasConcept C104317684 @default.
- W4200364103 hasConcept C11413529 @default.
- W4200364103 hasConcept C114614502 @default.
- W4200364103 hasConcept C124101348 @default.
- W4200364103 hasConcept C136764020 @default.
- W4200364103 hasConcept C141231307 @default.
- W4200364103 hasConcept C151810110 @default.
- W4200364103 hasConcept C170320093 @default.
- W4200364103 hasConcept C20218877 @default.
- W4200364103 hasConcept C2279292 @default.
- W4200364103 hasConcept C23123220 @default.
- W4200364103 hasConcept C2777382242 @default.
- W4200364103 hasConcept C33923547 @default.
- W4200364103 hasConcept C41008148 @default.
- W4200364103 hasConcept C54355233 @default.
- W4200364103 hasConcept C70721500 @default.
- W4200364103 hasConcept C75165309 @default.
- W4200364103 hasConcept C78548338 @default.
- W4200364103 hasConcept C80444323 @default.
- W4200364103 hasConcept C81081738 @default.
- W4200364103 hasConcept C86803240 @default.
- W4200364103 hasConceptScore W4200364103C104317684 @default.
- W4200364103 hasConceptScore W4200364103C11413529 @default.
- W4200364103 hasConceptScore W4200364103C114614502 @default.
- W4200364103 hasConceptScore W4200364103C124101348 @default.
- W4200364103 hasConceptScore W4200364103C136764020 @default.
- W4200364103 hasConceptScore W4200364103C141231307 @default.
- W4200364103 hasConceptScore W4200364103C151810110 @default.
- W4200364103 hasConceptScore W4200364103C170320093 @default.
- W4200364103 hasConceptScore W4200364103C20218877 @default.
- W4200364103 hasConceptScore W4200364103C2279292 @default.
- W4200364103 hasConceptScore W4200364103C23123220 @default.
- W4200364103 hasConceptScore W4200364103C2777382242 @default.
- W4200364103 hasConceptScore W4200364103C33923547 @default.
- W4200364103 hasConceptScore W4200364103C41008148 @default.
- W4200364103 hasConceptScore W4200364103C54355233 @default.
- W4200364103 hasConceptScore W4200364103C70721500 @default.
- W4200364103 hasConceptScore W4200364103C75165309 @default.
- W4200364103 hasConceptScore W4200364103C78548338 @default.
- W4200364103 hasConceptScore W4200364103C80444323 @default.