Matches in SemOpenAlex for { <https://semopenalex.org/work/W2899544675> ?p ?o ?g. }
- W2899544675 endingPage "2074" @default.
- W2899544675 startingPage "2066" @default.
- W2899544675 abstract "Abstract Motivation Advanced high-throughput sequencing technologies have produced massive amount of reads data, and algorithms have been specially designed to contract the size of these datasets for efficient storage and transmission. Reordering reads with regard to their positions in de novo assembled contigs or in explicit reference sequences has been proven to be one of the most effective reads compression approach. As there is usually no good prior knowledge about the reference sequence, current focus is on the novel construction of de novo assembled contigs. Results We introduce a new de novo compression algorithm named minicom. This algorithm uses large k-minimizers to index the reads and subgroup those that have the same minimizer. Within each subgroup, a contig is constructed. Then some pairs of the contigs derived from the subgroups are merged into longer contigs according to a (w, k)-minimizer-indexed suffix–prefix overlap similarity between two contigs. This merging process is repeated after the longer contigs are formed until no pair of contigs can be merged. We compare the performance of minicom with two reference-based methods and four de novo methods on 18 datasets (13 RNA-seq datasets and 5 whole genome sequencing datasets). In the compression of single-end reads, minicom obtained the smallest file size for 22 of 34 cases with significant improvement. In the compression of paired-end reads, minicom achieved 20–80% compression gain over the best state-of-the-art algorithm. Our method also achieved a 10% size reduction of compressed files in comparison with the best algorithm under the reads-order preserving mode. These excellent performances are mainly attributed to the exploit of the redundancy of the repetitive substrings in the long contigs. Availability and implementation https://github.com/yuansliu/minicom Supplementary information Supplementary data are available at Bioinformatics online." @default.
- W2899544675 created "2018-11-16" @default.
- W2899544675 creator A5017861049 @default.
- W2899544675 creator A5018620403 @default.
- W2899544675 creator A5048391870 @default.
- W2899544675 creator A5049900685 @default.
- W2899544675 date "2018-11-08" @default.
- W2899544675 modified "2023-10-15" @default.
- W2899544675 title "Index suffix–prefix overlaps by (<i>w</i>, <i>k</i>)-minimizer to generate long contigs for reads compression" @default.
- W2899544675 cites W1548134937 @default.
- W2899544675 cites W1931027898 @default.
- W2899544675 cites W2042947822 @default.
- W2899544675 cites W2069066547 @default.
- W2899544675 cites W2085522429 @default.
- W2899544675 cites W2092880969 @default.
- W2899544675 cites W2101247207 @default.
- W2899544675 cites W2103154918 @default.
- W2899544675 cites W2109153336 @default.
- W2899544675 cites W2111044311 @default.
- W2899544675 cites W2125557405 @default.
- W2899544675 cites W2131106408 @default.
- W2899544675 cites W2137892587 @default.
- W2899544675 cites W2144560237 @default.
- W2899544675 cites W2153707226 @default.
- W2899544675 cites W2159683766 @default.
- W2899544675 cites W2159906372 @default.
- W2899544675 cites W2166588423 @default.
- W2899544675 cites W2167943254 @default.
- W2899544675 cites W2194172909 @default.
- W2899544675 cites W2323180197 @default.
- W2899544675 cites W2396849069 @default.
- W2899544675 cites W2466892528 @default.
- W2899544675 cites W2538355508 @default.
- W2899544675 cites W2635555833 @default.
- W2899544675 cites W2764165465 @default.
- W2899544675 cites W2800253090 @default.
- W2899544675 cites W2949279665 @default.
- W2899544675 cites W2951070849 @default.
- W2899544675 cites W2951822379 @default.
- W2899544675 doi "https://doi.org/10.1093/bioinformatics/bty936" @default.
- W2899544675 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/30407482" @default.
- W2899544675 hasPublicationYear "2018" @default.
- W2899544675 type Work @default.
- W2899544675 sameAs 2899544675 @default.
- W2899544675 citedByCount "25" @default.
- W2899544675 countsByYear W28995446752019 @default.
- W2899544675 countsByYear W28995446752020 @default.
- W2899544675 countsByYear W28995446752021 @default.
- W2899544675 countsByYear W28995446752022 @default.
- W2899544675 countsByYear W28995446752023 @default.
- W2899544675 crossrefType "journal-article" @default.
- W2899544675 hasAuthorship W2899544675A5017861049 @default.
- W2899544675 hasAuthorship W2899544675A5018620403 @default.
- W2899544675 hasAuthorship W2899544675A5048391870 @default.
- W2899544675 hasAuthorship W2899544675A5049900685 @default.
- W2899544675 hasBestOaLocation W28995446751 @default.
- W2899544675 hasConcept C104317684 @default.
- W2899544675 hasConcept C11413529 @default.
- W2899544675 hasConcept C138885662 @default.
- W2899544675 hasConcept C141231307 @default.
- W2899544675 hasConcept C141603448 @default.
- W2899544675 hasConcept C150194340 @default.
- W2899544675 hasConcept C159985019 @default.
- W2899544675 hasConcept C162317418 @default.
- W2899544675 hasConcept C162319229 @default.
- W2899544675 hasConcept C180016635 @default.
- W2899544675 hasConcept C18949551 @default.
- W2899544675 hasConcept C192562407 @default.
- W2899544675 hasConcept C192953774 @default.
- W2899544675 hasConcept C199360897 @default.
- W2899544675 hasConcept C2778112365 @default.
- W2899544675 hasConcept C2779259728 @default.
- W2899544675 hasConcept C2779804580 @default.
- W2899544675 hasConcept C2781166958 @default.
- W2899544675 hasConcept C41008148 @default.
- W2899544675 hasConcept C41895202 @default.
- W2899544675 hasConcept C51679486 @default.
- W2899544675 hasConcept C54355233 @default.
- W2899544675 hasConcept C552990157 @default.
- W2899544675 hasConcept C59582021 @default.
- W2899544675 hasConcept C78548338 @default.
- W2899544675 hasConcept C86803240 @default.
- W2899544675 hasConceptScore W2899544675C104317684 @default.
- W2899544675 hasConceptScore W2899544675C11413529 @default.
- W2899544675 hasConceptScore W2899544675C138885662 @default.
- W2899544675 hasConceptScore W2899544675C141231307 @default.
- W2899544675 hasConceptScore W2899544675C141603448 @default.
- W2899544675 hasConceptScore W2899544675C150194340 @default.
- W2899544675 hasConceptScore W2899544675C159985019 @default.
- W2899544675 hasConceptScore W2899544675C162317418 @default.
- W2899544675 hasConceptScore W2899544675C162319229 @default.
- W2899544675 hasConceptScore W2899544675C180016635 @default.
- W2899544675 hasConceptScore W2899544675C18949551 @default.
- W2899544675 hasConceptScore W2899544675C192562407 @default.
- W2899544675 hasConceptScore W2899544675C192953774 @default.
- W2899544675 hasConceptScore W2899544675C199360897 @default.
- W2899544675 hasConceptScore W2899544675C2778112365 @default.
- W2899544675 hasConceptScore W2899544675C2779259728 @default.