Matches in SemOpenAlex for { <https://semopenalex.org/work/W2890941206> ?p ?o ?g. }
Showing items 1 to 93 of
93
with 100 items per page.
- W2890941206 abstract "Abstract With the advancement of high-throughput sequencing technologies, the amount of available sequencing data is growing at a pace that has now begun to greatly challenge the data processing and storage capacities of modern computer systems. Removing redundancy from such data by clustering could be crucial for reducing memory, disk space and running time consumption. In addition, it also has good performance on reducing dataset noise in some analysis applications. In this study, we propose a high-performance short sequence classification algorithm (HSC) for next generation sequencing (NGS) data based on efficient hash function and text similarity. First, HSC converts all reads into k-mers , then it forms a unique k-mer set by merging the duplicated and reverse complementary elements. Second, all unique k-mers are stored in a hash table, where the k-mer string is stored in the key field, and the ID of the reads containing the k-mer are stored in the value field. Third, each hash unit is transformed into a short text consisting of reads. Fourth, texts that satisfy the similarity threshold are combined into a long text, the merge operation is executed iteratively until there is no text that satisfies the merge condition. Finally, the long text is transformed into a cluster consisting of reads. We tested HSC using five real datasets. The experimental results showed that HSC cluster 100 million short reads within 2 hours, and it has excellent performance in reducing memory consumption. Compared to existing methods, HSC is much faster than other tools, it can easily handle tens of millions of sequences. In addition, when HSC is used as a preprocessing tool to produce assembly data, the memory and time consumption of the assembler is greatly reduced. It can help the assembler to achieve better assemblies in terms of N50, NA50 and genome fraction." @default.
- W2890941206 created "2018-09-27" @default.
- W2890941206 creator A5037871205 @default.
- W2890941206 creator A5039161616 @default.
- W2890941206 creator A5047265119 @default.
- W2890941206 creator A5061045747 @default.
- W2890941206 creator A5088826157 @default.
- W2890941206 date "2018-01-01" @default.
- W2890941206 modified "2023-09-23" @default.
- W2890941206 title "An efficient classification algorithm for NGS data based on text similarity" @default.
- W2890941206 cites W1994105594 @default.
- W2890941206 cites W2022986961 @default.
- W2890941206 cites W2025153546 @default.
- W2890941206 cites W2057253402 @default.
- W2890941206 cites W2096128575 @default.
- W2890941206 cites W2097606916 @default.
- W2890941206 cites W2103187775 @default.
- W2890941206 cites W2107772251 @default.
- W2890941206 cites W2115214414 @default.
- W2890941206 cites W2124351063 @default.
- W2890941206 cites W2125266506 @default.
- W2890941206 cites W2129933858 @default.
- W2890941206 cites W2133956160 @default.
- W2890941206 cites W2145252566 @default.
- W2890941206 cites W2145853890 @default.
- W2890941206 cites W2149616469 @default.
- W2890941206 cites W2156125289 @default.
- W2890941206 cites W2160681728 @default.
- W2890941206 cites W2170551349 @default.
- W2890941206 cites W2237301047 @default.
- W2890941206 cites W2534101049 @default.
- W2890941206 cites W2537565528 @default.
- W2890941206 cites W2540579902 @default.
- W2890941206 cites W2564293829 @default.
- W2890941206 cites W2762557867 @default.
- W2890941206 doi "https://doi.org/10.1017/s0016672318000058" @default.
- W2890941206 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/6865153" @default.
- W2890941206 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/30221607" @default.
- W2890941206 hasPublicationYear "2018" @default.
- W2890941206 type Work @default.
- W2890941206 sameAs 2890941206 @default.
- W2890941206 citedByCount "1" @default.
- W2890941206 countsByYear W28909412062020 @default.
- W2890941206 crossrefType "journal-article" @default.
- W2890941206 hasAuthorship W2890941206A5037871205 @default.
- W2890941206 hasAuthorship W2890941206A5039161616 @default.
- W2890941206 hasAuthorship W2890941206A5047265119 @default.
- W2890941206 hasAuthorship W2890941206A5061045747 @default.
- W2890941206 hasAuthorship W2890941206A5088826157 @default.
- W2890941206 hasBestOaLocation W28909412061 @default.
- W2890941206 hasConcept C111919701 @default.
- W2890941206 hasConcept C11413529 @default.
- W2890941206 hasConcept C124101348 @default.
- W2890941206 hasConcept C152124472 @default.
- W2890941206 hasConcept C154945302 @default.
- W2890941206 hasConcept C173608175 @default.
- W2890941206 hasConcept C197129107 @default.
- W2890941206 hasConcept C38652104 @default.
- W2890941206 hasConcept C41008148 @default.
- W2890941206 hasConcept C73555534 @default.
- W2890941206 hasConcept C99138194 @default.
- W2890941206 hasConceptScore W2890941206C111919701 @default.
- W2890941206 hasConceptScore W2890941206C11413529 @default.
- W2890941206 hasConceptScore W2890941206C124101348 @default.
- W2890941206 hasConceptScore W2890941206C152124472 @default.
- W2890941206 hasConceptScore W2890941206C154945302 @default.
- W2890941206 hasConceptScore W2890941206C173608175 @default.
- W2890941206 hasConceptScore W2890941206C197129107 @default.
- W2890941206 hasConceptScore W2890941206C38652104 @default.
- W2890941206 hasConceptScore W2890941206C41008148 @default.
- W2890941206 hasConceptScore W2890941206C73555534 @default.
- W2890941206 hasConceptScore W2890941206C99138194 @default.
- W2890941206 hasLocation W28909412061 @default.
- W2890941206 hasLocation W28909412062 @default.
- W2890941206 hasLocation W28909412063 @default.
- W2890941206 hasLocation W28909412064 @default.
- W2890941206 hasOpenAccess W2890941206 @default.
- W2890941206 hasPrimaryLocation W28909412061 @default.
- W2890941206 hasRelatedWork W2080529643 @default.
- W2890941206 hasRelatedWork W2150276710 @default.
- W2890941206 hasRelatedWork W2158198137 @default.
- W2890941206 hasRelatedWork W2369673098 @default.
- W2890941206 hasRelatedWork W2386315983 @default.
- W2890941206 hasRelatedWork W2508834110 @default.
- W2890941206 hasRelatedWork W2604316488 @default.
- W2890941206 hasRelatedWork W3010239717 @default.
- W2890941206 hasRelatedWork W3041052722 @default.
- W2890941206 hasRelatedWork W3142187282 @default.
- W2890941206 hasVolume "100" @default.
- W2890941206 isParatext "false" @default.
- W2890941206 isRetracted "false" @default.
- W2890941206 magId "2890941206" @default.
- W2890941206 workType "article" @default.