Matches in SemOpenAlex for { <https://semopenalex.org/work/W2283787433> ?p ?o ?g. }
- W2283787433 abstract "Metagenomics is a genomics research discipline devoted to the study of microbial communities in environmental samples and human and animal organs and tissues. Sequenced metagenomic samples usually comprise reads from a large number of different bacterial communities and hence tend to result in large file sizes, typically ranging between 1–10 GB. This leads to challenges in analyzing, transferring and storing metagenomic data. In order to overcome these data processing issues, we introduce MetaCRAM, the first de novo, parallelized software suite specialized for FASTA and FASTQ format metagenomic read processing and lossless compression. MetaCRAM integrates algorithms for taxonomy identification and assembly, and introduces parallel execution methods; furthermore, it enables genome reference selection and CRAM based compression. MetaCRAM also uses novel reference-based compression methods designed through extensive studies of integer compression techniques and through fitting of empirical distributions of metagenomic read-reference positions. MetaCRAM is a lossless method compatible with standard CRAM formats, and it allows for fast selection of relevant files in the compressed domain via maintenance of taxonomy information. The performance of MetaCRAM as a stand-alone compression platform was evaluated on various metagenomic samples from the NCBI Sequence Read Archive, suggesting 2- to 4-fold compression ratio improvements compared to gzip. On average, the compressed file sizes were 2-13 percent of the original raw metagenomic file sizes. We described the first architecture for reference-based, lossless compression of metagenomic data. The compression scheme proposed offers significantly improved compression ratios as compared to off-the-shelf methods such as zip programs. Furthermore, it enables running different components in parallel and it provides the user with taxonomic and assembly information generated during execution of the compression pipeline. The MetaCRAM software is freely available at http://web.engr.illinois.edu/~mkim158/metacram.html . The website also contains a README file and other relevant instructions for running the code. Note that to run the code one needs a minimum of 16 GB of RAM. In addition, virtual box is set up on a 4GB RAM machine for users to run a simple demonstration." @default.
- W2283787433 created "2016-06-24" @default.
- W2283787433 creator A5000295358 @default.
- W2283787433 creator A5011621481 @default.
- W2283787433 creator A5049656392 @default.
- W2283787433 creator A5067089715 @default.
- W2283787433 creator A5084947882 @default.
- W2283787433 creator A5086800976 @default.
- W2283787433 date "2016-02-19" @default.
- W2283787433 modified "2023-10-09" @default.
- W2283787433 title "MetaCRAM: an integrated pipeline for metagenomic taxonomy identification and compression" @default.
- W2283787433 cites W1965359119 @default.
- W2283787433 cites W1966711026 @default.
- W2283787433 cites W2010002522 @default.
- W2283787433 cites W2020255088 @default.
- W2283787433 cites W2032046925 @default.
- W2283787433 cites W2033563072 @default.
- W2283787433 cites W2048818637 @default.
- W2283787433 cites W2051929999 @default.
- W2283787433 cites W2055043387 @default.
- W2283787433 cites W2060108852 @default.
- W2283787433 cites W2093830129 @default.
- W2283787433 cites W2102276592 @default.
- W2283787433 cites W2108234281 @default.
- W2283787433 cites W2110300022 @default.
- W2283787433 cites W2111044311 @default.
- W2283787433 cites W2115613939 @default.
- W2283787433 cites W2116041602 @default.
- W2283787433 cites W2116895571 @default.
- W2283787433 cites W2117953160 @default.
- W2283787433 cites W2128777897 @default.
- W2283787433 cites W2131088968 @default.
- W2283787433 cites W2137689845 @default.
- W2283787433 cites W2141920662 @default.
- W2283787433 cites W2145166062 @default.
- W2283787433 cites W2147492358 @default.
- W2283787433 cites W2151450114 @default.
- W2283787433 cites W2158678815 @default.
- W2283787433 cites W2159084616 @default.
- W2283787433 cites W2159954944 @default.
- W2283787433 cites W2160969485 @default.
- W2283787433 cites W2166588423 @default.
- W2283787433 cites W2167943254 @default.
- W2283787433 cites W2168318733 @default.
- W2283787433 cites W2170551349 @default.
- W2283787433 cites W2171571559 @default.
- W2283787433 cites W4233928114 @default.
- W2283787433 doi "https://doi.org/10.1186/s12859-016-0932-x" @default.
- W2283787433 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/4759986" @default.
- W2283787433 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/26895947" @default.
- W2283787433 hasPublicationYear "2016" @default.
- W2283787433 type Work @default.
- W2283787433 sameAs 2283787433 @default.
- W2283787433 citedByCount "12" @default.
- W2283787433 countsByYear W22837874332016 @default.
- W2283787433 countsByYear W22837874332017 @default.
- W2283787433 countsByYear W22837874332018 @default.
- W2283787433 countsByYear W22837874332019 @default.
- W2283787433 countsByYear W22837874332020 @default.
- W2283787433 countsByYear W22837874332021 @default.
- W2283787433 countsByYear W22837874332023 @default.
- W2283787433 crossrefType "journal-article" @default.
- W2283787433 hasAuthorship W2283787433A5000295358 @default.
- W2283787433 hasAuthorship W2283787433A5011621481 @default.
- W2283787433 hasAuthorship W2283787433A5049656392 @default.
- W2283787433 hasAuthorship W2283787433A5067089715 @default.
- W2283787433 hasAuthorship W2283787433A5084947882 @default.
- W2283787433 hasAuthorship W2283787433A5086800976 @default.
- W2283787433 hasBestOaLocation W22837874331 @default.
- W2283787433 hasConcept C104317684 @default.
- W2283787433 hasConcept C111919701 @default.
- W2283787433 hasConcept C116834253 @default.
- W2283787433 hasConcept C124101348 @default.
- W2283787433 hasConcept C141231307 @default.
- W2283787433 hasConcept C15151743 @default.
- W2283787433 hasConcept C154945302 @default.
- W2283787433 hasConcept C18903297 @default.
- W2283787433 hasConcept C192953774 @default.
- W2283787433 hasConcept C2777904410 @default.
- W2283787433 hasConcept C41008148 @default.
- W2283787433 hasConcept C43521106 @default.
- W2283787433 hasConcept C55493867 @default.
- W2283787433 hasConcept C77088390 @default.
- W2283787433 hasConcept C78548338 @default.
- W2283787433 hasConcept C81081738 @default.
- W2283787433 hasConcept C86803240 @default.
- W2283787433 hasConcept C97250363 @default.
- W2283787433 hasConceptScore W2283787433C104317684 @default.
- W2283787433 hasConceptScore W2283787433C111919701 @default.
- W2283787433 hasConceptScore W2283787433C116834253 @default.
- W2283787433 hasConceptScore W2283787433C124101348 @default.
- W2283787433 hasConceptScore W2283787433C141231307 @default.
- W2283787433 hasConceptScore W2283787433C15151743 @default.
- W2283787433 hasConceptScore W2283787433C154945302 @default.
- W2283787433 hasConceptScore W2283787433C18903297 @default.
- W2283787433 hasConceptScore W2283787433C192953774 @default.
- W2283787433 hasConceptScore W2283787433C2777904410 @default.
- W2283787433 hasConceptScore W2283787433C41008148 @default.
- W2283787433 hasConceptScore W2283787433C43521106 @default.
- W2283787433 hasConceptScore W2283787433C55493867 @default.