Matches in SemOpenAlex for { <https://semopenalex.org/work/W337754377> ?p ?o ?g. }
- W337754377 abstract "With high throughput DNA sequencing costs dropping below $1000 for human genomes, data storage, retrieval and analysis are the major bottlenecks in biological studies. In order to address the large-data challenges on genomics, this thesis advocates : 1) A highly efficient read-level compression of the data which is achieved through reference-based compression by a tool called SLIMGENE and 2) a clean separation between evidence collection and inference in variant calling which is achieved though our Genome Query Language (GQL) that allows for the rapid collection of evidence needed for calling variants. The first contribution, SLIMGENE, introduces a set of domain specific lossless compression schemes that achieve over 40x compression of the ASCII representation of short reads, outperforming bzip2 by over 6x. Including quality values, we show 5x compression using less running time than bzip2. Secondly, given the discrepancy between the compression factor obtained with and without quality values, we initiate the study of using lossy transformations of the quality values. Specifically we show that a lossy quality value quantization results in 14x compression but has minimal impact on downstream applications like SNP calling that use quality values. The second contribution, GQL, introduces a novel framework for querying large genomic datasets. We provide a number of cases to showcase the user of GQL for complex evidence collection, such as the evidence for large structural variations. Specifically, typical GQL queries can be written in 5-10 lines of code and search large datasets ( 100GB) in only a few minutes on a cheap desktop computer. We show that GQL is faster and more concise than writing equivalent queries in existing frameworks such as GATK. We show that existing callers by an order of magnitude by using GQL to retrieve evidence. We also show how GQL output can be visualized using the UCSC browser" @default.
- W337754377 created "2016-06-24" @default.
- W337754377 creator A5088317076 @default.
- W337754377 date "2013-01-01" @default.
- W337754377 modified "2023-09-23" @default.
- W337754377 title "Compressing and Querrying the Human Genome" @default.
- W337754377 cites W1499849192 @default.
- W337754377 cites W1604187205 @default.
- W337754377 cites W1921457844 @default.
- W337754377 cites W1973589876 @default.
- W337754377 cites W2012016911 @default.
- W337754377 cites W2022406334 @default.
- W337754377 cites W2032247293 @default.
- W337754377 cites W2039795745 @default.
- W337754377 cites W2041069220 @default.
- W337754377 cites W2051929999 @default.
- W337754377 cites W2057807842 @default.
- W337754377 cites W2065817684 @default.
- W337754377 cites W2084748297 @default.
- W337754377 cites W2085755265 @default.
- W337754377 cites W2089251117 @default.
- W337754377 cites W2096283457 @default.
- W337754377 cites W2096941210 @default.
- W337754377 cites W2099821697 @default.
- W337754377 cites W2102619694 @default.
- W337754377 cites W2107688500 @default.
- W337754377 cites W2108234281 @default.
- W337754377 cites W2110553099 @default.
- W337754377 cites W2112509895 @default.
- W337754377 cites W2116753165 @default.
- W337754377 cites W2119180969 @default.
- W337754377 cites W2122962290 @default.
- W337754377 cites W2124785215 @default.
- W337754377 cites W2125598538 @default.
- W337754377 cites W2128016314 @default.
- W337754377 cites W2133112449 @default.
- W337754377 cites W2137422676 @default.
- W337754377 cites W2142071150 @default.
- W337754377 cites W2146405370 @default.
- W337754377 cites W2147492358 @default.
- W337754377 cites W2150111967 @default.
- W337754377 cites W2152476952 @default.
- W337754377 cites W2159084616 @default.
- W337754377 cites W2163938152 @default.
- W337754377 cites W2164086255 @default.
- W337754377 cites W2168133698 @default.
- W337754377 cites W2169150110 @default.
- W337754377 cites W2169818249 @default.
- W337754377 cites W2171777347 @default.
- W337754377 cites W2334091406 @default.
- W337754377 cites W2347040558 @default.
- W337754377 cites W2753710282 @default.
- W337754377 cites W2911978475 @default.
- W337754377 hasPublicationYear "2013" @default.
- W337754377 type Work @default.
- W337754377 sameAs 337754377 @default.
- W337754377 citedByCount "0" @default.
- W337754377 crossrefType "journal-article" @default.
- W337754377 hasAuthorship W337754377A5088317076 @default.
- W337754377 hasConcept C104317684 @default.
- W337754377 hasConcept C11413529 @default.
- W337754377 hasConcept C124101348 @default.
- W337754377 hasConcept C141231307 @default.
- W337754377 hasConcept C154945302 @default.
- W337754377 hasConcept C165021410 @default.
- W337754377 hasConcept C192953774 @default.
- W337754377 hasConcept C41008148 @default.
- W337754377 hasConcept C55493867 @default.
- W337754377 hasConcept C78548338 @default.
- W337754377 hasConcept C81081738 @default.
- W337754377 hasConcept C86803240 @default.
- W337754377 hasConceptScore W337754377C104317684 @default.
- W337754377 hasConceptScore W337754377C11413529 @default.
- W337754377 hasConceptScore W337754377C124101348 @default.
- W337754377 hasConceptScore W337754377C141231307 @default.
- W337754377 hasConceptScore W337754377C154945302 @default.
- W337754377 hasConceptScore W337754377C165021410 @default.
- W337754377 hasConceptScore W337754377C192953774 @default.
- W337754377 hasConceptScore W337754377C41008148 @default.
- W337754377 hasConceptScore W337754377C55493867 @default.
- W337754377 hasConceptScore W337754377C78548338 @default.
- W337754377 hasConceptScore W337754377C81081738 @default.
- W337754377 hasConceptScore W337754377C86803240 @default.
- W337754377 hasLocation W3377543771 @default.
- W337754377 hasOpenAccess W337754377 @default.
- W337754377 hasPrimaryLocation W3377543771 @default.
- W337754377 hasRelatedWork W1903041825 @default.
- W337754377 hasRelatedWork W2015521574 @default.
- W337754377 hasRelatedWork W2089873018 @default.
- W337754377 hasRelatedWork W2100076391 @default.
- W337754377 hasRelatedWork W2111044311 @default.
- W337754377 hasRelatedWork W2112173167 @default.
- W337754377 hasRelatedWork W2123010292 @default.
- W337754377 hasRelatedWork W2130419122 @default.
- W337754377 hasRelatedWork W2131106408 @default.
- W337754377 hasRelatedWork W2142434250 @default.
- W337754377 hasRelatedWork W2171057071 @default.
- W337754377 hasRelatedWork W2172512721 @default.
- W337754377 hasRelatedWork W2222050797 @default.
- W337754377 hasRelatedWork W2884502458 @default.