Matches in SemOpenAlex for { <https://semopenalex.org/work/W4365205323> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W4365205323 abstract "Columnar storage is a core component of a modern data analytics system. Although many database management systems (DBMSs) have proprietary storage formats, most provide extensive support to open-source storage formats such as Parquet and ORC to facilitate cross-platform data sharing. But these formats were developed over a decade ago, in the early 2010s, for the Hadoop ecosystem. Since then, both the hardware and workload landscapes have changed. In this paper, we revisit the most widely adopted open-source columnar storage formats (Parquet and ORC) with a deep dive into their internals. We designed a benchmark to stress-test the formats' performance and space efficiency under different workload configurations. From our comprehensive evaluation of Parquet and ORC, we identify design decisions advantageous with modern hardware and real-world data distributions. These include using dictionary encoding by default, favoring decoding speed over compression ratio for integer encoding algorithms, making block compression optional, and embedding finer-grained auxiliary data structures. We also point out the inefficiencies in the format designs when handling common machine learning workloads and using GPUs for decoding. Our analysis identified important considerations that may guide future formats to better fit modern technology trends." @default.
- W4365205323 created "2023-04-13" @default.
- W4365205323 creator A5015113627 @default.
- W4365205323 creator A5025646560 @default.
- W4365205323 creator A5049165312 @default.
- W4365205323 creator A5063728326 @default.
- W4365205323 creator A5072310638 @default.
- W4365205323 creator A5079756305 @default.
- W4365205323 date "2023-04-11" @default.
- W4365205323 modified "2023-09-30" @default.
- W4365205323 title "An Empirical Evaluation of Columnar Storage Formats" @default.
- W4365205323 doi "https://doi.org/10.48550/arxiv.2304.05028" @default.
- W4365205323 hasPublicationYear "2023" @default.
- W4365205323 type Work @default.
- W4365205323 citedByCount "0" @default.
- W4365205323 crossrefType "posted-content" @default.
- W4365205323 hasAuthorship W4365205323A5015113627 @default.
- W4365205323 hasAuthorship W4365205323A5025646560 @default.
- W4365205323 hasAuthorship W4365205323A5049165312 @default.
- W4365205323 hasAuthorship W4365205323A5063728326 @default.
- W4365205323 hasAuthorship W4365205323A5072310638 @default.
- W4365205323 hasAuthorship W4365205323A5079756305 @default.
- W4365205323 hasBestOaLocation W43652053231 @default.
- W4365205323 hasConcept C111919701 @default.
- W4365205323 hasConcept C121332964 @default.
- W4365205323 hasConcept C125411270 @default.
- W4365205323 hasConcept C13280743 @default.
- W4365205323 hasConcept C154945302 @default.
- W4365205323 hasConcept C168167062 @default.
- W4365205323 hasConcept C185798385 @default.
- W4365205323 hasConcept C194739806 @default.
- W4365205323 hasConcept C205649164 @default.
- W4365205323 hasConcept C2524010 @default.
- W4365205323 hasConcept C2777210771 @default.
- W4365205323 hasConcept C2778476105 @default.
- W4365205323 hasConcept C2780945871 @default.
- W4365205323 hasConcept C33923547 @default.
- W4365205323 hasConcept C41008148 @default.
- W4365205323 hasConcept C57273362 @default.
- W4365205323 hasConcept C75684735 @default.
- W4365205323 hasConcept C76155785 @default.
- W4365205323 hasConcept C77088390 @default.
- W4365205323 hasConcept C97355855 @default.
- W4365205323 hasConceptScore W4365205323C111919701 @default.
- W4365205323 hasConceptScore W4365205323C121332964 @default.
- W4365205323 hasConceptScore W4365205323C125411270 @default.
- W4365205323 hasConceptScore W4365205323C13280743 @default.
- W4365205323 hasConceptScore W4365205323C154945302 @default.
- W4365205323 hasConceptScore W4365205323C168167062 @default.
- W4365205323 hasConceptScore W4365205323C185798385 @default.
- W4365205323 hasConceptScore W4365205323C194739806 @default.
- W4365205323 hasConceptScore W4365205323C205649164 @default.
- W4365205323 hasConceptScore W4365205323C2524010 @default.
- W4365205323 hasConceptScore W4365205323C2777210771 @default.
- W4365205323 hasConceptScore W4365205323C2778476105 @default.
- W4365205323 hasConceptScore W4365205323C2780945871 @default.
- W4365205323 hasConceptScore W4365205323C33923547 @default.
- W4365205323 hasConceptScore W4365205323C41008148 @default.
- W4365205323 hasConceptScore W4365205323C57273362 @default.
- W4365205323 hasConceptScore W4365205323C75684735 @default.
- W4365205323 hasConceptScore W4365205323C76155785 @default.
- W4365205323 hasConceptScore W4365205323C77088390 @default.
- W4365205323 hasConceptScore W4365205323C97355855 @default.
- W4365205323 hasLocation W43652053231 @default.
- W4365205323 hasOpenAccess W4365205323 @default.
- W4365205323 hasPrimaryLocation W43652053231 @default.
- W4365205323 hasRelatedWork W1980163258 @default.
- W4365205323 hasRelatedWork W2141712509 @default.
- W4365205323 hasRelatedWork W2317981192 @default.
- W4365205323 hasRelatedWork W2349008526 @default.
- W4365205323 hasRelatedWork W2371843261 @default.
- W4365205323 hasRelatedWork W2371887257 @default.
- W4365205323 hasRelatedWork W2384066639 @default.
- W4365205323 hasRelatedWork W2607929079 @default.
- W4365205323 hasRelatedWork W3113501250 @default.
- W4365205323 hasRelatedWork W4327782974 @default.
- W4365205323 isParatext "false" @default.
- W4365205323 isRetracted "false" @default.
- W4365205323 workType "article" @default.