Matches in SemOpenAlex for { <https://semopenalex.org/work/W4313120197> ?p ?o ?g. }
Showing items 1 to 82 of
82
with 100 items per page.
- W4313120197 abstract "Current cluster scaled genomics data processing solutions rely on big data frameworks like Apache Spark, Hadoop and HDFS for data scheduling, processing and storage. These frameworks come with additional computation and memory overheads by default. It has been observed that scaling genomics dataset processing beyond 32 nodes is not efficient on such frameworks.To overcome the inefficiencies of big data frameworks for processing genomics data on clusters, we introduce a low-overhead and highly scalable solution on a SLURM based HPC batch system. This solution uses Apache Arrow as in-memory columnar data format to store genomics data efficiently and Arrow Flight as a network protocol to move and schedule this data across the HPC nodes with low communication overhead.As a use case, we use NGS short reads DNA sequencing data for pre-processing and variant calling applications. This solution outperforms existing Apache Spark based big data solutions in term of both computation time (2x) and lower communication overhead (more than 20-60% depending on cluster size). Our solution has similar performance to MPI-based HPC solutions, with the added advantage of easy programmability and transparent big data scalability. The whole solution is Python and shell script based, which makes it flexible to update and integrate alternative variant callers. Our solution is publicly available on GitHub at https://github.com/abs-tudelft/time-to-fly-high/tree/main/genomics" @default.
- W4313120197 created "2023-01-06" @default.
- W4313120197 creator A5016736335 @default.
- W4313120197 creator A5021955713 @default.
- W4313120197 creator A5062909392 @default.
- W4313120197 creator A5083798008 @default.
- W4313120197 date "2022-07-01" @default.
- W4313120197 modified "2023-09-27" @default.
- W4313120197 title "Communication-Efficient Cluster Scalable Genomics Data Processing Using Apache Arrow Flight" @default.
- W4313120197 cites W2159670728 @default.
- W4313120197 cites W2297835331 @default.
- W4313120197 cites W2398924165 @default.
- W4313120197 cites W2529838505 @default.
- W4313120197 cites W2745862296 @default.
- W4313120197 cites W2927075311 @default.
- W4313120197 cites W2952870794 @default.
- W4313120197 cites W3014336662 @default.
- W4313120197 cites W3028555753 @default.
- W4313120197 cites W3100908593 @default.
- W4313120197 doi "https://doi.org/10.1109/ispdc55340.2022.00028" @default.
- W4313120197 hasPublicationYear "2022" @default.
- W4313120197 type Work @default.
- W4313120197 citedByCount "0" @default.
- W4313120197 crossrefType "proceedings-article" @default.
- W4313120197 hasAuthorship W4313120197A5016736335 @default.
- W4313120197 hasAuthorship W4313120197A5021955713 @default.
- W4313120197 hasAuthorship W4313120197A5062909392 @default.
- W4313120197 hasAuthorship W4313120197A5083798008 @default.
- W4313120197 hasBestOaLocation W43131201972 @default.
- W4313120197 hasConcept C104317684 @default.
- W4313120197 hasConcept C111919701 @default.
- W4313120197 hasConcept C120314980 @default.
- W4313120197 hasConcept C138827492 @default.
- W4313120197 hasConcept C141231307 @default.
- W4313120197 hasConcept C173608175 @default.
- W4313120197 hasConcept C185592680 @default.
- W4313120197 hasConcept C189206191 @default.
- W4313120197 hasConcept C199360897 @default.
- W4313120197 hasConcept C2779960059 @default.
- W4313120197 hasConcept C2781215313 @default.
- W4313120197 hasConcept C29140674 @default.
- W4313120197 hasConcept C41008148 @default.
- W4313120197 hasConcept C48044578 @default.
- W4313120197 hasConcept C519991488 @default.
- W4313120197 hasConcept C55493867 @default.
- W4313120197 hasConcept C75684735 @default.
- W4313120197 hasConcept C83283714 @default.
- W4313120197 hasConceptScore W4313120197C104317684 @default.
- W4313120197 hasConceptScore W4313120197C111919701 @default.
- W4313120197 hasConceptScore W4313120197C120314980 @default.
- W4313120197 hasConceptScore W4313120197C138827492 @default.
- W4313120197 hasConceptScore W4313120197C141231307 @default.
- W4313120197 hasConceptScore W4313120197C173608175 @default.
- W4313120197 hasConceptScore W4313120197C185592680 @default.
- W4313120197 hasConceptScore W4313120197C189206191 @default.
- W4313120197 hasConceptScore W4313120197C199360897 @default.
- W4313120197 hasConceptScore W4313120197C2779960059 @default.
- W4313120197 hasConceptScore W4313120197C2781215313 @default.
- W4313120197 hasConceptScore W4313120197C29140674 @default.
- W4313120197 hasConceptScore W4313120197C41008148 @default.
- W4313120197 hasConceptScore W4313120197C48044578 @default.
- W4313120197 hasConceptScore W4313120197C519991488 @default.
- W4313120197 hasConceptScore W4313120197C55493867 @default.
- W4313120197 hasConceptScore W4313120197C75684735 @default.
- W4313120197 hasConceptScore W4313120197C83283714 @default.
- W4313120197 hasLocation W43131201971 @default.
- W4313120197 hasLocation W43131201972 @default.
- W4313120197 hasOpenAccess W4313120197 @default.
- W4313120197 hasPrimaryLocation W43131201971 @default.
- W4313120197 hasRelatedWork W1595151633 @default.
- W4313120197 hasRelatedWork W2092071486 @default.
- W4313120197 hasRelatedWork W2161537327 @default.
- W4313120197 hasRelatedWork W2553660239 @default.
- W4313120197 hasRelatedWork W2569819632 @default.
- W4313120197 hasRelatedWork W2745423290 @default.
- W4313120197 hasRelatedWork W2784494576 @default.
- W4313120197 hasRelatedWork W2891888092 @default.
- W4313120197 hasRelatedWork W3185293612 @default.
- W4313120197 hasRelatedWork W3217778767 @default.
- W4313120197 isParatext "false" @default.
- W4313120197 isRetracted "false" @default.
- W4313120197 workType "article" @default.