Matches in SemOpenAlex for { <https://semopenalex.org/work/W3084848114> ?p ?o ?g. }
- W3084848114 abstract "Abstract This paper introduces the HASTE Toolkit , a cloud-native software toolkit capable of partitioning data streams in order to prioritize usage of limited resources. This in turn enables more efficient data-intensive experiments. We propose a model that introduces automated, autonomous decision making in data pipelines, such that a stream of data can be partitioned into a tiered or ordered data hierarchy . Importantly, the partitioning is online and based on data content rather than a priori metadata. At the core of the model are interestingness functions and policies . Interestingness functions assign a quantitative measure of interestingness to a single data object in the stream, an interestingness score. Based on this score, a policy guides decisions on how to prioritize computational resource usage for a given object. The HASTE Toolkit is a collection of tools to adapt data stream processing to this pipeline model. The result is smart data pipelines capable of effective or even optimal use of e.g. storage, compute and network bandwidth, to support experiments involving rapid processing of scientific data characterized by large individual data object sizes. We demonstrate the proposed model and our toolkit through two microscopy imaging case studies, each with their own interestingness functions, policies, and data hierarchies. The first deals with a high content screening experiment, where images are analyzed in an on-premise container cloud with the goal of prioritizing the images for storage and subsequent computation. The second considers edge processing of images for upload into the public cloud for a real-time control loop for a transmission electron microscope. Key Points We propose a pipeline model for building intelligent pipelines for streams, accounting for actual information content in data rather than a priori metadata, and present the HASTE Toolkit, a cloud-native software toolkit for supporting rapid development according to the proposed model. We demonstrate how the HASTE Toolkit enables intelligent resource optimization in two image analysis case studies based on a) high-content imaging and b) transmission electron microscopy. We highlight the challenges of storage, processing and transfer in streamed high volume, high velocity scientific data for both cloud and cloud-edge use cases." @default.
- W3084848114 created "2020-09-21" @default.
- W3084848114 creator A5001646323 @default.
- W3084848114 creator A5009814800 @default.
- W3084848114 creator A5010333712 @default.
- W3084848114 creator A5028372092 @default.
- W3084848114 creator A5040423646 @default.
- W3084848114 creator A5046663348 @default.
- W3084848114 creator A5049765702 @default.
- W3084848114 creator A5065946530 @default.
- W3084848114 creator A5066637534 @default.
- W3084848114 creator A5075025484 @default.
- W3084848114 date "2020-09-14" @default.
- W3084848114 modified "2023-09-27" @default.
- W3084848114 title "Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit" @default.
- W3084848114 cites W1998614190 @default.
- W3084848114 cites W2048593198 @default.
- W3084848114 cites W2078583253 @default.
- W3084848114 cites W2104846587 @default.
- W3084848114 cites W2152012576 @default.
- W3084848114 cites W2340685884 @default.
- W3084848114 cites W2392395307 @default.
- W3084848114 cites W2424304400 @default.
- W3084848114 cites W2487200295 @default.
- W3084848114 cites W2585715405 @default.
- W3084848114 cites W2589440298 @default.
- W3084848114 cites W2660039237 @default.
- W3084848114 cites W2745296509 @default.
- W3084848114 cites W2766380068 @default.
- W3084848114 cites W2768694333 @default.
- W3084848114 cites W2805350385 @default.
- W3084848114 cites W2809003517 @default.
- W3084848114 cites W2811106513 @default.
- W3084848114 cites W2891018523 @default.
- W3084848114 cites W2950563407 @default.
- W3084848114 cites W2963045412 @default.
- W3084848114 cites W3029598567 @default.
- W3084848114 cites W3037672257 @default.
- W3084848114 doi "https://doi.org/10.1101/2020.09.13.274779" @default.
- W3084848114 hasPublicationYear "2020" @default.
- W3084848114 type Work @default.
- W3084848114 sameAs 3084848114 @default.
- W3084848114 citedByCount "0" @default.
- W3084848114 crossrefType "posted-content" @default.
- W3084848114 hasAuthorship W3084848114A5001646323 @default.
- W3084848114 hasAuthorship W3084848114A5009814800 @default.
- W3084848114 hasAuthorship W3084848114A5010333712 @default.
- W3084848114 hasAuthorship W3084848114A5028372092 @default.
- W3084848114 hasAuthorship W3084848114A5040423646 @default.
- W3084848114 hasAuthorship W3084848114A5046663348 @default.
- W3084848114 hasAuthorship W3084848114A5049765702 @default.
- W3084848114 hasAuthorship W3084848114A5065946530 @default.
- W3084848114 hasAuthorship W3084848114A5066637534 @default.
- W3084848114 hasAuthorship W3084848114A5075025484 @default.
- W3084848114 hasBestOaLocation W30848481141 @default.
- W3084848114 hasConcept C111919701 @default.
- W3084848114 hasConcept C120314980 @default.
- W3084848114 hasConcept C124101348 @default.
- W3084848114 hasConcept C127413603 @default.
- W3084848114 hasConcept C154945302 @default.
- W3084848114 hasConcept C199360897 @default.
- W3084848114 hasConcept C2781018962 @default.
- W3084848114 hasConcept C2781238097 @default.
- W3084848114 hasConcept C41008148 @default.
- W3084848114 hasConcept C43521106 @default.
- W3084848114 hasConcept C71901391 @default.
- W3084848114 hasConcept C77088390 @default.
- W3084848114 hasConcept C78519656 @default.
- W3084848114 hasConcept C79974875 @default.
- W3084848114 hasConcept C89198739 @default.
- W3084848114 hasConcept C93518851 @default.
- W3084848114 hasConceptScore W3084848114C111919701 @default.
- W3084848114 hasConceptScore W3084848114C120314980 @default.
- W3084848114 hasConceptScore W3084848114C124101348 @default.
- W3084848114 hasConceptScore W3084848114C127413603 @default.
- W3084848114 hasConceptScore W3084848114C154945302 @default.
- W3084848114 hasConceptScore W3084848114C199360897 @default.
- W3084848114 hasConceptScore W3084848114C2781018962 @default.
- W3084848114 hasConceptScore W3084848114C2781238097 @default.
- W3084848114 hasConceptScore W3084848114C41008148 @default.
- W3084848114 hasConceptScore W3084848114C43521106 @default.
- W3084848114 hasConceptScore W3084848114C71901391 @default.
- W3084848114 hasConceptScore W3084848114C77088390 @default.
- W3084848114 hasConceptScore W3084848114C78519656 @default.
- W3084848114 hasConceptScore W3084848114C79974875 @default.
- W3084848114 hasConceptScore W3084848114C89198739 @default.
- W3084848114 hasConceptScore W3084848114C93518851 @default.
- W3084848114 hasLocation W30848481141 @default.
- W3084848114 hasLocation W30848481142 @default.
- W3084848114 hasLocation W30848481143 @default.
- W3084848114 hasOpenAccess W3084848114 @default.
- W3084848114 hasPrimaryLocation W30848481141 @default.
- W3084848114 hasRelatedWork W14152614 @default.
- W3084848114 hasRelatedWork W14983537 @default.
- W3084848114 hasRelatedWork W1693865 @default.
- W3084848114 hasRelatedWork W1923150 @default.
- W3084848114 hasRelatedWork W215274 @default.
- W3084848114 hasRelatedWork W4049835 @default.
- W3084848114 hasRelatedWork W6430337 @default.
- W3084848114 hasRelatedWork W7346208 @default.