Matches in SemOpenAlex for { <https://semopenalex.org/work/W3139484330> ?p ?o ?g. }
Showing items 1 to 94 of
94
with 100 items per page.
- W3139484330 abstract "Users submit their simulations to High Performance Computing (HPC) clusters through batch systems which allocate cluster resources to user jobs. While some resource managers and job schedulers, such as Slurm, have a generalized resource model, they end up monitoring and managing only computing resources (i.e., nodes) in nearly all modern HPC systems. Other resources, such as parallel file systems, are also important to job execution but resource managers and job schedulers remain blind to their impact on the overall cluster utilization and job performance. For example, contention for IO resources increases job runtime and delays execution. Furthermore, we observe the trend of an increasing gap between compute power and IO bandwidth, meaning that the bandwidth to file systems is outpaced by the rate of data production for IO-intensive applications. These problems can be addressed with IO-aware schedulers. Unfortunately schedulers lack automatic, scalable, and general tools that support and enable IO-awareness by generating knowledge that the schedulers can leverage to prevent and mitigate IO contention while dealing with IO bandwidth constraints. ? To address the problems, in this thesis we propose AI4IO, a suite of Artificial Intelligence (AI) based tools that enable resource awareness on HPC systems. AI4IO consists of two tools: PRIONN and CanarIO. PRIONN automates predictions about user-submitted job resource usage; CanarIO detects, in real-time, the presence of IO contention on HPC systems and predicts which jobs are affected by that contention. By working in concert, the AI4IO tools predict the a priori knowledge necessary to prevent and mitigate IO contention with IO-aware scheduling. We leverage the Flux simulator to implement a realistic simulation of a HPC environment and integrate AI4IO in the Flux simulation. We first evaluate PRIONN and CanarIO separately and show that they improve performance with the prevention and mitigation of IO contention. We then use the two A4IO tools in concert to produce greater improvements in performance: we observe up to 6.2% improvement in makespan of real HPC job workloads, which amounts to more than 18,000 node-hours saved per week on a production-size cluster." @default.
- W3139484330 created "2021-03-29" @default.
- W3139484330 creator A5058523082 @default.
- W3139484330 date "2020-01-01" @default.
- W3139484330 modified "2023-09-27" @default.
- W3139484330 title "AI4IO: a suite of AI-based tools for IO-aware HPC resource management" @default.
- W3139484330 cites W1494425918 @default.
- W3139484330 cites W1545469897 @default.
- W3139484330 cites W1581344573 @default.
- W3139484330 cites W1581615656 @default.
- W3139484330 cites W1603527427 @default.
- W3139484330 cites W1988404188 @default.
- W3139484330 cites W2003064893 @default.
- W3139484330 cites W2025024269 @default.
- W3139484330 cites W2038924755 @default.
- W3139484330 cites W2039373661 @default.
- W3139484330 cites W2076995567 @default.
- W3139484330 cites W2101234009 @default.
- W3139484330 cites W2115890460 @default.
- W3139484330 cites W2153579005 @default.
- W3139484330 cites W2157777898 @default.
- W3139484330 cites W2162342269 @default.
- W3139484330 cites W2332895340 @default.
- W3139484330 cites W2335889461 @default.
- W3139484330 cites W2401336103 @default.
- W3139484330 cites W2560558247 @default.
- W3139484330 cites W2566299966 @default.
- W3139484330 cites W2886514749 @default.
- W3139484330 cites W2953429524 @default.
- W3139484330 cites W2963729913 @default.
- W3139484330 cites W2982336568 @default.
- W3139484330 hasPublicationYear "2020" @default.
- W3139484330 type Work @default.
- W3139484330 sameAs 3139484330 @default.
- W3139484330 citedByCount "0" @default.
- W3139484330 crossrefType "dissertation" @default.
- W3139484330 hasAuthorship W3139484330A5058523082 @default.
- W3139484330 hasConcept C111873713 @default.
- W3139484330 hasConcept C111919701 @default.
- W3139484330 hasConcept C120314980 @default.
- W3139484330 hasConcept C153083717 @default.
- W3139484330 hasConcept C154945302 @default.
- W3139484330 hasConcept C166957645 @default.
- W3139484330 hasConcept C206345919 @default.
- W3139484330 hasConcept C2776257435 @default.
- W3139484330 hasConcept C31258907 @default.
- W3139484330 hasConcept C41008148 @default.
- W3139484330 hasConcept C48044578 @default.
- W3139484330 hasConcept C79581498 @default.
- W3139484330 hasConcept C79974875 @default.
- W3139484330 hasConcept C83283714 @default.
- W3139484330 hasConcept C95457728 @default.
- W3139484330 hasConceptScore W3139484330C111873713 @default.
- W3139484330 hasConceptScore W3139484330C111919701 @default.
- W3139484330 hasConceptScore W3139484330C120314980 @default.
- W3139484330 hasConceptScore W3139484330C153083717 @default.
- W3139484330 hasConceptScore W3139484330C154945302 @default.
- W3139484330 hasConceptScore W3139484330C166957645 @default.
- W3139484330 hasConceptScore W3139484330C206345919 @default.
- W3139484330 hasConceptScore W3139484330C2776257435 @default.
- W3139484330 hasConceptScore W3139484330C31258907 @default.
- W3139484330 hasConceptScore W3139484330C41008148 @default.
- W3139484330 hasConceptScore W3139484330C48044578 @default.
- W3139484330 hasConceptScore W3139484330C79581498 @default.
- W3139484330 hasConceptScore W3139484330C79974875 @default.
- W3139484330 hasConceptScore W3139484330C83283714 @default.
- W3139484330 hasConceptScore W3139484330C95457728 @default.
- W3139484330 hasLocation W31394843301 @default.
- W3139484330 hasOpenAccess W3139484330 @default.
- W3139484330 hasPrimaryLocation W31394843301 @default.
- W3139484330 hasRelatedWork W1545469897 @default.
- W3139484330 hasRelatedWork W1988027958 @default.
- W3139484330 hasRelatedWork W2045282436 @default.
- W3139484330 hasRelatedWork W2277643002 @default.
- W3139484330 hasRelatedWork W2461317517 @default.
- W3139484330 hasRelatedWork W2496391885 @default.
- W3139484330 hasRelatedWork W2560369805 @default.
- W3139484330 hasRelatedWork W2596462647 @default.
- W3139484330 hasRelatedWork W2892464762 @default.
- W3139484330 hasRelatedWork W2953590165 @default.
- W3139484330 hasRelatedWork W2998875332 @default.
- W3139484330 hasRelatedWork W3007323336 @default.
- W3139484330 hasRelatedWork W3022437934 @default.
- W3139484330 hasRelatedWork W3022455671 @default.
- W3139484330 hasRelatedWork W3047653192 @default.
- W3139484330 hasRelatedWork W3082705249 @default.
- W3139484330 hasRelatedWork W3084501734 @default.
- W3139484330 hasRelatedWork W3094736321 @default.
- W3139484330 hasRelatedWork W3106172335 @default.
- W3139484330 hasRelatedWork W3198232371 @default.
- W3139484330 isParatext "false" @default.
- W3139484330 isRetracted "false" @default.
- W3139484330 magId "3139484330" @default.
- W3139484330 workType "dissertation" @default.