Matches in SemOpenAlex for { <https://semopenalex.org/work/W3113708478> ?p ?o ?g. }
- W3113708478 abstract "HPC systems keep growing in size to meet the ever-increasing demand for performance and computational resources. Apart from increased performance, large scale systems face two challenges that hinder further growth: energy efficiency and resiliency. At the same time, applications seeking increased performance rely on advanced parallelism for exploiting system resources, which leads to increased pressure on system interconnects. At large system scales, increased communication locality can be beneficial both in terms of application performance and energy consumption. Towards this direction, several studies focus on deriving a mapping of an application's processes to system nodes in a way that communication cost is reduced. A common approach is to express both the application's communication patterns and the system architecture as graphs and then solve the corresponding mapping problem. Apart from communication cost, the completion time of a job can also be affected by node failures. Node failures may result in job abortions, requiring job restarts. In this paper, we address the problem of assigning processes to system resources with the goal of reducing communication cost while also taking into account node failures. The proposed approach is integrated into the Slurm resource manager. Evaluation results show that, in scenarios where few nodes have a low outage probability, the proposed process placement approach achieves a notable decrease in the completion time of batches of MPI jobs. Compared to the default process placement approach in Slurm, the reduction is 18.9% and 31%, respectively for two different MPI applications." @default.
- W3113708478 created "2021-01-05" @default.
- W3113708478 creator A5003422611 @default.
- W3113708478 creator A5050587692 @default.
- W3113708478 creator A5079517640 @default.
- W3113708478 date "2020-12-29" @default.
- W3113708478 modified "2023-09-27" @default.
- W3113708478 title "Improving the Performance and Resilience of MPI Parallel Jobs with Topology and Fault-Aware Process Placement." @default.
- W3113708478 cites W1531896033 @default.
- W3113708478 cites W1538076361 @default.
- W3113708478 cites W1553890549 @default.
- W3113708478 cites W1575350781 @default.
- W3113708478 cites W1963853421 @default.
- W3113708478 cites W1965302146 @default.
- W3113708478 cites W1979146060 @default.
- W3113708478 cites W1984788566 @default.
- W3113708478 cites W1992432622 @default.
- W3113708478 cites W2019465613 @default.
- W3113708478 cites W2022272738 @default.
- W3113708478 cites W2033296651 @default.
- W3113708478 cites W2039631162 @default.
- W3113708478 cites W2081235423 @default.
- W3113708478 cites W2089536264 @default.
- W3113708478 cites W2099148634 @default.
- W3113708478 cites W2104119282 @default.
- W3113708478 cites W2107934817 @default.
- W3113708478 cites W2112121929 @default.
- W3113708478 cites W2119541875 @default.
- W3113708478 cites W2122797462 @default.
- W3113708478 cites W2124520572 @default.
- W3113708478 cites W2125441055 @default.
- W3113708478 cites W2131239319 @default.
- W3113708478 cites W2131940306 @default.
- W3113708478 cites W2138036770 @default.
- W3113708478 cites W2138108438 @default.
- W3113708478 cites W2142759683 @default.
- W3113708478 cites W2147176980 @default.
- W3113708478 cites W2153216629 @default.
- W3113708478 cites W2163189799 @default.
- W3113708478 cites W2164945803 @default.
- W3113708478 cites W2505535144 @default.
- W3113708478 cites W2514990236 @default.
- W3113708478 cites W2538164805 @default.
- W3113708478 cites W2561672146 @default.
- W3113708478 cites W2735418241 @default.
- W3113708478 cites W2781031449 @default.
- W3113708478 hasPublicationYear "2020" @default.
- W3113708478 type Work @default.
- W3113708478 sameAs 3113708478 @default.
- W3113708478 citedByCount "0" @default.
- W3113708478 crossrefType "posted-content" @default.
- W3113708478 hasAuthorship W3113708478A5003422611 @default.
- W3113708478 hasAuthorship W3113708478A5050587692 @default.
- W3113708478 hasAuthorship W3113708478A5079517640 @default.
- W3113708478 hasConcept C111335779 @default.
- W3113708478 hasConcept C111919701 @default.
- W3113708478 hasConcept C120314980 @default.
- W3113708478 hasConcept C121332964 @default.
- W3113708478 hasConcept C127413603 @default.
- W3113708478 hasConcept C138885662 @default.
- W3113708478 hasConcept C2524010 @default.
- W3113708478 hasConcept C2779585090 @default.
- W3113708478 hasConcept C2779808786 @default.
- W3113708478 hasConcept C33923547 @default.
- W3113708478 hasConcept C41008148 @default.
- W3113708478 hasConcept C41895202 @default.
- W3113708478 hasConcept C62611344 @default.
- W3113708478 hasConcept C63540848 @default.
- W3113708478 hasConcept C66938386 @default.
- W3113708478 hasConcept C97355855 @default.
- W3113708478 hasConcept C98045186 @default.
- W3113708478 hasConceptScore W3113708478C111335779 @default.
- W3113708478 hasConceptScore W3113708478C111919701 @default.
- W3113708478 hasConceptScore W3113708478C120314980 @default.
- W3113708478 hasConceptScore W3113708478C121332964 @default.
- W3113708478 hasConceptScore W3113708478C127413603 @default.
- W3113708478 hasConceptScore W3113708478C138885662 @default.
- W3113708478 hasConceptScore W3113708478C2524010 @default.
- W3113708478 hasConceptScore W3113708478C2779585090 @default.
- W3113708478 hasConceptScore W3113708478C2779808786 @default.
- W3113708478 hasConceptScore W3113708478C33923547 @default.
- W3113708478 hasConceptScore W3113708478C41008148 @default.
- W3113708478 hasConceptScore W3113708478C41895202 @default.
- W3113708478 hasConceptScore W3113708478C62611344 @default.
- W3113708478 hasConceptScore W3113708478C63540848 @default.
- W3113708478 hasConceptScore W3113708478C66938386 @default.
- W3113708478 hasConceptScore W3113708478C97355855 @default.
- W3113708478 hasConceptScore W3113708478C98045186 @default.
- W3113708478 hasLocation W31137084781 @default.
- W3113708478 hasOpenAccess W3113708478 @default.
- W3113708478 hasPrimaryLocation W31137084781 @default.
- W3113708478 hasRelatedWork W1563841372 @default.
- W3113708478 hasRelatedWork W1578976001 @default.
- W3113708478 hasRelatedWork W2119512272 @default.
- W3113708478 hasRelatedWork W2189377486 @default.
- W3113708478 hasRelatedWork W2244100950 @default.
- W3113708478 hasRelatedWork W2409375804 @default.
- W3113708478 hasRelatedWork W2523508832 @default.
- W3113708478 hasRelatedWork W2566744474 @default.
- W3113708478 hasRelatedWork W2587466995 @default.