Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287328462> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4287328462 abstract "Scaling supercomputers comes with an increase in failure rates due to the increasing number of hardware components. In standard practice, applications are made resilient through checkpointing data and restarting execution after a failure occurs to resume from the latest check-point. However, re-deploying an application incurs overhead by tearing down and re-instating execution, and possibly limiting checkpointing retrieval from slow permanent storage. In this paper we present Reinit++, a new design and implementation of the Reinit approach for global-restart recovery, which avoids application re-deployment. We extensively evaluate Reinit++ contrasted with the leading MPI fault-tolerance approach of ULFM, implementing global-restart recovery, and the typical practice of restarting an application to derive new insight on performance. Experimentation with three different HPC proxy applications made resilient to withstand process and node failures shows that Reinit++ recovers much faster than restarting, up to 6x, or ULFM, up to 3x, and that it scales excellently as the number of MPI processes grows." @default.
- W4287328462 created "2022-07-25" @default.
- W4287328462 creator A5026765811 @default.
- W4287328462 creator A5033868370 @default.
- W4287328462 creator A5081323651 @default.
- W4287328462 date "2021-02-13" @default.
- W4287328462 modified "2023-09-23" @default.
- W4287328462 title "Reinit++: Evaluating the Performance of Global-Restart Recovery Methods For MPI Fault Tolerance" @default.
- W4287328462 doi "https://doi.org/10.48550/arxiv.2102.06896" @default.
- W4287328462 hasPublicationYear "2021" @default.
- W4287328462 type Work @default.
- W4287328462 citedByCount "0" @default.
- W4287328462 crossrefType "posted-content" @default.
- W4287328462 hasAuthorship W4287328462A5026765811 @default.
- W4287328462 hasAuthorship W4287328462A5033868370 @default.
- W4287328462 hasAuthorship W4287328462A5081323651 @default.
- W4287328462 hasBestOaLocation W42873284621 @default.
- W4287328462 hasConcept C105339364 @default.
- W4287328462 hasConcept C111919701 @default.
- W4287328462 hasConcept C120314980 @default.
- W4287328462 hasConcept C121332964 @default.
- W4287328462 hasConcept C127413603 @default.
- W4287328462 hasConcept C173608175 @default.
- W4287328462 hasConcept C188198153 @default.
- W4287328462 hasConcept C2524010 @default.
- W4287328462 hasConcept C2776097996 @default.
- W4287328462 hasConcept C2779960059 @default.
- W4287328462 hasConcept C33923547 @default.
- W4287328462 hasConcept C41008148 @default.
- W4287328462 hasConcept C62611344 @default.
- W4287328462 hasConcept C63540848 @default.
- W4287328462 hasConcept C66938386 @default.
- W4287328462 hasConcept C78519656 @default.
- W4287328462 hasConcept C97355855 @default.
- W4287328462 hasConcept C98045186 @default.
- W4287328462 hasConcept C99844830 @default.
- W4287328462 hasConceptScore W4287328462C105339364 @default.
- W4287328462 hasConceptScore W4287328462C111919701 @default.
- W4287328462 hasConceptScore W4287328462C120314980 @default.
- W4287328462 hasConceptScore W4287328462C121332964 @default.
- W4287328462 hasConceptScore W4287328462C127413603 @default.
- W4287328462 hasConceptScore W4287328462C173608175 @default.
- W4287328462 hasConceptScore W4287328462C188198153 @default.
- W4287328462 hasConceptScore W4287328462C2524010 @default.
- W4287328462 hasConceptScore W4287328462C2776097996 @default.
- W4287328462 hasConceptScore W4287328462C2779960059 @default.
- W4287328462 hasConceptScore W4287328462C33923547 @default.
- W4287328462 hasConceptScore W4287328462C41008148 @default.
- W4287328462 hasConceptScore W4287328462C62611344 @default.
- W4287328462 hasConceptScore W4287328462C63540848 @default.
- W4287328462 hasConceptScore W4287328462C66938386 @default.
- W4287328462 hasConceptScore W4287328462C78519656 @default.
- W4287328462 hasConceptScore W4287328462C97355855 @default.
- W4287328462 hasConceptScore W4287328462C98045186 @default.
- W4287328462 hasConceptScore W4287328462C99844830 @default.
- W4287328462 hasLocation W42873284621 @default.
- W4287328462 hasOpenAccess W4287328462 @default.
- W4287328462 hasPrimaryLocation W42873284621 @default.
- W4287328462 hasRelatedWork W11122349 @default.
- W4287328462 hasRelatedWork W1874418 @default.
- W4287328462 hasRelatedWork W2559670 @default.
- W4287328462 hasRelatedWork W2991865 @default.
- W4287328462 hasRelatedWork W4479500 @default.
- W4287328462 hasRelatedWork W5574817 @default.
- W4287328462 hasRelatedWork W5664962 @default.
- W4287328462 hasRelatedWork W6262472 @default.
- W4287328462 hasRelatedWork W9819130 @default.
- W4287328462 hasRelatedWork W5406675 @default.
- W4287328462 isParatext "false" @default.
- W4287328462 isRetracted "false" @default.
- W4287328462 workType "article" @default.