Matches in SemOpenAlex for { <https://semopenalex.org/work/W3136605886> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W3136605886 abstract "Checkpoint/restart (C/R) provides fault-tolerant computing capability, enables long running applications, and provides scheduling flexibility for computing centers to support diverse workloads with different priority. It is therefore vital to get transparent C/R capability working at NERSC. MANA, by Garg et. al., is a transparent checkpointing tool that has been selected due to its MPI-agnostic and network-agnostic approach. However, originally written as a proof-of-concept code, MANA was not ready to use with NERSC's diverse production workloads, which are dominated by MPI and hybrid MPI+OpenMP applications. In this talk, we present ongoing work at NERSC to enable MANA for NERSC's production workloads, including fixing bugs that were exposed by the top applications at NERSC, adding new features to address system changes, evaluating C/R overhead at scale, etc. The lessons learned from making MANA production-ready for HPC applications will be useful for C/R tool developers, supercomputing centers and HPC end-users alike." @default.
- W3136605886 created "2021-03-29" @default.
- W3136605886 creator A5011670569 @default.
- W3136605886 creator A5012983999 @default.
- W3136605886 creator A5016236164 @default.
- W3136605886 creator A5029789868 @default.
- W3136605886 creator A5037645907 @default.
- W3136605886 creator A5062102265 @default.
- W3136605886 creator A5070983794 @default.
- W3136605886 creator A5078814947 @default.
- W3136605886 date "2021-03-15" @default.
- W3136605886 modified "2023-10-02" @default.
- W3136605886 title "Improving scalability and reliability of MPI-agnostic transparent checkpointing for production workloads at NERSC." @default.
- W3136605886 cites W1510894298 @default.
- W3136605886 cites W1965091139 @default.
- W3136605886 cites W2116115793 @default.
- W3136605886 cites W2139244298 @default.
- W3136605886 cites W2963147607 @default.
- W3136605886 cites W2964339509 @default.
- W3136605886 hasPublicationYear "2021" @default.
- W3136605886 type Work @default.
- W3136605886 sameAs 3136605886 @default.
- W3136605886 citedByCount "0" @default.
- W3136605886 crossrefType "posted-content" @default.
- W3136605886 hasAuthorship W3136605886A5011670569 @default.
- W3136605886 hasAuthorship W3136605886A5012983999 @default.
- W3136605886 hasAuthorship W3136605886A5016236164 @default.
- W3136605886 hasAuthorship W3136605886A5029789868 @default.
- W3136605886 hasAuthorship W3136605886A5037645907 @default.
- W3136605886 hasAuthorship W3136605886A5062102265 @default.
- W3136605886 hasAuthorship W3136605886A5070983794 @default.
- W3136605886 hasAuthorship W3136605886A5078814947 @default.
- W3136605886 hasConcept C105795698 @default.
- W3136605886 hasConcept C111919701 @default.
- W3136605886 hasConcept C120314980 @default.
- W3136605886 hasConcept C2779960059 @default.
- W3136605886 hasConcept C2780598303 @default.
- W3136605886 hasConcept C33923547 @default.
- W3136605886 hasConcept C41008148 @default.
- W3136605886 hasConcept C48044578 @default.
- W3136605886 hasConcept C63540848 @default.
- W3136605886 hasConcept C83283714 @default.
- W3136605886 hasConceptScore W3136605886C105795698 @default.
- W3136605886 hasConceptScore W3136605886C111919701 @default.
- W3136605886 hasConceptScore W3136605886C120314980 @default.
- W3136605886 hasConceptScore W3136605886C2779960059 @default.
- W3136605886 hasConceptScore W3136605886C2780598303 @default.
- W3136605886 hasConceptScore W3136605886C33923547 @default.
- W3136605886 hasConceptScore W3136605886C41008148 @default.
- W3136605886 hasConceptScore W3136605886C48044578 @default.
- W3136605886 hasConceptScore W3136605886C63540848 @default.
- W3136605886 hasConceptScore W3136605886C83283714 @default.
- W3136605886 hasLocation W31366058861 @default.
- W3136605886 hasOpenAccess W3136605886 @default.
- W3136605886 hasPrimaryLocation W31366058861 @default.
- W3136605886 hasRelatedWork W1983500457 @default.
- W3136605886 hasRelatedWork W2010644199 @default.
- W3136605886 hasRelatedWork W2015990710 @default.
- W3136605886 hasRelatedWork W2060671020 @default.
- W3136605886 hasRelatedWork W2114262241 @default.
- W3136605886 hasRelatedWork W2304337560 @default.
- W3136605886 hasRelatedWork W2519918969 @default.
- W3136605886 hasRelatedWork W2535102629 @default.
- W3136605886 hasRelatedWork W2657850787 @default.
- W3136605886 hasRelatedWork W2730389840 @default.
- W3136605886 hasRelatedWork W2748226655 @default.
- W3136605886 hasRelatedWork W2759764687 @default.
- W3136605886 hasRelatedWork W2760347133 @default.
- W3136605886 hasRelatedWork W2767112467 @default.
- W3136605886 hasRelatedWork W2768395119 @default.
- W3136605886 hasRelatedWork W2963106581 @default.
- W3136605886 hasRelatedWork W3033770600 @default.
- W3136605886 hasRelatedWork W3096588682 @default.
- W3136605886 hasRelatedWork W2023325599 @default.
- W3136605886 hasRelatedWork W2281641124 @default.
- W3136605886 isParatext "false" @default.
- W3136605886 isRetracted "false" @default.
- W3136605886 magId "3136605886" @default.
- W3136605886 workType "article" @default.