Matches in SemOpenAlex for { <https://semopenalex.org/work/W2916341479> ?p ?o ?g. }
Showing items 1 to 92 of
92
with 100 items per page.
- W2916341479 endingPage "12" @default.
- W2916341479 startingPage "1" @default.
- W2916341479 abstract "This paper compares the performance of different approaches to tolerate failures for applications executing on large-scale failure-prone platforms. We study (i) Rigid applications, which use a constant number of processors throughout execution; (ii) Moldable applications, which can use a different number of processors after each restart following a fail-stop error; and (iii) GridShaped applications, which are moldable applications restricted to use rectangular processor grids (such as many dense linear algebra kernels). We start with checkpoint/restart, the de-facto standard approach. For each application type, we compute the optimal number of failures (i.e. that maximizes the yield of the application) to tolerate before relinquishing the current allocation and waiting until a new resource can be allocated, and we determine the optimal yield that can be achieved. For GridShaped applications, we also investigate Application Based Fault Tolerance (ABFT) techniques and perform the same analysis, computing the optimal number of failures to tolerate and the associated yield. We instantiate our performance model with realistic applicative scenarios and make it publicly available for further usage. We show that using spare nodes grants a much better yield than currently used strategies that restart after each failure. Moreover, the yield is similar for Rigid, Moldable and GridShaped applications, while the optimal number of failures to tolerate is very high, even for a short wait time in between allocations. Finally, Moldable applications have the advantage to restart less frequently than Rigid applications." @default.
- W2916341479 created "2019-03-02" @default.
- W2916341479 creator A5001838181 @default.
- W2916341479 creator A5008117654 @default.
- W2916341479 creator A5010055736 @default.
- W2916341479 creator A5020765861 @default.
- W2916341479 creator A5054873210 @default.
- W2916341479 creator A5068707481 @default.
- W2916341479 creator A5075517045 @default.
- W2916341479 date "2019-07-01" @default.
- W2916341479 modified "2023-10-15" @default.
- W2916341479 title "Comparing the performance of rigid, moldable and grid-shaped applications on failure-prone HPC platforms" @default.
- W2916341479 cites W1983303752 @default.
- W2916341479 cites W2033656974 @default.
- W2916341479 cites W2052455844 @default.
- W2916341479 cites W2072072075 @default.
- W2916341479 cites W2081605270 @default.
- W2916341479 cites W2083613288 @default.
- W2916341479 cites W2089818961 @default.
- W2916341479 cites W2095059727 @default.
- W2916341479 cites W2128577831 @default.
- W2916341479 cites W2133046454 @default.
- W2916341479 cites W2174151054 @default.
- W2916341479 cites W4233783938 @default.
- W2916341479 cites W4237202528 @default.
- W2916341479 doi "https://doi.org/10.1016/j.parco.2019.02.002" @default.
- W2916341479 hasPublicationYear "2019" @default.
- W2916341479 type Work @default.
- W2916341479 sameAs 2916341479 @default.
- W2916341479 citedByCount "8" @default.
- W2916341479 countsByYear W29163414792020 @default.
- W2916341479 countsByYear W29163414792021 @default.
- W2916341479 countsByYear W29163414792022 @default.
- W2916341479 crossrefType "journal-article" @default.
- W2916341479 hasAuthorship W2916341479A5001838181 @default.
- W2916341479 hasAuthorship W2916341479A5008117654 @default.
- W2916341479 hasAuthorship W2916341479A5010055736 @default.
- W2916341479 hasAuthorship W2916341479A5020765861 @default.
- W2916341479 hasAuthorship W2916341479A5054873210 @default.
- W2916341479 hasAuthorship W2916341479A5068707481 @default.
- W2916341479 hasAuthorship W2916341479A5075517045 @default.
- W2916341479 hasBestOaLocation W29163414791 @default.
- W2916341479 hasConcept C120314980 @default.
- W2916341479 hasConcept C127413603 @default.
- W2916341479 hasConcept C134121241 @default.
- W2916341479 hasConcept C173608175 @default.
- W2916341479 hasConcept C187691185 @default.
- W2916341479 hasConcept C191897082 @default.
- W2916341479 hasConcept C192562407 @default.
- W2916341479 hasConcept C194648553 @default.
- W2916341479 hasConcept C2524010 @default.
- W2916341479 hasConcept C33923547 @default.
- W2916341479 hasConcept C41008148 @default.
- W2916341479 hasConcept C63540848 @default.
- W2916341479 hasConcept C78519656 @default.
- W2916341479 hasConceptScore W2916341479C120314980 @default.
- W2916341479 hasConceptScore W2916341479C127413603 @default.
- W2916341479 hasConceptScore W2916341479C134121241 @default.
- W2916341479 hasConceptScore W2916341479C173608175 @default.
- W2916341479 hasConceptScore W2916341479C187691185 @default.
- W2916341479 hasConceptScore W2916341479C191897082 @default.
- W2916341479 hasConceptScore W2916341479C192562407 @default.
- W2916341479 hasConceptScore W2916341479C194648553 @default.
- W2916341479 hasConceptScore W2916341479C2524010 @default.
- W2916341479 hasConceptScore W2916341479C33923547 @default.
- W2916341479 hasConceptScore W2916341479C41008148 @default.
- W2916341479 hasConceptScore W2916341479C63540848 @default.
- W2916341479 hasConceptScore W2916341479C78519656 @default.
- W2916341479 hasFunder F4320335353 @default.
- W2916341479 hasLocation W29163414791 @default.
- W2916341479 hasLocation W29163414792 @default.
- W2916341479 hasLocation W29163414793 @default.
- W2916341479 hasLocation W29163414794 @default.
- W2916341479 hasOpenAccess W2916341479 @default.
- W2916341479 hasPrimaryLocation W29163414791 @default.
- W2916341479 hasRelatedWork W2277714805 @default.
- W2916341479 hasRelatedWork W2319226115 @default.
- W2916341479 hasRelatedWork W2366601680 @default.
- W2916341479 hasRelatedWork W2376859990 @default.
- W2916341479 hasRelatedWork W2381161177 @default.
- W2916341479 hasRelatedWork W2392193501 @default.
- W2916341479 hasRelatedWork W2912704652 @default.
- W2916341479 hasRelatedWork W2970750595 @default.
- W2916341479 hasRelatedWork W4295074292 @default.
- W2916341479 hasRelatedWork W830772239 @default.
- W2916341479 hasVolume "85" @default.
- W2916341479 isParatext "false" @default.
- W2916341479 isRetracted "false" @default.
- W2916341479 magId "2916341479" @default.
- W2916341479 workType "article" @default.