Matches in SemOpenAlex for { <https://semopenalex.org/work/W2018637963> ?p ?o ?g. }
Showing items 1 to 78 of
78
with 100 items per page.
- W2018637963 endingPage "25" @default.
- W2018637963 startingPage "15" @default.
- W2018637963 abstract "Application developers are investigating Algorithm Based Fault Tolerance (ABFT) techniques to improve the efficiency of application recovery beyond what traditional techniques alone can provide. Applications will depend on libraries to sustain failure-free performance across process failure to continue to use High Performance Computing (HPC) systems efficiently even in the presence of process failure. Optimized Message Passing Interface (MPI) collective operations are a critical component of many scalable HPC applications. However, most of the collective algorithms are not able to handle process failure. Next generation MPI implementations must provide fault aware versions of such algorithms that can sustain performance across process failure. This paper discusses the design and implementation of fault aware collective algorithms for tree structured communication patterns. The three design approaches of rerouting, lookup avoiding and rebalancing are described, and analyzed for their performance impact relative to similar fault unaware barrier and broadcast collective algorithms. The analysis shows that the rerouting approach causes a significant performance degradation while the rebalancing approach can bring the performance within 1% of the fault unaware performance. This paper also presents the impact of the run-through stabilization prototype on point-to-point communication, and analyzes the time to rebalance the tree while accounting for process failures." @default.
- W2018637963 created "2016-06-24" @default.
- W2018637963 creator A5008485839 @default.
- W2018637963 creator A5076123370 @default.
- W2018637963 date "2012-01-01" @default.
- W2018637963 modified "2023-09-27" @default.
- W2018637963 title "Analyzing fault aware collective performance in a process fault tolerant MPI" @default.
- W2018637963 cites W1493735863 @default.
- W2018637963 cites W1551972236 @default.
- W2018637963 cites W1568637577 @default.
- W2018637963 cites W1825216778 @default.
- W2018637963 cites W1855495706 @default.
- W2018637963 cites W1973269641 @default.
- W2018637963 cites W1991732708 @default.
- W2018637963 cites W1993383198 @default.
- W2018637963 cites W2033579026 @default.
- W2018637963 cites W2036641664 @default.
- W2018637963 cites W2054129151 @default.
- W2018637963 cites W2081409107 @default.
- W2018637963 cites W2083613288 @default.
- W2018637963 cites W2089536264 @default.
- W2018637963 cites W2117163917 @default.
- W2018637963 cites W2133201251 @default.
- W2018637963 cites W2133943294 @default.
- W2018637963 cites W2133994512 @default.
- W2018637963 cites W2138660187 @default.
- W2018637963 cites W2143360884 @default.
- W2018637963 cites W2151984682 @default.
- W2018637963 cites W2157664008 @default.
- W2018637963 cites W2165863720 @default.
- W2018637963 cites W2296772319 @default.
- W2018637963 cites W346518175 @default.
- W2018637963 cites W1020185495 @default.
- W2018637963 cites W132638985 @default.
- W2018637963 cites W2912637334 @default.
- W2018637963 doi "https://doi.org/10.1016/j.parco.2011.10.010" @default.
- W2018637963 hasPublicationYear "2012" @default.
- W2018637963 type Work @default.
- W2018637963 sameAs 2018637963 @default.
- W2018637963 citedByCount "3" @default.
- W2018637963 countsByYear W20186379632012 @default.
- W2018637963 countsByYear W20186379632014 @default.
- W2018637963 crossrefType "journal-article" @default.
- W2018637963 hasAuthorship W2018637963A5008485839 @default.
- W2018637963 hasAuthorship W2018637963A5076123370 @default.
- W2018637963 hasConcept C111919701 @default.
- W2018637963 hasConcept C120314980 @default.
- W2018637963 hasConcept C173608175 @default.
- W2018637963 hasConcept C41008148 @default.
- W2018637963 hasConcept C63540848 @default.
- W2018637963 hasConcept C98045186 @default.
- W2018637963 hasConceptScore W2018637963C111919701 @default.
- W2018637963 hasConceptScore W2018637963C120314980 @default.
- W2018637963 hasConceptScore W2018637963C173608175 @default.
- W2018637963 hasConceptScore W2018637963C41008148 @default.
- W2018637963 hasConceptScore W2018637963C63540848 @default.
- W2018637963 hasConceptScore W2018637963C98045186 @default.
- W2018637963 hasIssue "1-2" @default.
- W2018637963 hasLocation W20186379631 @default.
- W2018637963 hasOpenAccess W2018637963 @default.
- W2018637963 hasPrimaryLocation W20186379631 @default.
- W2018637963 hasRelatedWork W1513409726 @default.
- W2018637963 hasRelatedWork W2027487876 @default.
- W2018637963 hasRelatedWork W2063848003 @default.
- W2018637963 hasRelatedWork W2124870959 @default.
- W2018637963 hasRelatedWork W2270308778 @default.
- W2018637963 hasRelatedWork W2351378856 @default.
- W2018637963 hasRelatedWork W2978510736 @default.
- W2018637963 hasRelatedWork W3102446781 @default.
- W2018637963 hasRelatedWork W4242263690 @default.
- W2018637963 hasRelatedWork W2147034415 @default.
- W2018637963 hasVolume "38" @default.
- W2018637963 isParatext "false" @default.
- W2018637963 isRetracted "false" @default.
- W2018637963 magId "2018637963" @default.
- W2018637963 workType "article" @default.