Matches in SemOpenAlex for { <https://semopenalex.org/work/W3005478802> ?p ?o ?g. }
- W3005478802 endingPage "226" @default.
- W3005478802 startingPage "208" @default.
- W3005478802 abstract "With the increasing fault rate on high-end supercomputers, the topic of fault tolerance has been gathering attention. To cope with this situation, various fault-tolerance techniques are under investigation; these include user-level, algorithm-based fault-tolerance techniques and parallel execution environments that enable jobs to continue following node failure. Even with these techniques, some programs with static load balancing, such as stencil computation, may underperform after a failure recovery. Even when spare nodes are present, they are not always substituted for failed nodes in an effective way. This article considers the questions of how spare nodes should be allocated, how to substitute them for faulty nodes, and how much the communication performance is affected by such a substitution. The third question stems from the modification of the rank mapping by node substitutions, which can incur additional message collisions. In a stencil computation, rank mapping is done in a straightforward way on a Cartesian network without incurring any message collisions. However, once a substitution has occurred, the optimal node-rank mapping may be destroyed. Therefore, these questions must be answered in a way that minimizes the degradation of communication performance. In this article, several spare node allocation and failed node substitution methods will be proposed, analyzed, and compared in terms of communication performance following the substitution. The proposed substitution methods are named sliding methods. The sliding methods are analyzed by using our developed simulation program and evaluated by using the K computer, Blue Gene/Q (BG/Q), and TSUBAME 2.5. It will be shown that when failures occur, the stencil communication performance on the K and BG/Q can be slowed around 10 times depending on the number of node failures. The barrier performance on the K can be cut in half. On BG/Q, barrier performance can be slowed by a factor of 10. Further, it will also be shown that almost no such communication performance degradation can be seen on TSUBAME 2.5. This is because TSUBAME 2.5 has an Infiniband network connected with a FatTree topology, while the K computer and BG/Q have dedicated Cartesian networks. Thus, the communication performance degradation depends on network characteristics." @default.
- W3005478802 created "2020-02-14" @default.
- W3005478802 creator A5008117654 @default.
- W3005478802 creator A5010055736 @default.
- W3005478802 creator A5020765861 @default.
- W3005478802 creator A5035283499 @default.
- W3005478802 creator A5053784620 @default.
- W3005478802 creator A5054873210 @default.
- W3005478802 date "2020-02-04" @default.
- W3005478802 modified "2023-10-18" @default.
- W3005478802 title "Overhead of using spare nodes" @default.
- W3005478802 cites W1977520786 @default.
- W3005478802 cites W1979572629 @default.
- W3005478802 cites W2001219263 @default.
- W3005478802 cites W2009852231 @default.
- W3005478802 cites W2017060126 @default.
- W3005478802 cites W2021234574 @default.
- W3005478802 cites W2025024269 @default.
- W3005478802 cites W2049644498 @default.
- W3005478802 cites W2063924830 @default.
- W3005478802 cites W2081413727 @default.
- W3005478802 cites W2087094132 @default.
- W3005478802 cites W2088112706 @default.
- W3005478802 cites W2096504919 @default.
- W3005478802 cites W2116886065 @default.
- W3005478802 cites W2128577831 @default.
- W3005478802 cites W2151984682 @default.
- W3005478802 cites W2157237396 @default.
- W3005478802 cites W2229245554 @default.
- W3005478802 cites W2234212075 @default.
- W3005478802 cites W2240709090 @default.
- W3005478802 cites W2247921729 @default.
- W3005478802 cites W4245507143 @default.
- W3005478802 doi "https://doi.org/10.1177/1094342020901885" @default.
- W3005478802 hasPublicationYear "2020" @default.
- W3005478802 type Work @default.
- W3005478802 sameAs 3005478802 @default.
- W3005478802 citedByCount "4" @default.
- W3005478802 countsByYear W30054788022020 @default.
- W3005478802 countsByYear W30054788022021 @default.
- W3005478802 countsByYear W30054788022022 @default.
- W3005478802 crossrefType "journal-article" @default.
- W3005478802 hasAuthorship W3005478802A5008117654 @default.
- W3005478802 hasAuthorship W3005478802A5010055736 @default.
- W3005478802 hasAuthorship W3005478802A5020765861 @default.
- W3005478802 hasAuthorship W3005478802A5035283499 @default.
- W3005478802 hasAuthorship W3005478802A5053784620 @default.
- W3005478802 hasAuthorship W3005478802A5054873210 @default.
- W3005478802 hasConcept C111919701 @default.
- W3005478802 hasConcept C11413529 @default.
- W3005478802 hasConcept C120314980 @default.
- W3005478802 hasConcept C127413603 @default.
- W3005478802 hasConcept C173608175 @default.
- W3005478802 hasConcept C194648553 @default.
- W3005478802 hasConcept C199360897 @default.
- W3005478802 hasConcept C2778220771 @default.
- W3005478802 hasConcept C2779960059 @default.
- W3005478802 hasConcept C31258907 @default.
- W3005478802 hasConcept C41008148 @default.
- W3005478802 hasConcept C45374587 @default.
- W3005478802 hasConcept C62611344 @default.
- W3005478802 hasConcept C63540848 @default.
- W3005478802 hasConcept C66938386 @default.
- W3005478802 hasConcept C78519656 @default.
- W3005478802 hasConceptScore W3005478802C111919701 @default.
- W3005478802 hasConceptScore W3005478802C11413529 @default.
- W3005478802 hasConceptScore W3005478802C120314980 @default.
- W3005478802 hasConceptScore W3005478802C127413603 @default.
- W3005478802 hasConceptScore W3005478802C173608175 @default.
- W3005478802 hasConceptScore W3005478802C194648553 @default.
- W3005478802 hasConceptScore W3005478802C199360897 @default.
- W3005478802 hasConceptScore W3005478802C2778220771 @default.
- W3005478802 hasConceptScore W3005478802C2779960059 @default.
- W3005478802 hasConceptScore W3005478802C31258907 @default.
- W3005478802 hasConceptScore W3005478802C41008148 @default.
- W3005478802 hasConceptScore W3005478802C45374587 @default.
- W3005478802 hasConceptScore W3005478802C62611344 @default.
- W3005478802 hasConceptScore W3005478802C63540848 @default.
- W3005478802 hasConceptScore W3005478802C66938386 @default.
- W3005478802 hasConceptScore W3005478802C78519656 @default.
- W3005478802 hasIssue "2" @default.
- W3005478802 hasLocation W30054788021 @default.
- W3005478802 hasOpenAccess W3005478802 @default.
- W3005478802 hasPrimaryLocation W30054788021 @default.
- W3005478802 hasRelatedWork W1501459447 @default.
- W3005478802 hasRelatedWork W2143273548 @default.
- W3005478802 hasRelatedWork W2145568978 @default.
- W3005478802 hasRelatedWork W2331290679 @default.
- W3005478802 hasRelatedWork W2391167130 @default.
- W3005478802 hasRelatedWork W2801842031 @default.
- W3005478802 hasRelatedWork W3005478802 @default.
- W3005478802 hasRelatedWork W3048889998 @default.
- W3005478802 hasRelatedWork W4242263690 @default.
- W3005478802 hasRelatedWork W91154980 @default.
- W3005478802 hasVolume "34" @default.
- W3005478802 isParatext "false" @default.
- W3005478802 isRetracted "false" @default.
- W3005478802 magId "3005478802" @default.