Matches in SemOpenAlex for { <https://semopenalex.org/work/W2008505510> ?p ?o ?g. }
- W2008505510 abstract "Reliability is increasingly becoming a challenge for high-performance computing (HPC) systems with thousands of nodes, such as IBM's Blue Gene/L. A shorter mean-time-to-failure can be addressed by adding fault tolerance to reconfigure working nodes to ensure that communication and computation can progress. However, existing approaches fall short in providing scalability and small recon guration overhead within the fault-tolerant layer.This paper contributes a scalable approach to recon gure the communication infrastructure after node failures. We propose a decentralized (peer-to-peer) protocol that maintains a consistent view of active nodes in the presence of faults. Our protocol shows response times in the order of hundreds of microseconds and single-digit milliseconds for recon guration using MPI over BlueGene/L and TCP over Gigabit, respectively. The protocol can be adapted to match the network topology to further increase performance. We also verify experimental results against a performance model, which demonstrates the scalability of the approach. Hence, the membership service is suitable for deployment in the communication layer of MPI runtime systems, and we have integrated an early version into LAM/MPI." @default.
- W2008505510 created "2016-06-24" @default.
- W2008505510 creator A5019453159 @default.
- W2008505510 creator A5031749427 @default.
- W2008505510 creator A5055838753 @default.
- W2008505510 creator A5069911834 @default.
- W2008505510 creator A5083030982 @default.
- W2008505510 date "2006-06-28" @default.
- W2008505510 modified "2023-09-23" @default.
- W2008505510 title "Scalable, fault tolerant membership for MPI tasks on HPC systems" @default.
- W2008505510 cites W1498659283 @default.
- W2008505510 cites W1577580543 @default.
- W2008505510 cites W1825216778 @default.
- W2008505510 cites W1993383198 @default.
- W2008505510 cites W2045450417 @default.
- W2008505510 cites W2046136445 @default.
- W2008505510 cites W2046980086 @default.
- W2008505510 cites W2047686393 @default.
- W2008505510 cites W2064757275 @default.
- W2008505510 cites W2077671984 @default.
- W2008505510 cites W2081612620 @default.
- W2008505510 cites W2091876663 @default.
- W2008505510 cites W2102061396 @default.
- W2008505510 cites W2108095339 @default.
- W2008505510 cites W2110966151 @default.
- W2008505510 cites W2114035455 @default.
- W2008505510 cites W2130264930 @default.
- W2008505510 cites W2141266148 @default.
- W2008505510 cites W2145325071 @default.
- W2008505510 cites W2171371822 @default.
- W2008505510 doi "https://doi.org/10.1145/1183401.1183433" @default.
- W2008505510 hasPublicationYear "2006" @default.
- W2008505510 type Work @default.
- W2008505510 sameAs 2008505510 @default.
- W2008505510 citedByCount "19" @default.
- W2008505510 countsByYear W20085055102012 @default.
- W2008505510 countsByYear W20085055102013 @default.
- W2008505510 countsByYear W20085055102015 @default.
- W2008505510 countsByYear W20085055102019 @default.
- W2008505510 countsByYear W20085055102021 @default.
- W2008505510 crossrefType "proceedings-article" @default.
- W2008505510 hasAuthorship W2008505510A5019453159 @default.
- W2008505510 hasAuthorship W2008505510A5031749427 @default.
- W2008505510 hasAuthorship W2008505510A5055838753 @default.
- W2008505510 hasAuthorship W2008505510A5069911834 @default.
- W2008505510 hasAuthorship W2008505510A5083030982 @default.
- W2008505510 hasBestOaLocation W20085055102 @default.
- W2008505510 hasConcept C111919701 @default.
- W2008505510 hasConcept C120314980 @default.
- W2008505510 hasConcept C121332964 @default.
- W2008505510 hasConcept C127413603 @default.
- W2008505510 hasConcept C142724271 @default.
- W2008505510 hasConcept C163258240 @default.
- W2008505510 hasConcept C171250308 @default.
- W2008505510 hasConcept C173608175 @default.
- W2008505510 hasConcept C192562407 @default.
- W2008505510 hasConcept C204787440 @default.
- W2008505510 hasConcept C2779960059 @default.
- W2008505510 hasConcept C2780385302 @default.
- W2008505510 hasConcept C31258907 @default.
- W2008505510 hasConcept C41008148 @default.
- W2008505510 hasConcept C43214815 @default.
- W2008505510 hasConcept C48044578 @default.
- W2008505510 hasConcept C62520636 @default.
- W2008505510 hasConcept C62611344 @default.
- W2008505510 hasConcept C63540848 @default.
- W2008505510 hasConcept C66938386 @default.
- W2008505510 hasConcept C70388272 @default.
- W2008505510 hasConcept C71924100 @default.
- W2008505510 hasConcept C83283714 @default.
- W2008505510 hasConceptScore W2008505510C111919701 @default.
- W2008505510 hasConceptScore W2008505510C120314980 @default.
- W2008505510 hasConceptScore W2008505510C121332964 @default.
- W2008505510 hasConceptScore W2008505510C127413603 @default.
- W2008505510 hasConceptScore W2008505510C142724271 @default.
- W2008505510 hasConceptScore W2008505510C163258240 @default.
- W2008505510 hasConceptScore W2008505510C171250308 @default.
- W2008505510 hasConceptScore W2008505510C173608175 @default.
- W2008505510 hasConceptScore W2008505510C192562407 @default.
- W2008505510 hasConceptScore W2008505510C204787440 @default.
- W2008505510 hasConceptScore W2008505510C2779960059 @default.
- W2008505510 hasConceptScore W2008505510C2780385302 @default.
- W2008505510 hasConceptScore W2008505510C31258907 @default.
- W2008505510 hasConceptScore W2008505510C41008148 @default.
- W2008505510 hasConceptScore W2008505510C43214815 @default.
- W2008505510 hasConceptScore W2008505510C48044578 @default.
- W2008505510 hasConceptScore W2008505510C62520636 @default.
- W2008505510 hasConceptScore W2008505510C62611344 @default.
- W2008505510 hasConceptScore W2008505510C63540848 @default.
- W2008505510 hasConceptScore W2008505510C66938386 @default.
- W2008505510 hasConceptScore W2008505510C70388272 @default.
- W2008505510 hasConceptScore W2008505510C71924100 @default.
- W2008505510 hasConceptScore W2008505510C83283714 @default.
- W2008505510 hasLocation W20085055101 @default.
- W2008505510 hasLocation W20085055102 @default.
- W2008505510 hasOpenAccess W2008505510 @default.
- W2008505510 hasPrimaryLocation W20085055101 @default.
- W2008505510 hasRelatedWork W2018430781 @default.
- W2008505510 hasRelatedWork W2027487876 @default.
- W2008505510 hasRelatedWork W2064720525 @default.