Matches in SemOpenAlex for { <https://semopenalex.org/work/W2538903612> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W2538903612 abstract "Cost-effective high-performance can be achieved using clusters of Commercial Off-The-Shelf (COTS) computers interconnected by high-speed networks. When clusters are used for critical applications and/or in hostile environment, the required system reliability can only be achieved using fault tolerance techniques that allow the system to continue to operate correctly despite component failure. Cluster management middleware (CMM) is a software layer above the operating system controlling individual nodes and below the applications. The CMM schedules tasks on a cluster, controls access to shared resources, provides for task submission and monitoring, and coordinates the cluster's fault tolerance mechanisms. Reliable operation of the cluster requires reliable, continuous operation of the management middleware. This dissertation is focused on the key challenges in building highly reliable CMM. The system is based on centralized decision making. However, unlike most other cluster middleware, the manager is protected by Byzantine fault-tolerant state machine replication and the ability to restore the management service to full functionality and full fault tolerance following arbitrary single faults. To this end, we use a low-cost fault-tolerant replication mechanism coupled with on-line self-diagnosis and reconfiguration. The robust replicated manager is coupled with less aggressive fault tolerance mechanisms for dealing with less critical system components and with a fault-tolerant system bootstrapping mechanism. A fault-tolerant cluster designed to operate autonomously, must include a highly-reliable trusted hardcore to control critical functions such as the initiation of a node reset. We describe the functionality required from this trusted hardcore and its interactions with the replicated cluster manager. The result of this work is a carefully balanced integrated set of efficient practical techniques for aggressive fault tolerance. These techniques allow a highly reliable system to be built using mostly standard COTS hardware and software components. This is demonstrated in an operational system, called Ghidrah, that has been built at UCLA. This dissertation includes preliminary performance evaluation of Ghidrah and validation of the fault tolerance mechanisms by fault injection experiments." @default.
- W2538903612 created "2016-10-28" @default.
- W2538903612 creator A5022191573 @default.
- W2538903612 creator A5076257646 @default.
- W2538903612 date "2006-01-01" @default.
- W2538903612 modified "2023-09-24" @default.
- W2538903612 title "Fault-tolerant cluster management" @default.
- W2538903612 hasPublicationYear "2006" @default.
- W2538903612 type Work @default.
- W2538903612 sameAs 2538903612 @default.
- W2538903612 citedByCount "1" @default.
- W2538903612 crossrefType "journal-article" @default.
- W2538903612 hasAuthorship W2538903612A5022191573 @default.
- W2538903612 hasAuthorship W2538903612A5076257646 @default.
- W2538903612 hasConcept C105795698 @default.
- W2538903612 hasConcept C106159729 @default.
- W2538903612 hasConcept C108074857 @default.
- W2538903612 hasConcept C111919701 @default.
- W2538903612 hasConcept C119701452 @default.
- W2538903612 hasConcept C120314980 @default.
- W2538903612 hasConcept C12590798 @default.
- W2538903612 hasConcept C127413603 @default.
- W2538903612 hasConcept C147494362 @default.
- W2538903612 hasConcept C149635348 @default.
- W2538903612 hasConcept C162324750 @default.
- W2538903612 hasConcept C169468491 @default.
- W2538903612 hasConcept C2779795794 @default.
- W2538903612 hasConcept C33923547 @default.
- W2538903612 hasConcept C41008148 @default.
- W2538903612 hasConcept C50712370 @default.
- W2538903612 hasConcept C62611344 @default.
- W2538903612 hasConcept C63540848 @default.
- W2538903612 hasConcept C66938386 @default.
- W2538903612 hasConceptScore W2538903612C105795698 @default.
- W2538903612 hasConceptScore W2538903612C106159729 @default.
- W2538903612 hasConceptScore W2538903612C108074857 @default.
- W2538903612 hasConceptScore W2538903612C111919701 @default.
- W2538903612 hasConceptScore W2538903612C119701452 @default.
- W2538903612 hasConceptScore W2538903612C120314980 @default.
- W2538903612 hasConceptScore W2538903612C12590798 @default.
- W2538903612 hasConceptScore W2538903612C127413603 @default.
- W2538903612 hasConceptScore W2538903612C147494362 @default.
- W2538903612 hasConceptScore W2538903612C149635348 @default.
- W2538903612 hasConceptScore W2538903612C162324750 @default.
- W2538903612 hasConceptScore W2538903612C169468491 @default.
- W2538903612 hasConceptScore W2538903612C2779795794 @default.
- W2538903612 hasConceptScore W2538903612C33923547 @default.
- W2538903612 hasConceptScore W2538903612C41008148 @default.
- W2538903612 hasConceptScore W2538903612C50712370 @default.
- W2538903612 hasConceptScore W2538903612C62611344 @default.
- W2538903612 hasConceptScore W2538903612C63540848 @default.
- W2538903612 hasConceptScore W2538903612C66938386 @default.
- W2538903612 hasLocation W25389036121 @default.
- W2538903612 hasOpenAccess W2538903612 @default.
- W2538903612 hasPrimaryLocation W25389036121 @default.
- W2538903612 hasRelatedWork W111121798 @default.
- W2538903612 hasRelatedWork W1488443159 @default.
- W2538903612 hasRelatedWork W1582967675 @default.
- W2538903612 hasRelatedWork W1659809143 @default.
- W2538903612 hasRelatedWork W1975701581 @default.
- W2538903612 hasRelatedWork W1980984075 @default.
- W2538903612 hasRelatedWork W2010507067 @default.
- W2538903612 hasRelatedWork W2094134528 @default.
- W2538903612 hasRelatedWork W2099560402 @default.
- W2538903612 hasRelatedWork W2118753804 @default.
- W2538903612 hasRelatedWork W2138541468 @default.
- W2538903612 hasRelatedWork W2156508032 @default.
- W2538903612 hasRelatedWork W2156832230 @default.
- W2538903612 hasRelatedWork W2288284517 @default.
- W2538903612 hasRelatedWork W2323592907 @default.
- W2538903612 hasRelatedWork W2579267047 @default.
- W2538903612 hasRelatedWork W2788009896 @default.
- W2538903612 hasRelatedWork W2908016640 @default.
- W2538903612 hasRelatedWork W2046932804 @default.
- W2538903612 hasRelatedWork W2272150896 @default.
- W2538903612 isParatext "false" @default.
- W2538903612 isRetracted "false" @default.
- W2538903612 magId "2538903612" @default.
- W2538903612 workType "article" @default.