Availability of a distributed computer system with failures (Q1069693)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Availability of a distributed computer system with failures |
scientific article; zbMATH DE number 3936464
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Availability of a distributed computer system with failures |
scientific article; zbMATH DE number 3936464 |
Statements
Availability of a distributed computer system with failures (English)
0 references
1986
0 references
A model for distributed systems with failing components is presented. Each node may fail and during its recovery the load is distributed to other nodes that are up. The model assumes periodic checkpointing for error recovery and testing of the status of other nodes for the distribution of load. We consider the availability of a node, which is the proportion of time a node is available for processing, as the performance measure. A methodology for optimizing the availability of a node with respect to the checkpointing and testing intervals is given. A decomposition approach that uses the steady-state flow balance condition to estimate the load at the node is proposed. Numerical examples are presented to demonstrate the usefulness of the technique. For the case in which all nodes are identical, closed form solutions are obtained.
0 references
distributed systems
0 references
failing components
0 references
error recovery
0 references
performance measure
0 references