The Fault Assumptions in Distributed Integrated Architectures 2007-01-3798
Distributed integrated architectures in the automotive and avionic domain result in hardware cost reduction, dependability improvements, and improved coordination between application subsystems compared to federated systems. In order to support safety-critical application subsystems, a distributed integrated architecture needs to support fault-tolerance strategies that enable the continued operation of the system in the presence of failures. The basis for the implementation and validation of fault-tolerance strategies are realistic fault assumptions, which are captured in a fault hypothesis. This paper describes a fault hypothesis for distributed integrated architectures, which takes into account the sharing of the communication and computational resources of a single distributed computer system among multiple application subsystems. Each node computer serves for the execution of multiple jobs. In analogy, the communication network interconnecting the node computers has to support message exchanges of more than one application subsystem. Using a generic system model of a distributed integrated architecture, we argue in favor of a differentiation of fault containment regions for hardware and software faults. Based on these fault containment regions, we discuss the failure modes, the failure rates, the maximum number of failures, and the recovery intervals. In particular, the fault hypothesis describes the assumptions concerning the respective frequencies of transient and permanent failures in consideration of recent semiconductor trends.