Reliability Architect Framework and Toolset

The current proposal aims at building an architectural design tool for the analysis and the management of hard and soft errors as well in-field repair capabilities. This tool will embed the functions of several existing in-house tools, as well as providing new features.

rat-overview

The tool will consist in a single Excel spreadsheet document per analyzed system, organized in a collection of individual sheets. Some of the sheets will contain structural information about the system under analysis and will be populated with data entered by the user. Other sheets will be automatically prepared by the tool and will present overall reliability data computed on the basis of the user-provided information.

rat-structure

A specific tool section (the component library) will hold failure rate and failure modes of the components used in the system. It is the user’s responsibility to provide the component failure data. Data can be entered manually, by copy/paste operations or through the import of CSV files. Component failure data may reside in existing corporate databases or may be sourced externally.

The component library information, combined with structural information will be used in a reliability analysis aiming at providing failure rates and failure modes for the overall system. The results can be formatted in human or machine-readable reports and output files, documenting the reliability claims of the design.

The tool will have support for reliability budgeting, so that system architects can allocate the FIT rate budget to the components in a system, before the detailed design gets underway. The user can manually specify several scenarios and the tool will compute the reliability under each scenario.

The system architect will be able to use the tool very early in the specification phase to predict and budget the overall reliability figures of the system and to select implementation and design choices

ASIC design engineers will be able to use the tool to predict the reliability model of their design and ensure the established targets are met.

Reliability engineers will use the tool to produce a detailed reports with system level failure rates and predicted availability.

The implementation of the tool will be modular.

A Soft Error Rate calculator will evaluate the high-level Soft Error (SE) rates for the various blocks of the design, including memories, flip-flops and combinatorial logic. The SER calculator will use intrinsic, raw SER figures for each considered component type in conjunction with de-rating factors, provided as default values or available as a de-rating library (another sheet of the tool) where the user can select the type of de-rating that best fits best the given system or application.

Another module will assist ASIC designers to select the optimal memory protection for each memory instance (e.g. SECDE ECC, advanced ECC, parity, none) while minimizing area overhead and achieving the reliability targets. This module relies on: intrinsic SER figures for SBU and MCU events; a MCU to MBU analysis according to the configuration of the memory; a library of available memory protection choices; a SER reduction formula for each defined memory protection scheme and finally the extra area overhead required by a given scheme. The tool will propose the best protection approach that fits a target SER at minimum area overhead costs.

The reporting module will produce detailed reports and reliability datasheets.

Another module will implement a high availability (HA) checklist process, similar to the existing HA gap analysis.

An in-field repair calculator can be integrated into the tool and will provide recommendations for memory repair and evaluate their effect on system reliability.

The initial tool is designed so that new modules can be integrated and help improve the system reliability analysis.