|
|
IBM System z10 design for RAS
|
|
|
by W. J. Clarke
L. C. Alves
T. J. Dell
H. Elfering
J. P. Kubala
C. Lin
M. J. Mueller
and K. Werner
|
|
|
|
The IBM System z10™ server reliability, availability, and
serviceability (RAS) design continues to reduce the sources of
server outages through innovative RAS architecture and
techniques. The z10™ server introduced functional improvements
that challenged the RAS design. Increases were made in the
performance of each processor, the total number of processors, the
total size of the memory, the amount of cache, the bandwidth of the
I/O, the thermal density, and the exposure to soft errors. These
changes demanded stronger RAS functions to prevent unscheduled
outages. Significant improvements were made to the IBM
e-business on demand® functions (concurrent, customer-requested
upgrades) that enable customers to better manage capacity without
having to take planned outages. The hypervisor simplified
configuration changes, such as adding cryptography or channel
subsystems to logical partitions, by eliminating the need for
preplanning. Single-core checkstopping and single transparent
CPU (central processing unit) sparing were added. The RAS
functions reduced the number of scheduled outages. Product
improvements were complemented by improvements in RAS
modeling. This paper describes these RAS improvements and how
they provide value to the customer.
Full paper
|
|
|
|
|
|