IBM Skip to main content
  Home     Products & services     Support & downloads     My account  
  Select a country  
Journals Home  
  Systems Journal  
Journal of Research
and Development
  ·  Current Issue  
  ·  Recent Issues  
  ·  Papers in Progress  
  ·  Search/Index  
  ·  Orders  
  ·  Description  
  ·  Patents  
  ·  Recent publications  
  ·  Author's Guide  
  Staff  
  Contact Us  
Journal of Research and Development  
Volume 43, Numbers 5/6, 1999
IBM S/390 Server G5/G6
 Table of contents: arrowHTML arrowPDF arrowASCII   This article: HTML arrowPDF arrowASCII   DOI: 10.1147/rd.435.0723 arrowCopyright info
   

Architecture and software support in IBM S/390 Parallel Enterprise Servers for IEEE Floating-Point arithmetic

by P. H. Abbott, D. G. Brush, C. W. Clark III, C. J. Crone, J. R. Ehrman, G. W. Ewart, C. A. Goodrich, M. Hack, J. S. Kapernick, B. J. Minchau, W. C. Shepard, R. M. Smith, Sr., R. Tallman, S. Walkowiak, A. Watanabe, and W. R. White
IEEE Binary Floating-Point is an industry-standard architecture. The IBM System/360™ hexadecimal floating-point architecture predates the IEEE standard and has been carried forward through the System/370™ to current System/390® processors. The growing importance of industry standards and floating-point combined to produce a need for IEEE Floating-Point on System/390. At the same time, customer investment in IBM floating-point had to be preserved. This paper describes the architecture, hardware, and software efforts that combined to produce a conforming implementation of IEEE Floating-Point on System/390 while retaining compatibility with the original IBM architecture.

Introduction

In late 1995, the IBM System/390* Division decided to add support for IEEE Floating-Point arithmetic to theevolving S/390 Parallel Enterprise Servers*. The objective was to address the increasing prevalence of IEEE Floating-Point in new workloads that customers wanted to host on these new machines. This paper describes the resulting hardware architecture and software support provided to allow developers of these new workloads, as well as customer programmers, to employ the new facilities.

IBM System/360* implemented floating-point arithmetic in hardware and offered the feature in all models of the product line, a significant advance at the time. This capability brought the performance of hardware floating-point to customers using even the smallest S/360 machines. The facilities included four floating-point registers (FPRs), short- (32-bit) and long- (64-bit) precision data types, and an innovative hexadecimal representation allowing for compact exponent encoding and minimizing the shifting required to normalize operands. The elements of this architecture were implemented in several ways in the different models of S/360 to provide appropriate levels of performance when compared to other machines in the line and with other manufacturers’ processors. Later, an extended-precision (128-bit) data type was added to the hardware, providing significant additional precision to floating-point representation and computation. This support was an IBM architectural exclusive at the time of its introduction and remains one today. This architecture has been implemented on all IBM large-scale machines following S/360, up to and including the current S/390 Enterprise Server products.

In the interim, the industry, including IBM, looked for a standard floating-point system that could be implemented efficiently on smaller systems, especially the then-emerging line of RISC machines, and one that addressed deficiencies in existing floating-point systems. This work was conducted through the IEEE Floating-Point Working Group P754. IBM’s first “straw-man” architecture proposal for implementing the IEEE Floating-Point draft standard [1­5] in the then-current S/370* architecture dates from 1982. In 1985, ANSI/IEEE Standard 754-1985 [6] was approved. Today, most of the IBM systems with which S/390 Enterprise Servers interoperate implement this standard, including all Personal Computers, all RISC workstations (including RS/6000*), and AS/400*. Programming languages have begun to include IEEE Floating-Point as a required data type.

IBM’s current effort began in 1989, when the System Architecture group developed an initial proposal for an IEEE Floating-Point implementation. This architecture effort, augmented by hardware and software engineering activity, continued at varying levels of intensity until the current one began in 1995.

  S/390 transformation
S/390* is in the midst of a transformation that began in the early 1990s. The large S/390 mainframe of that time was an extremely capable system. It hosted one or more very robust S/390 operating systems, providing a comprehensive hardware/software system capable of running workloads inconceivable for other platforms, including those of the world’s largest banks, manufacturing companies, insurance companies, stock exchanges, and airline reservation systems.

However, this system was perceived to be too costly by both actual and prospective customers. It was also considered proprietary and was labeled a “legacy” system, a pejorative term with no concrete definition but an implied meaning of old or obsolete, to be replaced by inexpensive, modern, open systems based on client/server technology.

The large systems of the time supported a vertical model, in which customer growth requirements were met by providing larger systems featuring faster basic engines configured either as uniprocessors or as multiprocessors containing two to six engines. These machines were built from bipolar technology, which provided the high performance required to support vertical growth. However, this technology was also high-cost and high-risk, requiring special environmental support, including chilled water for cooling. These factors raised the machines’ total cost of ownership.

Those systems also featured software technologies and workloads characterized as proprietary or “legacy,” such as SNA, centralized OLTP (on-line transaction processing, as delivered through subsystems such as IMS and CICS), and batch processing. The technologies that provided the ability to support these workloads delivered solid value and were critical to the large business organizations that deployed them. They were, however, continuing to diminish in popularity within the industry. These systems continued support for the floating-point system first implemented with S/360, referred to in this paper as hexadecimal floating-point, or HFP.

  Current environment
At this point in the S/390 transformation, significant changes have been accomplished. The most important of these has been a change to a horizontal growth model, featuring the S/390 Parallel Sysplex*. Today, when customers need additional capacity, they can grow horizontally by adding an additional system and connecting it to the existing ones. This is accomplished by connecting each of the systems to an additional processor configured as a coupling facility, or CF, a form of electronic shared memory. The CF supports a set of operations that, with the accompanying subsystem software support, provides scalable, low-overhead, single-system-image operation across multiple physical machines, allowing functions such as N-way (where N is the number of systems) sharing of data managed by DB2* or DL1. Work for the IMS subsystem can be routed to any system in the Parallel Sysplex, providing another single-system view of the environment. The combined system also supports higher availability through facilities provided by the communications subsystems and the coupling facility.

In addition, OS/390*, the integrated MVS system, has embraced open-systems function by providing an implementation of an XPG4-compliant UNIX** kernel and a UNIX shell as an alternate execution environment to the others it supports. This simplifies the porting of UNIX applications and also facilitates the implementation of important middleware technologies on OS/390. Examples of such applications and technologies include SAP R/3 Data Base Server and Lotus Domino**, some of the new workloads enabled for the transformed S/390 Enterprise Server. S/390 is also providing an implementation of the Java** virtual machine [7], a technology facilitated by having a UNIX environment on OS/390. The native-float data type in the Java language [8] is an IEEE float. The implementation of IEEE Floating-Point hardware on S/390 will enable these new technologies to run on S/390 much more efficiently.

Design objectives

Our basic objective was to add IEEE Floating-Point as an alternative to the existing HFP, not as a replacement for it. A set of design objectives for the project developed, beginning with the previous architecture efforts and continuing as we extended them to provide an architecture and design for the S/390 IEEE Floating-Point hardware and software implementation. These objectives are described below.

  Affordability
The first objective was affordability. It led us to provide a full hardware implementation, but also to provide software to support a focused set of new applications and a customer development capability, as opposed to support across the OS/390 system and across the S/390 platforms. This resulted in support in the OS/390 BCP (Basic Control Program), the UNIX kernel, the High-Level Assembler (HLASM), the C/C++ Compiler, the C Run-time Library, the C++ Run-time Library, the Java virtual machine, Language Environment (LE) for OS/390, the dbx debugger, and the VisualAge* Remote Debugger, all in the initial release. To allow development of customer applications, we provided support in VM/ESA* for guest virtual machines that use IEEE Floating-Point.

  Performance
The second objective was performance. Any additional overhead in context switching created by the new additional floating-point registers (AFPRs) was important to control. We adopted a “pay-as-you-go” philosophy whereby the overhead is incurred only if the hardware feature is actually being used. The floating-point hardware feature must be activated before it is available for customer use. This activation is accomplished on a per-process basis by the first attempt to use the new instructions or registers.

  Serviceability
The third objective was serviceability. The FPRs (the existing four as well as the AFPRs) are made available to both existing HFP arithmetic and the new IEEE Binary Floating-Point (BFP) operations. An architecture tradeoff between serviceability and the desire to economize on instruction operation codes was settled in favor of serviceability. This led to a new, separate instruction set for IEEE Floating-Point, instructions visible to a debugger or in a memory dump, as opposed to a mode-set operation and reuse of the existing instruction set.

It is important to note that the new AFPRs are available to both forms of floating-point computation. As such, a dump of the registers means nothing without an understanding of which type of computations have been performed.

  Compatibility
The fourth objective was compatibility. It guided us when implementation issues arose, associated with areas of the specification that are not clearly spelled out. The decision made was to remain compatible with RS/6000 and AIX* in such circumstances. This approach facilitates porting applications to OS/390 that originated on UNIX platforms.

  Definitive support
The fifth design objective was definitive support. Specifically, this meant that we wanted to avoid issues of imprecision as to whether an environment does or does not support IEEE Floating-Point. The specific choice we made was to provide IEEE Floating-Point support in OS/390 Version 2 Release 6. This led to the development of an instruction simulator, included in the OS/390 Base Control Program (BCP), that provides support for the instructions when the operating system is running on a machine without the hardware feature. The machines providing the necessary hardware support are the G5 and its successors. However, any program compiled to use the IEEE instructions will run on an OS/390 Version 2 Release 6 system, albeit at degraded performance on the G4 and earlier machines.

Another aspect of definitive support was long-term coexistence between HFP and BFP. There is minimal support designed to encourage migration from HFP to BFP over time. Existing applications, if either not recompiled or recompiled without change, continue to use HFP and four FPRs. Such applications are encouraged to use, where possible, the AFPRs, but not to convert to BFP unless its specific advantages apply. Conversion to BFP requires application analysis to ensure that adequate range is available. Mechanically, using either BFP facilities or the AFPRs for HFP requires application recompilation and specific compiler directive-setting.

  Conformance to the standard
The sixth objective was conformance to the standard. This goal, taken from the hardware architecture, was to provide a fully conforming hardware implementation of the standard. The target was to put as few dependencies for conformance as possible on the OS/390 BCP. Meeting this objective allows us to enhance the control program independently of the state of the standard and conformance to it, a desirable degree of freedom with respect to future work. It will allow us to broaden the scope of software support for IEEE Floating-Point in OS/390 in response to market requirements, without depending on enhancements to the control program.

Architecture

The System/360 architecture, introduced in 1964, was the definition of the interface to the machine, both hardware and microcode, as observable by the “program.” This was one of the first instances of the use of the term “architecture” pertaining to a computer system. Since that time, the term has become very popular and has acquired diverse meanings. It is used to describe both hardware and software interfaces, as well as system structure. In this paper, the terms “architecture” and “hardware architecture” are used in the original System/360 sense. For S/390, the official term for the current version of this interface is Enterprise Systems Architecture/390* (ESA/390*) [9].

Architecture design process
Architecture design is an iterative process; it involves coordinating requirements from many areas, including hardware, programming, and marketing. These requirements are continually changing and seldom present a consistent picture. This problem is particularly acute for ESA/390, because it is supported by several different operating system platforms (OS/390, VM, TPF), each on different design and delivery schedules.

For any particular architecture facility, it is unusual for the hardware and corresponding software design to coincide; a more typical situation is that the hardware design for the first machine to implement a new architecture facility is frozen before the software design begins. Thus, a good deal of guesswork may be needed to predict requirements before they are known. Typically, new architecture facilities delivered on a machine must first be tolerated, and later exploited, by each software platform. The very nature of the support for IEEE Floating-Point exacerbated this problem.

Unlike some architecture facilities, which are implemented in microcode, the implementation of IEEE Floating-Point in the hardware requires fundamental changes in dataflow. Consequently, the designers needed much more lead time than usual. In fact, the basic architecture model was completed in 1992, and most of the basic dataflow changes were included in the hardware for G4 in preparation for introduction of the facility on G5.

Compatibility
Compatibility, one of the strengths of ESA/390, is also a very important factor in the architecture design process. For ESA/390, compatibility among hardware models is even more important than it is among releases of software. The typical life of a machine in the field is five to ten years. A single customer may have multiple machines with an age span of at least ten years. On the other hand, the same customer is not expected to run more than two releases of a particular operating system, and even then not for an extended period of time. Consequences of incompatible changes are much more serious in the hardware architecture than in the operating system. Occasional compatibility problems between versions of an operating system can be solved by running back levels under VM or LPAR. The corresponding situation for hardware—keeping old machines—is much less palatable.

Compatibility is of particular concern in the case of IEEE Floating-Point, as this facility is intended to be used by application programs running in the problem state. In the ESA/390 environment, application programs have learned to enjoy complete bit-for-bit compatibility across the entire product line for more than three decades, a trend that is expected to continue for several more.

Only one chance to get it right
The original System/360 architecture for HFP did not define a guard digit for long operands. Not until after the first machines were shipped was it discovered that this was a serious problem. The hardware was redesigned and the change (primarily in microcode) was retrofitted in the field. With the G5 implementation of IEEE Floating-Point, such a change would require hardware changes in the chip and would be much more expensive than the 1968 alteration. Since this early correction, the System/360 definition of HFP has been unchanged for more than 30 years. Even additions to the HFP architecture have been rare. The extended format and seven operation codes were added in December 1967, and two operation codes for square root were added in September 1991.1 In short, with IEEE Floating-Point, we had only one chance and would be bound essentially permanently to what we produced.

  ESA/390 design objectives
With the foregoing in mind, the following design objectives were used as a basis for developing the hardware architecture [10] for the ESA/390 floating-point (FP) extensions:

  • World-class definition for both current and future markets.
  • Fully conforming implementation of the IEEE standard (all of the “shalls” and most of the “shoulds”).
  • Full support for both single and double formats.
  • Performance matching or exceeding that of the original HFP.
  • Support for the extended format, not just for internal computation, but externalized to the application in the same manner as HFP.
  • A more complete set of extended-precision operations than previously available for HFP.
  • AFPRs to improve the support of complex arithmetic and extended formats.
  • Control-program-independent application interface.
  • Where practical, BFP additions also provided for HFP.
  • Coexistence of HFP and BFP applications and data.

  Dual-radix floating-point representation
Support of two floating-point radixes in the same machine presents some special problems. Among these are questions of whether the two should share the same FPRs, share the same exceptions, and share the same operation codes.

Common registers
The same FPRs are used for both BFP and HFP data, because separate sets increase the complexity of the design, the hardware cost, and the overhead for context switching. Further, separate registers do not solve any of the linkage problems associated with mixed mode.

Separate exception indications
Exception handling for BFP is quite different from that for HFP. BFP has five exceptions, with masks and status flags for each; HFP has four exceptions, two masks, and no status flags. Three exceptions are common to BFP and HFP (overflow, underflow, and division by zero), so using the same interruption codes for both modes was a possibility. But even for these three exceptions, the action taken by the machine is not the same, so there was no hardware advantage to using the same interruption codes. It was also expected that in a mixed BFP/HFP environment, different trap-handling routines would be invoked for the BFP and HFP exceptions, so new interruption codes were defined for all five BFP interruption conditions.

BFP/HFP conversion instructions
Coexistence and migration from HFP to BFP are provided by instructions that convert between the HFP long format and the BFP short and long formats. Instructions for HFP short operands are not necessary, as conversion between HFP long and short is trivial. Conversion of extended operands is expected to be too infrequent to justify implementation in hardware.

The BFP/HFP conversion instructions present some unique problems as to whether HFP or BFP exceptions should be reported and whether the BFP rounding modes should apply. It is expected that these instructions will be used in both BFP and HFP environments, so they were defined to be independent of both. Rounding is provided explicitly, independently of the BFP rounding modes, and exceptions are reported by means of the condition code rather than by program exceptions.

Mode bit or separate instructions
A key decision that must be made in such a design is whether to reuse the existing HFP operation codes for BFP, with the run-time interpretation controlled by a mode bit in the PSW, or to introduce a new set of operation codes. A similar choice had to be made in the IEEE standard for the four different styles of rounding. Although the standards committee could have left this choice of implementation open, it decided to design an explicit mode setting that would be available to the program.

Using a mode bit and reusing operation codes offers the clear advantages of simpler hardware and fewer new operation codes. It also allows for mode-independent subroutines (facilitated by a special LOAD MODAL CONSTANT instruction, which loads one of two operands from storage depending on the floating-point mode setting) and might simplify compiler code generation.

While the hardware advantages are real, the software benefits are less compelling. Algorithms for intrinsic functions are entirely different in the two floating-point formats, and the reuse potential of other subroutines is marginal, given the very different mechanisms provided for dealing with exceptions. Since compiler code generation is heavily table-driven, a separate set of operation codes is not a major issue.

A modal approach makes sense in the case of the IEEE rounding style because the different modes apply nearly uniformly to all instructions. In the BFP/HFP case, the underlying instruction sets are somewhat different, which weakens the leverage that might otherwise be possible. Further, the modal approach creates compatibility problems for old programs written for HFP if there is any possibility that they may be executed in the BFP mode. Finally, the modal approach presents a significant difficulty (even for new programs) in being able to diagnose a case in which a program that has been written for one mode has been inappropriately called in the other (resulting invariably in gibberish, which may or may not be quickly detected).

For these reasons, it was decided to pay the price in hardware and use separate operation codes to ensure the ability to produce robust applications.

  Registers
Several considerations determined the register-related aspects of the architecture.

Floating-point control register
BFP operations require several new control and status bits: five IEEE exception status flags, five IEEE exception trap masks, and two bits to specify any of four rounding modes defined by the standard. The traditional container for this type of information is the PSW, but there is not enough room. This information is therefore placed in a new register, called the floating-point control (FPC) register, which is accessible by the problem program. In addition to the above information, an 8-bit data exception code (DXC) is also placed in the FPC register when an IEEE trap occurs.

An overview of the 32-bit FPC register is shown in Figure 1. Details are given in Tables 1 and 2.

Figure 1Figure 1

Table 1   FPC register bit assignments.

Byte Bit(s) IEEE-invalid-operation mask Abbr.

0 0 IEEE-invalid-operation mask IMi
0 1 IEEE-division-by-zero mask IMz
0 2 IEEE-overflow mask IMo
0 3 IEEE-underflow mask IMu
0 4 IEEE-inexact mask IMx
0 5­7 (Reserved) 0
1 0 IEEE-invalid-operation flag SFi
1 1 IEEE-division-by-zero flag SFz
1 2 IEEE-overflow flag SFo
1 3 IEEE-underflow flag SFu
1 4 IEEE-inexact flag SFx
1 5­7 (Reserved) 0
2 0­7 Data-exception code DXC
3 0­5 (Reserved) 0
3 6­7 Rounding mode RM

Table 2   Rounding mode.

FPC byte 3 bits 6­7 Rounding mode

00 Round to nearest
01 Round toward 0
10 Round toward + infinity
11 Round toward ­ infinity

Additional floating-point registers
The original System/360 architecture provided four 64-bit FPRs. In 1964, four registers constituted an innovation and a not inconsiderable amount of hardware. Intrinsic functions written in assembly language performed quite well with four registers; compilers at that time could exploit only one or two.

Today, four FPRs are too few for optimum performance. The simplest math library routines require four FPRs; this number doubles for complex arithmetic and doubles again for extended precision. With the current technology, the cost to provide additional registers in the hardware is minimal. Support of BFP provided the opportunity to increase the number of FPRs, for both HFP and BFP.

The four original System/360 FPRs were numbered 0, 2, 4, and 6, making the numbering consistent with that for the even­odd general register pairs. Extended-precision operands, which were added later, were accommodated in pairs of registers (0/2 or 4/6), making only two registers available in that format.

The System/360 instruction format provides 4-bit register fields; thus, the bits in the instruction formats have always been there to expand the number of available registers to 16. Coupling register pairs for extended-precision operands was done to provide a compatible superset of the original System/360 implementation. Table 3 shows the layout of the FPRs, including the coupling of register pairs for extended-precision operands.

Table 3   Floating-point register layout.

128-bit FPR pair
64-bit FPR 64-bit FPR

0 2
1 3
4 6
5 7
8 10
9 11
12 14
13 15

Activating additional registers
Bit 13 of control register 0 (CR0.13) controls the use of the AFPRs. If a program attempts to use any of the 12 new floating-point registers or the FPC register when this bit is zero, a program interruption occurs. (Since all BFP instructions depend on the masks and the rounding mode in the FPC register, they implicitly use that register.)

This approach provides two advantages:

  1. An application can use the BFP instructions and AFPRs independently of the operating system platform. Thus, a control is required to prevent an application from using the registers if the control program is not prepared to save and restore them. If a new program attempts to use the new registers on a new machine running an old control program, a data exception is reported, and no harm is done.
  2. The additional registers need not be saved and restored when running programs that do not use the new registers.

  Conformance to the standard
To our knowledge, there is no hardware implementation that conforms to the standard. The standard states that “hardware components that require software support to conform to the standard shall not be said to conform apart from such software.” A great deal of effort was expended to provide a fully conforming hardware implementation of the standard. This was accomplished for the basic formats in all but one area: binary­decimal conversion.

Conversions between BFP format and decimal strings
The standard requires conversions between “basic format” floating-point numbers and decimal strings, but does not specify the format of “decimal strings.” This is the only area required by the standard and not provided by the hardware. The software conversions are described in the section on conversion between decimal and binary floating-point.

  BFP formats
The standard defines two “basic” formats: single and double. ESA/390 provides both, but uses the terms “short” and “long” (for 32-bit and 64-bit formats), as these terms have been used since the inception of System/360 for HFP operands of corresponding lengths.

The standard defines “extended” formats but permits a great deal of implementation flexibility in this area. These formats provide additional exponent range and precision above the widest basic format supported by the machine. They were primarily intended to be used to facilitate intermediate computations by math library routines and were not necessarily meant to be externalized. However, the S/390 implementation does externalize the extended format, so much care was taken in the definition. The ESA/390 extended format is 128 bits and meets or exceeds the requirements of the standard in every respect. The bit representation was chosen to be completely consistent with that used by the standard for single and double; it includes a sign bit, a 15-bit exponent, and a 112-bit fraction with an implicit leading one bit, and supports denormalized numbers.

The bit representation of the BFP data formats in storage is defined to be left-to-right in a manner that is uniform for all numeric operands in the ESA/390 architecture. Although the format diagrams in the IEEE Floating-Point standard appear to use the same left-to-right bit sequence, the standard only defines the meaning of the bits without specifying how they appear in storage; the storage arrangement is left to the implementation. Several implementations in fact use other sequences; this may affect programs that depend on the bit representations of floating-point data in storage.

BFP short format
Figure 2 shows the bit distribution of a BFP short-format operand. When a number or NaN (Not-a-Number, a binary representation of a non-numeric character or string) in the BFP short format is loaded into an FPR, it occupies the left half of the register; the right half remains unchanged.

Figure 2Figure 2

BFP long format
Figure 3 shows the bit distribution of a BFP long-format operand. When a number or NaN in the BFP long format is loaded into an FPR, it occupies the entire register.

Figure 3Figure 3

BFP extended format
Figure 4 shows the bit distribution of a BFP extended-format operand. A number or NaN in the BFP extended format occupies an FPR pair. The sign and biased exponent are in the leftmost 16 bits of the left register and are followed by the leftmost 48 bits of the fraction. The rightmost 64 bits of the fraction are in the right register of the pair.

Figure 4Figure 4

The properties of the three formats are presented in Table 4.

Table 4   Summary of BFP data formats.

Property Format

Short Long Extended

Format length (bits) 32 64 128
Biased-exponent length (bits) 8 11 15
Fraction length (bits) 23 52 112
Precision (p) 24 53 113
Maximum exponent (Emax) 127 1023 16383
Minimum exponent (Emin) ­126 ­1022 ­16382
Exponent bias 127 1023 16383
Nmax (1­2­24) × 2128 (1­2­53) × 21024 (1­2­113) × 216384
approximate approximate3.4 × 1038 approximate approximate1.8 × 10308 approximate approximate1.2 × 104932
Nmin 1.0 × 2­126 1.0 × 2­1022 1.0 × 2­16382
approximate approximate1.2 × 10­38 approximate approximate2.2 × 10­308 approximate approximate3.4 × 10­4932
Dmin 1.0 × 2­149 1.0 × 2­1074 1.0 × 2­16494
approximate approximate1.4 × 10­45 approximate approximate4.9 × 10­324 approximate approximate6.5 × 10­4966

Explanation:
approximate approximate
Dmin
Nmax
Nmin
Value is approximate.
Smallest (in magnitude) representable denormalized number.
Largest (in magnitude) representable number.
Smallest (in magnitude) representable normalized number.

NaN (Not-a-Number)
The standard requires two types of NaNs—signaling and quiet—where “Signaling NaNs afford values for […] enhancements […] not the subject of the standard” and “Quiet NaNs should […] afford retrospective diagnostic information […]” [6]. Quiet NaNs are normally propagated through arithmetic operations without signaling an exception. The standard does not specify the exact encoding of NaNs, but does indicate in an appendix that it is a mistake to use the sign bit to distinguish signaling from quiet NaNs. The encoding chosen indicates a quiet NaN by a one in the leftmost fraction bit. This encoding permits a signaling NaN to be converted to a quiet NaN by setting this bit to one.

When more than one NaN is encountered, the standard requires one of them to be propagated but does not specify which. Intel delivers the NaN with the larger fraction. RS/6000 delivers a NaN according to operand number. In ESA/390, signaling NaNs are chosen over Quiet NaNs, and NaNs of the same type are selected by operand number. This is similar to the technique used by the RS/6000.

  Rounding
The standard requires four rounding modes, with “round to nearest even” being the default. While most compilers do not specify the details for rounding for most operations, they do have special requirements in certain cases. For example, FLOAT, FIX, AMOD, AINT, and ANINT are normally expanded into in-line code that is sensitive to the details of rounding. To facilitate such operations, a special rounding-method field was defined to permit the current rounding mode to be overridden. This field permits the specification of all four rounding modes defined by IEEE, as well as “biased round to nearest,” which is used by ANINT. The rounding-method field is provided for the following BFP instructions: CONVERT TO FIXED, DIVIDE TO INTEGER, and LOAD FP INTEGER.

  Operations
Considerations related to the way in which operations are performed were important in several respects.

Separate operation codes for each format
Several implementations of IEEE (Intel x86 and RS/6000, in particular) are optimized for the largest length supported. These machines convert shorter formats to a common format in a register, provide most operations only on the common format, and round (when required) to the precision of the shorter format. For ESA/390, separate operation codes were defined for all three formats to match the implementation of HFP. This also provides maximum performance for all formats.

Storage-to-register operations
Most implementations of IEEE are optimized for register-to-register operations. The dataflow for S/390 processors is optimized for storage-to-register operations (inherited from System/360). To provide comparable performance for BFP and HFP, both register-to-register and storage-to-register operations were included. Storage-to-register versions of LOAD LENGTHENED provide loading and conversion as a single instruction. This also facilitates comparison between different formats, which is required by the standard. Storage-to-register operations for extended precision require additional data paths in the hardware and have not been implemented for either HFP or BFP.

Comparison
The standard permits comparison to be implemented either as a condition code or as a true­false response to a predicate. ESA/390 supports a condition code, whereas most compilers support predicates. Two instructions, COMPARE and COMPARE AND SIGNAL, were included to permit a compiler to support the predicate form.

Remainder
The remainder operation, required by the standard, is always exact. This operation may require thousands of cycles. The instruction DIVIDE TO INTEGER can be used to obtain the IEEE remainder by means of a simple two-instruction loop whose execution can be interrupted and restarted transparently.

Multiply­add
The instructions MULTIPLY AND ADD and MULTIPLY AND SUBTRACT are not required by the standard. They have the mathematical property of only one rounding. The definition is bit-for-bit compatible with the RS/6000, facilitating reuse of that system’s library packages.

List of instructions
Table 5 is a list of instructions supported by the architecture for BFP operations.

Table 5   Instructions operating on BFP data.
Instruction
name
Source (bits) 32 64 128
Result (bits) 32 64 128 32 64 128 32 64 128
ADD R, S      R, S      R
COMPAREa R, S     R, S      R
COMPARE AND SIGNALa R, S      R, S      R
CONVERT BFP TO HFP   R     R     
CONVERT HFP TO BFP*     R R     
CONVERT FROM FIXEDb R R R       
CONVERT TO FIXED*c R     R     R   
DIVIDE R, S       R, S       R
DIVIDE TO INTEGER*d R       R     
LOAD R, S       R, S       R
LOAD AND TEST R       R       R
LOAD COMPLEMENT R       R       R
LOAD FP INTEGER* R       R       R
LOAD LENGTHENED   R, S R, S     R, S    
LOAD NEGATIVE R       R       R
LOAD POSITIVE R       R       R
LOAD ROUNDED     R     R R  
LOAD ZEROe R       R       R
MULTIPLY R, S R, S     R, S R, S     R
MULTIPLY AND ADD R, S       R, S     
MULTIPLY AND SUBTRACT R, S       R, S     
SQUARE ROOT R, S       R, S       R
SUBTRACT R, S       R, S       R
STORE S       S     
TEST DATA CLASSa R       R       R
Explanation:
* Instruction includes an M field to control the rounding method.
a COMPARE, COMPARE AND SIGNAL, and TEST DATA CLASS set the condition code and do not produce a floating-point result.
b The source operand for CONVERT FROM FIXED comes from a general register.
c The result of CONVERT TO FIXED is placed in a general register.
d DIVIDE TO INTEGER provides the IEEE remainder function.
e LOAD ZERO does not have a source operand.
R Operation is provided in the register-to-register form.
S Operation is provided in the storage-to-register form.

  Nontrapped underflow exceptions
The standard defines underflow in terms of two events: tininess2 and loss of accuracy. It permits each of these events to be detected in either of two ways, but requires that the implementation detect them in the same way for all operations. Tininess may be detected either 1) after rounding3 or 2) before rounding. Loss of accuracy may be detected as either 3) a denormalization loss or 4) an inexact result. Methods 2 and 4 were selected as being simpler and mathematically cleaner. This choice matches the RS/6000 implementation.

  Traps
Dependency on the control program is minimized by extending an existing program interruption, the data exception. The new conditions are identified by a data exception code (DXC) placed in the FPC register. The application program can access this information directly without requiring a control program service.

Trapped overflow and underflow on LOAD ROUNDED
The standard defines an exponent bias adjustment to be applied to the result for trapped overflows and underflows for “all operations except conversions.” For conversions, the result must be returned in the same format or a wider format than the source. Exponent adjustment is permitted, but not required, for this case. In terms of the standard, the instruction LOAD ROUNDED is a “conversion.”

For LOAD ROUNDED, overflow or underflow may occur in two ways: 1) in the original format, as the result of rounding to the precision of a shorter format, or 2) if the exponent of the result lies in the range of the source but not the target. In the first case, the exponent must be adjusted to fit within the range permitted by the source format. In the second, application of the bias adjustment value defined by the standard may cause the result to lie outside the range permitted by the format. The ESA/390 solution was to define a different bias adjustment value for LOAD ROUNDED that can be consistently applied in both cases.

Trap result incremented
The standard requires an implementation that always traps to indicate whether the delivered result has been rounded up. Although the ESA/390 implementation does not always trap, the data exception code (DXC) includes separate indications for “inexact and truncated” and “inexact and incremented.” This information was included to assist the trap handler.

Conversion between decimal and binary floating-point

  General statement of the problem
Machine numbers are in binary; input to and output from programs is usually expected in decimal. Binary-to-decimal conversion is used for output of floating-point results; decimal-to-binary conversion is used for user-supplied input and for compile-time constants.

The IEEE standard on binary-to-decimal conversion strikes a compromise between what might be done efficiently in hardware (in early 1980s technology) and mathematical utility. For exponents in the center of the format’s range (i.e., numbers whose absolute value is not far less than or greater than one), and for a decimal precision not exceeding the format’s binary precision, a correctly rounded result is required; for numbers nearer the underflow or overflow thresholds, a bit more leeway is permitted. The standard explicitly requires that compile-time and run-time conversions of the same input yield the same result.

Floating-point decimal-to-binary conversion is usually provided by software, and the S/390 solution is no exception. Nevertheless, we wanted the conversion functions to carry the same expectation of exact reproducibility as one expects of a hardware instruction.

One way to achieve exact reproducibility in software is to prescribe a specific implementation in great detail. A better way is to prescribe a mathematically correct definition of the expected result; this gives the implementer the freedom to make tradeoffs suitable to a particular situation or, more significantly, to take advantage in the future of algorithmic improvements in terms of speed or resource consumption. (This is how the basic IEEE arithmetic operations are specified in the standard.)

For floating-point conversion, the mathematically precise definition is simply this: Perform the conversion as if infinite precision were available, then apply a single rounding step according to the intended rounding mode. In the case of the normal round-to-nearest mode, this means the result must be the single nearest representable result, or (in the case of an exact midpoint between two representable values) the nearest one whose low-order bit is zero.

The following observation reduces the problem to a finite-precision statement: Every binary (or hexadecimal) floating-point number has an exact decimal representation, because all factors of the input base (2) are also factors of the output base (10). Since the range of machine numbers is bounded, there is a longest decimal representation, and this bounds the required precision if decimal arithmetic is used. For example, the largest possible number of significant digits in the exact decimal representation of an IEEE Double is 751. The bound is one more than this, since we need to be able to distinguish the exact midpoint between two representable values. This number is 1/2 Dmin, or half the smallest denormal (approximate approximate2.47 × 10­324):

2­(1023+52)=0.000000 · · · 0000002470328 · · · 6328125
323 zeros     and     752 digits

It may be interesting to illustrate how easy it is to do exact conversion using arbitrary-precision decimal arithmetic. The Rexx program shown in Figure 5 displays the exact decimal value of an IEEE Double given in hexadecimal.

Figure 5Figure 5

  History
Gordon Slishman gives a nice account [11] of how floating-point conversion accuracy has improved over the years, from quite poor in the 1960s to mostly acceptable in the late 1970s (in part due to the availability of extended precision for intermediate calculations). Within IBM, Assembler H and FORTRAN have had the highest standards, although an inconsistency between equally precise but different roundings in FORTRAN led Slishman to develop and publish correctly rounding conversions for HFP in the late 1980s. David Matula had published the theoretical underpinnings (e.g., [12]) and Jerry Coonen’s analysis [13] led to the correctness bounds allowed by the IEEE standard. Unpublished work at IBM includes a decimal-arithmetic-based conversion routine for an IBM internal debugger (1977; the routines were also used in VM/XA in the mid-1980s). Unpublished work at MIT in the 1970s includes routines in MacLisp and ZetaLisp; this was disclosed in two seminal papers at the 1990 SIGPLAN conference [14, 15]. The work described here started in 1992 and evolved from an attempt to obtain provably correct routines for the debugger mentioned above, stimulated by the discovery of a few incorrectly rounded numbers (the search for which was in turn triggered by the discovery of a similar number for Assembler H).

Most programming languages or run-time environments that make any claim of accuracy imply no more than a (reasonably) bounded error. Rounding carry-propagation implies that several low-order binary (or decimal) digits may be wrong if they are all 0 or 1 (or 9). The Java language may be the first to require correctly rounded fixed-precision conversions. (Languages that support arbitrary-precision arithmetic present a different situation.)

  Conversion basics
Floating-point numbers are traditionally expressed as

fraction×baseexponent.

An equivalent representation that exposes the precision (expressed as a number of bits or digits) is

(integer+fractionbaseexponent­precision.

When the precision of the input to conversion is implicit in the number of bits or digits given, the fraction (in the exposed-precision representation shown above) will be zero. The fraction may be nonzero when this representation is used to show the last intermediate result of conversion, just before the rounding step. The rounding rule takes the fraction into account when determining whether the integer in the final result should be one more than the integer in the intermediate result.

The basic method for converting a floating-point number from one base to another is through a sequence of transformations that leave the value intact. The starting point is

value = infract × inbaseinexp(inbase = 2 or 10).

The ending point should be (for the same value)

value = outfract × outbaseoutexp (outbase = 10 or 2).

In principle, the internal representation used during the conversion process can differ from both input and output representations—it might be fixed-point binary, for example, when converting between decimal character strings and machine floating-point formats. This leads to the following three-step overall description of conversion:

  1. Input conversion to internal format.
  2. Value-preserving transformation.
  3. Output conversion from internal format to final result.
The internal format used for input conversion is chosen to make that step as straightforward as possible, and the same argument applies to the internal format expected by the output conversion. These internal representations are not the same; hence the need for a value-preserving transformation.

We know how to convert integers from one base to another. There are machine instructions to convert integers that fit in a register, and techniques for converting larger integers using multiple-precision fixed-point arithmetic are well known. The actual conversion between decimal and binary (whether input, output, or intermediate) is easiest when the value is in the exposed-precision representation, where the numbers to be converted are integers.

  Using higher-precision floating-point arithmetic
Mathematically, the transformation from the internal input representation to the internal output representation can be written very simply:

outfract=infract×factor,

where

factor = inbaseinexp .

outbaseoutexp

The output exponent is chosen to make this factor close to one. The number of possible exponent values is limited by the format, so this computation is typically carried out by a combination of table lookup and a small number of floating-point multiplications. If the computation can be carried out at a higher precision than the precision of the desired result, the error due to repeated rounding can be kept reasonably small—but the final result will in general not be the same as if infinite precision had been used (in the constants as well as in the arithmetic), followed by a single rounding.

If the constants are carefully chosen and the rounding errors are carefully tracked, it is possible but nontrivial to determine when the conversion result is in fact the correctly rounded result; this is the case when the rounding error is too small to bridge the gap to the next representable number. Some correctly rounding conversion methods do this, and retry the conversion of difficult numbers with higher precision—typically using multiple-precision arithmetic. (A difficult number is one whose value is uncomfortably close to a rounding threshold—see the subsection below on difficult numbers.)

Note that when the precision of the available machine is no greater than the precision of the desired result, multiple-precision arithmetic may be needed even in the common case. The S/390 conversion routines apply the same standard of correctness to 128-bit extended-precision conversion, so we have to face this issue.

  Using multiple-precision integer arithmetic
The conversion routines provided with the S/390 IEEE Floating-Point support do not actually use floating-point arithmetic at all; they treat the floating-point formats as bit patterns to be processed, and use multiple-precision integer arithmetic. This has several advantages:

  • The core conversion routines can be independent of the input and output formats; only generic parameters such as precision and exponent range are required. Extended-precision conversion is no more difficult than single-precision; it simply requires more resources. (The High-Level Assembler also takes advantage of generic conversion by offering improved conversion for hexadecimal floating-point together with support for IEEE binary floating-point.)
  • The handling of difficult numbers is not essentially different from the easy case. This improves the robustness of the implementation.
  • Proving correctness is conceptually easier than for floating-point arithmetic. This improves confidence in the implementation.
  • The conversion routines can run efficiently on hardware that does not yet support the new floating-point instructions. This is especially useful for compilers and assemblers.

The internal representation is as a “bignum” fixed-point number multiplied by a power of two as well as a power of ten:

(integer+fraction) × 2binexp × 10decexp.

By bignum we mean a number expressed in a large base, such as base 109 or 232, where each digit can be processed as a 32-bit machine integer. A decimal bignum uses a power of ten as a base; 109 is a good choice because it nearly fills a 32-bit register. This representation allows for simple and efficient multiple-precision arithmetic. A binary bignum uses a power of two as a base. Conversion to and from bignums of a compatible base is easy, because it can be done simply by grouping smallnum digits or bits (as the case may be).

A fixed-point bignum consists of an array of 32-bit words in storage; the position of the radix point is implicit (i.e., the software knows where it is), and is chosen to reflect the desired output precision. It should be large enough to accommodate both the given input precision and the desired output precision. In particular, the rounding position should be in the integer part, so that the only knowledge required of the fraction is whether it is exactly zero or not. The actual fraction will then not have to participate in output conversion.

The input and output conversions convert to or from a compatible bignum base. Let us describe decimal-to-binary conversion, to be definite, and revisit the three essential conversion steps mentioned earlier:

  1. Input conversion from a decimal character string to a decimal bignum times a power of ten.
  2. Transformation of the decimal bignum times a power of ten into a binary bignum times a power of two.
  3. Output conversion from a binary bignum times a power of two to the desired binary machine format.
Input conversion is straightforward. The only point worth mentioning is that an indefinite number of digits is accepted, but if more digits are given than are necessary to represent the longest exact decimal representation, the extra digits are only checked to see whether they are all zero; in effect, they are coalesced into a single “sticky” digit. This is necessary in order to be able to break rounding ties correctly, and useful in that it allows the internal storage requirements to be bounded by the result format, without restricting the input format.

Output conversion turns out to be the trickiest part, even though it is not conceptually demanding. This is where the single rounding step takes place, and one needs to be careful to account for all of the bits that are (or might have been) beyond the rounding point, in order to break ties correctly in all cases. The rounding position has to be determined relative to the high-order bit, and has to take possible denormalization into account.

  A core conversion algorithm
The core of the conversion routine is the transformation of the internal representation. We illustrate here the algorithm for decimal-to-binary conversion. Binary-to-decimal conversion is essentially the same, but in the opposite direction.

The algorithm transforms the internal representation without changing the represented value. At all points, the representation is as follows, with value being invariant:

value = (integer + fraction) × 2binexp × 5decexp.

At the beginning, we have a decimal bignum with a null fraction, and binexp = decexp (because we are given a power of ten, not a power of five).

At the end, we want decexp to be zero, with a binary bignum. The size (number of bigdigits) of the integer part of the binary bignum must be such that the rounding point will be inside the integer part, with at least one bit to the right of the rounding point. It is then sufficient to know whether the fraction is exactly zero or not in order to perform correct rounding, including correct tie-breaking.

This transformation is called exponent reduction. It is achieved by multiplying the bignum by a sequence of factors of the form 5a/2b, where a and b have the same sign as decexp. Each factor should be close to one, so as not to change the size of the integral part of the bignum. Indeed, the algorithm switches dynamically between a factor that is smaller than one and a factor that is greater than one, in order to keep the high-order bigdigit within a narrow range. The cleverness of the algorithm lies in the careful choice of the exponents a and b.

When the initial decexp is positive, we would be multiplying by a power of five and dividing by a power of two. This suggests that a binary bignum is most appropriate, because if we choose 2b to equal the binary bigbase (i.e., b = 32), the division is free, because it amounts to no more than occupying a new fraction bigdigit, since the result of the multiplication is implicitly shifted one bigdigit to the right. Since the factor should be close to one, this choice of b suggests a = 14 for a factor of 514/232 approximate approximate 1.4 > 1, or a = 13 for a factor of 513/232 approximate approximate 0.28 < 1.

When the initial decexp is negative, we would be multiplying by a power of two and dividing by a power of five. This suggests that a decimal bignum is most appropriate, because if we pick 10a to equal the decimal bigbase (i.e., a = 9), the division comes cost-free again. The two factors used to achieve a balanced negative exponent reduction are 230/109 approximate approximate 1.05 > 1 and 229/109 approximate approximate 0.54 < 1.

At some point, the bignum must be converted from bigdecimal to bigbinary. It is now clear that this conversion should be done before the multiplicative transformation when the exponent is positive, and afterward when it is negative. In other words, we use either bigbinary or bigdecimal arithmetic, depending on whether we are converting large or small magnitudes. Performing the conversion as the first or last step also avoids the need to convert the fraction; it is zero initially, and at the end its actual value does not matter—only whether it is zero or not.

Multiplication by the factors considered here is much simpler than general bignum multiplication; it almost degenerates to the case of multiplying one bignum by a single-digit bignum. The factor that is smaller than one is clearly representable by a single bigdigit. The factor that is larger than one does not exceed twice the bignum base, and it turns out that this is only marginally more difficult than single-bigdigit multiplication when the bigbase is 232, and comes cost-free when the bigbase is 109 if we allow the high-order bigdigit to exceed the bigbase (this works because decimal bigdigits are stored in a binary box of size 232, and 2 × 109 < 231).

Exponent reduction by means of the factors described above reduces the absolute value of the decimal exponent by 9, 13, or 14 per step, so the last step (to reduce the remaining decimal exponent to zero) may have to use a different factor, chosen from a small table of residual factors of the same general form. For example, to reduce a residual decimal exponent of +5, use the factor 55/212, which is (215 × 105)/232, and to reduce a residual decimal exponent of ­5, use 212/55, or (217 × 104)/109.

Note that, because of the choice of decimal arithmetic for reducing negative exponents, all calculations described above are exact if we allow the bignum fraction to grow by up to one bigdigit per multiplication. The maximum number of multiplications is determined by the valid exponent range, and is therefore bounded by the format of the binary floating-point number (whether input or output). There are no difficult numbers for this method. The long fraction resulting from the conversion of extreme exponents will be discarded after checking whether it is zero or not, because that is all that is needed to figure out how to round the integral part correctly.

The determination of the required bignum size (which controls the resource requirements of the algorithm) must take into account several factors. The integral portion must accommodate the result precision, even when the initial high-order bigdigit is only 1/10th of the base and when the cumulative factor applied during exponent reduction is as small as it can be. The self-balancing nature of exponent reduction makes the cumulative factor come close to one, but one must know a lower bound. The fraction must accommodate the largest possible number of reduction steps (each can contribute at most one new fraction bigdigit), and this is determined by the format-derived exponent bound.

Figure 6 shows a worked-out example. We use a decimal bigbase of 1000 in order to illustrate the procedure. Each group of three digits represents one bigdigit. The reduction factors would be 1.024 and 0.512 in this case. The given input is 3.71448848e­5 (or 371448848 × 10­13), to be converted to IEEE Single (with a 24-bit fraction). The result is 163364931 × 2­42 plus a very small increment whose only significance is that there will be no tie-breaking during rounding after 24 bits, which is indeed safely within the 28-bit hexadecimal number 9BCC043. Normalizing to one bit before the binary point yields 1.001101111001100000001002 × 2­15 for a biased exponent of 127 ­ 15 = 112 (hex 70). Putting the pieces together yields the hex representation of 381BCC04 for this IEEE Single.

Figure 6Figure 6

  Improving performance using truncated bignum multiplication
The exact fraction can grow to be quite large, especially for extended precision—and yet it will be discarded at the end. It would seem certain that a better approach is possible.

Suppose we keep only one fraction bigdigit (i.e., we discard all new fraction bigdigits after the first). (We do need to keep track of the zeroness of the fraction.) The result will certainly not be greater than the exact result, but it may be smaller. Each truncated multiplication drops data beyond the last surviving fraction bigdigit, so the incremental error is bounded by one in the value of this bigdigit. Each error is multiplied by the cumulative reduction factors that follow, but those are close to one, and the largest relevant factor is bounded independently of the number of multiplications. The cumulative error is therefore proportional to the number of multiplications. This number can be determined from the exponent to be reduced, so we have a useful bound on the total truncation error. With a bigbase of 109, this error is at most a few thousand across the entire exponent range of BFP extended precision. A blind error threshold of 5000 would be sufficient (in fact, the program picks the error threshold dynamically). With this, any first-fraction bigdigit exceeding 999995000 would be suspicious; with “random” inputs, this might happen once out of 100000 conversions.

When the truncated result is suspicious, we can retry the exponent reduction with higher precision, but that might still not be enough, as revealed by the example shown in Figure 7. The cost of exact multiplication is not grossly worse than that of truncated multiplication— factors of 2 to 10 in execution time were measured (for double precision). Since the actual retry probability (using tight error bounds, and two fraction bigdigits when the number of multiplications is large) is less than 1 in 107, any retry is in fact done exactly.

Figure 7Figure 7

The conversion example shown earlier can be used to illustrate the effect. We repeat the exponent reduction in Figure 7 with truncation after one and after two bigdigits (base 1000 in this example). The error threshold in this simple example would have been 5, and this last fraction bigdigit exceeding 995 would have been judged “suspicious,” for a good reason: The exact answer is indeed different before the decimal point.

The example shown in Figure 8 illustrates that a truncated result may look very suspicious and yet be correct. It shows a different kind of difficult number.The exact and truncated values are shown side by side here. The number is 5.29097127e­7.

Figure 8Figure 8

  Difficult numbers
A difficult number is one whose binary (decimal) representation has a very long run of zeros or ones (nines) near the rounding point. Such a long run permits tiny errors to affect the bits that are to be retained, because of carry propagation.

In the context of our conversion algorithm, they are numbers that cause the initial (integral) bignum bigbeg to be transformed to a near-integer result bigend by multiplying it by a factor of the form 2x/5y whose value is close to 1 (in the range 1/5 to 5; x and y are integers of the same sign):

bigend = bigbeg × open parenthesis 2x close parenthesis ± tinyfraction.

5y

If tinyfraction = 0, then one of bigbeg or bigend is a power of 2 and the other is a power of 5.

The continued-fraction expansion of a real number leads to a sequence of progressively better “best” rational approximations of the number, where “best” means that there is no better rational approximation with a smaller (reduced) denominator. Continued-fraction expansion of a rational number such as 2x/5y leads to a finite sequence of rational approximations, the last of which is exact. The point is that the given numerator and denominator may be very large, whereas the partial convergents start out as very simple fractions (the crudest being 1/1 when the rational number is between 1/2 and 2), and require progressively more digits as the approximations converge to the true value.

Our difficult numbers correspond to partial convergents of such fractions. The examples used above were found by looking for nine-digit partial convergents for increasing exponents x in 2x/5y (with y chosen to keep the ratio close to one). One can find thousands of examples of this kind for single and double precision, and hundreds of thousands for extended precision, dispelling any hope of simply checking for a handful of exceptions. (For large exponents, as are encountered with extended precision, each continued-fraction expansion may yield several thousand partial convergents, and for each power of ten there are four or five ratios in the range (1/5, 5) to try, since 24 < 52 < 25. Not every convergent leads to a good difficult number (because the run of zeros or nines may be too short, or may not occur near enough to the rounding point), but a significant proportion of them does.

The length of a run of zeros or nines (ones) can be shown to be bounded by the size (in digits or bits) of the initial bignum plus the size of the partial quotients at the point where the partial convergent is taken. This means that if we could bound the size of the partial quotients that occur in the continued-fraction expansions of near-unity ratios of the form 2x/5y, we could bound the bignum fraction length required to guarantee the correctness of the integral part without having to resort to exact multiplication.

The number of possibly relevant expansions is bounded by the exponent range of the format being converted, so a bound on partial quotient size can be obtained by exhaustive search. This is time-consuming and not very elegant, but preliminary experiments suggest that this bound may be surprisingly small: The largest partial quotient encountered across the range of possible exponents had nine digits (28 bits). This is small, considering that the corresponding convergents contain up to 21000 digits. (For the range of exponents possible with double precision, the largest partial quotient was the seven-digit number 7651576 (23 bits).

A future implementation of the conversion algorithm may be able to take advantage of this observation, if confirmed or rigorously proved. It would be especially advantageous (in terms of the required resources) in cases in which the size of the decimal input or output is restricted to a number of digits commensurate with the binary precision (9, 17, or 35 digits), because the bignum fraction would have to be only one or two bigdigits longer than the integral part (which in turn would require at most four bigdigits).

Thanks to a single Assembler H rounding error discovered in the late 1980s, we are now able to generate many more difficult numbers. Here are some examples:

DC E'.1053771313464019060319004056804E-41'
31-digit HFP single
DC D'.303325544866797714604E-10'
21-digit HFP double
DC L'.8031692147E-10'
10-digit HFP extended

High-Level Assembler Release 3 (see the subsection on the High-Level Assembler) supports the new conversion routines in addition to the old ones, so the reader might like to compare results for these difficult numbers. In particular, comparing types D, DH, and L for the HFP double example, or types E, EH, and D for the HFP single example, demonstrates that the old types (E and D) fail to round up. The single-precision number is particularly stunning, because it looks like an exact tie even in extended precision; it takes 15 bits more than 128-bit extended precision to show that there are additional nonzero bits, and that the number should be rounded up even if IEEE rounding were applied, where an exact tie should have rounded down.

  Conversion resource requirements
The resources required for conversion are time and scratch storage. The primary factors affecting storage requirements are the covered exponent range and the number of decimal digits allowed. The decimal size may be unbounded, because there is a natural bound derived from the exponent range, as was mentioned earlier. This bound is 126 digits for single, 752 for double, and 11503 for extended precision.

The most expensive single operation (in terms of execution time) is the decimal-to-binary conversion of the bignum integer; it is quadratic in the number of bigdigits. This is an issue only when thousands of digits (leading to hundreds of bigdigits) are given for extended precision (because of the implied maximum for double and single precision). In practice, only as many digits as make sense for the desired precision are given (some languages do not even allow more), in which case there are at most four integral bigdigits, and this is not an issue.

Exponents near the extreme of the range require many exponent reduction steps (bignum multiplications where one factor has a single bigdigit). This number is roughly one-tenth of the absolute value of the decimal exponent. For difficult numbers, the exact retry can be expensive (because of the growing fraction), but one would have to compile a large table where all numbers are difficult for this to be a problem.

The storage requirements for the maximum possible fraction growth must be met, however, because no matter how rare the need may be, a failure cannot be tolerated. In the case of decimal-to-binary conversion, negative exponents are doubly expensive: Exponent reduction reduces the remaining decimal exponent by only 9 per step (instead of 13 or 14 for positive exponent reduction), and the largest starting exponent (in absolute value) can be very large because it must account for the input precision (the given number of digits) as well as for possible denormals. The worst case is extended-precision Dmin (roughly 4e­4966) given as an 11503-digit exact decimal. The starting decimal exponent in that case would be ­16470, requiring up to 1830 fraction bigdigits! (This is in addition to the 1279 bigdigits required to hold the integer part of this bignum.)

The work area size (in bytes) turns out to be 256 + 0.45Emax + 1.34Dmax, where Emax is the highest decimal fraction exponent (38, 307, or 4915) and Dmax is the given or implied bound on the number of decimal digits. This is at most 18 KB (extended precision, unbounded decimal length). It is 2.5 KB when restricted to 36 digits and only a few hundred bytes when further restricted to double. For double with unbounded decimal length (e.g., as required for the Java language) it is about 1.5 KB. When constraints are known in advance, a fixed work area can be used; alternatively, when the conversion fails because of insufficient room, one can expand the work area dynamically and then retry the conversion.

(If the bound on continued-fraction expansions is confirmed, the dependency on Emax would be removed from the space requirement, and a few hundred bytes would suffice even for extended precision if the decimal precision is limited to, say, 36 digits.)

  Extra considerations for HLASM
The Assembler has traditionally supported the generation of artificially shortened constants by means of an explicit length declaration (overriding the natural length of the format). This length could even be specified in bits, e.g., DL.12'1.2,3.4' would assemble two 12-bit “double-precision” floating-point numbers in three bytes, with rounding in the last retained bit position.

The generic conversion algorithm described here is totally unconcerned about this, since it supports any fraction length, and there is nothing special about the lengths implied by the standard formats.

The High-Level Assembler was thus able to retain this idiosyncrasy for binary floating-point formats (it is difficult to call DBL.27'123' an “IEEE Double”).

Software support

Support for IEEE Floating-Point involves both hardware and software. In this section, software-related aspects are described.

  OS/390 Control Program support
The OS/390 Base Control Program (BCP) and Base UNIX System Services were enhanced to support IEEE Floating-Point.

Base Control Program support
The OS/390 Base Control Program [27­29] support for IEEE Floating-Point underpins all of the higher layers of software. Since it is more convenient for the upper layers, especially applications, to base their initial use of IEEE Floating-Point on a given level of OS/390, rather than being restricted to S/390 processors that support the FP extensions, the BCP also simulates the new instructions and registers at the machine-code level.

Support of hardware
BCP support for the FP extensions consists of detecting the presence of the new hardware and enabling its use by software, including the new registers in status saving, handling new hardware exceptions, and extending RAS support to include the new registers.

Detection and enablement     Although the FP extensions are always available on all of the CPUs of a G5 processor, it would be dangerous to allow an application to use these facilities on an older level of the BCP, since the AFPRs and FPC register would not be saved and restored on interrupts and task switches. Therefore, as described above in the subsection on activating additional registers, the BCP must enable use of these facilities by setting on the CPU’s AFPR-control bit (CR 0.13). Besides providing compatibility, this requirement has the additional benefit of reducing supervisor overhead for saving and restoring the new registers; this overhead is incurred only for dispatchable units4 that actually use them.

During IPL the BCP determines whether the FP extensions are available. Because multiple programs may wish to check for its presence, a general use programming interface (GUPI) is provided by a flag in the communications vector table (CVT). For programs that require the capabilities of the FP extensions but make only light use of them, a second CVT flag indicates whether floating-point simulation is available.

After IPL, if the FP extensions are available, the BCP selectively enables its use by application programs. Enablement occurs transparently to the dispatchable unit as follows:

  1. When a program tries to execute a BFP instruction or use an AFPR with CR 0.13 off, the CPU presents a data exception with a special DXC value.
  2. The program check handler in the BCP recognizes the special exception and enables CR 0.13 and supervisor status saving of the AFPRs and FPC register. For a task, a flag is set in the secondary task control block (STCB) to allow upper software layers to determine whether it is using the new registers.
  3. The program check handler then redrives the instruction, and the program continues execution.

The IEEE 754 standard defines default settings for the rounding mode and exception mask. When the BCP enables a dispatchable unit to use the FP extensions, it establishes these defaults by clearing the FPC register to zero.5

Normally, once a dispatchable unit is enabled to use the FP extensions, it remains enabled for its lifetime, and the system incurs a small status-saving overhead even if the dispatchable unit stops using the new registers. While not a significant penalty for normal applications, a long-lived dispatchable unit might wish to disable use of the new registers and discontinue the associated status saving when these are no longer needed. For example, a transaction manager scheduling many transactions under the same task, where only a few transactions use FP extensions, might find this advantageous. A BCP service provides the disable function.

New register status saving     The supervisor must include the AFPRs and FPC register when saving and restoring status for dispatchable units that are enabled to use them. This occurs when the dispatcher switches the current dispatchable unit and on some interrupts. Although floating-point is normally used by applications running in task mode, the new registers must also be included in service request block (SRB) status saving for subsystems such as DB2 that use SRBs to process application requests.

Another area where the BCP could choose to save and restore the AFPRs and FPC register is in system linkages, such as LINK and XCTL, and supervisor calls (SVCs). Similarly, the hardware could add the new registers to the status saved or restored by the PC/PR/PT and BAKR instructions. It was decided, however, to continue the previous policy of allowing the FPRs to be managed by the application within the scope of a dispatchable unit. Besides being compatible with the past, this approach

  • Reduces status-saving overhead, since system code rarely uses the FPRs.
  • Avoids development cost in the BCP to add floating-point status saving at multiple points.
  • Avoids behavior and performance changes for the PC/PR/PT and BAKR instructions.
  • Allows the application program full control of the FPRs.

The last point is also a drawback, however, since an application program must be aware that it is responsible for saving and restoring the new registers across these linkages, if needed.

Hardware exceptions     The BCP has minimal involvement in handling IEEE Floating-Point exceptions. The extended synchronous program interrupt exit (ESPIE) interface allows an application (or run-time environment) to enable for certain HFP exceptions (in the program mask field of the PSW) and gain control in an exit routine when they occur. For BFP, all IEEE exceptions occur as data exceptions. The application manages the exception mask and flags in the FPC register. When an IEEE exception occurs, the program check first-level interrupt handler (FLIH) processes the data exception normally. The only additional support required is to copy the DXC value from low storage and pass it to the application program’s ESPIE routine or, via the recovery termination manager (RTM), to the recovery routine for the dispatchable unit [functional recovery routine (FRR) or extended specify task abnormal exit (ESTAE)], if either of these is established. The ESPIE or recovery routine accesses the FPRs directly, if needed.

Reliability, availability, and serviceability (RAS)     A critical area of system reliability and availability involves machine check handling and alternate CPU recovery. In these cases, an error in the CPU has occurred and its status may have to be saved for software recovery, including possibly moving the current dispatchable unit to another CPU. To enable such recovery, the G5 processor includes the AFPRs and FPC register as part of the CPU status. For serviceability, the AFPRs and FPC register are included and formatted in the logging of CPU errors. The new registers are saved when a standalone dump is taken in the event of a system failure. Application and system dumps and formatters also include the new registers.

Scheduling IEEE Floating-Point work in a sysplex     Applications using the FP extensions and running in a parallel sysplex environment must be scheduled on a system with the hardware, or at least the software, that supports them. This requirement was anticipated in an earlier OS/390 release by the WorkLoad Manager (WLM) [16]. The WLM allows an installation to identify jobs that require specific hardware, software, or data resources and to define systems as having, or not having, those resources. The WLM then ensures that jobs are scheduled only on systems with the required resources. This capability can be used to direct applications using the FP extensions to systems in a sysplex running on a G5 processor or, for light use, to systems with floating-point simulation.

Floating-point instruction simulator
The primary goal of the simulator is to allow application programs that exploit the FP extensions to run on OS/390 Version 2, Release 6 when a G5 processor is not available. With the exception of speed, a program will function identically. Secondary goals are to minimize the differences to ancillary functions such as tracing when using simulation and to make the simulation paths relatively efficient.

Implementation     Simulation of the FP extensions entails not only performing floating-point operations and detecting associated exception cases, but also detecting most other conditions that can occur when a CPU executes an S/390 instruction (e.g., storage access and specification exceptions). To perform this kind of simulation, software must

  1. Receive control on each instruction to be simulated.
  2. Determine the instruction (operation code) to be simulated.
  3. Decode the instruction, generate the result, and detect any exceptions that would occur in hardware execution.
  4. Return to the point that would receive control in hardware execution (continue to the next sequential instruction or process an exception).
It is more efficient to perform this level of simulation very close to the hardware, so it is implemented as a second-level interrupt handler (SLIH) that receives control from the program check first-level interrupt handler (FLIH).

The FLIH receives control for instruction simulation when the FP extensions are not available and the CPU encounters either a new floating-point instruction (operation exception) or an existing floating-point instruction using an AFPR (specification exception). Normally these exceptions would be presented to the program’s ESPIE or recovery routine as an error. In OS/390 Version 2, Release 6, however, when not running on a G5 processor, the FLIH invokes the simulation SLIH. To avoid clogging the system trace table with unnecessary entries, program interruptions for simulated instructions are not traced.

The simulation SLIH fetches the instruction and determines whether the operation is one of those to be simulated. To detect storage access exceptions that the user’s program would trigger if running on the actual hardware, the SLIH runs in the address-space control (ASC) mode of the user’s program to fetch the instruction, and accesses user storage with the user’s key.

The job of fetching the instruction is further complicated by the fact that the instruction typically resides in pageable storage, but the SLIH is running with the CPU disabled for interrupts. If a page fault occurs, it must be treated like a normal page fault. This is done by setting a footprint flag before the instruction fetch, so that if the FLIH is re-entered for a page fault, it is known that this happened on a simulated instruction. The FLIH resets the environment so that the page fault appears to have occurred on the original instruction and is handled normally.

A table-driven routine is used to decode the instruction and invoke the processing routine for it. The task of creating software routines to faithfully imitate the more than 100 new floating-point instructions in the G5 processor was made less daunting by reusing the algorithms from a hardware test tool. Using the same algorithms in the software simulation also helped to ensure that results were identical with those of the actual G5 hardware.

The algorithms simulate all functions of the hardware, including setting the condition code, checking and setting fields in the FPC register, and detecting exception conditions. The original FPRs (0, 2, 4, and 6) are used to improve performance of the simulation. Page faults that occur when fetching operands or storing a result are handled as described earlier for instruction fetch.

If no exceptions are detected by the simulation, the SLIH returns to the FLIH to continue with the next sequential instruction in the user’s program. If an exception is detected, the SLIH directs the FLIH to change the original program interruption to the appropriate exception type and handle it as if it had occurred in the hardware.

Validation     To ensure that it produces results identical to those of the G5 processor, the floating-point simulator underwent rigorous testing using the same test suite as the hardware test tool.

Performance and side effects     Naturally, the CPU time used to simulate a floating-point instruction is vastly greater than with the G5 hardware. On a per-instruction basis, the simulator is at least 100 times slower.

Time spent in the simulator could be counted as CPU time for the user’s program (“captured” time) or considered to be system overhead (“uncaptured” time). Although in all other ways the simulator behaves identically to the hardware, the need to account for the CPU resource consumed by user programs requires that simulator time be captured. As a result, performance reporting tools such as the Resource Measurement Facility (RMF) show higher CPU times for jobs run using the simulator.

Base UNIX System Services support
OS/390 includes standard UNIX functions that replicate, reuse, or create new environments [17­