INTEL 80287 PROGRAMMER'S REFERENCE MANUAL 1987 Intel Corporation makes no warranty for the use of its products and assumes no responsibility for any errors which may appear in this document nor does it make a commitment to update the information contained herein. Intel retains the right to make changes to these specifications at any time, without notice. Contact your local sales office to obtain the latest specifications before placing your order. The following are trademarks of Intel Corporation and may only be used to identify Intel Products: Above, BITBUS, COMMputer, CREDIT, Data Pipeline, FASTPATH, Genius, i, î, ICE, iCEL, iCS, iDBP, iDIS, I²ICE, iLBX, im, iMDDX, iMMX, Inboard, Insite, Intel, intel, intelBOS, Intelevision, inteligent Identifier, inteligent Programming, Intellec, Intellink, iOSP, iPDS, iPSC, iRMX, iSBC, iSBX, iSDM, iSXM, KEPROM, Library Manager, MAP-NET, MCS, Megachassis, MICROMAINFRAME, MULTIBUS, MULTICHANNEL, MULTIMODULE, MultiSERVER, ONCE, OpenNET, OTP, PC-BUBBLE, Plug-A-Bubble, PROMPT, Promware, QUEST, QueX, Quick-Pulse Programming, Ripplemode, RMX/80, RUPI, Seamless, SLD, UPI, and VLSiCEL, and the combination of ICE, iCS, iRMX, iSBC, iSBX, MCS, or UPI and a numerical suffix, 4-SITE. MDS is an ordering code only and is not used as a product name or trademark. MDS(R) is a registered trademark of Mohawk Data Sciences Corporation. *MULTIBUS is a patented Intel bus. Additional copies of this manual or other Intel literature may be obtained from: Intel Corporation Literature Distribution Mail Stop SC6-59 3065 Bowers Avenue Santa Clara, CA 95051 (c)INTEL CORPORATION 1987 CG-10/86 Preface ─────────────────────────────────────────────────────────────────────────── An Introduction to the 80287 This supplement describes the 80287 Numeric Processor Extension (NPX) for the 80286 microprocessor. Below is a brief overview of 80286 concepts, along with some of the nomenclature used throughout this and other Intel publications. The 80286 Microsystem The 80286 is a new VLSI microprocessor system with exceptional capabilities for supporting large-system applications. Based on a new-generation CPU (the Intel 80286), this powerful microsystem is designed to support multiuser reprogrammable and real-time multitasking applications. Its dedicated system support circuits simplify system hardware; sophisticated hardware and software tools reduce both the time and the cost of product development. The 80286 is a virtual-memory microprocessor with on-chip memory management and protection. The 80286 microsystem offers a total-solution approach, enabling you to develop high-speed, interactive, multiuser, multitasking── and multiprocessor──systems more rapidly and at higher performance than ever before. ■ Reliability and system up-time are becoming increasingly important in all applications. Information must be protected from misuse or accidental loss. The 80286 includes a sophisticated and flexible four-level protection mechanism that isolates layers of operating system programs from application programs to maintain a high degree of system integrity. ■ The 80286 provides 16 megabytes of physical address space to support today's application requirements. This large physical memory enables the 80286 to keep many large programs and data structures simultaneously in memory for high-speed access. ■ For applications with dynamically changing memory requirements, such as multiuser business systems, the 80286 CPU provides on-chip memory management and virtual memory support. On an 80286-based system, each user can have up to a gigabyte (2^(30) bytes) of virtual-address space. This large address space virtually eliminates restrictions on the number or size of programs that may be part of the system. ■ Large multiuser or real-time multitasking systems are easily supported by the 80286. High-performance features, such as a very high-speed task switch, fast interrupt-response time, inter-task protection, and a quick and direct operating system interface, make the 80286 highly suited to multiuser/multitasking applications. ■ The 80286 has two operating modes: Real-Address mode and Protected-Address mode. In Real-Address mode, the 80286 is fully compatible with the 8086, 8088, 80186, and 80188 microprocessors; all of the extensive libraries of 8086 and 8088 software execute four to six times faster on the 80286, without any modification. ■ In Protected-Address mode, the advanced memory management and protection features of the 80286 become available, without any reduction in performance. Upgrading 8086 and 8088 application programs to use these new memory management and protection features usually requires only reassembly or recompilation (some programs may require minor modification). This compatibility between 80286 and 8086 processor families reduces both the time and the cost of software development. The Organization of This Manual This manual describes the 80287 Numeric Processor Extension (NPX) for the 80286 microprocessor. The material in this manual is presented from the perspective of software designers, both at an applications and at a systems software level. ■ Chapter One, "Overview of Numeric Processing," gives an overview of the 80287 NPX and reviews the concepts of numeric computation using the 80287. ■ Chapter Two, "Programming Numeric Applications," provides detailed information for software designers generating applications for systems containing an 80286 CPU with an 80287 NPX. The 80286/80287 instruction set mnemonics are explained in detail, along with a description of programming facilities for these systems. A comparative 80287 programming example is given. ■ Chapter Three, "System-Level Numeric Programming," provides information of interest to systems software writers, including details of the 80287 architecture and operational characteristics. ■ Chapter Four, "Numeric Programming Examples," provides several detailed programming examples for the 80287, including conditional branching, the conversion between floating-point values and their ASCII representations, and the calculation of several trigonometric functions. These examples illustrate assembly-language programming on the 80287 NPX. ■ Appendix A, "Machine Instruction Encoding and Decoding," gives reference information on the encoding of NPX instructions. ■ Appendix B, "Compatability between the 80287 NPX and the 8087," describes the differences between the 80287 and the 8087. ■ Appendix C, "Implementing the IEEE P754 Standard," gives details of the IEEE P754 Standard. ■ The Glossary defines 80287 and floating-point terminology. Refer to it as needed. Related Publications To best use the material in this manual, readers should be familiar with the operation and architecture of 80286 systems. The following manuals contain information related to the content of this supplement and of interest to programmers of 80287 systems: ■ Introduction to the 80286, order number 210308 ■ ASM286 Assembly Language Reference Manual, order number 121924 ■ 80286 Operating System Writer's Guide, order number 121960 ■ 80286 Hardware Reference Manual, order number 210760 ■ Microprocessor and Peripheral Handbook, order number 210844 ■ PL/M-286 User's Guide, order number 121945 ■ 80287 Support Library Reference Manual, order number 122129 ■ 8086 Software Toolbox Manual, order number 122203 (includes information about 80287 Emulator Software) Notational Conventions This manual uses special notation to represent sub- and superscript characters. Subscript characters are surrounded by {curly brackets}, for example 10{2} = 10 base 2. Superscript characters are preceeded by a caret and enclosed within (parentheses), for example 10^(3) = 10 to the third power. Table of Contents ─────────────────────────────────────────────────────────────────────────── Preface Chapter 1 Overview of Numeric Processing Introduction to the 80287 Numeric Processor Extension Performance Ease of Use Applications Upgradability Programming Interface Hardware Interface 80287 Numeric Processor Architecture The NPX Register Stack The NPX Status Word Control Word The NPX Tag Word The NPX Instruction and Data Pointers Computation Fundamentals Number System Data Types and Formats Binary Integers Decimal Integers Real Numbers Rounding Control Precision Control Infinity Control Special Computational Situations Special Numeric Values Nonnormal Real Numbers Denormals and Gradual Underflow Unnormals──Descendents of Denormal Operands Zeros and Pseudo Zeros Infinity NaN (Not a Number) Indefinite Encoding of Data Types Numeric Exceptions Invalid Operation Zero Divisor Denormalized Operand Numeric Overflow and Underflow Inexact Result Handling Numeric Errors Automatic Exception Handling Software Exception Handling Chapter 2 Programming Numeric Applications The 80287 NPX Instruction Set Compatibility with the 8087 NPX Numeric Operands Data Transfer Instructions Arithmetic Instructions Comparison Instructions Transcendental Instructions Constant Instructions Processor Control Instructions Instruction Set Reference Information Instruction Execution Time Bus Transfers Instruction Length Programming Facilities High-Level Languages PL/M-286 ASM286 Defining Data Records and Structures Addressing Modes Comparative Programming Example 80287 Emulation Concurrent Processing with the 80287 Managing Concurrency Instruction Synchronization Data Synchronization Error Synchronization Incorrect Error Synchronization Proper Error Synchronization Chapter 3 System-Level Numeric Programming 80287 Architecture Processor Extension Data Channel Real-Address Mode and Protected Virtual-Address Mode Dedicated and Reserved I/O Locations Processor Initialization and Control System Initialization Recognizing the 80287 NPX Configuring the Numerics Environment Initializing the 80287 80287 Emulation Handling Numeric Processing Exceptions Simultaneous Exception Response Exception Recovery Examples Chapter 4 Numeric Programming Examples Conditional Branching Examples Exception Handling Examples Floating-Point to ASCII Conversion Examples Function Partitioning Exception Considerations Special Instructions Description of Operation Scaling the Value Inaccuracy in Scaling Avoiding Underflow and Overflow Final Adjustments Output Format Trigonometric Calculation Examples FPTAN and FPREM Cosine Uses Sine Code Appendix A Machine Instriction Encoding and Decoding Appendix B Compatibility Between the 80287 NPX and the 8087 Appendix C Implementing The IEEE P754 Standard Options implemented in the 80287 Areas of the Standard Implemented in Software Additional Software to Meet the Standard Glossary of 80287 and Floating-Point Terminology Index Figures 1-1 Evolution and Performance of Numeric Processors 1-2 80287 NPX Block Diagram 1-3 80287 Register Set 1-4 80287 Status Word 1-5 80287 Control Word Format 1-6 80287 Tag Word Format 1-7 80287 Instruction and Data Pointer Image in Memory 1-8 80287 Number System 1-9 Data Formats 1-10 Projective versus Affine Closure 1-11 Arithmetic Example Using Infinity 2-1 FSAVE/FRSTOR Memory Layout 2-2 FSTENV/FLDENV Memory Layout 2-3 Sample 80287 Constants 2-4 Status Word RECORD Definition 2-5 Structure Definition 2-6 Sample PL/M-286 Program 2-7 Sample ASM286 Program 2-8 Instructions and Register Stack 2-9 Synchronizing References to Shared Data 2-10 Documenting Data Synchronization 2-11 Nonconcurrent FIST Instruction Code Macro 2-12 Error Synchronization Examples 3-1 Software Routine to Recognize the 80287 4-1 Conditional Branching for Compares 4-2 Conditional Branching for FXAM 4-3 Full-State Exception Handler 4-4 Reduced-Latency Exception Handler 4-5 Reentrant Exception Handler 4-6 Floating-Point to ASCII Conversion Routine 4-7 Calculating Trigonometric Functions Tables 1-1 Numeric Processing Speed Comparisons 1-2 Numeric Data Types 1-3 Principal NPX Instructions 1-4 Interpreting the NPX Condition Codes 1-5 Real Number Notation 1-6 Rounding Modes 1-7 Denormalization Process 1-8 Exceptions Due to Denormal Operands 1-9 Unnormal Operands and Results 1-10 Zero Operands and Results 1-11 Masked Overflow Response with Directed Rounding 1-12 Infinity Operands and Results 1-13 Binary Integer Encodings 1-14 Packed Decimal Encodings 1-15 Real and Long Real Encodings 1-16 Temporary Real Encodings 1-17 Exception Conditions and Masked Responses 2-1 Data Transfer Instructions 2-2 Arithmetic Instructions 2-3 Basic Arithmetic Instructions and Operands 2-4 Condition Code Interpretation after FPREM 2-5 Comparison Instructions 2-6 Condition Code Interpretation after FCOM 2-7 Condition Code Interpretation after FTST 2-8 FXAM Condition Code Settings 2-9 Transcendental Instructions 2-10 Constant Instructions 2-11 Processor Control Instructions 2-12 Key to Operand Types 2-13 Execution Penalties 2-14 Instruction Set Reference Data 2-15 PL/M-286 Built-In Procedures 2-16 80287 Storage Allocation Directives 2-17 Addressing Mode Examples 3-1 NPX Processor State Following Initialization 3-2 Precedence of NPX Exceptions A-1 80287 Instruction Encoding A-2 Machine Instruction Decoding Guide Chapter 1 Overview of Numeric Processing ─────────────────────────────────────────────────────────────────────────── The 80287 NPX is a high-performance numerics processing element that extends the 80286 architecture by adding significant numeric capabilities and direct support for floating-point, extended-integer, and BCD data types. The 80286 CPU with 80287 NPX easily supports powerful and accurate numeric applications through its implementation of the proposed IEEE 754 Standard for Binary Floating-Point Arithmetic. Introduction to the 80287 Numeric Processor Extension The 80287 Numeric Processor Extension (NPX) is highly compatible with its predecessor, the earlier Intel 8087 NPX. The 8087 NPX was designed for use in 8086-family systems. The 8086 was the first microprocessor family to partition the processing unit to permit high-performance numeric capabilities. The 8087 NPX for this processor family implemented a complete numeric processing environment in compliance with the proposed IEEE 754 Floating-Point Standard. With the 80287 Numeric Processor Extension, high-speed numeric computations have been extended to 80286 high-performance multi-tasking and multi-user systems. Multiple tasks using the numeric processor extension are afforded the full protection of the 80286 memory management and protection features. Figure 1-1 illustrates the relative performance of 8-MHz 8086/8087 and 80286/80287 systems in executing numerics-oriented applications. Figure 1-1. Evolution and Performance of Numeric Processors DOUBLE-PRECISION ▲ ┌─────────────┐ WHETSTONE │ │ 80286/80287 │ PERFORMANCE │ └──────•──────┘ (KOPS) 200 ─┼─ • │ • │ ┌─────•─────┐ │ │ 8086/8087 │ │ └───────────┘ │ 100 ─┼─ │ │ │ │ └────────────┼────────────┼───────────► 1980 1983 YEAR INTRODUCED Performance Table 1-1 compares the execution times of several 80287 instructions with the equivalent operations executed in software on an 8-MHz 80286. The software equivalents are highly-optimized assembly-language procedures from the 80287 emulator. As indicated in the table, the 80287 NPX provides about 50 to 100 times the performance of software numeric routines on the 80286 CPU. An 8-MHz 80287 multiplies 32-bit and 64-bit real numbers in about 11.9 and 16.9 microseconds, respectively. Of course, the actual performance of the NPX in a given system depends on the characteristics of the individual application. Although the performance figures shown in table 1-1 refer to operations on real (floating-point) numbers, the 80287 also manipulates fixed-point binary and decimal integers of up to 64 bits or 18 digits, respectively. The 80287 can improve the speed of multiple-precision software algorithms for integer operations by 10 to 100 times. Because the 80287 NPX is an extension of the 80286 CPU, no software overhead is incurred in setting up the NPX for computation. The 80287 and 80286 processors coordinate their activities in a manner transparent to software. Moreover, built-in coordination facilities allow the 80286 CPU to proceed with other instructions while the 80287 NPX is simultaneously executing numeric instructions. Programs can exploit this concurrency of execution to further increase system performance and throughput. Table 1-1. Numeric Processing Speed Comparisons Approximate Performance Ratios: 8 MHz 80287 to 8 MHz Protected Mode iAPX ┌────── Floating-Point Instruction ────────────┐ using E80287 FADD ST,ST (Temp Real) Addition 1: 42 FDIV DWORD PTR (Single-Precision) Division 1:266 FXAM (Stack(0) assumed) Examine 1:139 FYL2X (Stack(0),(1) assumed) Logarithm 1: 99 FPATAN (Stack(0) assumed) Arctangent 1:153 F2XM1 (Stack (0) assumed) Exponentiation 1: 41 Ease of Use The 80287 NPX offers more than raw execution speed for computation-intensive tasks. The 80287 brings the functionality and power of accurate numeric computation into the hands of the general user. Like the 8087 NPX that preceded it, the 80287 is explicitly designed to deliver stable, accurate results when programmed using straightforward "pencil and paper" algorithms. The IEEE 754 standard specifically addresses this issue, recognizing the fundamental importance of making numeric computations both easy and safe to use. For example, most computers can overflow when two single-precision floating-point numbers are multiplied together and then divided by a third, even if the final result is a perfectly valid 32-bit number. The 80287 delivers the correctly rounded result. Other typical examples of undesirable machine behavior in straightforward calculations occur when solving for the roots of a quadratic equation: -b ± √(b² - 4ac) ──────────────────── 2a for computing financial rate of return, which involves the expression: (1+i)^(n). On most machines, straightforward algorithms will not deliver consistently correct results (and will not indicate when they are incorrect). To obtain correct results on traditional machines under all conditions usually requires sophisticated numerical techniques that are foreign to most programmers. General application programmers using straightforward algorithms will produce much more reliable programs using the 80287. This simple fact greatly reduces the software investment required to develop safe, accurate computation-based products. Beyond traditional numerics support for scientific applications, the 80287 has built-in facilities for commercial computing. It can process decimal numbers of up to 18 digits without round-off errors, performing exact arithmetic on integers as large as 2^(64) or 10^(18). Exact arithmetic is vital in accounting applications where rounding errors may introduce monetary losses that cannot be reconciled. The NPX contains a number of optional facilities that can be invoked by sophisticated users. These advanced features include two models of infinity, directed rounding, gradual underflow, and either automatic or programmed exception-handling facilities. These automatic exception-handling facilities permit a high degree of flexibility in numeric processing software, without burdening the programmer. While performing numeric calculations, the NPX automatically detects exception conditions that can potentially damage a calculation. By default, on-chip exception handlers may be invoked to field these exceptions so that a reasonable result is produced, and execution may proceed without program interruption. Alternatively, the NPX can signal the CPU, invoking a software exception handler whenever various types of exceptions are detected. Applications The NPX's versatility and performance make it appropriate to a broad array of numeric applications. In general, applications that exhibit any of the following characteristics can benefit by implementing numeric processing on the 80287: ■ Numeric data vary over a wide range of values, or include nonintegral values. ■ Algorithms produce very large or very small intermediate results. ■ Computations must be very precise; i.e., a large number of significant digits must be maintained. ■ Performance requirements exceed the capacity of traditional microprocessors. ■ Consistently safe, reliable results must be delivered using a programming staff that is not expert in numerical techniques. Note also that the 80287 can reduce software development costs and improve the performance of systems that use not only real numbers, but operate on multiprecision binary or decimal integer values as well. A few examples, which show how the 80287 might be used in specific numerics applications, are described below. In many cases, these types of systems have been implemented in the past with minicomputers. The advent of the 80287 brings the size and cost savings of microprocessor technology to these applications for the first time. ■ Business data processing──The NPX's ability to accept decimal operands and produce exact decimal results of up to 18 digits greatly simplifies accounting programming. Financial calculations that use power functions can take advantage of the 80287's exponentiation and logarithmic instructions. ■ Process control──The 80287 solves dynamic range problems automatically, and its extended precision allows control functions to be fine-tuned for more accurate and efficient performance. Control algorithms implemented with the NPX also contribute to improved reliability and safety, while the 80287's speed can be exploited in real-time operations. ■ Computer numerical control (CNC)──The 80287 can move and position machine tool heads with accuracy in real-time. Axis positioning also benefits from the hardware trigonometric support provided by the 80287. ■ Robotics──Coupling small size and modest power requirements with powerful computational abilities, the NPX is ideal for on-board six-axis positioning. ■ Navigation──Very small, lightweight, and accurate inertial guidance systems can be implemented with the 80287. Its built-in trigonometric functions can speed and simplify the calculation of position from ■ Graphics terminals──The 80287 can be used in graphics terminals to locally perform many functions that normally demand the attention of a main computer; these include rotation, scaling, and interpolation. By also using an 82720 Graphics Display Controller to perform high speed data transfers, very powerful and highly self-sufficient terminals can be built from a relatively small number of 80286 family parts. ■ Data acquisition──The 80287 can be used to scan, scale, and reduce large quantities of data as it is collected, thereby lowering storage requirements and time required to process the data for analysis. The preceding examples are oriented toward traditional numerics applications. There are, in addition, many other types of systems that do not appear to the end user as computational, but can employ the 80287 to advantage. Indeed, the 80287 presents the imaginative system designer with an opportunity similar to that created by the introduction of the microprocessor itself. Many applications can be viewed as numerically-based if sufficient computational power is available to support this view. This is analogous to the thousands of successful products that have been built around "buried" microprocessors, even though the products themselves bear little resemblance to computers. Upgradability The architecture of the 80286 CPU is specifically adapted to allow easy upgradability to use an 80287, simply by plugging in the 80287 NPX. For this reason, designers of 80286 systems may wish to incorporate the 80287 NPX into their designs in order to offer two levels of price and performance at little additional cost. Two features of the 80286 CPU make the design and support of upgradable 80286 systems particularly simple: ■ The 80286 can be programmed to recognize the presence of an 80287 NPX; that is, software can recognize whether it is running on an 80286 or an 80287 system. ■ After determining whether the 80287 NPX is available, the 80286 CPU can be instructed to let the NPX execute all numeric instructions. If an 80287 NPX is not available, the 80286 CPU can emulate all 80287 numeric instructions in software. This emulation is completely transparent to the application software──the same object code may be used by both 80286 and 80287 systems. No relinking or recompiling of application software is necessary; the same code will simply execute faster on the 80287 than on the 80286 system. To facilitate this design of upgradable 80286 systems, Intel provides a software emulator for the 80287 that provides the functional equivalent of the 80287 hardware, implemented in software on the 80286. Except for timing, the operation of this 80287 emulator (E80287) is the same as for the 80287 NPX hardware. When the emulator is combined as part of the systems software, the 80286 system with 80287 emulation and the 80286 with 80287 hardware are virtually indistinguishable to an application program. This capability makes it easy for software developers to maintain a single set of programs for both systems. System manufacturers can offer the NPX as a simple plug-in performance option without necessitating any changes in the user's software. Programming Interface The 80286/80287 pair is programmed as a single processor; all of the 80287 registers appear to a programmer as extensions of the basic 80286 register set. The 80286 has a class of instructions known as ESCAPE instructions, all having a common format. These ESC instructions are numeric instructions for the 80287 NPX. These numeric instructions for the 80287 are simply encoded into the instruction stream along with 80286 instructions. All of the CPU memory-addressing modes may be used in programming the NPX, allowing convenient access to record structures, numeric arrays, and other memory-based data structures. All of the memory management and protection features of the CPU are extended to the NPX as well. Numeric processing in the 80287 centers around the NPX register stack. Programmers can treat these eight 80-bit registers as either a fixed register set, with instructions operating on explicitly-designated registers, or a classical stack, with instructions operating on the top one or two stack elements. Internally, the 80287 holds all numbers in a uniform 80-bit temporary-real format. Operands that may be represented in memory as 16-, 32-, or 64-bit integers, 32-, 64-, or 80-bit floating-point numbers, or 18-digit packed BCD numbers, are automatically converted into temporary-real format as they are loaded into the NPX registers. Computation results are subsequently converted back into one of these destination data formats when they are stored into memory from the NPX registers. Table 1-2 lists each of the seven data types supported by the 80287, showing the data format for each type. All operands are stored in memory with the least significant digits starting at the initial (lowest) memory address. Numeric instructions access and store memory operands using only this initial address. For maximum system performance, all operands should start at even memory addresses. Table 1-3 lists the 80287 instructions by class. No special programming tools are necessary to use the 80287, because all of the NPX instructions and data types are directly supported by the ASM286 Assembler and Intel's appropriate high-level languages. Software routines for the 80287 may be written in ASM286 Assembler or any of the following higher-level languages: PL/M-286 PASCAL-286 FORTRAN-286 C-286 In addition, all of the development tools supporting the 8086 and 8087 can also be used to develop software for the 80286 and 80287 operating in Real-Address mode. All of these high-level languages provide programmers with access to the computational power and speed of the 80287 without requiring an understanding of the architecture of the 80286 and 80287 chips. Such architectural considerations as concurrency and data synchronization are handled automatically by these high-level languages. For the ASM286 programmer, specific rules for handling these issues are discussed in a later section of this supplement. Table 1-2. Numeric Data Types Significant Data Type Bits Digits Approximate Range (Decimal) (Decimal) Word integer 16 4 -32,768 ≤ X ≤ +32,767 Short integer 32 9 -2*10^(9) ≤ X ≤ +2*10^(9) Long integer 64 18 -9*10^(18) ≤ X ≤ +9*10^(18) Packed decimal 80 18 -99...99 ≤ X ≤ +99...99 (18 digits) Short real 32 6-7 8.43*10^(-37) ≤ │X│ ≤ 3.37*10^(38) Long real 64 15-16 4.19*10^(-307) ≤ │X│ ≤ 1.67*10^(308) Temporary real 80 19 3.4*10^(-4932) ≤ │X│ ≤ 1.2*10^(4932) Table 1-3. Principal NPX Instructions Class Instruction Types Data Transfer Load (all data types), Store (all data types), Exchange Arithmetic Add, Subtract, Multiply, Divide, Subtract Reversed, Divide Reversed, Square Root, Scale, Remainder, Integer Part, Change Sign, Absolute Value, Extract Comparison Compare, Examine, Test Transcendental Tangent, Arctangent, 2^(X) -1, Y*Log{2}(X+1), Y*Log{2}(X) Constants 0, 1, π, Log{10}2, Log{e}2, Log{2}10, Log2{e} Processor Load Control Word, Store Control Word, Store Status Word, Control Load Environment, Store Environment, Save, Restore, Clear Exceptions, Initialize, Set Protected Mode Hardware Interface As an extension of the 80286 processor, the 80287 is wired very much in parallel with the 80286 CPU. Four special status signals, PEREQ, PEACK, BUSY, and ERROR, permit the two processors to coordinate their activities. The 80287 NPX also monitors the 80286 S1, S0, COD/INTA, READY, HLDA, and CLK pins to monitor the execution of ESC instructions (numeric instructions) by the 80286. As shown in figure 1-2, the 80287 NPX is divided internally into two processing elements; the Bus Interface Unit (BIU) and the Numeric Execution Unit (NEU). The two units operate independently of one another: the BIU receives and decodes instructions, requests operand transfers with memory, and executes processor control instructions, whereas the NEU processes individual numeric instructions. The BIU handles all of the status and signal lines between the 80287 and the 80286. The NEU executes all instructions that involve the register stack. These instructions include arithmetic, logical, transcendental, constant, and data transfer instructions. The data path in the NEU is 84 bits wide (68 fraction bits, 15 exponent bits, and a sign bit), allowing internal operand transfers to be performed at very high speeds. The 80287 executes a single numeric instruction at a time. Before executing most ESC instructions, the 80286 tests the BUSY pin and, before initiating the command, waits until the 80287 indicates that it is not busy. Once initiated, the 80286 continues program execution, while the 80287 executes the numeric instruction. Unlike the 8087, which required a WAIT instruction to test the BUSY signal before each ESC opcode, these WAIT instructions are permissible, but not necessary, in 80287 programs. In all cases, a WAIT or ESC instruction should be inserted after any 80287 store to memory (except FSTSW or FSTCW) or load from memory (except FLDENV, FLDCW, or FRSTOR) before the 80286 reads or changes the memory value. When needed, all data transfers between memory and the 80287 NPX are performed by the 80286 CPU, using its Processor Extension Data Channel. Numeric data transfers performed by the 80286 use the same timing as any other bus cycle, and all such transfers come under the supervision of the 80286 memory management and protection mechanisms. The 80286 Processor Extension Data Channel and the hardware interface between the 80286 and 80287 processors are described in Chapter Six of the 80286 Hardware Reference Manual. From the programmer's perspective, the 80287 can be considered just an extension of the 80286 processor. All interaction between the 80286 and the 80287 processors on the hardware level is handled automatically by the 80286 and is transparent to the software. To communicate with the 80287, the 80286 uses the reserved I/O port addresses 00F8H, 00FAH, and 00FCH (I/O ports numbered 00F8H through 00FFH are reserved for the 80286/80287 interface). These I/O operations are performed automatically by the 80286 and are distinct from I/O operations that result from program I/O instructions. I/O operations resulting from the execution of ESC instructions are completely transparent to software. Any program may execute ESCAPE (numeric) instructions, without regard to its current I/O Privilege Level (IOPL). To guarantee correct operation of the 80287, programs must not perform any explicit I/O operations to any of the eight ports reserved for the 80287. The IOPL of the 80286 can be used to protect the integrity of 80287 computations in multiuser reprogrammable applications, preventing any accidental or other tampering with the 80287 (see Chapter Eight of the 80286 Operating System Writer's Guide). Figure 1-2. 80287 NPX Block Diagram ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┬─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ BUS INTERFACE UNIT NUMERIC EXECUTION UNIT │ │ │ ┌──────────────┐ EXPONENT BUS ▲ ▲ FRACTION BUS │ │ CONTROL WORD │ │ ┌──────────┐ ║ ║ ┌─────────────┐ │ ├──────────────┤ │ EXPONENT │◄══►║INTER-║◄═══►//PROGRAMMABLE// │ │ STATUS WORD │ │ │ MODULE │ ║ FACE ║ // SHIFTER // │ └─────▲────────┘ └──────────┘ ║◄═/══►║ └───────▲─────┘ │ ║ │ ┌────────────┐ ║ 16 ║◄════════════╝ │ ║ NEU │ MICROCODE │ 16 / ║◄═════╗ ┌────────────┐ │ ┌───▼────┐INSTRUCTION│CONTROL UNIT│ ║ / 68 ╚►│ ARITHMETIC │ │ │ ╞══════════►└────────────┘╔════╣ ║◄══════╗│ MODULE │ │ │ DATA │ │ ╔═══════╝ 16 / / 64 ║└─────▲──────┘ │ Data◄══════►│ BUFFER │ ║ ┌──┐ ║ ║◄════╗ ╚══════╝ │ │ │ ┌───│──────┐ ║ │ ┌▼┐┌─────▼──────▼┐ ║ ┌───────────┐ │ │ │◄══►│ OPERANDS │◄╝ │ │T││ │(7) ╚══►│ TEMPORARY │ │ └───▲────┘ │ QUEUE │ │ │A││ │ · │ REGISTERS │ │ ║◄─────┐ └───│──────┘ │ │G││ │ · └───────────┘ │ ┌─────▼───┐ │ │ │ ││ REGISTER │ · │ Status◄═══► CONTROL │ └──────│──────────┘ │W││ STACK │ │ │ │ │O││ │ · │ Address◄══► UNIT │ │ │R││ │ │ └─────────┘ │D││◄─ 80 BITS ─►│(0) │ │ └─┘└─────────────┘ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┴─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ 80287 Numeric Processor Architecture To the programmer, the 80287 NPX appears as a set of additional registers complementing those of the 80286. These additional registers consist of ■ Eight individually-addressable 80-bit numeric registers, organized as a register stack ■ Three sixteen-bit registers containing: an NPX status word an NPX control word a tag word ■ Four 16-bit registers containing the NPX instruction and data pointers All of the NPX numeric instructions focus on the contents of these NPX registers. The NPX Register Stack The 80287 register stack is shown in figure 1-3. Each of the eight numeric registers in the 80287's register stack is 80 bits wide and is divided into fields corresponding to the NPX's temporary-real data type. Numeric instructions address the data registers relative to the register on the top of the stack. At any point in time, this top-of-stack register is indicated by the ST (Stack Top) field in the NPX status word. Load or push operations decrement ST by one and load a value into the new top register. A store-and-pop operation stores the value from the current ST register and then increments ST by one. Like 80286 stacks in memory, the 80287 register stack grows down toward lower-addressed registers. Many numeric instructions have several addressing modes that permit the programmer to implicitly operate on the top of the stack, or to explicitly operate on specific registers relative to the ST. The ASM286 Assembler supports these register addressing modes, using the expression ST(0), or simply ST, to represent the current Stack Top and ST(i) to specify the ith register from ST in the stack (0 ≤ i ≤ 7). For example, if ST contains 011B (register 3 is the top of the stack), the following statement would add the contents of the top two registers on the stack (registers 3 and 5): FADD ST, ST(2) The stack organization and top-relative addressing of the numeric registers simplify subroutine programming by allowing routines to pass parameters on the register stack. By using the stack to pass parameters rather than using "dedicated" registers, calling routines gain more flexibility in how they use the stack. As long as the stack is not full, each routine simply loads the parameters onto the stack before calling a particular subroutine to perform a numeric calculation. The subroutine then addresses its parameters as ST, ST(1), etc., even though ST may, for example, refer to physical register 3 in one invocation and physical register 5 in another. Figure 1-3. 80287 Register Set 80287 STACK: TAG FIELD 79 78 64 63 0 1 0 R1 ╔════╤════════╤══════════════════════════════════════════════╗ ╔══════╗ ║SIGN│EXPONENT│ SIGNIFICAND ║ ║ ║ R2 ╟────┼────────┼──────────────────────────────────────────────╢ ╟──────╢ R3 ╟────┼────────┼──────────────────────────────────────────────╢ ╟──────╢ R4 ╟────┼────────┼──────────────────────────────────────────────╢ ╟──────╢ R5 ╟────┼────────┼──────────────────────────────────────────────╢ ╟──────╢ R6 ╟────┼────────┼──────────────────────────────────────────────╢ ╟──────╢ R7 ╟────┼────────┼──────────────────────────────────────────────╢ ╟──────╢ R8 ╟────┼────────┼──────────────────────────────────────────────╢ ╟──────╢ ╚════╧════════╧══════════════════════════════════════════════╝ ╚══════╝ 15 0 ╔═════════════════════╗ ║ CONTROL REGISTER ║ ╟─────────────────────╢ ║ STATUS REGISTER ║ ╟─────────────────────╢ ║ TAG WORD ║ ╟─────────────────────╢ ║ ║ ╟─INSTRUCTION POINTER─╢ ║ ║ ╟─────────────────────╢ ║ ║ ╟─ DATA POINTER ─╢ ║ ║ ╚═════════════════════╝ The NPX Status Word The 16-bit status word shown in figure 1-4 reflects the overall state of the 80287. This status word may be stored into memory using the FSTSW/FNSTSW, FSTENV/FNSTENV, and FSAVE/FNSAVE instructions, and can be transferred into the 80286 AX register with the FSTSW AX/FNSTSW AX instructions, allowing the NPX status to be inspected by the CPU. The Busy bit (bit 15) and the BUSY pin indicate whether the 80287's execution unit is idle (B = 0) or is executing a numeric instruction or signalling an exception (B = 1). (The instructions FNSTSW, FNSTSW AX, FNSTENV, and FNSAVE do not set the Busy bit themselves, nor do they require the Busy bit to be clear in order to execute.) The four NPX condition code bits (C{0}-C{3}) are similar to the flags in a CPU: the 80287 updates these bits to reflect the outcome of arithmetic operations. The effect of these instructions on the condition code bits is summarized in table 1-4. These condition code bits are used principally for conditional branching. The FSTSWAX instruction stores the NPX status word directly into the CPU AX register, allowing these condition codes to be inspected efficiently by 80286 code. Bits 12-14 of the status word point to the 80287 register that is the current Stack Top (ST). The significance of the stack top has been described in the section on the Register Stack. Figure 1-4 shows the six error flags in bits 0-5 of the status word. Bit 7 is the error summary status (ES) bit. ES is set if any unmasked exception bits are set, and is cleared otherwise. If this bit is set, the ERROR signal is asserted. Bits 0-5 indicate whether the NPX has detected one of six possible exception conditions since these status bits were last cleared or reset. Table 1-4. Interpreting the NPX Condition Codes Instruction Type C{3} C{2} C{1} C{0} Interpretation Compare, Test 0 0 X X = value is not affected by instruction 0 ST ST = Top of stack > Source or 0 (FTST) 0 0 X X = value is not affected by instruction 1 ST ST = Top of stack < Source or 0 (FTST) 1 0 X X = value is not affected by instruction 0 ST ST = Top of stack = Source or 0 (FTST) 1 1 X X = value is not affected by instruction 1 ST ST = Top of stack is not comparable Remainder Q{1} Q{n} = Quotient bit n following complete reduction (C{2}=0) 0 Q{0} Q{n} = Quotient bit n following complete reduction (C{2}=0) Q{2} Q{n} = Quotient bit n following complete reduction (C{2}=0) Complete reduction with three low bits of quotient in C{0}, C{3}, and C{1} U U = value is undefined following instruction 1 U U = value is undefined following instruction U U = value is undefined following instruction Incomplete Reduction Examine 0 0 0 0 Valid, positive unnormalized 0 0 0 1 Invalid, positive, exponent = 0 0 0 1 0 Valid, negative, unnormalized 0 0 1 1 Invalid, negative, exponent = 0 0 1 0 0 Valid, positive, normalized 0 1 0 1 Infinity, positive 0 1 1 0 Valid, negative, normalized 0 1 1 1 Infinity, negative 1 0 0 0 Zero, positive 1 0 0 1 Empty Register 1 0 1 0 Zero, negative 1 0 1 1 Empty Register 1 1 0 0 Invalid, positive, exponent = 0 1 1 0 1 Empty Register 1 1 1 0 Invalid, negative, exponent = 0 1 1 1 1 Empty Register Figure 1-4. 80287 Status Word 15 0 ╔═╤══╤═══╤══╤══╤══╤══╤═╤══╤══╤══╤══╤══╤══╗ EXCEPTION FLAGS ║B│C3│S T│C2│C1│C0│ES│X│PE│UE│OE│ZE│DE│IE║ (1 = EXCEPTION ╚╤╧═╤╧╤╤╤╧═╤╧═╤╧═╤╧═╤╧╤╧═╤╧═╤╧═╤╧═╤╧═╤╧═╤╝ HAS OCCURRED) │ │ │││ │ │ │ │ │ │ │ │ │ │ └─────────── INVALID OPERATION For definitions, see the section on exception handling. │ │ │││ │ │ │ │ │ │ │ │ │ └────────────── DENORMALIZED OPERAND For definitions, see the section on exception handling. │ │ │││ │ │ │ │ │ │ │ │ └───────────────── ZERO DIVIDE For definitions, see the section on exception handling. │ │ │││ │ │ │ │ │ │ │ └──────────────────── OVERFLOW For definitions, see the section on exception handling. │ │ │││ │ │ │ │ │ │ └─────────────────────── UNDERFLOW For definitions, see the section on exception handling. │ │ │││ │ │ │ │ │ └────────────────────────── PRECISION For definitions, see the section on exception handling. │ │ │││ │ │ │ │ └──────────────────────── (RESERVED) │ │ │││ │ │ │ └────────────────────────── ERROR SUMMARY STATUS ES is set if any unmasked exception bit is set, cleared or otherwise. │ └─│││──┴──┴──┴───────────────────────────── CONDITION CODE See Table 1-4 for condition code interpretation. │ └┴┴────────────────────────────────────── STACK TOP POINTER S T VALUES: 000 = REGISTER 0 IS TOP OF STACK 001 = REGISTER 1 IS TOP OF STACK ∙ ∙ ∙ ∙ 111 = REGISTER 7 IS TOP OF STACK └────────────────────────────────────────────── NEU BUSY Control Word The NPX provides the programmer with several processing options, which are selected by loading a word from memory into the control word. Figure 1-5 shows the format and encoding of the fields in the control word. The low-order byte of this control word configures the 80287 error and exception masking. Bits 0-5 of the control word contain individual masks for each of the six exception conditions recognized by the 80287. The high-order byte of the control word configures the 80287 processing options, including ■ Precision control ■ Rounding control ■ Infinity control The Precision control bits (bits 8-9) can be used to set the 80287 internal operating precision at less than the default precision (64-bit significand). These control bits can be used to provide compatibility with the earlier-generation arithmetic processors having less precision than the 80287, as required by the IEEE 754 standard. Setting a lower precision, however, will not affect the execution time of numeric calculations. The rounding control bits (bits 10-11) provide for directed rounding and true chop as well as the unbiased round-to-nearest-even mode specified in the IEEE 754 standard. The infinity control bit (bit 12) determines the manner in which the 80287 treats the special values of infinity. Either affine closure (where positive infinity is distinct from negative infinity) or projective closure (infinity is treated as a single unsigned quantity) may be specified. These two alternative views of infinity are discussed in the section on Computation Fundamentals. Figure 1-5. 80287 Control Word Format ╔═════╤═══╤═══╤═══╤═╤═╤══╤══╤══╤══╤══╤══╗ ║X X X│I C│R C│P C│X│X│PM│UM│OM│ZM│DM│IM║ EXCEPTION MASKS ╚╤═╤═╤╧═╤═╧╤═╤╧╤═╤╧╤╧╤╧═╤╧═╤╧═╤╧═╤╧═╤╧═╤╝ (1 = EXCEPTION IS MASKED) │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └──── INVALID OPERATION │ │ │ │ │ │ │ │ │ │ │ │ │ │ └─────── DENORMALIZED OPERAND │ │ │ │ │ │ │ │ │ │ │ │ │ └────────── ZERO DIVIDE │ │ │ │ │ │ │ │ │ │ │ │ └───────────── OVERFLOW │ │ │ │ │ │ │ │ │ │ │ └──────────────── UNDERFLOW │ │ │ │ │ │ │ │ │ │ └─────────────────── PRECISION │ │ │ │ │ │ │ │ │ └────────────────────── (RESERVED) │ │ │ │ │ │ │ │ └──────────────────────── (RESERVED) │ │ │ │ │ │ └─┴────────────────────────── PRECISION CONTROL PRECISION CONTROL 00 = 24-BIT SIGNIFICAND 01 = RESERVED 10 = 53-BIT SIGNIFICAND 11 = 64-BIT SIGNIFICAND │ │ │ │ └─┴────────────────────────────── ROUNDING CONTROL ROUNDING CONTROL 00 = ROUND TO NEAREST OR EVEN 01 = ROUND DOWN (TOWARD -∞) 10 = ROUND UP (TOWARD +∞) 11 = CHOP (TRUNCATE TOWARD ZERO) │ │ │ └─────────────────────────────────── INFINITY CONTROL │ │ │ (0 = PROJECTIVE, 1 = AFFINE) └─┴─┴────────────────────────────────────── (RESERVED) The NPX Tag Word The tag word indicates the contents of each register in the register stack, as shown in figure 1-6. The tag word is used by the NPX itself in order to track its numeric registers and optimize performance. Programmers may use this tag information to interpret the contents of the numeric registers. The tag values are stored in the tag word corresponding to the physical registers 0-7. Programmers must use the current Stack Top (ST) pointer stored in the NPX status word to associate these tag values with the relative stack registers ST(0) through ST(7). Figure 1-6. 80287 Tag Word Format 15 0 ╔════════╤════════╤════════╤════════╤════════╤════════╤════════╤════════╗ ║ TAG(7) │ TAG(6) │ TAG(5) │ TAG(4) │ TAG(3) │ TAG(2) │ TAG(1) │ TAG(0) ║ ╚════════╧════════╧════════╧════════╧════════╧════════╧════════╧════════╝ TAG VALUES: 00 = VALID; 01 = ZERO; 10 = INVALID OR INFINITY; 11 = EMPTY The NPX Instruction and Data Pointers The NPX instruction and data registers provide support for programmed exception-handlers. Whenever the 80287 executes a math instruction, the NPX internally saves the instruction address, the operand address (if present), and the instruction opcode. The 80287 FSTENV and FSAVE instructions store this data into memory, allowing exception handlers to determine the precise nature of any numeric exceptions that may be encountered. When stored in memory, the instruction and data pointers appear in one of two formats, depending on the operating mode of the 80287. Figure 1-7 shows these pointers as they are stored following an FSTENV instruction. In Real-Address mode, these values are the 20-bit physical address and 11-bit opcode formatted like the 8087. In Protected mode, these values are the 32-bit virtual addresses used by the program that executed the ESC instruction. The instruction address saved in the 80287 will point to any prefixes that preceded the instruction. This is different from the 8087, for which the instruction address pointed only to the ESC instruction opcode. Figure 1-7. 80287 Instruction and Data Pointer Image in Memory MEMORY OFFSET ╔═══════════════════════════════╗ REAL MODE ║ CONTROL WORD ║ +0 ╟───────────────────────────────╢ ║ STATUS WORD ║ +2 ╟───────────────────────────────╢ ║ TAG WORD ║ +4 ╟───────────────────────────────╢ ║ INSTRUCTION POINTER(15-0) ║ +6 ╟───────────┬─┬─────────────────╢ ║INSTRUCTION│ │ INSTRUCTION ║ ║ POINTER │0│ OPCODE ║ +8 ║ (19-16) │ │ (10-0) ║ ╟───────────┴─┴─────────────────╢ ║ DATA POINTER(15-0) ║ +10 ╟───────────┬───────────────────╢ ║ DATA │ ║ ║ POINTER │ 0 ║ +12 ║ (19-16) │ ║ ╚═══════════╧═══════════════════╝ 15 12 11 0 MEMORY OFFSET ╔═══════════════════════════════╗ PROTECTED MODE ║ CONTROL WORD ║ +0 ╟───────────────────────────────╢ ║ STATUS WORD ║ +2 ╟───────────────────────────────╢ ║ TAG WORD ║ +4 ╟───────────────────────────────╢ ║ IP OFFSET ║ +6 ╟───────────────────────────────╢ ║ CS SELECTOR ║ +8 ╟───────────────────────────────╢ ║ DATA OPERAND OFFSET ║ +10 ╟───────────────────────────────╢ ║ DATA OPERAND SELECTOR ║ +12 ╚═══════════════════════════════╝ 15 0 Computation Fundamentals This section covers 80287 programming concepts that are common to all applications. It describes the 80287's internal number system and the various types of numbers that can be employed in NPX programs. The most commonly used options for rounding, precision, and infinity (selected by fields in the control word) are described, with exhaustive coverage of less frequently used facilities deferred to later sections. Exception conditions that may arise during execution of NPX instructions are also described along with the options that are available for responding to these exceptions. Number System The system of real numbers that people use for pencil and paper calculations is conceptually infinite and continuous. There is no upper or lower limit to the magnitude of the numbers one can employ in a calculation, or to the precision (number of significant digits) that the numbers can represent. When considering any real number, there is always an infinity of numbers both larger and smaller. There is also an infinity of numbers between (i.e., with more significant digits than) any two real numbers. For example, between 2.5 and 2.6 are 2.51, 2.5897, 2.500001, etc. While ideally it would be desirable for a computer to be able to operate on the entire real number system, in practice this is not possible. Computers, no matter how large, ultimately have fixed-size registers and memories that limit the system of numbers that can be accommodated. These limitations determine both the range and the precision of numbers. The result is a set of numbers that is finite and discrete, rather than infinite and continuous. This sequence is a subset of the real numbers that is designed to form a useful approximation of the real number system. Figure 1-8 superimposes the basic 80287 real number system on a real number line (decimal numbers are shown for clarity, although the 80287 actually represents numbers in binary). The dots indicate the subset of real numbers the 80287 can represent as data and final results of calculations. The 80287's range is approximately ±4.19*10^(-307) to ±1.67*10^(308). Applications that are required to deal with data and final results outside this range are rare. For reference, the range of the IBM 370 is about ±0.54*10^(-78) to ±0.72*10^(76). The finite spacing in figure 1-8 illustrates that the NPX can represent a great many, but not all, of the real numbers in its range. There is always a gap between two adjacent 80287 numbers, and it is possible for the result of a calculation to fall in this space. When this occurs, the NPX rounds the true result to a number that it can represent. Thus, a real number that requires more digits than the 80287 can accommodate (e.g., a 20-digit number) is represented with some loss of accuracy. Notice also that the 80287's representable numbers are not distributed evenly along the real number line. In fact, an equal number of representable numbers exists between successive powers of 2 (i.e., as many representable numbers exist between 2 and 4 as between 65,536 and 131,072). Therefore, the gaps between representable numbers are larger as the numbers increase in magnitude. All integers in the range ±2^(64) (approximately ±10^(18)), however, are exactly representable. In its internal operations, the 80287 actually employs a number system that is a substantial superset of that shown in figure 1-8. The internal format (called temporary real) extends the 80287's range to about ±3.4*10^(-4932) to ±1.2*10^(4932), and its precision to about 19 (equivalent decimal) digits. This format is designed to provide extra range and precision for constants and intermediate results, and is not normally intended for data or final results. From a practical standpoint, the 80287's set of real numbers is sufficiently large and dense so as not to limit the vast majority of microprocessor applications. Compared to most computers, including mainframes, the NPX provides a very good approximation of the real number system. It is important to remember, however, that it is not an exact representation, and that arithmetic on real numbers is inherently approximate. Conversely, and equally important, the 80287 does perform exact arithmetic on integer operands. That is, an operation on two integers returns an exact integral result, provided that the true result is an integer and is in range. For example, 4 ÷ 2 yields an exact integer, 1 ÷ 3 does not, and 2^(40) * 2^(30) + 1 does not, because the result requires greater than 64 bits of precision. Figure 1-8. 80287 Number System |◄───NEGATIVE RANGE (NORMALIZED)──►| | | | -5 -4 -3 -2 -1 | ┌───┬───┬─┐┌───┬───┬───┬───┬───┬───┐ │ │ │ ││░░░│░░░│▒▒▒│▒▒▒│▓▓▓│███│ └───┴───┴─┘└───┴───┴───┴───┴───┴───┘ ▲ ▲ │ │ ┌───────────────────────────────────┐ └-1.67*10^(308) -4.19*10^(-307)┘ │ ────────────┬────────────── │ │ ▓▓▓▓▓▓▓▓▓▓▓▓│▒▒▒▒▒▒▒▒▒▒▒▒▒▒ │ │ ▓▓▓▓▓▓▓▓▓▓▓▓│▒▒▒▒▒▒▒▒▒▒▒▒▒▒ │ |◄───POSITIVE RANGE (NORMALIZED)──►| │ ────•───────•───────•────── │ | | │ ▲ ▲ │ | 1 2 3 4 5 | │ │◄──┬──►│ │ ┌───┬───┬───┬───┬───┬───┐┌─┬───┬───┐ │ │ │ └ 2.00000000000000000 │ │███│▓▓▓│▒▒▒│▒▒▒│░░░│░░░││ │ │ │ │ │ │ │ └───┴───┴───┴───┴───┴───┘└─┴───┴───┘ │ │ └──── (NOT REPRESENTABLE) │ ▲ └─┬─┘ ▲ │ │ │ │ └────────┐ │ │ └──────── 1.99999999999999999 │ │ │ 1.67*10^(308)┘ │ PRECISION: │◄── 18 DIGITS ──►│ │ └4.19*10^(-307) └─────────────────────┴───────────────────────────────────┘ Data Types and Formats The 80287 recognizes seven numeric data types, divided into three classes: binary integers, packed decimal integers, and binary reals. A later section describes how these formats are stored in memory (the sign is always located in the highest-addressed byte). Figure 1-9 summarizes the format of each data type. In the figure, the most significant digits of all numbers (and fields within numbers) are the leftmost digits. Table 1-5 provides the range and number of signficant (decimal) digits that each format can accommodate. Table 1-5. Real Number Notation ┌───────────────────┬──────────────────────────────────────────────────────┐ │ Notation │ Value │ ├───────────────────┼──────────────────────────────────────────────────────┤ │Ordinary Decimal │ 178.125 │ ├───────────────────┼──────────────────────────────────────────────────────┤ │Scientific Decimal │ 1{▲}78125E2 │ ├───────────────────┼──────────────────────────────────────────────────────┤ │Scientific Binary │ 1{▲)0110010001E111 │ ├───────────────────┼──────────────────────────────────────────────────────┤ │Scientific Binary │ 1{▲}0110010001E10000110 │ │(Biased Exponent) │ │ ├───────────────────┼──────────────────────────────────────────────────────┤ │ │ Sign Biased Exponent Significand │ │ ├──────────────────────────────────────────────────────┤ │80287 Short Real │ │ │(Normalized) │ 0 10000110 ▲ 01100100010000000000000 │ │ │ └────1{▲}(implicit) │ └───────────────────┴──────────────────────────────────────────────────────┘ Figure 1-9. Data Formats ◄──────── INCREASING SIGNIFICANCE ╔═╤═════════╗ WORD ║S│MAGNITUDE║ (TWO'S INTEGER ╚═╧═════════╝ COMPLEMENT) 15 0 ╔═╤══════════════════════╗ SHORT ║S│ MAGNITUDE ║ (TWO'S INTEGER ╚═╧══════════════════════╝ COMPLEMENT) 31 0 ╔═╤════════════════════════════════════════════════╗ LONG ║S│ MAGNITUDE ║ (TWO'S INTEGER ╚═╧════════════════════════════════════════════════╝ COMPLEMENT) 63 0 ╔═╤═══╤════════════════════════MAGNITUDE════════════════════════╗ PACKED ║S│ X │d17 d16 • • • d11 d10 d9 d8 d7 d6 d5 d4 d3 d2 d1 d0║ DECIMAL ╚═╧═══╧═══╧═══╧═══════════╧═══╧═══╧══╧══╧══╧══╧══╧══╧══╧══╧══╧══╝ 79 72 0 ╔═╤════════╤═════════════╗ SHORT ║S│ BIASED │ SIGNIFICAND ║ REAL ║ │EXPONENT│ ║ ╚═╧═══════ Position of implicit binary point. Integer bit of significand; stored in temporary real, implicit (always 1) in short and long real ════════════╝ 31 22 0 ╔═╤════════════╤══════════════════════════════════╗ LONG ║S│ BIASED │ SIGNIFICAND ║ REAL ║ │ EXPONENT │ ║ ╚═╧═══════════ Position of implicit binary point. Integer bit of significand; stored in temporary real, implicit (always 1) in short and long real ═════════════════════════════════╝ 63 51 0 ╔═╤═════════════╤═╤═════════════════════════════════════════════╗ TEMPORARY ║S│ BIASED │1│ SIGNIFICAND ║ REAL ║ │ EXPONENT │ │ ║ ╚═╧═════════════╧ Position of implicit binary point ════════════════════════════════════════════╝ 79 64 63 0 NOTES: S = Sign bit (0 = positive, 1 = negative) dn = Decimal digit (two per byte) X = Bits have no significance; 80287 ignores when loading, zeros when storing. Exponent Bias (normal values): Short Real: 127 (7FH) Long Real: 1023 (3FFH) Temporary Real: 16383 (3FFFH) Binary Integers The three binary integer formats are identical except for length, which governs the range that can be accommodated in each format. The leftmost bit is interpreted as the number's sign: 0 = positive and 1 = negative. Negative numbers are represented in standard two's complement notation (the binary integers are the only 80287 format to use two's complement). The quantity zero is represented with a positive sign (all bits are 0). The 80287 word integer format is identical to the 16-bit signed integer data type of the 80286. Decimal Integers Decimal integers are stored in packed decimal notation, with two decimal digits "packed" into each byte, except the leftmost byte, which carries the sign bit (0 = positive, 1 = negative). Negative numbers are not stored in two's complement form and are distinguished from positive numbers only by the sign bit. The most significant digit of the number is the leftmost digit. All digits must be in the range 0H-9H. Real Numbers The 80287 stores real numbers in a three-field binary format that resembles scientific, or exponential, notation. The number's significant digits are held in the significand field, the exponent field locates the binary point within the significant digits (and therefore determines the number's magnitude), and the sign field indicates whether the number is positive or negative. (The exponent and significand are analogous to the terms "characteristic" and "mantissa" used to describe floating point numbers on some computers.) Negative numbers differ from positive numbers only in the sign bits of their significands. Table 1-5 shows how the real number 178.125 (decimal) is stored in the 80287 short real format. The table lists a progression of equivalent notations that express the same value to show how a number can be converted from one form to another. The ASM286 and PL/M-286 language translators perform a similar process when they encounter programmer-defined real number constants. Note that not every decimal fraction has an exact binary equivalent. The decimal number 1/10, for example, cannot be expressed exactly in binary (just as the number 1/3 cannot be expressed exactly in decimal). When a translator encounters such a value, it produces a rounded binary approximation of the decimal value. The NPX usually carries the digits of the significand in normalized form. This means that, except for the value zero, the significand is an integer and a fraction as follows: 1{▲}fff...ff where ▲ indicates an assumed binary point. The number of fraction bits varies according to the real format: 23 for short, 52 for long, and 63 for temporary real. By normalizing real numbers so that their integer bit is always a 1, the 80287 eliminates leading zeros in small values (│X│ < 1). This technique maximizes the number of significant digits that can be accommodated in a significand of a given width. Note that, in the short and long real formats, the integer bit is implicit and is not actually stored; the integer bit is physically present in the temporary real format only. If one were to examine only the signficand with its assumed binary point, all normalized real numbers would have values between 1 and 2. The exponent field locates the actual binary point in the significant digits. Just as in decimal scientific notation, a positive exponent has the effect of moving the binary point to the right, and a negative exponent effectively moves the binary point to the left, inserting leading zeros as necessary. An unbiased exponent of zero indicates that the position of the assumed binary point is also the position of the actual binary point. The exponent field, then, determines a real number's magnitude. In order to simplify comparing real numbers (e.g., for sorting), the 80287 stores exponents in a biased form. This means that a constant is added to the true exponent described above. The value of this bias is different for each real format (see figure 1-9). It has been chosen so as to force the biased exponent to be a positive value. This allows two real numbers (of the same format and sign) to be compared as if they are unsigned binary integers. That is, when comparing them bitwise from left to right (beginning with the leftmost exponent bit), the first bit position that differs orders the numbers; there is no need to proceed further with the comparison. A number's true exponent can be determined simply by subtracting the bias value of its format. The short and long real formats exist in memory only. If a number in one of these formats is loaded into an 80287 register, it is automatically converted to temporary real, the format used for all internal operations. Likewise, data in registers can be converted to short or long real for storage in memory. The temporary real format may be used in memory also, typically to store intermediate results that cannot be held in registers. Most applications should use the long real form to store real number data and results; it provides sufficient range and precision to return correct results with a minimum of programmer attention. The short real format is appropriate for applications that are constrained by memory, but it should be recognized that this format provides a smaller margin of safety. It is also useful for debugging algorithms, because roundoff problems will manifest themselves more quickly in this format. The temporary real format should normally be reserved for holding intermediate results, loop accumulations, and constants. Its extra length is designed to shield final results from the effects of rounding and overflow/underflow in intermediate calculations. However, the range and precision of the long real form are adequate for most microcomputer applications. Rounding Control Internally, the 80287 employs three extra bits (guard, round, and sticky bits) that enable it to represent the infinitely precise true result of a computation; these bits are not accessible to programmers. Whenever the destination can represent the infinitely precise true result, the 80287 delivers it. Rounding occurs in arithmetic and store operations when the format of the destination cannot exactly represent the infinitely precise true result. For example, a real number may be rounded if it is stored in a shorter real format, or in an integer format. Or, the infinitely precise true result may be rounded when it is returned to a register. The NPX has four rounding modes, selectable by the RC field in the control word (see figure 1-5). Given a true result b that cannot be represented by the target data type, the 80287 determines the two representable numbers a and c that most closely bracket b in value (a < b < c). The processor then rounds (changes) b to a or to c according to the mode selected by the RC field as shown in table 1-6. Round introduces an error in a result that is less than one unit in the last place to which the result is rounded. "Round to nearest" is the default mode and is suitable for most applications; it provides the most accurate and statistically unbiased estimate of the true result. The chop mode is provided for integer arithmetic applications. "Round up" and "round down" are termed directed rounding and can be used to implement interval arithmetic. Interval arithmetic generates a certifiable result independent of the occurrence of rounding and other errors. The upper and lower bounds of an interval may be computed by executing an algorithm twice, rounding up in one pass and down in the other. Table 1-6. Rounding Modes ┌────────┬────────────────────────┬────────────────────────────────────────┐ │RC Field│ Rounding Mode │ Rounding Action │ ├────────┼────────────────────────┼────────────────────────────────────────┤ │ 00 │ Round to nearest │ Closer to b of a or c; if equally │ │ │ │ close, select even number (the one │ │ │ │ whose least significant bit is zero). │ │ │ │ │ │ 01 │ Round down (toward -∞) │ a │ │ │ │ │ │ 10 │ Round up (toward +∞) │ c │ │ │ │ │ │ 11 │ Chop (toward 0) │ Smaller in magnitude of a or c │ └────────┴────────────────────────┴────────────────────────────────────────┘ ─────────────────────────────────────────────────────────────────────────── NOTE a < b < c; a and c are representable, b is not. ─────────────────────────────────────────────────────────────────────────── Precision Control The 80287 allows results to be calculated with either 64, 53, or 24 bits of precision in the significand as selected by the precision control (PC) field of the control word. The default setting, and the one that is best suited for most applications, is the full 64 bits of significance provided by the temporary-real format. The other settings are required by the proposed IEEE standard, and are provided to obtain compatibility with the specifications of certain existing programming languages. Specifying less precision nullifies the advantages of the temporary real format's extended fraction length, and does not increase execution speed. When reduced precision is specified, the rounding of the fractional value clears the unused bits on the right to zeros. Infinity Control The 80287's system of real numbers may be closed by either of two models of infinity. These two means of closing the number system, projective and affine closure, are illustrated schematically in figure 1-10. The setting of the IC field in the control word selects one model or the other. The default means of closure is projective, and this is recommended for most computations. When projective closure is selected, the NPX treats the special values +∞ and -∞ as a single unsigned infinity (similar to its treatment of signed zeros). In the affine mode the NPX respects the signs of +∞ and -∞. While affine mode may provide more information than projective, there are occasions when the sign may in fact represent misinformation. For example, consider an algorithm that yields an intermediate result x of +0 and -0 (the same numeric value) in different executions. If 1/x were then computed in affine mode, two entirely different values (+∞ and -∞) would result from numerically identical values of x. Projective mode, on the other hand, provides less information but never returns misinformation. In general, then, projective mode should be used globally, with affine mode reserved for local computations where the programmer can take advantage of the sign and knows for certain that the nature of the computations will not produce a misleading result. Figure 1-10. Projective versus Affine Closure PROJECTIVE CLOSURE AFFINE CLOSURE ∞ ┌────►•◄────┐ │ │ - │ │ + -∞ - + +∞ │ │ •◄──────────┼──────────►• └─────┼─────┘ 0 0 Special Computational Situations Besides being able to represent positive and negative numbers, the 80287 data formats may be used to describe other entities. These special values provide extra flexibility, but most users will not need to understand them in order to use the 80287 successfully. This section describes the special values that may occur in certain cases and the significance of each. The 80286 exceptions are also described, for writers of exception handlers and for those interested in probing the limits of computation using the 80287. The material presented in this section is mainly of interest to programmers concerned with writing exception handlers. For many readers, this section can be browsed lightly. Special Numeric Values The 80287 data formats encompass encodings for a variety of special values in addition to the typical real or integer data values that result from normal calculations. These special values have significance and can express relevant information about the computations or operations that produced them. The various types of special values are ■ Non-normal real numbers, including denormals unnormals ■ Zeros and pseudo zeros ■ Positive and negative infinity ■ NaN (Not-a-Number) ■ Indefinite The following description explains the origins and significance of each of these special values. Tables 1-12 through 1-15 at the end of this section show how each of these special values is encoded for each of the numeric data types. Nonnormal Real Numbers As described previously, the 80287 generally stores nonzero real numbers in normalized floating-point form; that is, the integer (leading) bit of the significand is always a 1. This bit is explicitly stored in the temporary real format, and is implicitly assumed to be a one (1{▲}) in the short- and long-real formats. Since leading zeros are eliminated, normalized storage allows the maximum number of significant digits to be held in a significand of a given width. When a floating-point numeric value becomes very close to zero, normalized storage cannot be used to express the value accurately. To accommodate these instances, the 80287 can store and operate on reals that are not normalized, i.e., whose significands contain one or more leading zeros. Nonnormals typically arise when the result of a calculation yields a value that is too small to be represented in normal form. Nonnormal values can exist in one of two forms: ■ The floating-point exponent may be stored at its most negative value (a Denormal), ■ The integer bit (and perhaps other leading bits) of the significand may be zero (an Unnormal). The leading zeros of nonnormals permit smaller numbers to be represented, at the cost of some lost precision (the number of significant bits is reduced by the leading zeros). In typical algorithms, extremely small values are most likely to be generated as intermediate, rather than final results. By using the NPX's temporary real format for holding intermediate, values as small as ±3.4*10^(-4932) can be represented; this makes the occurrence of nonnormal numbers a rare phenomenon in 80287 applications. Nevertheless, the NPX can load, store, and operate on nonnormalized real numbers when they do occur. Denormals and Gradual Underflow A denormal is the result of the NPX's response to an underflow exception when that exception has been masked by the programmer (see the 80287 control word, figure 1-5). Underflow occurs when the absolute value of a real number becomes too small to be represented in the destination format, that is, when the exponent of the true result is too negative to be represented in the destination format. For example, a true exponent of -130 will cause underflow if the destination is short real, because -126 is the smallest exponent this format can accommodate. No underflow would occur if the destination were long real or temporary real, since these formats can handle exponents down to -1023 and -16,383, respectively. Most computers underflow "abruptly:" they simply return a zero result, which is likely to produce an unacceptable final result if computation continues. The 80287, on the other hand, underflows "gradually" when the underflow exception is masked. Gradual underflow is accomplished by denormalizing the result until it is just within the exponent range of the destination format. Denormalizing means incrementing the true result's exponent and inserting a corresponding leading zero in the significand, shifting the rest of the significand one place to the right. Denormal values may occur in any of the short-real, long-real, or temporary-real formats. Table 1-7 illustrates how a result might be denormalized to fit a short-real destination. The intent of the 80287's masked response to underflow is to allow computation to continue without program intervention, while introducing an error that carries about the same risk of contaminating the final result as roundoff error. Roundoff (precision) errors occur frequently in real number calculations; sometimes they spoil the result of computation, but often they do not. Recognizing that roundoff errors are often nonfatal, computation usually proceeds, and the programmer inspects the final results to see if these errors have had a significant effect. The 80287's masked underflow response allows programmers to treat underflows in a similar manner; the computation continues and the programmer can examine the final result to determine if an underflow has had important consequences. (If the underflow has had a significant effect, an invalid operation will probably be signalled later in the computation.) Denormalization produces a denormal or a zero. Denormals are readily identified by their exponents, which are always the minimum for their formats; in biased form, this is always the bit string: 00...00. This same exponent value is also assigned to the zeros, but a denormal has a nonzero significand. A denormal in a register is tagged special. Tables 1-14 and 1-15 later in this chapter show how denormal values are encoded in each of the real data formats. The denormalization process may cause the loss of low-order significand bits as they are shifted off the right. In a severe case, all the significand bits of the true result are shifted out and replaced by the leading zeros. In this case, the result of denormalization is a true zero, and if the value is in a register, it is tagged as such. However, this is a comparatively rare occurrence and, in any case, is no worse than "abrupt" underflow. Denormals are rarely encountered in most applications. Typical debugged algorithms generate extremely small results during the evaluation of intermediate subexpressions; the final result is usually of an appropriate magnitude for its short or long real destination. If intermediate results are held in temporary real, as is recommended, the great range of this format makes underflow very unlikely. Denormals are likely to arise only when an application generates a great many intermediates, so many that they cannot be held on the register stack or in temporary real memory variables. If storage limitations force the use of short or long reals for intermediates, and small values are produced, underflow may occur, and, if masked, may generate denormals. Accessing a denormal may produce an exception as shown in table 1-8. (The denormalized exception signals that a denormal has been fetched.) Denormals may have reduced significance due to lost low-order bits, and an option of the proposed IEEE standard precludes operations on nonnormalized operands. This option may be implemented in the form of an exception handler that responds to unmasked denormalized exceptions. Most users will mask this exception so that computation may proceed; any loss of accuracy will be analyzed by the user when the final result is delivered. As table 1-8 shows, the division and remainder operations do not accept denormal divisors and raise the invalid operation exception. Recall also that the transcendental instructions require normalized operands and do not check for exceptions. In all other cases, the NPX converts denormals to unnormals, and the rules governing unnormal arithmetic then apply (unnormals are described in the following section). Table 1-7. Denormalization Process ┌──────────────────┬────────┬────────────┬───────────────────────────┐ │Operation │ Sign │ Exponent Expressed as unbiased, decimal number │ Significand │ │──────────────────┼────────┼────────────┼───────────────────────────┤ │True Result │ 0 │ -129 │ 1{▲}01011100...00 │ │Denormalize │ 0 │ -128 │ 0{▲}101011100...00 │ │Denormalize │ 0 │ -127 │ 0{▲}0101011100...00 │ │Denormalize │ 0 │ -126 │ 0{▲}00101011100...00 │ │Denormal Result Before storing, significand is rounded to 24 bits, integer bit is dropped, and exponent is biased by adding 126 │ 0 │ -126 │ 0{▲}00101011100...00 │ └──────────────────┴────────┴────────────┴───────────────────────────┘ Table 1-8. Exceptions Due to Denormal Operands ┌─────────────────────────────┬───────────┬──────────────────────────────┐ │Operation │ Exception │ Masked Response │ ├─────────────────────────────┼───────────┼──────────────────────────────┤ │FLD (short/long real) │ D │ Load as equivalent unnormal │ ├─────────────────────────────┼───────────┼──────────────────────────────┤ │Arithmetic (except following)│ D │ Convert (in a work area) │ │ │ │ denormal to equivalent │ │ │ │ unnormal and proceed │ ├─────────────────────────────┼───────────┼──────────────────────────────┤ │Compare and test │ D │ Convert (in a work area) │ │ │ │ denormal to equivalent │ │ │ │ unnormal and proceed │ ├─────────────────────────────┼───────────┼──────────────────────────────┤ │Division or FPREM with │ I │ Return real indefinite │ │denormal divisor │ │ │ └─────────────────────────────┴───────────┴──────────────────────────────┘ Unnormals──Descendents of Denormal Operands An unnormal is the result of a computation using denormal operands and is therefore the descendent of the 80287's masked underflow response. An unnormal may exist only in the temporary real format; it may have any exponent that a normal value may have (that is, in biased form any nonzero value), but it is distinguished from a normal by the integer bit of its significand, which is always 0. An unnormal in a register is tagged valid. Unnormals are distinct from denormals, which have an exponent of 00...00 in biased form. Unnormals allows arithmetic to continue following an underflow while still retaining their identity as numbers that may have reduced significance. That is, unnormal operands generate unnormal results, so long as their unnormality has a significant effect on the result. Unnormals are thus prevented from "masquerading" as normals, numbers that have full significance. On the other hand, if an unnormal has an insignificant effect on a calculation with a normal, the result will be normal. For example, adding a small unnormal to a large normal yields a normal result. The converse situation yields an unnormal. Table 1-9 shows how the instruction set deals with unnormal operands. Note that the unnormal may be the original operand or a temporary created by the 80287 from a denormal. Table 1-9. Unnormal Operands and Results Operation Result Addition/subtraction Normalization of operand with larger abosolute value determines normalization of result. Multiplication If either operand is unnormal, result is unormal. Division (unnormal dividend only) Result is unnormal. FPREM (unnormal dividend only) Result if normalized. Division/FPREM (unnormal Signal invalid operation. divisor) Compare/FTST Normalize as much as possible before making comparison. FRNDINT Normalize as much as possible before rounding. FSQRT Signal invalid operation. FST, FSTP (short/long real If value is above destination's underflow destination) boundary, then signal invalid operation; else signal underflow. FSTP (temporary real destination) Store as usual. FIST, FISTP, FBSTP Signal invalid operation. FLD Load as usual. FXCH Exchange as usual. Transcendental instructions Undefined; operands must be normal and are not checked. Zeros and Pseudo Zeros The value zero in the real and decimal integer formats may be signed either positive or negative, although the sign of a binary integer zero is always positive. For computational purposes, the value of zero always behaves identically, regardless of sign, and typically the fact that a zero may be signed is transparent to the programmer. If necessary, the FXAM instruction may be used to determine a zero's sign. The zeros discussed above are called true zeros; if one of them is loaded or generated in a register, the register is tagged zero. Table 1-10 lists the results of instructions executed with zero operands and also shows how a true zero may be created from nonzero operands. Only the temporary real format may contain a special class of values called pseudo zeros. A pseudo zero is an unnormal whose significand is all zeros, but whose (biased) exponent is nonzero (true zeros have a zero exponent). Neither is a pseudo zero's exponent all ones, since this encoding is reserved for infinities and NANs. A pseudo zero result will be produced if two unnormals, containing a total of more than 64 leading zero bits in their significands, are multiplied together. This is a remote possibility in most applications, but it can happen. Pseudo zero operands behave like unnormals, except in the following cases where they produce the same results as true zeros: ■ Compare and test instructions ■ FRNDINT (round to integer) ■ Division, where the dividend is either a true zero or a pseudo zero (the divisor is a pseudo zero) In addition and subtraction of a pseudo zero and a true zero or another pseudo zero, the pseudo zero(s) behaves like unnormals, except for the determination of the result's sign. The sign is determined as shown in table 1-10 for two true zero operands. Infinity The real formats support signed representations of infinities. These values are encoded with a biased exponent of all ones and a significand of 1{▲}00...00; if the infinity is in a register, it is tagged special. The significand distinguishes infinities from NANs, including real indefinite. A programmer may code an infinity, or it may be created by the NPX as its masked response to an overflow or a zero divide exception. Note that when rounding is up or down, the masked response may create the largest valid value representable in the destination rather than infinity. See table 1-11 for details. As operands, infinities behave somewhat differently depending on how the infinity control field in the control word is set (see table 1-12). When the projective model of infinity is selected, the infinities behave as a single unsigned representation; because of this, infinity cannot be compared with any value except infinity. In affine mode, the signs of the infinities are observed, and comparisons are possible. Table 1-10. Zero Operands and Results Operation/Operands Result Operation/Operands Result FLD, FBLD Arithmetic and compare operations with binary integers interpret the integer sign in the same manner. Division +0 +0 ±0 ÷ ±0 Invalid operation -0 -0 ±X ÷ ±0 Zerodivide FILD Arithmetic and compare operations with binary integers interpret the integer sign in the same manner. +0 ÷ +X, -0 ÷ -X +0 +0 +0 +0 ÷ -X, -0 ÷ +X -0 FST, FSTP -X ÷ -Y, +X ÷ +Y +0, underflow Very small X and very large Y may yield zero, after rounding of true result. NPX signals underflow to warn that zero has been yielded from nonzero operands. +0 +0 -X ÷ +Y, +X ÷ -Y -0, underflow Very small X and very large Y may yield zero, after rounding of true result. NPX signals underflow to warn that zero has been yielded from nonzero operands. -0 -0 +X Severe underflows in storing to short or long real may generate zeros +0 FPREM -X Severe underflows in storing to short or long real may generate zeros -0 ±0 rem ±0 Invalid operation FBSTP ±X rem ±0 Invalid operation +0 +0 +0 rem +X, +0 rem -X +0 -0 -0 -0 rem +X, -0 rem -X -0 FIST, FISTP +X rem +Y, +X rem -Y +0 When Y divides into X exactly +0 +0 -X rem -Y, -X rem +Y -0 When Y divides into X exactly -0 +0 +X Small values (│X│ < 1) stored into integers may round to zero +0 FSQRT -X Small values (│X│ < 1) stored into integers may round to zero +0 -0 -0 +0 +0 Addition +0 plus +0 +0 Compare -0 plus -0 -0 ±0: +X A < B +0 plus -0, -0 plus +0 *0 Sign is determined by round mode: * = + for nearest, up, or chop * = - for down ±0: ±0 A = B -X plus +X, +X plus -X *0 Sign is determined by round mode: * = + for nearest, up, or chop * = - for down ±0: -X A > B ±0 plus ±X, ±X plus ±0 ┼X ┼ = sign of X FTST Subtraction ±0 Zero +0 minus -0 +0 FCHS -0 minus +0 -0 +0 -0 +0 minus +0, -0 minus -0 *0 Sign is determined by round mode: * = + for nearest, up, or chop * = - for down -0 +0 +X minus +X, -X minus -X *0 Sign is determined by round mode: * = + for nearest, up, or chop * = - for down FABS ±0 minus ±X, ±X minus ±0 ┼X ┼ = sign of X ±0 +0 F2XM1 Multiplication +0 +0 +0 * +0, -0 * -0 +0 -0 -0 +0 * -0, -0 * +0 -0 FRNDINT +0 * +X, +X * +0 +0 +0 +0 +0 * -X, -X * +0 -0 -0 -0 -0 * +X, +X * -0 -0 FXTRACT -0 * -X, -X * -0 +0 +0 Both +0 +X * +Y, -X * -Y +0, underflow Very small values of X and Y may yield zeros, after rounding of true result. NPX signals underflow to warn that zero has been yielded by nonzero operands. -0 Both -0 +X * -Y, -X * +Y -0, underflow Very small values of X and Y may yield zeros, after rounding of true result. NPX signals underflow to warn that zero has been yielded by nonzero operands. NaN (Not a Number) A NaN (Not a Number) is a member of a class of special values that exist in the real formats only. A NaN has an exponent of 11..11B, may have either sign, and may have any significand except 1{▲}00..00B, which is assigned to the infinities. A NaN in a register is tagged special. The 80287 will generate the special NaN, real indefinite, as its masked response to an invalid operation exception. This NaN is signed negative; its significand is encoded 1{▲}100..00. All other NaNs represent programmer-created values. Whenever the NPX uses an operand that is a NaN, it signals an invalid operation exception in its status word. If this exception is masked in the 80287 control word, the 80287's masked exception response is to return the NaN as the operation result. If both operands of an instruction are NaNs, the result is the NaN with the larger absolute value. In this way, a NaN that enters a computation propagates through the computation and will eventually be delivered as the final result. Note, however, that the transcendental instructions do not check their operands, and a NaN will produce an undefined result. By unmasking the invalid operation exception, the programmer can use NaNs to trap to the exception handler. The generality of this approach and the large number of NaN values that are available provide the sophisticated programmer with a tool that can be applied to a variety of special situations. For example, a compiler could use NaNs as references to uninitialized (real) array elements. The compiler could preinitialize each array element with a NaN whose significand contained the index (relative position) of the element. If an application program attempted to access an element that it had not initialized, it would use the NaN placed there by the compiler. If the invalid operation exception were unmasked, an interrupt would occur, and the exception handler would be invoked. The exception handler could determine which element had been accessed, since the operand address field of the exception pointers would point to the NaN, and the NaN would contain the index number of the array element. NaNs could also be used to speed up debugging. In its early testing phase, a program often contains multiple errors. An exception handler could be written to save diagnostic information in memory whenever it was invoked. After storing the diagnostic data, it could supply a NaN as the result of the erroneous instruction, and that NaN could point to its associated diagnostic area in memory. The program would then continue, creating a different NaN for each error. When the program ended, the NaN results could be used to access the diagnostic data saved at the time the errors occurred. Many errors could thus be diagnosed and corrected in one test run. Table 1-11. Masked Overflow Response with Directed Rounding ┌───────────────────┬──────────┬───────────────────────────────────────────┐ │ True Result │ │ │ ├──────────────┬────┤ Rounding │ Result Delivered │ │Normalization │Sign│ Mode │ │ ├──────────────┼────┼──────────┼───────────────────────────────────────────┤ │Normal │ + │ Up │ +∞ │ │Normal │ + │ Down │ Largest finite positive number The largest valid representable reals are encoded: exponent: 11...10B significand: (1){▲}11...10B │ │Normal │ - │ Up │ Largest finite negative number The largest valid representable reals are encoded: exponent: 11...10B significand: (1){▲}11...10B │ │Normal │ - │ Down │ -∞ │ │Unnormal │ + │ Up │ +∞ │ │Unnormal │ - │ Down │ Largest exponent, result's significand The significand retains its identity as an unnormal; the true result is rounded as usual (effectively chopped toward 0 in this case). The exponent is encoded 11...10B. │ │Unnormal │ + │ Up │ Largest exponent, result's significand The significand retains its identity as an unnormal; the true result is rounded as usual (effectively chopped toward 0 in this case). The exponent is encoded 11...10B. │ │Unnormal │ - │ Down │ -∞ │ └──────────────┴────┴──────────┴───────────────────────────────────────────┘ Table 1-12. Infinity Operands and Results Key to symbols used in this table X = zero or nonzero operand Y = nonzero operand * = sign of original operand ┼ = sign is complement of original operand's sign Φ = sign is "exclusive or" original operand signs (+ if operands had same sign, - if operands had different signs) Operation Projective Result Affine Result Addition +∞ plus +∞ Invalid operation +∞ -∞ plus -∞ Invalid operation -∞ +∞ plus -∞ Invalid operation Invalid operation -∞ plus +∞ Invalid operation Invalid operation ±∞ plus ±X *∞ *∞ ±X plus ±∞ *∞ *∞ Subtraction +∞ minus -∞ Invalid operation +∞ -∞ minus +∞ Invalid operation -∞ +∞ minus +∞ Invalid operation Invalid operation -∞ minus -∞ Invalid operation Invalid operation ±∞ minus ±X *∞ *∞ ±X minus ±∞ ┼∞ ┼∞ Multiplication ±∞ * ±∞ Φ Φ ±∞ * ±Y Φ Φ ±0 * ±∞, ±∞ * ±0 Invalid operation Invalid operation Division ±∞ ÷ ±∞ Invalid operation Invalid operation ±∞ ÷ ±X Φ Φ ±X ÷ ±∞ Φ Φ FSQRT -∞ Invalid operation Invalid operation +∞ Invalid operation +∞ FPREM ±∞ rem ±∞ Invalid operation Invalid operation ±∞ rem ±X Invalid operation Invalid operation ±Y rem ±∞ *Y *Y ±0 rem ±∞ *0 *0 FRNDINT ±∞ *∞ *∞ FSCALE ±∞ scaled by ±∞ Invalid operation Invalid operation ±∞ scaled by ±X *∞ *∞ ±0 scaled by ±∞ *0 *0 ±Y scaled by ± Invalid operation Invalid operation FXTRACT ±∞ Invalid operation Invalid operation Compare ±∞: ±∞ A = B -∞ < +∞ ±∞: ±Y A ? B (and) invalid operation -∞ < Y < +∞ ±∞: ±0 A ? B (and) invalid operation -∞ < 0 < +∞ FTST ±∞ A ? B (and) invalid operation *∞ Indefinite For every 80287 numeric data type, one unique encoding is reserved for representing the special value indefinite. The 80287 produces this encoding as its response to a masked invalid-operation exception. In the case of reals, the indefinite value can be stored and loaded like any NaN, and it always retains its special identity; programmers are advised not to use this encoding for any other purpose. Packed decimal indefinite may be stored by the NPX in a FBSTP instruction; attempting to use this encoding in a FBLD instruction, however, will have an undefined result. In the binary integers, the same encoding may represent either indefinite or the largest negative number supported by the format (-2^(15), -2^(31), or -2^(63)). The 80287 will store this encoding as its masked response to an invalid operation, or when the value in a source register represents or rounds to the largest negative integer representable by the destination. In situations where its origin may be ambiguous, the invalid operation exception flag can be examined to see if the value was produced by an exception response. When this encoding is loaded, or used by an integer arithmetic or compare operation, it is always interpreted as a negative number; thus indefinite cannot be loaded from a packed decimal or binary integer. Encoding of Data Types Tables 1-13 through 1-16 show how each of the special values just described is encoded for each of the numeric data types. In these tables, the least-significant bits are shown to the right and are stored in the lowest memory addresses. The sign bit is always the left-most bit of the highest-addressed byte. Table 1-13. Binary Integer Encodings Class Sign Magnitude ┌──────────────────────────────────────────────────────────── │ (Largest) 0 11...11 │ ∙ ∙ Positives ∙ ∙ │ ∙ ∙ │ (Smallest) 0 00...01 └──────────────────────────────────────────────────────────── Zero 0 00...00 ┌──────────────────────────────────────────────────────────── │ (Smallest) 1 11...11 │ ∙ ∙ Negatives ∙ ∙ │ ∙ ∙ │ (Largest/Indefinite If this encoding is used as a source operand (as in an integer load or integer arithmetic instruction), the 80287 interprets it as the largest negative number representable in the format: -2^(15), -2^(31), or -2^(63). The 80287 will deliver this encoding to an integer destination in two cases: 1. If the result is the largest negative number 2. As the response to a masked invalid operation exception, in which case it represents the special value integer indefinite.) 1 00...00 └──────────────────────────────────────────────────────────── Word: ───15 bits─── Short: ───31 bits─── Long: ───63 bits─── Table 1-14. Packed Decimal Encodings ┌─────────────────── Magnitude ────────────────────────┐ Class Sign digit digit digit digit . . . digit ┌────────────────────────────────────────────────────────────────────────────────────────────── │ (Largest) 0 0000000 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 . . . 1 0 0 1 │ ∙ ∙ ∙ │ ∙ ∙ ∙ Positives ∙ ∙ ∙ │ (Smallest) 0 0000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . . 0 0 0 1 │ │ Zero 0 0000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0 └────────────────────────────────────────────────────────────────────────────────────────────── ┌────────────────────────────────────────────────────────────────────────────────────────────── │ Zero 1 0000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0 │ │ (Smallest) 1 0000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . . 0 0 0 1 Negatives ∙ ∙ ∙ │ ∙ ∙ ∙ │ ∙ ∙ ∙ │ (Largest) 1 0000000 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 . . . 1 0 0 1 └────────────────────────────────────────────────────────────────────────────────────────────── Indefinite The packed decimal indefinite encoding is stored by FBSTP in response to a masked invalid operation exception. Attempting to load this value via FBLD produces an undefined result. 1 1111111 1 1 1 1 1 1 1 1 U U U U UUUU means bit values are undefined and may contain any value U U U U . . . U U U U ──── 1 byte ─── ──────────────────────── 9 bytes ────────────────────── Table 1-15. Real and Long Real Encodings Biased Significand Integer bit is implied and not stored Class Sign Exponent {▲}ff...ff ┌────────────────────────────────────────────────────────────── │ NaNs 0 11...11 11...11 │ ∙ ∙ ∙ │ ∙ ∙ ∙ │ ∙ ∙ ∙ │ 0 11...11 00...01 │ ──────────────────────────────────────────────────────── │ ∞ 0 11...11 00...00 │ ┌─────────────────────────────────────────────────────── │ │ Normals 0 11...10 11...11 │ │ ∙ ∙ ∙ Positives │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ 0 00...01 00...00 │ │ ────────────────────────────────────────────── │ Reals Denormals 0 00...00 11...11 │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ 0 00...00 00...01 │ │ ────────────────────────────────────────────── │ │ Zero 0 00...00 00...00 └────────────────────────────────────────────────────────────── ┌────────────────────────────────────────────────────────────── │ │ Zero 1 00...00 00...00 │ │ ────────────────────────────────────────────── │ │ Denormals 1 00...00 00...01 │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ Reals ∙ ∙ ∙ │ │ 1 00...00 11...11 │ │ ────────────────────────────────────────────── │ │ Normals 1 00...01 00...00 │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ 1 11...10 11...11 Negatives └─────────────────────────────────────────────────────── │ ∞ 1 11...11 00...00 │ ┌─────────────────────────────────────────────────────── │ │ 1 11...11 00...01 │ │ ∙ ∙ ∙ │ NaNs ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ ────────────────────────────────────────────── │ │ Indefinite 1 11...11 10...00 │ └─────────────────────────────────────────────────────── │ ∙ ∙ ∙ │ ∙ ∙ ∙ │ ∙ ∙ ∙ │ 1 11...11 11...11 └────────────────────────────────────────────────────────────── Short: │ ───8 bits── │ ──23 bits── │ Long: │ ──11 bits── │ ──52 bits── │ Table 1-16. Temporary Real Encodings Biased Significand Integer bit is implied and not stored Class Sign Exponent 1{▲}ff...ff ┌───────────────────────────────────────────────────────── │ NaNs 0 11...11 111...11 │ ∙ ∙ ∙ │ ∙ ∙ ∙ │ ∙ ∙ ∙ │ 0 11...11 100...01 │ ──────────────────────────────────────────────────── │ ∞ 0 11...11 100...00 │ ┌──────────────────────────────────────────────────── │ │ 0 11...10 Normals │ │ ∙ ∙ 111...11 │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ ∙ ∙ 100...00 │ │ ∙ ∙ ────────────── Positives │ ∙ ∙ Unnormals │ │ ∙ ∙ 011...11 │ Reals ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ 0 00...01 000...00 │ │ ────────────── │ │ Denormals │ │ 0 00...00 011...11 │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ 0 00...00 000...01 │ ├─────────────────────────────────────────────────── │ │ Zero 0 00...00 000...00 └────────────────────────────────────────────────────────── ┌────────────────────────────────────────────────────────── │ │ Zero 1 00...00 000...00 │ ├─────────────────────────────────────────────────── │ │ Denormals │ │ 1 00...00 000...01 │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ 1 00...00 011...11 │ │ ────────────── │ │ 1 00...01 Unnormals │ │ ∙ ∙ 000...00 │ │ ∙ ∙ ∙ │ Reals ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ ∙ ∙ 011...11 │ │ ∙ ∙ │ │ ────────────── Negatives │ ∙ ∙ Normals │ │ ∙ ∙ 100...00 │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ 1 11...10 11111...11 │ └──────────────────────────────────────────────────── │ ∞ 1 11...11 100...00 │ ┌─────────────────────────────────────────────────── │ │ 1 11...11 100...00 │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ ────────────────────────────────────────── │ NaNs Indefinite 1 11...11 110...00 │ │ ────────────────────────────────────────── │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ ∙ ∙ ∙ │ │ 1 11...11 111...11 └──────┴─────────────────────────────────────────────────── │──15 bits──│──64 bits──│ Numeric Exceptions Whenever the 80287 NPX attempts a numeric operation with invalid operands or produces a result that cannot be represented, the 80287 recognizes a numeric exception condition. Altogether, the 80287 checks for the following six classes of exceptions while executing numeric instructions: 1. Invalid operation 2. Divide-by-zero 3. Denormalized operand 4. Numeric overflow 5. Numeric underflow 6. Inexact result (precision) Invalid Operation The 80287 reports an invalid operation if any of the following occurs: ■ An attempt to load a register that is not empty (stack overflow). ■ An attempt to pop an operand from an empty register (stack underflow). ■ An operand is a NaN. ■ The operands cause the operation to be indeterminate (square root of a negative number, 0/0). An invalid operation generally indicates a program error. Zero Divisor If an instruction attempts to divide a finite nonzero operand by zero, the 80287 will report a zero divide exception. Denormalized Operand If an instruction attempts to operate on a denormal, the NPX reports the denormalized operand exception. This exception allows users to implement in software an option of the proposed IEEE standard specifying that operands must be prenormalized before they are used. Numeric Overflow and Underflow If the exponent of a numeric result is too large for the destination real format, the 80287 signals a numeric overflow. Conversely, if the exponent of a result is too small to be represented in the destination format, a numeric underflow is signaled. If either of these exceptions occur, the result of the operation is outside the range of the destination real format. Typical algorithms are most likely to produce extremely large and small numbers in the calculation of intermediate, rather than final, results. Because of the great range of the temporary real format (recommended as the destination format for intermediates), overflow and underflow are relatively rare events in most 80287 applications. Inexact Result If the result of an operation is not exactly representable in the destination format, the 80287 rounds the number and reports the precision exception. For example, the fraction 1/3 cannot be precisely represented in binary form. This exception occurs frequently and indicates that some (generally acceptable) accuracy has been lost; it is provided for applications that need to perform exact arithmetic only. Handling Numeric Errors When numeric errors occur, the NPX takes one of two possible courses of action: ■ The NPX can itself handle the error, producing the most reasonable result and allowing numeric program execution to continue undisturbed. ■ A software exception handler can be invoked by the CPU to handle the error. Each of the six exception conditions described above has a corresponding flag bit in the 80287 status word and a mask bit in the 80287 control word. If an exception is masked (the corresponding mask bit in the control word = 1), the 80287 takes an appropriate default action and continues with the computation. If the exception is unmasked (mask = 0), the 80287 asserts the ERROR output to the 80286 to signal the exception and invoke a software exception handler. The NPX reports an exception by setting the corresponding flag in the NPX status word to 1. The NPX then checks the corresponding exception mask in the control word to determine if it should "field" the exception (mask = 1), or if it should signal the exception to the CPU to invoke a software exception handler (mask = 0). If the mask is set, the exception is said to be masked (from user software), and the NPX executes its on-chip masked response for that exception. If the mask is not set (mask = 0), the exception is unmasked, and the NPX performs its unmasked response. The masked response always produces a standard result, then proceeds with the instruction. The unmasked response always traps to a software exception handler, allowing the CPU to recognize and take action on the exception. Table 1-17 gives a complete description of all exception conditions and the NPX's masked response. Note that when exceptions are masked, the NPX may detect multiple exceptions in a single instruction, because it continues executing the instruction after performing its masked response. For example, the 80287 could detect a denormalized operand, perform its masked response to this exception, and then detect an underflow. Table 1-17. Exception Conditions and Masked Responses ─────────────────────────────────────────────────────────────────────────── Condition Masked Response ─────────────────────────────────────────────────────────────────────────── Invalid Operation ─────────────────────────────────────────────────────────────────────────── Source register is tagged empty Return real indefinite. (usually due to stack underflow). Destination register is not tagged Return real indefinite empty (usually due to stack (overwrite destination value). overflow). One or both operands is a NaN. Return NaN with larger absolute value (ignore signs). (Compare and test operations only): Set condition codes "not one or both operands is a NaN. comparable." (Addition operations only): closure Return real indefinite. is affine and operands are opposite-signed infinities; or closure is projective and both operands are ∞ (signs immaterial). (Subtraction operations only): Return real indefinite. closure is affine and operands are like-signed infinities; or closure is projective and both operands are ∞ (signs immaterial). (Multiplication operations only): Return real indefinite. ∞ * 0; or 0 * ∞. (Division operations only): Return real indefinite. ∞ ÷ ∞; or 0 ÷ 0; or 0 ÷ pseudo zero; or divisor is denormal or unormal. (FPREM instruction only): modulus Return real indefinite, set (divisor) is unnormal or denormal; condition code = "complete or dividend is ∞. remainder." (FSQRT instruction only): operand Return real indefinite. is nonzero and negative; or operand is denormal or unnormal; or closure is affine and operand is -∞; or closure is projective and operand is ∞. (Compare operations only): closure Set condition code = "not is projective and ∞ is being comparable." compared with 0, a normal or ∞. (FTST instruction only): closure is Set condition code = "not projective and operand is ∞. comparable." (FIST, FISTP instructions only): Store integer indefinite. source register is empty, a NaN, denormal, unnormal, ∞, or exceeds representable range of destination. (FBSTP instruction only): source Stored packed decimal register is empty, a NaN, denormal, indefinite. unnormal, ∞, or exceeds 18 decimal digits. (FST, FSTP instructions only): Store real indefinite. destination is short or long real and source register is an unnormal with exponent in range. (FXCH instruction only): one or Change empty register(s) to both registers is tagged empty. real indefinite and then perform exchange. ─────────────────────────────────────────────────────────────────────────── Condition Masked Response ─────────────────────────────────────────────────────────────────────────── Denormalized Operand ─────────────────────────────────────────────────────────────────────────── (FLD instruction only): source No special action; load as usual. operand is denormal. (Arithmetic operations only): one Convert (in a work area) the or both operands is denormal. operand to the equivalent unnormal and proceed. (Compare and test operations only): Convert (in a work area) any one or both operands is denormal denormal to the equivalent or unnormal other than pseudo unnormal; normalize as much as zero). possible, and proceed with operation. ─────────────────────────────────────────────────────────────────────────── Zero Divide ─────────────────────────────────────────────────────────────────────────── (Division operations only): Return ∞ signed with "exclusive or" divisor = 0. of operand signs. ─────────────────────────────────────────────────────────────────────────── Overflow ─────────────────────────────────────────────────────────────────────────── (Arithmetic operations only): Return properly signed ∞ and signal rounding is nearest or chop, and precision exception. exponent of true result > 16,383. (FST, FSTP instructions only): Return properly signed ∞ and signal rounding is nearest or chop, and precision exception. exponent of true result > +127 (short real destination) or > +1023 (long real destination). ─────────────────────────────────────────────────────────────────────────── Underflow ─────────────────────────────────────────────────────────────────────────── (Arithmetic operations only): Denormalize until exponent rises to exponent of true result < -16,382 -16,382 (true), round significand (true). to 64 bits. If denormalized rounded significand = 0, then return true 0; else, return denormal (tag = special, biased exponent = 0). (FST, FSTP instructions only): Denormalize until exponent rises to destination is short real and -126 (true), round significand to exponent of true result < -126 24 bits, store true 0 if (true). denormalized rounded significand = 0; else, store denormal (biased exponent = 0). (FST, FSTP instructions only): Denormalize until exponent rises to destination is long real and -1022 (true), round significand to exponent of true result < -1022 53 bits, store true 0 if rounded (true). denormalized significand = 0; else, store denormal (biased exponent = 0). ─────────────────────────────────────────────────────────────────────────── Precision ─────────────────────────────────────────────────────────────────────────── True rounding error occurs. No special action. Masked response to overflow No special action. exception earlier in instruction. Automatic Exception Handling As described in the previous section, when the 80287 NPX encounters an exception condition whose corresponding mask bit in the NPX control word is set, the NPX automatically performs an internal fix-up (masked-exception) response. The 80287 NPX has a default fix-up activity for every possible exception condition it may encounter. These masked-exception responses are designed to be safe and are generally acceptable for most numeric applications. As an example of how even severe exceptions can be handled safely and automatically using the NPX's default exception responses, consider a calculation of the parallel resistance of several values using only the standard formula (figure 1-11). If R{1} becomes zero, the circuit resistance becomes zero. With the divide-by-zero and precision exceptions masked, the 80287 NPX will produce the correct result. By masking or unmasking specific numeric exceptions in the NPX control word, NPX programmers can delegate responsibility for most exceptions to the NPX, reserving the most severe exceptions for programmed exception handlers. Exception-handling software is often difficult to write, and the NPX's masked responses have been tailored to deliver the most reasonable result for each condition. For the majority of applications, programmers will find that masking all exceptions other than Invalid Operation will yield satisfactory results with the least programming effort. An Invalid Operation exception normally indicates a fatal error in a program that must be corrected; this exception should not normally be masked. The exception flags in the NPX status word provide a cumulative record of exceptions that have occurred since these flags were last cleared. Once set, these flags can be cleared only by executing the FCLEX (clear exceptions) instruction, by reinitializing the NPX, or by overwriting the flags with an FRSTOR or FLDENV instruction. This allows a programmer to mask all exceptions (except invalid operation), run a calculation, and then inspect the status word to see if any exceptions were detected at any point in the calculation. Figure 1-11. Arithmetic Example Using Infinity │ │ │ │ │ R{1} │ █─────────/\/\/\/──────────█ 1 │ │ EQUIVALENT = ──────────── │ │ RESISTANCE 1 1 1 │ R{2} │ ── + ── + ── █─────────/\/\/\/──────────█ R{1} R{2} R{3} │ │ │ │ │ R{3} │ └─────────/\/\/\/──────────┘ Software Exception Handling If the NPX encounters an unmasked exception condition, it signals the exception to the 80286 CPU using the ERROR status line between the two processors. The next time the 80286 CPU encounters a WAIT or ESC instruction in its instruction stream, the 80286 will detect the active condition of the ERROR status line and automatically trap to an exception response routine using interrupt #16──the Processor Extension Error exception. This exception response routine is typically a part of the systems software. Typical exception responses may include: ■ Incrementing an exception counter for later display or printing ■ Printing or displaying diagnostic information (e.g., the 80287 environment and registers) ■ Aborting further execution ■ Using the exception pointers to build an instruction that will run without exception andexecuting it Application programmers on 80286 systems having systems software support for the 80287 NPX should consult their references for the appropriate system response to NPX exceptions. For systems programmers, specific details on writing software exception handlers are included in the section "System-Level Numeric Programming" later in this manual. The 80287 NPX differs from the 8087 NPX in the manner in which numeric exceptions are signalled to the CPU; the 8087 requires an interrupt controller (8259A) to interrupt the CPU, while the 80287 does not. Programmers upgrading 8087 software to operate on an 80287 should be aware of these differences and any implications they might have on numeric exception-handling software. Appendix B explains the differences between the 80287 and the 8087 NPX in greater detail. Chapter 2 Programming Numeric Applications ─────────────────────────────────────────────────────────────────────────── Programmers developing applications for the 80287 have a wide range of instructions and programming alternatives from which to choose. The following sections describe the 80287 instruction set in detail, and follow up with a discussion of several of the programming facilities that are available to programmers of 80287. The 80287 NPX Instruction Set This section describes the operation of all 80287 instructions. Within this section, the instructions are divided into six functional classes: ■ Data Transfer instructions ■ Arithmetic instructions ■ Comparison instructions ■ Transcendental instructions ■ Constant instructions ■ Processor Control instructions At the end of this section, each of the instructions is described in terms of its execution speed, bus transfers, and exceptions, as well as a coding example for each combination of operands accepted by the instruction. For easy reference, this information is concentrated into a table, organized alphabetically by instruction mnemonic. Throughout this section, the instruction set is described as it appears to the ASM286 programmer who is coding a program. Appendix A covers the actual machine instruction encodings, which are principally of use to those reading unformatted memory dumps, monitoring instruction fetches on the bus, or writing exception handlers. Compatibility with the 8087 NPX The instruction set for the 80287 NPX is largely the same as that for the 8087 NPX used with 8086 and 8088 systems. Most object programs generated for the 8087 will execute without change on the 80287. Several instructions are new to the 80287, and several 8087 instructions perform no useful function on the 80287. Appendix B at the back of this manual gives details of these instruction set differences and of the differences in the ASM86 and ASM286 assemblers. Numeric Operands The typical NPX instruction accepts one or two operands as inputs, operates on these, and produces a result as an output. Operands are most often (the contents of) register or memory locations. The operands of some instructions are predefined; for example, FSQRT always takes the square root of the number in the top stack element. Others allow, or require, the programmer to explicitly code the operand(s) along with the instruction mnemonic. Still others accept one explicit operand and one implicit operand, which is usually the top stack element. Whether supplied by the programmer or utilized automatically, the two basic types of operands are sources and destinations. A source operand simply supplies one of the inputs to an instruction; it is not altered by the instruction. Even when an instruction converts the source operand from one format to another (e.g., real to integer), the conversion is actually performed in an internal work area to avoid altering the source operand. A destination operand may also provide an input to an instruction. It is distinguished from a source operand, however, because its content may be altered when it receives the result produced by the operation; that is, the destination is replaced by the result. Many instructions allow their operands to be coded in more than one way. For example, FADD (add real) may be written without operands, with only a source or with a destination and a source. The instruction descriptions in this section employ the simple convention of separating alternative operand forms with slashes; the slashes, however, are not coded. Consecutive slashes indicate an option of no explicit operands. The operands for FADD are thus described as //source/destination, source This means that FADD may be written in any of three ways: FADD FADD source FADD destination, source When reading this section, it is important to bear in mind that memory operands may be coded with any of the CPU's memory addressing modes. To review these modes──direct, register indirect, based, indexed, based indexed──refer to the 80286 Programmer's Reference Manual. Table 2-17 later in this chapter also provides several addressing mode examples. Data Transfer Instructions These instructions (summarized in table 2-1) move operands among elements of the register stack, and between the stack top and memory. Any of the seven data types can be converted to temporary real and loaded (pushed) onto the stack in a single operation; they can be stored to memory in the same manner. The data transfer instructions automatically update the 80287 tag word to reflect the register contents following the instruction. FLD source FLD (load real) loads (pushes) the source operand onto the top of the register stack. This is done by decrementing the stack pointer by one and then copying the content of the source to the new stack top. The source may be a register on the stack (ST(i)) or any of the real data types in memory. Short and long real source operands are converted to temporary real automatically. Coding FLD ST(0) duplicates the stack top. FST destination FST (store real) transfers the stack top to the destination, which may be another register on the stack or a short or long real memory operand. If the destination is short or long real, the significand is rounded to the width of the destination according to the RC field of the control word, and the exponent is converted to the width and bias of the destination format. If, however, the stack top is tagged special (it contains ∞, a NaN, or a denormal) then the stack top's significand is not rounded but is chopped (on the right) to fit the destination. Neither is the exponent converted, but it also is chopped on the right and transferred "as is." This preserves the value's identification as ∞ or a NaN (exponent all ones) or a denormal (exponent all zeros) so that it can be properly loaded and tagged later in the program if desired. Table 2-1. Data Transfer Instructions ┌─────────────────────────────────────────────────────────┐ │ Real Transfers │ ├───────────┬─────────────────────────────────────────────┤ │ FLD │ Load real │ │ FST │ Store real │ │ FSTP │ Store real and pop │ │ FXCH │ Exchange registers │ ├───────────┴─────────────────────────────────────────────┤ │ Integer Transfers │ ├───────────┬─────────────────────────────────────────────┤ │ FILD │ Integer load │ │ FIST │ Integer store │ │ FISTP │ Integer store and pop │ ├───────────┴─────────────────────────────────────────────┤ │ Packed Decimal Transfers │ ├───────────┬─────────────────────────────────────────────┤ │ FBLD │ Packed decimal (BCD) load │ │ FBSTP │ Packed decimal (BCD) store and pop │ └───────────┴─────────────────────────────────────────────┘ FSTP destination FSTP (store real and pop) operates identically to FST except that the stack is popped following the transfer. This is done by tagging the top stack element empty and then incrementing ST. FSTP permits storing to a temporary real memory variable, whereas FST does not. Coding FSTP ST(0) is equivalent to popping the stack with no data transfer. FXCH//destination FXCH (exchange registers) swaps the contents of the destination and the stack top registers. If the destination is not coded explicitly, ST(1) is used. Many 80287 instructions operate only on the stack top; FXCH provides a simple means of effectively using these instructions on lower stack elements. For example, the following sequence takes the square root of the third register from the top: FXCH ST(3) FSQRT FXCH ST(3) FILD source FILD (integer load) converts the source memory operand from its binary integer format (word, short, or long) to temporary real and loads (pushes) the result onto the stack. The (new) stack top is tagged zero if all bits in the source were zero, and is tagged valid otherwise. FIST destination FIST (integer store) rounds the content of the stack top to an integer according to the RC field of the control word and transfers the result to the destination. The destination may define a word or short integer variable. Negative zero is stored in the same encoding as positive zero: 0000...00. FISTP destination FISTP (integer and pop) operates like FIST and also pops the stack following the transfer. The destination may be any of the binary integer data types. FBLD source FBLD (packed decimal (BCD) load) converts the content of the source operand from packed decimal to temporary real and loads (pushes) the result onto the stack. The sign of the source is preserved, including the case where the value is negative zero. FBLD is an exact operation; the source is loaded with no rounding error. The packed decimal digits of the source are assumed to be in the range 0-9H. The instruction does not check for invalid digits (A-FH) and the result of attempting to load an invalid encoding is undefined. FBSTP destination FBSTP (packed decimal (BCD) store and pop) converts the content of the stack top to a packed decimal integer, stores the result at the destination in memory, and pops the stack. FBSTP produces a rounded integer from a nonintegral value by adding 0.5 to the value and then chopping. Users who are concerned about rounding may precede FBSTP with FRNDINT. Arithmetic Instructions The 80287's arithmetic instruction set (table 2-2) provides a wealth of variations on the basic add, subtract, multiply, and divide operations, and a number of other useful functions. These range from a simple absolute value to a square root instruction that executes faster than ordinary division; 80287 programmers no longer need to spend valuable time eliminating square roots from algorithms because they run too slowly. Other arithmetic instructions perform exact modulo division, round real numbers to integers, and scale values by powers of two. The 80287's basic arithmetic instructions (addition, subtraction, multiplication, and division) are designed to encourage the development of very efficient algorithms. In particular, they allow the programmer to minimize memory references and to make optimum use of the NPX register stack. Table 2-3 summarizes the available operation/operand forms that are provided for basic arithmetic. In addition to the four normal operations, two "reversed" instructions make subtraction and division "symmetrical" like addition and multiplication. The variety of instruction and operand forms give the programmer unusual flexibility: ■ Operands may be located in registers or memory. ■ Results may be deposited in a choice of registers. ■ Operands may be a variety of NPX data types: temporary real, long real, short real, short integer or word integer, with automatic conversion to temporary real performed by the 80287. Five basic instruction forms may be used across all six operations, as shown in table 2-3. The classicial stack form may be used to make the 80287 operate like a classical stack machine. No operands are coded in this form, only the instruction mnemonic. The NPX picks the source operand from the stack top and the destination from the next stack element. It then pops the stack, performs the operation, and returns the result to the new stack top, effectively replacing the operands by the result. The register form is a generalization of the classical stack form; the programmer specifies the stack top as one operand and any register on the stack as the other operand. Coding the stack top as the destination provides a convenient way to access a constant, held elsewhere in the stack, from the stack top. The converse coding (ST is the source operand) allows, for example, adding the top into a register used as an accumulator. Often the operand in the stack top is needed for one operation but then is of no further use in the computation. The register pop form can be used to pick up the stack top as the sourced operand, and then discard it by popping the stack. Coding operands of ST(1), ST with a register pop mnemonic is equivalent to a classical stack operation: the top is popped and the result is left at the new top. The two memory forms increase the flexibity of the 80287's arithmetic instructions. They permit a real number or a binary integer in memory to be used directly as a source operand. This is a very useful facility in situations where operands are not used frequently enough to justify holding them in registers. Note that any memory addressing mode may be used to define these operands, so they may be elements in arrays, structures, or other data organizations, as well as simple scalars. The six basic operations are discussed further in the paragraphs following table 2-3, and descriptions of the remaining seven arithmetic operations follow. Table 2-2. Arithmetic Instructions Addition FADD Add real FADDP Add real and pop FIADD Integer add Subtraction FSUB Subtract real FSUBP Subtract real and pop FISUB Integer subtract FSUBR Subtract real reversed FSUBRP Subtract real reversed and pop FISUBR Integer subtract reversed Multiplication FMUL Multiply real FMULP Multiply real and pop FIMUL Integer multiply Division FDIV Divide real FDIVP Divide real and pop FIDIV Integer divide FDIVR Divide real reversed FDIVRP Divide real reversed and pop FIDIVR Integer divide reversed Other Operations FSQRT Square root FSCALE Scale FPREM Partial remainder FRNDINT Round to integer FXTRACT Extract exponent and significand FABS Absolute value FCHS Change sign Table 2-3. Basic Arithmetic Instruction and Operands ┌───────────────┬─────────┬────────────────────────────┬───────────────┐ │ Instruction │ Mnemonic│ Operand Forms │ ASM286 Example│ │ Form │ Form │ destination, source │ │ ├───────────────┼─────────┼────────────────────────────┼───────────────┤ │Classical stack│ Fop │ {ST(1),ST} │ FADD │ │Register │ Fop │ ST(i),ST or ST,ST(i) │ FSUB ST,ST(3)│ │Register pop │ FopP │ ST(i),ST │ FMULP ST(2),ST│ │Real memory │ Fop │ {ST,} short-real/long-real │ FDIV AZIMUTH │ │Integer memory │ Flop │ {ST,} word-integer/ │ FIDV N_PULSES │ │ │ │ short-integer │ │ └───────────────┴─────────┴────────────────────────────┴───────────────┘ ─────────────────────────────────────────────────────────────────────────── NOTES Braces({ }) surround inplicit operands; these are not coded, and are shown here for information only. op = ADD destinaiton ← destination + source SUB destination ← destination - source SUBR destination ← soure - destination MUL destination ← destination ∙ source DIV destination ← destination ÷ source DIVR destination ← source ÷ destination ─────────────────────────────────────────────────────────────────────────── ADDITION FADD //source/destination,source FADDP //destination/source FIADD source The addition instructions (add real, add real and pop, integer add) add the source and destination operands and return the sum to the destination. The operand at the stack top may be doubled by coding: FADD ST,ST(0) NORMAL SUBTRACTION FSUB //source/destinaton,source FSUBP //destination/source FISUB source The normal subtraction instruction (subtract real,subtract real and pop, integer subtract) subtract the source operand from the destination and return the difference to the destination. REVERSED SUBTRACTION FSUBR //source/destinaton,source FSUBRP //destination/source FISUBR source The reversed subtraction instructions (subtract real reversed, subtract real reversed and pop, integer subtract reversed) subtract the destination from the source and return the difference to the destination. MULTIPLICATION FMUL //source/destination,source FMULP destination,source FIMUL source The multiplication instructions (multiply real, multiply real and pop, integer multiply) multiply the source and destination operands and return the product to the destination. Coding FMUL ST,ST(0) squares the content of the stack top. NORMAL DIVISION FDIV //source/destination,source FDIVP destination,source FIDIV source The normal division instructions (divide real, divide real and pop, integer divide) divide the destination by the source and return the quotient to the destination. REVERSED DIVISION FDIVR //source destination, source FDIVRP destination,source FIDIVR source The reversed division instructions (divide real reversed, divide real reversed and pop, integer divide reversed) divide the source operand by the destination and return the quotient to the destination. FSQRT FSQRT (square root) replaces the content of the top stack element with its square root. (Note: The square root of -0 is defined to be -0.) FSCALE FSCALE (scale) interprets the value contained in ST(1) as an integer and adds this value to the exponent of the number in ST. This is equivalent to ST ← ST * 2^(ST(1)) Thus FSCALE provides rapid multiplication or division by integal powers of 2. It is particularly useful for scaling the elements of a vector. Note that FSCALE assumes the scale factor in ST(1) is an integral value in the range -2^(15) ≤ x < 2^(15). If the value is not integral, but is in-range and is greater in magnitude than 1, FSCALE uses the nearest integer smaller in magnitude; i.e., it chops the value toward 0. If the value is out of range, or 0 < │x│ < 1, the instruction will produce an undefined result and will not signal an exception. The recommended practice is to load the scale factor from a word integer to ensure correct operation. FPREM FPREM (partial remainder) performs modulo division of the top stack element by the next stack element, i.e., ST(1) is the modulus. FPREM produces an exact result; the precision exception does not occur. The sign of the remainder is the same as the sign of the orginal dividend. FPREM operates by performing successive scaled subtractions; obtaining the exact remainder when the operands differ greatly in magnitude can consume large amounts of execution time. Because the 80287 cas only be preempted between instructions, the remainder function could seriously increase interrupt latency in these cases. Accordingly, the instruction is designed to be executed interactively in a software-controlled loop. FPREM can reduce a magnitude difference of up to 264 in one execution. If FPREM produces a remainder that is less than the modulus, the function is complete and bit C2 of the status word condition code is cleared. If the function is incomplete, C2 is set to 1; the result is ST is then called the partial remainder. Software can inspect C2 by storing the status word following execution of FPREM and re-execute the instruction (using the partial remainder in ST as the dividend), until C2 is cleared. Alternatively, a program can determine when the function is complete by comparing ST to ST(1). If ST > ST(1), then FPREM must be executed again; if ST = ST(1), then the remainder is 0; if ST < ST(1), then the remainder is ST. A higher priority interrupting routine that needs the 80287 can force a context switch between the instructions in the remainder loop. An important use for FPREM is to reduce arguments (operands) of periodic transcendental functions to the range permitted by these instructions. For example, the FPTAN (tangent) instruction requires its argument to be less than π/4. Using π/4 as a modulus, FPTAN will reduce an argument so that it is in range of FPTAN. Because FPREM produces an exact result, the argument reduction does not introduce roundoff error into the calculation, even if several iterations are required to bring the argument into range. (The rounding of π does not create the effect of a rounded argument, but of a rounded period.) FPREM also provides the least-significant three bits of the quotient generated by FPREM (in C{3}, C{1}, C{0}). This is also important for trancendental argument reduction, because it locates the original angle in the correct one of eight π/4 segments of the unit circle (see table 2-4). If the quotient is less than 4, then C0 will be the value of C3 before FPREM was executed. If the quotient is less than 2, then C3 will be the value of C1 before FPREM was executed. FRNDINT FRNDINT (round to integer) rounds the top stack element to an integer. For example, assume that ST contains the 80287 real number encoding of the decimal value 155.625. FRNDINT will change the value to 155 if the RC field of the control word is set to down or chop, or to 156 if it is set to up or nearest. FXTRACT FXTRACT (extract exponent and significand) "decomposes" the number in the stack top into two numbers that represent the actual value of the operand's exponent and significand fields. The "exponent" replaces the original operand on the stack and the "significand" is pushed onto the stack. Following execution of FXTRACT, ST (the new stack top) contains the value of the original significand expressed as a real number: its sign is the same as the operand's, its exponent is 0 true (16,383 or 3FFFH biased), and its significand is identical to the original operand's. ST(1) contains the value of the original operand's true (unbiased) exponent expressed as a real number. If the original operand is zero, FXTRACT produces zeros in ST and ST(1) and both are signed as the original operand. To clarify the operation of FXTRACT, assume ST contains a number of whose true exponent is +4 (i.e., its exponent field contains 4003H). After executing FXTRACT, ST(1) will contain the real number +4.0; its sign will be positive, its exponent field will contain 4001H (+2 true) and its significand field will contain 1{▲}00...00B. In other words, the value in ST(1) will be 1.0 * 2^(2) = 4. If ST contains an operand whose true exponent is -7 (i.e., its exponent field contains 3FF8H), then FXTRACT will return an "exponent" of -7.0; after the instruction executes, ST(1)'s sign and exponent fields will contain C001H (negative sign, true exponent of 2), and its significand will be 1{▲}1100...00B. In other words, the value in ST(1) will be -1.11 * 2^(2) = -7.0. In both cases, following FXTRACT, ST's sign and significand fields will be the same as the original operand's, and its exponent field will contain 3FFFH (0 true). FXTRACT is useful in conjunction with FBSTP for converting numbers in 80287 temporary real format to decimal representations (e.g., for printing or displaying). It can also be useful for debugging, because it allows the exponent and significant parts of a real number to be examined separately. FABS FABS (absolute value) changes the top stack element to its absolute value by making its sign positive. FCHS FCHS (change sign) complements (reverses) the sign of the top stack element. Table 2-4. Condition Code Interpretation after FPREM ┌─────────────────────────────────────────────────────────────────────┐ │ ┌──Condition Code──┐ │ │ C3 C2 C1 C0 Interpretation after FPREM │ ├─────┬─────┬─────┬─────┬─────────────────────────────────────────────┤ │ X │ 1 │ X │ X │ Incomplete Reduction; │ │ │ │ │ │ further iteration is required for │ │ │ │ │ │ complete reduction. │ │ │ │ │ │ Complete Reduction; │ │ │ │ │ │ C1, C3, and C0 contain the three least- │ │ │ │ │ │ significant bits of quotient: │ │ 0 │ 0 │ 0 │ 0 │ (Quotient) MOD 8 = 0 │ │ 0 │ 0 │ 0 │ 1 │ (Quotient) MOD 8 = 4 │ │ 0 │ 0 │ 1 │ 0 │ (Quotient) MOD 8 = 1 │ │ 0 │ 0 │ 1 │ 1 │ (Quotient) MOD 8 = 5 │ │ 1 │ 0 │ 0 │ 0 │ (Quotient) MOD 8 = 2 │ │ 1 │ 0 │ 0 │ 1 │ (Quotient) MOD 8 = 6 │ │ 1 │ 0 │ 1 │ 0 │ (Quotient) MOD 8 = 3 │ │ 1 │ 0 │ 1 │ 1 │ (Quotient) MOD 8 = 7 │ └─────┴─────┴─────┴─────┴─────────────────────────────────────────────┘ Comparison Instructions Each of these instructions (table 2-5) analyzes the top stack element, often in relationship to another operand, and reports the result in the status word condition code. The basic operations are compare, test (compare with zero), and examine (report tag, sign, and normalization). Special forms of the compare operation are provided to optimize algorithms by allowing direct comparisons with binary integers and real numbers in memory, as well as popping the stack after a comparison. The FSTSW (store status word) instruction may be used following a comparison to transfer the condition code to memory for inspection. Note that instructions other than those in the comparison group may update the condition code. To ensure that the status word is not altered inadvertently, store it immediately following a comparison operation. FCOM //source FCOM (compare real) compares the stack top to the source operand. The source operand may be a register on the stack, or a short or long real memory operand. If an operand is not coded, ST is compared to ST(1). Positive and negative forms of zero compare identically as if they were unsigned. Following the instruction, the condition codes reflect the order of the operands as shown in table 2-6. NaNs and ∞ (projective) cannot be compared and return C3 = C0 = 1 as shown in the table. FCOMP //source FCOMP (compare real and pop) operates like FCOM, and in addition pops the stack. FCOMPP FCOMPP (compare real and pop twice) operates like FCOM and additionally pops the stack twice, discarding both operands. The comparison is of the stack top to ST(1); no operands may be explicitly coded. FICOM source FICOM (integer compare) converts the source operand, which may reference a word or short binary integer variable, to temporary real and compares the stack top to it. FICOMP source FICOMP (integer compare and pop) operates identically to FICOM and additionally discards the value in ST by popping the stack. FTST FTST (test) tests the top stack element by comparing it to zero. The result is posted to the condition codes as shown in table 2-7. FXAM FXAM (examine) reports the content of the top stack element as positive/negative and NaN/unnormal/denormal/normal/zero, or empty. Table 2-8 lists and interprets all the condition code values that FXAM generates. Although four different encodings may be returned for an empty register, bits C3 and C0 of the condition code are both 1 in all encodings. Bits C2 and C1 should be ignored when examining for empty. Table 2-5. Comparison Instructions FCOM Compare real FCOMP Compare real and pop FCOMPP Compare real and pop twice FICOM Integer compare FICOMP Integer compare and pop FTST Test FXAM Examine Table 2-6. Condition Code Interpretation after FCOM ┌── Condition Code ──┐ C3 C2 C1 C0 Interpretation after FCOM 0 0 X 0 ST > source 0 0 X 1 ST < source 1 0 X 0 ST = source 1 1 X 1 ST is not comparable Table 2-7. Condition Code Interpretation after FTST ┌── Condition Code ──┐ C3 C2 C1 C0 Interpretation after FTST 0 0 X 0 ST > 0 0 0 X 1 ST < 0 1 0 X 0 ST = 0 1 1 X 1 ST is not comparable; (i.e., it is a NaN or projective infinity) Table 2-8. FXAM Condition Code Settings ┌─── Condition Code ───┐ C3 C2 C1 C0 Interpretation 0 0 0 0 + Unnormal 0 0 0 1 + NaN 0 0 1 0 - Unnormal 0 0 1 1 - NaN 0 1 0 0 + Normal 0 1 0 1 + ∞ 0 1 1 0 - Normal 0 1 1 1 - ∞ 1 0 0 0 + 0 1 0 0 1 Empty 1 0 1 0 - 0 1 0 1 1 Empty 1 1 0 0 + Denormal 1 1 0 1 Empty 1 1 1 0 - Denormal 1 1 1 1 Empty Transcendental Instructions The instructions in this group (table 2-9) perform the time-consuming core calculations for all common trigonometric, inverse trigonometric, hyperbolic, inverse hyperbolic, logarithmic, and exponential functions. Prologue and epilogue software may be used to reduce arguments to the range accepted by the instructions and to adjust the result to correspond to the original arguments if necessary. The transcendentals operate on the top one or two stack elements, and they return their results to the stack, also. ─────────────────────────────────────────────────────────────────────────── NOTE The transcendental instructions assume that their operands are valid and in-range. The instruction descriptions in this section provide the allowed operand range of each instruction. ─────────────────────────────────────────────────────────────────────────── All operands to a transcendental must be normalized; denormals, unnormals, infinities, and NaNs are considered invalid. (Zero operands are accepted by some functions and are considered out-of-range by others). If a transcendental operand is invalid or out-of-range, the instruction will produce an undefined result without signalling an exception. It is the programmer's responsibility to ensure that operands are valid and in-range before executing a transcendental. For periodic functions, FPREM may be used to bring a valid operand into range. FPTAN 0 ≤ ST(0) ≤ π/4 FPTAN (partial tangent) computes the function Y/X = TAN(Θ). Θ is taken from the top stack element; it must lie in the range 0 ≤ Θ ≤ π/4. The result of the operation is a ratio; Y replaces Θ in the stack and X is pushed, becoming the new stack top. The ratio result of FPTAN and the ratio argument of FPATAN are designed to optimize the calculation of the other trigonometric functions, including SIN, COS, ARCSIN, and ARCCOS. These can be derived from TAN and ARCTAN via standard trigonometric identities. FPATAN 0 ≤ ST(1) < ST(0) < ∞ FPATAN (partial arctangent) computes the function Θ = ARCTAN(Y/X). X is taken from the top stack element and Y from ST(1). Y and X must observe the inequality 0 ≤ Y < X < ∞. The instruction pops the stack and returns Θ to the (new) stack top, overwriting the Y operand. F2XM1 0 ≤ ST(0) ≤ 0.5 F2XM1 (2 to the X minus 1) calculates the function Y = 2^(X) - 1. X is taken from the stack top and must be in the range 0 ≤ X ≤ 0.5. The result Y replaces X at the stack top. This instruction is designed to produce a very accurate result even when X is close to 0. To obtain Y = 2^(X), add 1 to the result delivered by F2XM1. The following formulas show how values other than 2 may be raised to a power of X: 10^(x) = 2^(x * LOG{2}10) e^(x) = 2^(x * LOG{2}e) y^(x) = 2^(x * LOG{2}Y) As shown in the next section, the 80287 has built-in instructions for loading the constants LOG{2}10 and LOG{2}e, and the FYL2X instruction may be used to calculate X * LOG{2}Y. FYL2X 0 < ST(0) < ∞ - ∞ < ST(1) < ∞ FYL2X (Y log base 2 of X) calculates the function Z = Y * LOG{2}X. X is taken from the stack top and Y from ST(1). The operands must be in the ranges 0 < X < ∞ and -∞ < Y < +∞. The instruction pops the stack and returns Z at the (new) stack top, replacing the Y operand. This function optimizes the calculations of log to any base other than two, because a multiplication is always required: LOG{n}2 * LOG{2}X FYL2XP1 0 ≤ │ST(0)│ < (1 - (√2/2)) -∞ < ST(1) < ∞ FYL2XP1 (Y log base 2 of (X + 1)) calculates the function Z = Y * LOG{2}(X+1). X is taken from the stack top and must be in the range 0 ≤ │X│ < (1 - (√2/2)). Y is taken from ST(1) and must be in the range -∞ < Y < ∞. FYL2XP1 pops the stack and returns Z at the (new) stack top, replacing Y. The instruction provides improved accuracy over FYL2X when computing the log of a number very close to 1, for example 1 + ε where ε << 1. Providing ε rather than 1 + ε as the input to the function allows more significant digits to be retained. Table 2-9. Transcendental Instructions FPTAN Partial tangent FPATAN Partial arctangent F2XM1 2^(X) - 1 FYL2X Y * log{2}X FYL2XP1 Y * log{2}(X + 1) Constant Instructions Each of these instructions (table 2-10) loads (pushes) a commonly-used constant onto the stack. The values have full temporary real precision (64 bits) and are accurate to approximately 19 decimal digits. Because a temporary real constant occupies 10 memory bytes, the constant instructions, which are only two bytes long, save storage and improve execution speed, in addition to simplifying programming. FLDZ FLDZ (load zero) loads (pushes) +0.0 onto the stack. FLD1 FLD1 (load one) loads (pushes) +1.0 onto the stack. FLDPI FLDPI (load π) loads (pushes) π onto the stack. FLDL2T FLDL2T (load log base 2 of 10) loads (pushes) the value LOG{2}10 onto the stack. FLDL2E FLDL2E (load log base 2 of e) loads (pushes) the value LOG{2}e onto the stack. FLDLG2 FLDLG2 (load log base 10 of 2) loads (pushes) the value LOG{10}2 onto the stack. FLDLN2 FLDLN2 (load log base e of 2) loads (pushes) the value LOG{e}2 onto the stack. Table 2-10. Constant Instructions FLDZ Load +0.0 FLD1 Load +1.0 FLDPI Load π FLDL2T Load log{2}10 FLDL2E Load log{2}e FLDLG2 Load log{10}2 FLDLN2 Load log{e}2 Processor Control Instructions The processor control instructions shown in table 2-11 are not typically used in calculations; they provide control over the 80287 NPX for system-level activities. These activities include initialization, exception handling, and task switching. As shown in table 2-11, many of the NPX processor control instructions have two forms of assembler mnemonic: ■ A wait form, where the mnemonic is prefixed only with an F, such as FSTSW. This form checks for unmasked numeric errors. ■ A no-wait form, where the mnemonic is prefixed with an FN, such as FNSTSW. This form ignores unmasked numeric errors. When the control instruction is coded using the no-wait form of the mnemonic, the ASM286 assembler does not precede the ESC instruction with a wait instruction, and the CPU does not test the ERROR status line from the NPX before executing the processor control instruction. Only the processor control class of instructions have this alternate no-wait form. All numeric instructions are automatically synchronized by the 80286, with the CPU testing the BUSY status line and only executing the numeric instruction when this line is inactive. Because of this automatic synchronization by the 80286, numeric instructions for the 80287 need not be preceded by a CPU wait instruction in order to execute correctly. It should also be noted that the 8087 instructions FENI and FDISI perform no function in the 80287. If these opcodes are detected in an 80286/80287 instruction stream, the 80287 will perform no specific operation and no internal states will be affected. For programmers interested in porting numeric software from 8087 environments to the 80286, however, it should be noted that program sections containing these exception-handling instructions are not likely to be completely portable to the 80287. Appendix B contains a more complete description of the differences between the 80287 and the 8087 NPX. Table 2-11. Processor Control Instructions FINIT/FNINIT Initialize processor FSETPM Set Protected Mode FLDCW Load control word FSTCW/FNSTCW Store control word FSTSW/FNSTSW Store status word FSTSW AX/FNSTSW AX Store status word to AX FCLEX/FNCLEX Clear exceptions FSTENV/FNSTENV Store Environment FLDENV Load environment FSAVE/FNSAVE Save state FRSTOR Restore state FINCSTP Increment stack pointer FDECSTP Decrement stack pointer FFREE Free register FNOP No operation FWAIT CPU Wait FINIT/FNINIT FINIT/FNINIT (initialize processor) sets the 80287 NPX into a known state, unaffected by any previous activity. The no-wait form of this instruction will cause the 80287 to abort any previous numeric operations currently executing in the NEU. This instruction performs the functional equivalent of a hardware RESET, with one exception; FINIT/FNINIT does not affect the current 80287 operating mode (either Real-Address mode or Protected mode). FINIT checks for unmasked numeric exceptions, FNINIT does not. Note that if FNINIT is executed while a previous 80287 memory-referencing instruction is running, 80287 bus cycles in progress will be aborted. This instruction may be necessary to clear the 80287 if a Processor Extension Segment Overrun Exception (Interrupt 9) is detected by the CPU. FSETPM FSETPM (set Protected mode) sets the operating mode of the 80287 to Protected Virtual-Address mode. When the 80287 is first initialized following hardware RESET, it operates in Real-Address mode, just as does the 80286 CPU. Once the 80287 NPX has been set into Protected mode, only a hardware RESET can return the NPX to operation in Real-Address mode. When the 80287 operates in Protected mode, the NPX exception pointers are represented differently than they are in Real-Address mode (see the FSAVE and FSTENV instructions that follow). This distinction is evident primarily to writers of numeric exception handlers, however. For general application programmers, the operating mode of the 80287 need not be a concern. FLDCW source FLDCW (load control word) replaces the current processor control word with the word defined by the source operand. This instruction is typically used to establish or change the 80287's mode of operation. Note that if an exception bit in the status word is set, loading a new control word that unmasks that exception and clears the interrupt enable mask will generate an immediate interrupt request before the next instruction is executed. When changing modes, the recommended procedure is to first clear any exceptions and then load the new control word. FSTCW/FNSTCW destination FSTCW/FNSTCW (store control word) writes the current processor control word to the memory location defined by the destination. FSTCW checks for unmasked numeric exceptions, FNSTCW does not. FSTSW/FNSTSW destination FSTSW/FNSTCW (store status word) writes the current value of the 80287 status word to the destination operand in memory. The instruction is used to ■ Implement conditional branching following a comparison or FPREM instruction (FSTSW) ■ Poll the 80287 to determine if it is busy (FNSTSW) ■ Invoke exception handlers in environments that do not use interrupts (FSTSW). FSTSW checks for unmasked numeric exceptions, FNSTSW does not. FSTSW AX/FNSTSW AX FSTSW AX/FNSTSW AX (store status word to AX) is a special 80287 instruction that writes the current value of the 80287 status word directly into the 80286 AX register. This instruction optimizes conditional branching in numeric programs, where the 80286 CPU must test the condition of various NPX status bits. The waited form checks for unmasked numeric exceptions, the non-waited for does not. When this instruction is executed, the 80286 AX register is updated with the NPX status word before the CPU executes any further instructions. In this way, the 80286 can immediately test the NPX status word without any WAIT or other synchronization instructions required. FCLEX/FNCLEX FCLEX/FNCLEX (clear exceptions) clears all exception flags, the error status flag and the busy flag in the status word. As a consequence, the 80287's ERROR line goes inactive. FCLEX checks for unmasked numeric exceptions, FNCLEX does not. FSAVE/FNSAVE destination FSAVE/FNSAVE (save state) writes the full 80287 state──environment plus register stack──to the memory location defined by the destination operand. Figure 2-1 shows the layout of the 94-byte save area; typically the instruction will be coded to save this image on the CPU stack. FNSAVE delays its execution until all NPX activity completes normally. Thus, the save image reflects the state of the NPX following the completion of any running instruction. After writing the state image to memory, FSAVE/FNSAVE initializes the 80287 as if FINIT/FNINIT had been executed. FSAVE/FNSAVE is useful whenever a program wants to save the current state of the NPX and initialize it for a new routine. Three examples are ■ An operating system needs to perform a context switch (suspend the task that had been running and give control to a new task). ■ An exception handler needs to use the 80287. ■ An application task wants to pass a "clean" 80287 to a subroutine. FSAVE checks for unmasked numeric errors before executing, FNSAVE does not. An FWAIT should be executed before CPU interrupts are enabled or any subsequent 80287 instruction is executed. Other CPU instructions may be executed between the FNSAVE/FSAVE and the FWAIT. Figure 2-1. FSAVE/FRSTOR Memory Layout ◄────────────┐INCREASING ╔═════════════════════════╗ │ADDRESS REAL MODE ║ CONTROL WORD ║ +0 │ ╟─────────────────────────╢ │ ║ STATUS WORD ║ +2 │ ╟─────────────────────────╢ ▼ ║ TAG WORD ║ +4 ╟─────────────────────────╢ ┌─║INSTRUCTION POINTER(15-0)║ +6 │ ╟───────────┬─┬───────────╢ INSTRUCTION│ ║INSTRUCTION│ │INSTRUCTION║ POINTER┤ ║ POINTER │0│ OPCODE ║ +8 │ ║ (19-14) │ │ (10-0) ║ ╞═╟───────────┴─┴───────────╢ OPERAND│ ║ DATA POINTER(15-0) ║ +10 POINTER┤ ╟───────────┬─────────────╢ │ ║ DATA │ ║ └─║ POINTER │ 0 ║ +12 ║ (19-16) │ ║ ╚══╦════════╧══════════╦══╝ ┌─║ SIGNIFICAND 15-0 ║ +14 │ ╟───────────────────╢ │ ║ SIGNIFICAND 31-16 ║ +16 TOP STACK┤ ╟───────────────────╢ ELEMENT:ST│ ║ SIGNIFICAND 47-32 ║ +18 │ ╟───────────────────╢ │ ║ SIGNIFICAND 63-48 ║ +20 │ ╟─▲─────────────────╢ └─║S│ EXPONENT 14-0 ║ +22 ╟─┴─────────────────╢ ┌─║ SIGNIFICAND 15-0 ║ +14 │ ╟───────────────────╢ │ ║ SIGNIFICAND 31-16 ║ +16 NEXT STACK┤ ╟───────────────────╢ ELEMENT:ST(1)│ ║ SIGNIFICAND 47-32 ║ +18 │ ╟───────────────────╢ │ ║ SIGNIFICAND 63-48 ║ +20 │ ╟─▲─────────────────╢ └─║S│ EXPONENT 14-0 ║ +22 ╟─┴─────────────────╢ ╟─┴─────────────────╢ ≈ ≈ ╟───────────────────╢ ┌─║ SIGNIFICAND 15-0 ║ +84 │ ╟───────────────────╢ │ ║ SIGNIFICAND 31-16 ║ +86 LAST STACK┤ ╟───────────────────╢ ELEMENT:ST(7)│ ║ SIGNIFICAND 47-32 ║ +88 │ ╟───────────────────╢ │ ║ SIGNIFICAND 63-48 ║ +90 │ ╟─▲─────────────────╢ └─║S│ EXPONENT 14-0 ║ +92 ╚═╧═════════════════╝ ◄────────────┐INCREASING │ADDRESSES ╔═════════════════════╗ │ PROTECTED MODE ║ CONTROL WORD ║ +0 │ ╟─────────────────────╢ │ ║ STATUS WORD ║ +2 │ ╟─────────────────────╢ ▼ ║ TAG WORD ║ +4 ╟─────────────────────╢ ║ IP OFFSET ║ +6 ╟─────────────────────╢ ║ CS SELECTOR ║ +8 ╟─────────────────────╢ ║ DATA OPERAND OFFSET ║ +10 ╟─────────────────────╢ ║DATA OPERAND SELECTOR║ +12 ╚╦═══════════════════╦╝ ┌─║ SIGNIFICAND 15-0 ║ +14 │ ╟───────────────────╢ │ ║ SIGNIFICAND 31-16 ║ +16 TOP STACK┤ ╟───────────────────╢ ELEMENT:ST│ ║ SIGNIFICAND 47-32 ║ +18 │ ╟───────────────────╢ │ ║ SIGNIFICAND 63-48 ║ +20 │ ╟─▲─────────────────╢ └─║S│ EXPONENT 14-0 ║ +22 ╟─┴─────────────────╢ ┌─║ SIGNIFICAND 15-0 ║ +24 │ ╟───────────────────╢ │ ║ SIGNIFICAND 31-16 ║ +26 NEXT STACK┤ ╟───────────────────╢ ELEMENT:ST(1)│ ║ SIGNIFICAND 47-32 ║ +28 │ ╟───────────────────╢ │ ║ SIGNIFICAND 63-48 ║ +30 │ ╟─▲─────────────────╢ └─║S│ EXPONENT 14-0 ║ +32 ╟─┴─────────────────╢ ≈ ≈ ╟───────────────────╢ ║ SIGNIFICAND 15-0 ║ +84 ┌─╟───────────────────╢ │ ║ SIGNIFICAND 31-16 ║ +86 │ ╟───────────────────╢ LAST STACK┤ ║ SIGNIFICAND 47-32 ║ +88 ELEMENT:ST(7)│ ╟───────────────────╢ │ ║ SIGNIFICAND 63-48 ║ +90 │ ╟─▲─────────────────╢ │ ║S│ EXPONENT 14-0 ║ +92 └─╚═╧═════════════════╝ ─────────────────────────────────────────────────────────────────────────────────────────────────── NOTES: a = INSTRUCTION POINTER b = OPERAND POINTER S = Sign Bit 0 of each field is rightmost, least significant bit of corresponding register field. Bit 63 of significand is integer bit (assumed binary point is immediately to the right.) ─────────────────────────────────────────────────────────────────────────────────────────────────── FRSTOR source FRSTOR (restore state) reloads the 80287 from the 94-byte memory area defined by the source operand. This information should have been written by a previous FSAVE/FNSAVE instruction and not altered by any other instruction. An FWAIT is not required after FRSTOR. FRSTOR will automatically wait and check for interrupts until all data transfers are completed before continuing to the next instruction. Note that the 80287 "reacts" to its new state at the conclusion of the FRSTOR; it will, for example, generate an exception request if the exception and mask bits in the memory image so indicate when the next WAIT or error-checking-ESC instruction is executed. FSTENV/FNSTENV destination FSTENV/FNSTENV (store environment) writes the 80287's basic status──control, status, and tag words, and exception pointers──to the memory location defined by the destination operand. Typically, the environment is saved on the CPU stack. FSTENV/FNSTENV is often used by exception handlers because it provides access to the exception pointers that identify the offending instruction and operand. After saving the environment, FSTENV/FNSTENV sets all exception masks in the processor. FSTENV checks for pending errors before executing, FNSTENV does not. Figure 2-2 shows the format of the environment data in memory. FNSTENV does not store the environment until all NPX activity has completed. Thus, the data saved by the instruction reflects the 80287 after any previously decoded instruction has been executed. After writing the environment image to memory, FNSTENV/FSTENV initializes the 80287 state as if FNINIT/FINIT had been executed. FSTENV/FNSTENV must be allowed to complete before any other 80287 instruction is decoded. When FSTENV is coded, an explicit FWAIT, or assembler-generated WAIT, should precede any subsequent 80287 instruction. Figure 2-2. FSTENV/FLDENV Memory Layout REAL MODE PROTECTED MODE 15 0 MEMORY 15 0 MEMORY ╔═══════════════════════════════╗OFFSET ╔═══════════════════════════╗OFFSET ║ CONTROL WORD ║ +0 ║ CONTROL WORD ║ +0 ╟───────────────────────────────╢ ╟───────────────────────────╢ ║ STATUS WORD ║ +2 ║ STATUS WORD ║ +2 ╟───────────────────────────────╢ ╟───────────────────────────╢ ║ TAG WORD ║ +4 ║ TAG WORD ║ +4 ╟───────────────────────────────╢ ╟───────────────────────────╢ ║ INSTRUCTION POINTER(15-0) ║ +6 ║ IP OFFSET ║ +6 ╟───────────┬─┬─────────────────╢ ╟───────────────────────────╢ ║INSTRUCTION│ │ INSTRUCTION ║ ║ ║ ║ POINTER │0│ OPCODE ║ +8 ║ CS SELECTOR ║ +8 ║ (19-14) │ │ (10-0) ║ ║ ║ ╟───────────┴─┴─────────────────╢ ╟───────────────────────────╢ ║ DATA POINTER(15-0) ║ +10 ║ DATA OPERAND OFFSET ║ +10 ╟───────────┬───────────────────╢ ╟───────────────────────────╢ ║ DATA │ ║ ║ ║ ║ POINTER │ 0 ║ +12 ║ DATA OPERAND SELECTOR ║ +12 ║ (19-16) │ ║ ║ ║ ╚═══════════╧═══════════════════╝ ╚═══════════════════════════╝ 15 12 11 0 FLDENV source FLDENV (load environment) reloads the environment from the memory area defined by the source operand. This data should have been written by a previous FSTENV/FNSTENV instruction. CPU instructions (that do not reference the environment image) may immediately follow FLDENV. An FWAIT is not required after FLDENV. FLDENV will automatically wait for all data transfers to complete before executing the next instruction. Note that loading an environment image that contains an unmasked exception will cause a numeric exception when the next WAIT or error-checking-ESC instruction is executed. FINCSTP FINCSTP (increment stack pointer) adds 1 to the stack top pointer (ST) in the status word. It does not alter tags or register contents, nor does it transfer data. It is not equivalent to popping the stack, because it does not set the tag of the previous stack top to empty. Incrementing the stack pointer when ST = 7 produces ST = 0. FDECSTP FDECSTP (decrement stack pointer) subtracts 1 from ST, the stack top pointer in the status word. No tags or registers are altered, nor is any data transferred. Executing FDECSTP when ST = 0 produces ST = 7. FFREE destination FFREE (free register) changes the destination register's tag to empty; the content of the register is unaffected. FNOP FNOP (no operation) stores the stack top to the stack top (FST ST,ST(0)) and thus effectively performs no operation. FWAIT (CPU Instruction) FWAIT is not actually an 80287 instruction, but an alternate mnemonic for the CPU WAIT instruction. The FWAIT or WAIT mnemonic should be coded whenever the programmer wants to synchronize the CPU to the NPX, that is, to suspend further instruction decoding until the NPX has completed the current instruction. FWAIT will check for unmasked numeric exceptions. ─────────────────────────────────────────────────────────────────────────── NOTE A CPU instruction should not attempt to access a memory operand until the 80287 instruction has completed. For example, the following coding shows how FWAIT can be used to force the CPU instruction to wait for the 80287: FIST VALUE FWAIT ; Wait for FIST to complete MOV AX,VALUE ─────────────────────────────────────────────────────────────────────────── More information on when to code an FWAIT instruction is given in a following section of this chapter, "Concurrent Processing with the 80287." Instruction Set Reference Information Table 2-14 later in this chapter lists the operating characteristics of all the 80287 instructions. There is one table entry for each instruction mnemonic; the entries are in alphabetical order for quick lookup. Each entry provides the general operand forms accepted by the instruction as well as a list of all exceptions that may be detected during the operation. One entry exists for each combination of operand types that can be coded with the mnemonic. Table 2-12 explains the operand identifiers allowed in table 2-14. Following this entry are columns that provide execution time in clocks, the number of bus transfers run during the operation, the length of the instruction in bytes, and an ASM286 coding sample. Instruction Execution Time The execution of an 80287 instruction involves three principal activities, each of which may contribute to the overall execution time of the instruction: ■ 80286 CPU overhead involved in handling the ESC instruction opcode and setting up the 80287 NPX ■ Instruction execution by the 80287 NPX ■ Operand transfers between the 80287 NPX and memory or a CPU register The timing of these various activities is affected by the individual clock frequencies of the 80286 CPU and the 80287 NPX. In addition, slow memories requiring the insertion of wait states in bus cycles, and bus contention due to other processors in the system, may lengthen operand transfer times. In calculating an overall execution time for an individual numeric instruction, analysts must take each of these activities into account. In most cases, it can be assumed that the numeric instructions have already been prefetched by the 80286 and are awaiting execution. ■ The CPU overhead in handling the ESC instruction opcode takes only a single CPU bus cycle before the 80287 begins its execution of the numeric instruction. The timing of this bus cycle is determined by the CPU clock. Additional CPU activity is required to set up the 80287's instruction and data pointer registers, but this activity occurs after the 80287 has begun executing its instruction, and so this parallel activity does not affect total execution time. ■ The duration of individual numeric instructions executing on the 80287 varies for each instruction. Table 2-14 quotes a typical execution clock count and a range for each 80287 instruction. Dividing the figures in the table by 10 (for a 10-MHz 80287 NPX clock) produces an execution time in microseconds. The typical case is an estimate for operand values that normally characterize most applications. The range encompasses best- and worst-case operand values that may be found in extreme circumstances. ■ The operand transfer time required to transfer operands between the 80287 and memory or a CPU register depends on the number of words to be transferred, the frequency of the CPU clock controlling bus timing, the number of wait states added to accommodate slower memories, and whether operands are based at even or odd memory addresses. Some (small) additional number of bus cycles may also be lost due to the asynchronous nature of the PEREQ/PEACK handshaking between the 80286 and 80287, and this interaction varies with relative frequencies of the CPU and NPX clocks. The execution clock counts for the NPX execution of instructions shown in table 2-14 assume that no exceptions are detected during execution. Invalid operation, denormalized operand (unmasked), and zero divide exceptions usually decrease execution time from the typical figure, but execution still falls within the indicated range. The precision exception has no effect on execution time. Unmasked overflow and underflow, and masked denormalized exceptions impose additional execution penalties as shown in table 2-13. Absolute worst-case execution times are therefore the high range figure plus the largest penalty that may be encountered. Table 2-12. Key to Operand Types Identifier Explanation ST Stack top; the register currently at the top of the stack. ST(i) A register in the stack i (0≤i≤7) stack elements from the top. ST(1) is the next-on-stack register, ST(2) is below ST(1), etc. Short-real A short real (32 bits) number in memory. Long-real A long real (64 bits) number in memory. Temp-real A temporary real (80 bits) number in memory. Packed-decimal A packed decimal integer (18 digits, 10 bytes) in memory. Word-integer A word binary integer (16 bits) in memory. Short-integer A short binary integer (32 bits) in memory. Long-integer A long binary integer (64 bits) in memory. nn-bytes A memory area nn bytes long. Bus Transfers NPX instructions that reference memory require bus cycles to transfer operands between the NPX and memory. The actual number of transfers depends on the length of the operand and the alignment of the operand in memory. In table 2-14, the first figure gives execution clocks for even-addressed operands, while the second gives the clock count for odd-addressed operands. For operands aligned at word boundaries, that is, based at even memory addresses, each word to be transferred requires one bus cycle between the 80286 data channel and memory, and one bus cycle to the NPX. For operands based at odd memory addresses, each word transfer requires two bus cycles to transfer individual bytes between the 80286 data channel and memory, and one bus cycle to the NPX. ─────────────────────────────────────────────────────────────────────────── NOTE For best performance, operands for the 80287 should be aligned along word boundaries; that is, based at even memory addresses. Operands based at odd memory addresses are transferred to memory essentially byte-at-a-time and may take half again as long to transfer as word-aligned operands. ─────────────────────────────────────────────────────────────────────────── Additional transfer time is required if slow memories are being used, requiring the insertion of wait states into the CPU bus cycle. In multiprocessor environments, the bus may not be available immediately; this overhead can also increase effective transfer time. Table 2-13. Execution Penalties Exception Additional Clocks Overflow (unmasked) 14 Underflow (unmasked) 16 Denormalized (masked) 33 Instruction Length 80287 instructions that do not reference memory are two bytes long. Memory reference instructions vary between two and four bytes. The third and fourth bytes are for the 8- or 16-bit displacement values used in conjunction with the standard 80286 memory-addressing modes. Note that the lengths quoted in table 2-14 for the processor control instructions (FNINIT, FNSTCW, FNSTSW, FNSTSW AX, FNCLEX, FNSTENV, and FNSAVE) do not include the one-byte CPU wait instruction inserted by the ASM286 assembler if the control instruction is coded using the wait form of the mnemonic (e.g. FINIT, FSTCW, FSTSW, FSTSW AX, FCLEX, FSTENV, and FSAVE). Wait and no-wait forms of the processor control instructions have been described in the preceding section titled "Processor Control Instructions." Table 2-14. Instruction Set Reference Data ⌐Execution Clocks¬ Operand Word Code Operands Typical Range Transfers Bytes Coding Example ─────────────────────────────────────────────────────────────────────────────────────────────────── FABS FABS (no operands) Absolute value Exceptions: I (no operands) 14 10-17 0 2 FABS ─────────────────────────────────────────────────────────────────────────────────────────────────── FADD FADD\\source\destination,source Add real Execptions: I,D,O,U,P \\ST,ST(i)\ST(i),ST 85 70-100 0 2 FADD,ST,ST(4) short-real 105 90-120 2 2-4 FADD AIR_TEMP [SI] long-real 110 95-125 4 2-4 FADD [BX].MEAN ─────────────────────────────────────────────────────────────────────────────────────────────────── FADDP FADDP destination, source Add real and pop Exceptions: I,D,O,U,P ST(i),ST 90 75-105 0 2 FADDP ST(2),ST ─────────────────────────────────────────────────────────────────────────────────────────────────── FBLD FBLD source Packed decimal (BCD) load Exceptions: I packed-decimal 300 290-310 5 2-4 FBLD YTD_SALES ─────────────────────────────────────────────────────────────────────────────────────────────────── FBSTP FBSTP destination Packed decimal (BCD) store and pop Exceptions: I packed-decimal 530 520-540 5 2-4 FBSTP [BX].FORECAST ─────────────────────────────────────────────────────────────────────────────────────────────────── FCHS FCHS (no operands) Change sign Exceptions: I (no operands) 15 10-17 0 2 FCHS ─────────────────────────────────────────────────────────────────────────────────────────────────── FCLEX/FNCLEX FCLEX/FNCLEX(no operands) Clear exceptions Exceptions: None (no operands) 5 2-8 0 2 FNCLEX ─────────────────────────────────────────────────────────────────────────────────────────────────── FCOM FCOM //source Compare real Exceptions: I, D //ST(i) 45 40-50 0 2 FCOM ST(1) short-real 65 60-70 2 2-4 FCOM [BP].UPPER_LIMIT long-real 70 65-75 4 2-4 FCOM WAVELENGTH ─────────────────────────────────────────────────────────────────────────────────────────────────── FCOMP FCOMP //source Compare real and pop Exceptions: I, D //ST(i) 47 42-52 0 2 FCOMP ST(2) short-real 68 63-73 2 2-4 FCOMP [BP + 2].N_READINGS long-real 72 67-77 4 2-4 FCOMP DENSITY ─────────────────────────────────────────────────────────────────────────────────────────────────── FCOMPP FCOMPP (no operands) Compare real and pop twice Exceptions: I, D (no operands) 50 45-55 0 2 FCOMPP ─────────────────────────────────────────────────────────────────────────────────────────────────── FDECSTP FDECSTP (no operands) Decrement stack pointer Exceptions: None (no operands) 9 6-12 0 2 FDECSTP ─────────────────────────────────────────────────────────────────────────────────────────────────── FDIV FDIV //source/destination,source Divide real Exceptions: I, D, Z, O, U, P //ST(i),ST 198 193-203 0 2 FDIV short-real 220 215-225 2 2-4 FDIV DISTANCE long-real 225 220-230 4 2-4 FDIV ARC [DI] ─────────────────────────────────────────────────────────────────────────────────────────────────── FDIVP FDIVP destination, source Divide real and pop Exceptions: I, D, Z, O, U, P ST(i),ST 202 197-207 0 2 FDIVP ST(4),ST ─────────────────────────────────────────────────────────────────────────────────────────────────── FDIVR FDIVR //source/destination, source Divide real reversed Exceptions: I, D, Z, O, U, P //ST,ST(i)/ST(i),ST 199 194-204 0 2 FDIVR ST(2),ST short-real 221 216-226 2 2-4 FDIVR [BX].PULSE_RATE long-real 226 221-231 4 2-4 FDIVR RECORDER.FREQUENCY ─────────────────────────────────────────────────────────────────────────────────────────────────── FDIVRP FDIVRP destination, source Divide real reversed and pop Exceptions: I, D, Z, O, U, P ST(i),ST 203 198-208 0 2 FDIVRP ST(1),ST ─────────────────────────────────────────────────────────────────────────────────────────────────── FFREE FFREE destination Free register Exceptions: None ST(i) 11 9-16 0 2 FFREE ST(1) ─────────────────────────────────────────────────────────────────────────────────────────────────── FIADD FIADD source Integer add Exceptions: I, D, O, P word-integer 120 102-137 1 2-4 FIADD DISTANCE_TRAVELLED short-integer 125 108-143 2 2-4 FIADD PULSE_COUNT [SI] ─────────────────────────────────────────────────────────────────────────────────────────────────── FICOM FICOM source Integer compare Exceptions: I, D word-integer 80 72-86 1 2-4 FICOM TOOL.N_PASSES short-integer 85 78-91 2 2-4 FICOM [BP+4].PARM_COUNT ─────────────────────────────────────────────────────────────────────────────────────────────────── FICOMP FICOMP source Integer compare and pop Exceptions: I, D word-integer 82 74-88 1 2-4 FICOMP [BP].LIMIT [SI] short-integer 87 80-93 2 2-4 FICOMP N_SAMPLES ─────────────────────────────────────────────────────────────────────────────────────────────────── FIDIV FIDIV source Integer divide Exceptions: I, D, Z, O, U, P word-integer 230 224-238 1 2-4 FIDIV SURVEY.OBSERVATIONS short-integer 236 230-243 2 2-4 FIDIV RELATIVE_ANGLE [DI] ─────────────────────────────────────────────────────────────────────────────────────────────────── FIDIVR FIDIVR source Integer divide reversed Exceptions: I, D, Z, O, U, P word-integer 230 225-239 1 2-4 FIDIVR [BP].X_COORD short-integer 237 231-245 2 2-4 FIDIVR FREQUENCY ─────────────────────────────────────────────────────────────────────────────────────────────────── FILD FILD source Integer load Exceptions: I word-integer 50 46-54 1 2-4 FILD [BX].SEQUENCE short-integer 56 52-60 2 2-4 FILD STANDOFF [DI] long-integer 64 60-68 4 2-4 FILD RESPONSE.COUNT ─────────────────────────────────────────────────────────────────────────────────────────────────── FIMUL FIMUL source Integer multiply Exceptions: I, D, O, P word-integer 130 124-138 1 2-4 FIMUL BEARING short-integer 136 130-144 2 2-4 FIMUL POSITION.Z_AXIS ─────────────────────────────────────────────────────────────────────────────────────────────────── FINCSTP FINCSTP (no operands) Increment stack pointer Exceptions: None (no operands) 9 6-12 0 2 FINCSTP ─────────────────────────────────────────────────────────────────────────────────────────────────── FINIT/FNINIT FINIT/FNINIT (no operands) Initialize processor Exceptions: None (no operands) 5 2-8 0 2 FINIT ─────────────────────────────────────────────────────────────────────────────────────────────────── FIST FIST destination Integer store Exceptions: I, P word-integer 86 80-90 1 2-4 FIST OBS.COUNT[SI] short-integer 88 82-92 2 2-4 FIST [BP;].FACTORED_PULSES ─────────────────────────────────────────────────────────────────────────────────────────────────── FISTP FISTP destination Integer store and pop Exceptions: I, P word-integer 88 82-92 1 2-4 FISTP [BX].ALPHA_COUNT [SI] short-integer 90 84-94 2 2-4 FISTP CORRECTED_TIME long-integer 100 94-105 4 2-4 FISTP PANEL.N_READINGS ─────────────────────────────────────────────────────────────────────────────────────────────────── FISUB FISUB source Integer subtract Exceptions: I, D, O, P word-integer 120 102-137 1 2-4 FISUB BASE_FREQUENCY short-integer 125 108-143 2 2-4 FISUB TRAIN_SIZE [DI] ─────────────────────────────────────────────────────────────────────────────────────────────────── FISUBR FISUBR source Integer subtract reversed Exceptions: I, D, O, P word-integer 120 103-139 1 2-4 FISUBR FLOOR [BX] [SI] short-integer 125 109-144 2 2-4 FISUBR BALANCE ─────────────────────────────────────────────────────────────────────────────────────────────────── FLD FLD source Load real Exceptions: I, D ST(i) 20 17-22 0 2 FLD ST(0) short-real 43 38-56 2 2-4 FLD READING [SI].PRESSURE long-real 46 40-60 4 2-4 FLD BP].TEMPERATURE temp-real 57 53-65 5 2-4 FLD SAVEREADING ─────────────────────────────────────────────────────────────────────────────────────────────────── FLDCW FLDCW source Load control word Exceptions: None 2-bytes 10 7-14 1 2-4 FLDCW CONTROL_WORD ─────────────────────────────────────────────────────────────────────────────────────────────────── FLDENV FLDENV source Load environment Exceptions: None 14-bytes 40 35-45 7 2-4 FLDENV [BP + 6] ─────────────────────────────────────────────────────────────────────────────────────────────────── FLDLG2 FLDLG2 (no operands) Load log{10}2 Exceptions: I (no operands) 21 18-24 0 2 FLDLG2 ─────────────────────────────────────────────────────────────────────────────────────────────────── FLDLN2 FLDLN2 (no operands) Load log{e}2 Exceptions: I (no operands) 20 17-23 0 2 FLDLN2 ─────────────────────────────────────────────────────────────────────────────────────────────────── FLDL2E FLDL2E (no operands) Load log{2}e Exceptions: I (no operands) 18 15-21 0 2 FLDL2E ─────────────────────────────────────────────────────────────────────────────────────────────────── FLDL2T FLDL2T (no operands) Load log{2}10 Exceptions: I (no operands) 19 16-22 0 2 FLDL2T ─────────────────────────────────────────────────────────────────────────────────────────────────── FLDPI FLDPI (no operands) Load π Exceptions: I (no operands) 19 16-22 0 2 FLDPI ─────────────────────────────────────────────────────────────────────────────────────────────────── FLDZ FLDZ (no operands) Load +0.0 Exceptions: I (no operands) 14 11-17 0 2 FLDZ ─────────────────────────────────────────────────────────────────────────────────────────────────── FLD1 FLD1 (no operands) Load +1.0 Exceptions: I (no operands) 18 15-21 0 2 FLD1 ─────────────────────────────────────────────────────────────────────────────────────────────────── FMUL FMUL //source/destination,source Multiply real Exceptions: I, D, O, U, P //ST(i),ST/T,ST(i) Occurs when one or both operands is "short"──it has 40 trailing zeros in its fraction (e.g., it was loaded from a short-real memory operand). 97 90-105 0 2 FMUL ST,ST(3) //ST(i),ST/ST,ST(i) 138 130-145 0 2 FMUL ST,ST(3) short-real 118 110-125 2 2-4 FMUL SPEED_FACTOR long-real Occurs when one or both operands is "short"──it has 40 trailing zeros in its fraction (e.g., it was loaded from a short-real memory operand). 120 112-126 4 2-4 FMUL [BP].HEIGHT long-real 161 154-168 4 2-4 FMUL [BP].HEIGHT ─────────────────────────────────────────────────────────────────────────────────────────────────── FMULP FMULP destination, source Multiply real and pop Exceptions: I, D, O, U, P ST(i),ST Occurs when one or both operands is "short"──it has 40 trailing zeros in its fraction (e.g., it was loaded from a short-real memory operand). 100 94-108 0 2 FMULP ST(1),ST ST(i),ST 142 134-148 0 2 FMULP ST(1),ST ─────────────────────────────────────────────────────────────────────────────────────────────────── FNOP FNOP (no operands) No operation Exceptions: None (no operands) 13 10-16 0 2 FNOP ─────────────────────────────────────────────────────────────────────────────────────────────────── FPATAN FPATAN (no operands) Partial arctangent Exceptions: U, P (operands not checked) (no operands) 650 250-800 0 2 FPATAN ─────────────────────────────────────────────────────────────────────────────────────────────────── FPREM FPREM (no operands) Partial remainder Exceptions: I, D, U (no operands) 125 15-190 0 2 FPREM ─────────────────────────────────────────────────────────────────────────────────────────────────── FPTAN FPTAN (no operands) Partial tangent Exceptions: I, P (operands not checked) (no operands) 450 30-540 0 2 FPTAN ─────────────────────────────────────────────────────────────────────────────────────────────────── FRNDINT FRNDINT (no operands) Round to integer Exceptions: I, P (no operands) 45 16-50 0 2 FRNDINT ─────────────────────────────────────────────────────────────────────────────────────────────────── FRSTOR FRSTOR source Restore saved state Exceptions: None 94-bytes ( The 80287 execution clock count for this instruction is not meaningful in determining overall instruction execution time. For typical frequency ratios of the 80286 and 80287 clocks, 80287 execution occurs in parallel with the operand transfers, with the operand transfers determining the overall execution time of the instruction. For 80286:80287 clock frequency ratios of 4:8, 1:1, and 8:5, the overall execution clock count for this instruction is estimated at 490, 302, and 227 80287 clocks, respectively.) 47 2-4 FRSTOR [BP] ─────────────────────────────────────────────────────────────────────────────────────────────────── FSAVE/FNSAVE FSAVE/FNSAVE destination Save state Exceptions: None 94-bytes ( The 80287 execution clock count for this instruction is not meaningful in determining overall instruction execution time. For typical frequency rations of the 80286 and 80287 clocks, 80287 execution occurs in parallel with the operand transfers, with the operand transfers determining the overall execution time of the instruction. For 80286:80287 clock frequency ratios of 4:8, 1:1, and 8:5, the overall execution clock count for this instruction is estimated at 376, 233, and 174 80287 clocks, respectively.) 47 2-4 FSAVE [BP] ─────────────────────────────────────────────────────────────────────────────────────────────────── FSCALE FSCALE (no operands) Scale Exceptions: I, O, U (no operands) 35 32-38 0 2 FSCALE ─────────────────────────────────────────────────────────────────────────────────────────────────── FSETPM FSETPM (no operands) Set protected mode Exceptions: None (no operands) 2-8 0 2 FSETPM ─────────────────────────────────────────────────────────────────────────────────────────────────── FSQRT FSQRT (no operands) Square root Exceptions: I, D, P (no operands) 183 80-186 0 2 FSQRT ─────────────────────────────────────────────────────────────────────────────────────────────────── FST FST destination Store real Exceptions: I, O, U, P ST(i) 18 15-22 0 2 FST ST(3) short-real 87 84-90 2 2-4 FST CORRELATION [DI] long-real 100 96-104 4 2-4 FST MEAN_READING ─────────────────────────────────────────────────────────────────────────────────────────────────── FSTCW/ FSTCW destination FNSTCW Store control word Exceptions: None 2-bytes 15 12-18 1 2-4 FSTCW SAVE_CONTROL ─────────────────────────────────────────────────────────────────────────────────────────────────── FSTENV/ FSTENV destination FNSTENV Store environment Exceptions: None 14-bytes 45 40-50 7 2-4 FSTENV [BP] ─────────────────────────────────────────────────────────────────────────────────────────────────── FSTP FSTP destination Store real and pop Exceptions: I, O, U, P ST(i) 20 17-24 0 2 FSTP ST(2) short-real 89 86-92 2 2-4 FSTP [BX].ADJUSTED_RPM long-real 102 98-106 4 2-4 FSTP TOTAL_DOSAGE temp-real 55 52-58 5 2-4 FSTP REG_SAVE [SI] ─────────────────────────────────────────────────────────────────────────────────────────────────── FSTSW/ FSTSW destination FNSTSW Store status word Exceptions: None 2-bytes 15 12-18 1 2-4 FSTSW SAVE_STATUS ─────────────────────────────────────────────────────────────────────────────────────────────────── FSTSW AX/ FSTSW AX FNSTSWAX Store status word to AX Exceptions: None AX 10-16 1 2 FSTSW AX ─────────────────────────────────────────────────────────────────────────────────────────────────── FSUB FSUB //source/destination,source Subtract real Exceptions: I, D, O, U, P //ST,ST(i)/ST(i),ST 85 70-100 0 2 FSUB ST,ST(2) short-real 105 90-120 2 2-4 FSUB BASE_VALUE long-real 110 95-125 4 2-4 FSUB COORDINATE.X ─────────────────────────────────────────────────────────────────────────────────────────────────── FSUBP FSUBP destination, source Subtract real and pop Exceptions: I, D, O, U, P ST(i),ST 90 75-105 0 2 FSUBP ST(2),ST ─────────────────────────────────────────────────────────────────────────────────────────────────── FSUBR FSUBR //source/destination, source Subtract real reversed Exceptions: I, D, O, U, P //ST,ST(i)/ST(i),ST 87 70-100 0 2 FSUBR ST,ST(1) short-real 105 90-120 2 2-4 FSUBR VECTOR[SI] long-real 110 95-125 4 2-4 FSUBR [BX].INDEX ─────────────────────────────────────────────────────────────────────────────────────────────────── FSUBRP FSUBRP destination, source Subtract real reversed and pop Exceptions: I, D, O, U, P ST(i),ST 90 75-105 0 2 FSUBRP ST(1),ST ─────────────────────────────────────────────────────────────────────────────────────────────────── FTST FTST (no operands) Test stack top against +0.0 Exceptions: I, D (no operands) 42 38-48 0 2 FTST ─────────────────────────────────────────────────────────────────────────────────────────────────── FWAIT FWAIT (no operands) (CPU) Wait while 80287 is busy Exceptions: None (CPU instruction) (no operands) 3+5n n = number of times CPU examines BUSY line before 80287 completes execution of previous instruction. 3+5n n = number of times CPU examines BUSY line before 80287 completes execution of previous instruction. 0 1 FWAIT ─────────────────────────────────────────────────────────────────────────────────────────────────── FXAM FXAM (no operands) Examine stack top Exceptions: None (no operands) 17 12-23 0 2 FXAM ─────────────────────────────────────────────────────────────────────────────────────────────────── FXCH FXCH //destination Exchange registers Exceptions: I //ST(i) 12 10-15 0 2 FXCH ST(2) ─────────────────────────────────────────────────────────────────────────────────────────────────── FXTRACT FXTRACT (no operands) Extract exponent and significant Exceptions: I (no operands) 50 27-55 0 2 FXTRACT ─────────────────────────────────────────────────────────────────────────────────────────────────── FYL2X FYL2X (no operands) Y * Log{2}X Exceptions: P (operands not checked) (no operands) 950 900-1100 0 2 FYL2X ─────────────────────────────────────────────────────────────────────────────────────────────────── FYL2XP1 FYL2XP1 (no operands) Y * log{2}(X + 1) Exceptions: P (operands not checked) (no operands) 850 700-1000 0 2 FYL2XP1 ─────────────────────────────────────────────────────────────────────────────────────────────────── F2XM1 F2XM1 (no operands) 2^(2-1) Exceptions: U, P (operands not checked) (no operands) 500 310-630 0 2 F2XM1 ─────────────────────────────────────────────────────────────────────────────────────────────────── Programming Facilities As described previously, the 80287 NPX is programmed simply as an extension of the 80286 CPU. This section describes how programmers in ASM286 and in a variety of higher-level languages can work with the 80287. The level of detail in this section is intended to give programmers a basic understanding of the software tools that can be used with the 80287, but this information does not document the full capabilities of these facilities. For a complete list of documentation on all the languages available for 80286 systems, readers should consult Intel's Literature Guide. High-Level Languages For programmers using high-level languages, the programming and operation of the NPX is handled automatically by the compiler. A variety of Intel high-level languages are available that automatically make use of the 80287 NPX when appropriate. These languages include PL/M-286 FORTRAN-286 PASCAL-286 C-286 Each of these high-level languages has special numeric libraries allowing programs to take advantage of the capabilities of the 80287 NPX. No special programming conventions are necessary to make use of the 80287 NPX when programming numeric applications in any of these languages. Programmers in PL/M-286 and ASM286 can also make use of many of these library routines by using routines contained in the 80287 Support Library, described in the 80287 Support Library Reference Manual, Order Number 122129. These library routines provide many of the functions provided by higher-level languages, including exception handlers, ASCII-to-floating-point conversions, and a more complete set of transcendental functions than that provided by the 80287 instruction set. PL/M-286 Programmers in PL/M-286 can access a very useful subset of the 80287's numeric capabilities. The PL/M-286 REAL data type corresponds to the NPX's short real (32-bit) format. This data type provides a range of about 8.43*10^(-37) ≤ ABS(X) ≤ 3.38*10^(38), with about seven significant decimal digits. This representation is adequate for the data manipulated by many microcomputer applications. The utility of the REAL data type is extended by the PL/M-286 compiler's practice of holding intermediate results in the 80287's temporary real format. This means that the full range and precision of the processor are utilized for intermediate results. Underflow, overflow, and rounding errors are most likely to occur during intermediate computations rather than during calculation of an expression's final result. Holding intermediate results in temporary real format greatly reduces the likelihood of overflow and underflow and eliminates roundoff as a serious source of error until the final assignment of the result is performed. The compiler generates 80287 code to evaluate expressions that contain REAL data types, whether variables or constants or both. This means that addition, subtraction, multiplication, division, comparison, and assignment of REALs will be performed by the NPX. INTEGER expressions, on the other hand, are evaluated on the CPU. Five built-in procedures (table 2-15) give the PL/M-286 programmer access to 80287 functions manipulated by the processor control instructions. Prior to any arithmetic operations, a typical PL/M-286 program will set up the NPX after power up using the INIT$REAL$MATH$UNIT procedure and then issue SET$REAL$MODE to configure the NPX. SET$REAL$MODE loads the 80287 control word, and its 16-bit parameter has the format shown in figure 1-5. The recommended value of this parameter is 033EH (projective closure, round to nearest, 64-bit precision, all exceptions masked except invalid operation). Other settings may be used at the programmer's discretion. If any exceptions are unmasked, an exception handler must be provided in the form of an interrupt procedure that is designated to be invoked by CPU interrupt pointer (vector) number 16. The exception handler can use the GET$REAL$ERROR procedure to obtain the low-order byte of the 80287 status word and to then clear the exception flags. The byte returned by GET$REAL$ERROR contains the exception flags; these can be examined to determine the source of the exception. The SAVE$REAL$STATUS and RESTORE$REAL$STATUS procedures are provided for multi-tasking environments where a running task that uses the 80287 may be preempted by another task that also uses the 80287. It is the responsibility of the preempting task to issue SAVE$REAL$STATUS before it executes any statements that affect the 80287; these include the INIT$REAL$MATH$UNIT and SET$REAL$MODE procedures as well as arithmetic expressions. SAVE$REAL$STATUS saves the 80287 state (registers, status, and control words, etc.) on the CPU's stack. RESTORE$REAL$STATUS reloads the state information; the preempting task must invoke this procedure before terminating in order to restore the 80287 to its state at the time the running task was preempted. This enables the preempted task to resume execution from the point of its preemption. Table 2-15. PL/M-286 Built-In Procedures 80287 Procedure Instruction Description INIT$REAL$MATH$UNIT Also initializes interrupt pointers for emulation. FINIT Initialize processor. SET$REAL$MODE FLDCW Set exception masks, rounding precision, and infinity controls. GET$REAL$ERROR Returns low-order byte of status word. FNSTSW & FNCLEX Store, then clear, exception flags. SAVE$REAL$STATUS FNSAVE Save processor state. RESTORE$REAL$STATUS FRSTOR Restore processor state. ASM286 The ASM286 assembly language provides programmmers with complete access to all of the facilities of the 80286 and 80287 processors. The programmer's view of the 80286/80287 hardware is a single machine with these resources: ■ 160 instructions ■ 12 data types ■ 8 general registers ■ 4 segment registers ■ 8 floating-point registers, organized as a stack Defining Data The ASM286 directives shown in table 2-16 allocate storage for 80287 variables and constants. As with other storage allocation directives, the assembler associates a type with any variable defined with these directives. The type value is equal to the length of the storage unit in bytes (10 for DT, 8 for DQ, etc.). The assembler checks the type of any variable coded in an instruction to be certain that it is compatible with the instruction. For example, the coding FIADD ALPHA will be flagged as an error if ALPHA's type is not 2 or 4, because integer addition is only available for word and short integer data types. The operand's type also tells the assembler which machine instruction to produce; although to the programmer there is only an FIADD instruction, a different machine instruction is required for each operand type. On occasion it is desirable to use an instruction with an operand that has no declared type. For example, if register BX points to a short integer variable, a programmer may want to code FIADD [BX]. This can be done by informing the assembler of the operand's type in the instruction, coding FIADD DWORD PTR [BX]. The corresponding overrides for the other storage allocations are WORD PTR, QWORD PTR, and TBYTE PTR. The assembler does not, however, check the types of operands used in processor control instructions. Coding FRSTOR [BP] implies that the programmer has set up register BP to point to the stack location where the processor's 94-byte state record has been previously saved. The initial values for 80287 constants may be coded in several different ways. Binary integer constants may be specified as bit strings, decimal integers, octal integers, or hexadecimal strings. Packed decimal values are normally written as decimal integers, although the assembler will accept and convert other representations of integers. Real values may be written as ordinary decimal real numbers (decimal point required), as decimal numbers in scientific notation, or as hexadecimal strings. Using hexadecimal strings is primarily intended for defining special values such as infinities, NaNs, and nonnormalized numbers. Most programmers will find that ordinary decimal and scientific decimal provide the simplest way to initialize 80287 constants. Figure 2-3 compares several ways of setting the various 80287 data types to the same initial value. Note that preceding 80287 variables and constants with the ASM286 EVEN directive ensures that the operands will be word-aligned in memory. This will produce the best system performance. All 80287 data types occupy integral numbers of words so that no storage is "wasted" if blocks of variables are defined together and preceded by a single EVEN declarative. Table 2-16. 80287 Storage Allocation Directives Directive Interpretation Data Types DW Define Word Word integer DD Define Doubleword Short integer, short real DQ Define Quadword Long integer, long real DT Define Tenbyte Packed decimal, temporary real Records and Structures The ASM286 RECORD and STRUC (structure) declaratives can be very useful in NPX programming. The record facility can be used to define the bit fields of the control, status, and tag words. Figure 2-4 shows one definition of the status word and how it might be used in a routine that polls the 80287 until it has completed an instruction. Because STRUCtures allow different but related data types to be grouped together, they often provide a natural way to represent "real world" data organizations. The fact that the structure template may be "moved" about in memory adds to its flexibility. Figure 2-5 shows a simple structure that might be used to represent data consisting of a series of test score samples. A structure could also be used to define the organization of the information stored and loaded by the FSTENV and FLDENV instructions. Figure 2-3. Sample 80287 Constants ; THE FOLLOWING ALL ALLOCATE THE CONSTANT: -126 ; NOTE TWO'S COMPLETE STORAGE OF NEGATIVE BINARY INTEGERS. ; EVEN ; FORCE WORK ALIGNMENT WORD_INTEGER DW 111111111000010B ; BIT STRING SHORT_INTEGER DD OFFFFFF82H ; HEX STRING MUST START ; WITH DIGIT LONG_INTEGER DQ -126 ; ORDINARY DECIMAL SHORT_REAL DD -126.0 ; NOTE PRESENCE OF '.' LONG_REAL DD -1.26E2 ; "SCIENTIFIC" PACKED_DECIMAL DT -126 ; ORDINARY DECIMAL INTEGER ; IN THE FOLLOWING, SIGN AND EXPONENT IS 'C005' ; SIGNIFICAND IS '7E00...00', 'R' INFORMS ASSEMBLER THAT ; THE STRING REPRESENTS A REAL DATA TYPE. ; TEMP_REAL DT 0C0057E00000000000000R ; HEX STRING Figure 2-4. Status Word RECORD Definition ; RESERVE SPACE FOR STATUS WORD STATUS_WORD ; LAY OUT STATUS WORD FIELDS STATUS RECORD & BUSY: 1, & COND_CODE 3: 1, & STACK_TOP: 3, & COND_CODE 2: 1, & COND_CODE 1: 1, & COND_CODE 0: 1, & INT_REQ: 1, & RESERVED: 1, & P_FLAG: 1, & U_FLAG: 1, & O_FLAG: 1, & Z_FLAG: 1, & D_FLAG: 1, & I_FLAG: 1, ; POLL STATUS WORD UNTIL 80287 IS NOT BUSY POLL: FNSTSW STATUS_WORD TEST STATUS_WORD, MASK_BUSY JNZ POLL Figure 2-5. Structure Definition SAMPLE STRUC N_OBS DD ? ; SHORT INTEGER MEAN DQ ? ; LONG REAL MODE DW ? ; WORD INTEGER STD_DEV DQ ? ; LONG REAL ; ARRAY OF OBSERVATIONS -- WORD INTEGER TEST_SCORES DW 1000 DUP (?) SAMPLE ENDS Addressing Modes 80287 memory data can be accessed with any of the CPU's five memory addressing modes. This means that 80287 data types can be incorporated in data aggregates ranging from simple to complex according to the needs of the application. The addressing modes, and the ASM286 notation used to specify them in instructions, make the accessing of structures, arrays, arrays of structures, and other organizations direct and straightforward. Table 2-17 gives several examples of 80287 instructions coded with operands that illustrate different addressing modes. Table 2-17. Addressing Mode Examples ┌─── Coding───────────────┐ Interpretation FIADD ALPHA ALPHA is a simple scalar (mode is direct). FDIVR ALPHA.BETA BETA is a field in a structure that is "overlaid" on ALPHA (mode is direct). FMUL QWORD PTR [BX] BX contains the address of a long real variable (mode is register indirect). FSUB ALPHA [SI] ALPHA is an array and SI contains the offset of an array element from the start of the array (mode is indexed). FILD [BP].BETA BP contains the address of a structure on the CPU stack and BETA is a field in the structure (mode is based). FBLD TBYTE PTR [BX] [DI] BX contains the address of a packed decimal array and DI contains the offset of an array element (mode is based indexed). Comparative Programming Example Figures 2-6 and 2-7 show the PL/M-286 and ASM286 code for a simple 80287 program, called ARRSUM. The program references an array (X$ARRAY), which contains 0-100 short real values; the integer variable N$OF$X indicates the number of array elements the program is to consider. ARRSUM steps through X$ARRAY accumulating three sums: ■ SUM$X, the sum of the array values ■ SUM$INDEXES, the sum of each array value times its index, where the index of the first element is 1, the second is 2, etc. ■ SUM$SQUARES, the sum of each array element squared (A true program, of course, would go beyond these steps to store and use the results of these calculations.) The control word is set with the recommended values: projective closure, round to nearest, 64-bit precision, interrupts enabled, and all exceptions masked invalid operation. It is assumed that an exception handler has been written to field the invalid operation, if it occurs, and that it is invoked by interrupt pointer 16. Either version of the program will run on an actual or an emulated 80287 without altering the code shown. The PL/M-286 version of ARRSUM (figure 2-6) is very straightforward and illustrates how easily the 80287 can be used in this language. After declaring variables the program calls built-in procedures to initialize the processor (or its emulator) and to load to the control word. The program clears the sum variables and then steps through X$ARRAY with a DO-loop. The loop control takes into account PL/M-286's practice of considering the index of the first element of an array to be 0. In the computation of SUM$INDEXES, the built-in procedure FLOAT converts I+1 from integer to real because the language does not support "mixed mode" arithmetic. One of the strengths of the NPX, of course, is that it does support arithmetic on mixed data types (because all values are converted internally to the 80-bit temporary real format). The ASM286 version (figure 2-7) defines the external procedure INIT287, which makes the different initialization requirements of the processor and its emulator transparent to the source code. After defining the data and setting up the segment registers and stack pointer, the program calls INIT287 and loads the control word. The computation begins with the next three instructions, which clear three registers by loading (pushing) zeros onto the stack. As shown in figure 2-8, these registers remain at the bottom of the stack throughout the computation while temporary values are pushed on and popped off the stack above them. The program uses the CPU LOOP instruction to control its iteration through X_ARRAY; register CX, which LOOP automatically decrements, is loaded with N_OF_X, the number of array elements to be summed. Register SI is used to select (index) the array elements. The program steps through X_ARRAY from back to front, so SI is initialized to point at the element just beyond the first element to be processed. The ASM286 TYPE operator is used to determine the number of bytes in each array element. This permits changing X_ARRAY to a long real array by simply changing its definition (DD to DQ) and reassembling. Figure 2-8 shows the effect of the instructions in the program loop on the NPX register stack. The figure assumes that the program is in its first iteration, that N_OF_X is 20, and that X_ARRAY(19) (the 20th element) contains the value 2.5. When the loop terminates, the three sums are left as the top stack elements so that the program ends by simply popping them into memory variables. Figure 2-6. Sample PL/M-286 Program PL/M-286 COMPILER ARRAYSUM SERIES-III PL/M-286 V 1.0 COMPILATION OF MODULE ARRAYSUM OBJECT MODULE PLACED IN :F6:D.OBJ COMPILER INVOLKED BY PLM286.86 :F6:D.SRC XREF /****************************************** * * * ARRAYSUM MOD * * * ******************************************/ 1 array$sum: do; 2 1 declare (sum$x,sum$indexes,sum$squares) real; 3 1 declare x$array(100) real; 4 1 declare (n$of$x,i) integer; 5 1 declare control$287 literally '033eh'; /* Assume x$array and n$of$x are initalized */ /* Prepare the 80287 of its emulator */ 6 1 call init$real$math$unit; 7 1 call set$real$mode(control$287); /* Clear sums */ 8 1 sum$x, sum$indexes, (sum$squares - 0) 0; /* Loop through array, accumulating sums */ 9 1 do i = 0 to n$of$x-1; 10 2 sum$x = sum$x = x$array(i); 11 2 sum$indexes = sum$indexes + (x$array(i) * float(i±1)); 12 2 sum$squares = sum$squares + (x$array(i)*x$array(i)); 13 2 end; /* etc. */ 14 1 end array$sum; PL/M-286 COMPILER ARRAYSUM CROSS-REFERENCE LISTING DEFN ADDR SIZE NAME,ATTRIBUTES, AND REFERENCES 1 0006H 117 ARRAYSUM PROCEDURE STACK=002H 5 CONTROL287 LITERALLY '033eh' 7 FLOAT BUILTIN 11 4 019EH 2 I INTEGER 9* 9 10 11 12 13 INITREALMATHUNIT BUILTIN 6 4 019CH 2 NDFX INTEGER 9 SETREALMODE BUILTIN 7 2 0004H 4 SUMINDEXES REAL 8* 11 11* 2 0008H 4 SUMSQAURES REAL 8* 12 12* 2 0000H 4 SUMX REAL 8* 10 10* 3 000CH 400 XARRAY REAL ARRAY(100) 10 11 12 MODULE INFORMATION CODE AREA SIZE = 0077H 119D CONSTANT AREA SIZE = 0004H 4D VARIABLE AREA SIZE = 01A0H 416D MAXIMUM STACK SIZE = 0002H 2D 33 LINES READ 0 PROGRAM WARNINGS 0 PROGRAM ERRORS DICTIONARY SUMMARY 96KB MEMORY AVILABLE 3KB MEMORY USED (3%) 0KB DISK SPACE USED END OF PL/M-286 COMPILATION Figure 2-7. Sample ASM286 Program iAPX286 MACRO ASSEMBLER EXAMPLE ASM286_PROGRAM SERIES-III iAPX286 MACRO ASSEMBLER X108 ASSEMBLY OF MODULE EXAMPLE_ASM286_PROGRAM OBJECT MODULE PLACED IN :F6:287EXP.OBJ ASSEMBLER INVOKED BY ASM286 B6 :F6:287EXP.SRC XREF LOC OBJ LINE SOURCE 1 name example_ASM286_program 2 ; Define intitialization routine 3 extrn init287:far 4 5 ; Allocate space for date ---- 6 data segment rw public 0000 3E03 7 control_287 dw 033eh 0002 ???? 8 n_of_x dw ? 0004 (000 9 x_array dd 100 dwp (?) ???????? ) 0194 ???????? 10 sum_squares dd ? 019B ???????? 11 sum_indexes dd ? 019C ???????? 12 sum_x dd ? ---- 13 data ends 14 15 ; Allocate CPU stack space ---- 16 stack stackseg 400 17 18 ; Begin code ---- 19 code segment or public 20 assumes ds: data, ss: stack, es: nothing 0000 21 start: 0000 BB---- R 22 mov ax,data 0003 BED8 23 mov ds,ax 0005 B8---- R 24 mov ax,stack 0008 BED0 25 mov ss,ax 000A BCFEFF R 26 mov sp,stackstart stack 27 28 ; Assume x_array and n_of_x are initialized 29 ; this pprogram zeroes n_of_x 30 31 ; Prepare the 80287 or its emulator 000D 9A0000---- E 32 call init287 0012 D92E0000 R 33 fldcw control_287 34 35 ; Clear three registers to hold running sums 0016 D9EE 36 fldz 0018 D9EE 37 fldz 001A D9EE 38 fldz 39 40 ; Setup CX as loop counter and 41 ; SI as index to x_array 001C 8B0E0200 R 42 mov cx,n_of_x 0020 F7E9 43 imul cx 0022 8BF0 44 mov si,ax 45 46 ; SI now contains index of last element + 1 47 ; Loop thru x_array, accumulating sums 0024 48 sum_next: 0024 8E3304 49 sub si,type x_array ;backup one element 0027 D9840400 R 50 fld x_array[si] ;push it on the stack 002B DCC3 51 fadd st(3),st ;add into sum of x 002D D9C0 52 fld st ;duplicate x on tap 002F DCC8 53 fmul st,st ;square it 0031 DEC2 54 faddp st(2),st ;add into sum of (index+x) 55 ; and discard 0033 FF0E0200 R 56 dec n_of_x ;reduce index for next iteration 0037 E2EB 57 loop sum_next ;continue 58 59 ; Pop running sums into memory 0039 60 pop_results: 0039 D91E9401 R 61 fstp sum_squares 003D D91E9801 R 62 fstp sum_indexes 0041 D91E9C01 R 63 fstp sum_x 0045 9B 64 fwait 65 66 ; 67 ; Etc. 68 ; ---- 69 code ends 70 end start iAPX286 MACRO ASSEMBLER EXAMPLE_ASM286_PROGRAM XREF SYMBOL TABLE LISTING NAME TYPE VALUE ATTRIBUTES, XREFS CODE SEGMENT SIZE=0046H ER PUBLIC 19# 69 CONTROL_287 V WORD 0000H DATA 7# 33 DATA SEGMENT SIZE=01A0H RW PUBLIC 6# 13 20 22 INIT287 L FAR 0000H EXTR 3# 32 N_OF_X V WORD 0002H DATA 8# 42 56 POP_RESULTS L NEAR 0039H CODE 60# STACK STACK SIZE=0190H RW PUBLIC 16# 20 24 26 START L NEAR 0000H CODE 21# 70 SUM_INDEXES V DWORD 0198H DATA 11# 62 SUM_NEXT L NEAR 0024H CODE 48# 57 SUM_SQUARES V DWORD 0194H DATA 10# 61 SUM_X V DWORD 019CH DATA 12# 63 X_ARAY V DWORD 0004H (100) DATA 9# 49 50 END OF SYMBOL TABLE LISTING ASSEMBLY COMPLETE, NO ERRORS Figure 2-8. Instructions and Register Stack FLDZ,FLDZ,FLDZ FDL X_ARRAY[SI] ╔═════╗──────────────────────────►╔═════╗ ST(0)║ 0.0 ║ SUM_SQUARES ST(0)║ 2.5 ║ X_ARRAY(19) ╟─────╢ ╟─────╢ ST(1)║ 0.0 ║ SUM_INDEXES ST(1)║ ║ SUM_SQUARES ╟─────╢ ╟─────╢ ST(2)║ 0.0 ║ SUM_X ST(2)║ 0.0 ║ SUM_INDEXES ╚═════╝ ╟─────╢ ST(3)║ 0.0 ║ SUM_X ┌────────────────────────╚═════╝ │ FADD ST(3),ST│ FLD ST ╔═════╗◄─┘───────────────────────►╔═════╗ ST(0)║ 2.5 ║ X_ARRAY(19) ST(0)║ 2.5 ║ X_ARRAY(19) ╟─────╢ ╟─────╢ ST(1)║ 0.0 ║ SUM_SQUARES ST(1)║ 2.5 ║ X_ARRAY(19) ╟─────╢ ╟─────╢ ST(2)║ 0.0 ║ SUM_INDEXES ST(2)║ 0.0 ║ SUM_SQUARES ╟─────╢ ╟─────╢ ST(3)║ 2.5 ║ SUM_X ST(3)║ 0.0 ║ SUM_INDEXES ╚═════╝ ╟─────╢ ST(4)║ 2.5 ║ SUM_X ┌────────────────────────╚═════╝ │ FMUL ST,ST│ FADDP ST(2),ST ╔═════╗◄─┘───────────────────────►╔═════╗ ST(0)║ 6.25║ X_ARRAY(19)^(2) ST(0)║ 2.5 ║ X_ARRAY(19) ╟─────╢ ╟─────╢ ST(1)║ 2.5 ║ X_ARRAY(19) ST(1)║ 6.25║ SUM_SQUARES ╟─────╢ ╟─────╢ ST(2)║ 0.0 ║ SUM_SQUARES ST(2)║ 0.0 ║ SUM_INDEXES ╟─────╢ ╟─────╢ ST(3)║ 0.0 ║ SUM_INDEXES ST(3)║ 2.5 ║ SUM_X ╟─────╢ ┌───────────╚═════╝ ST(4)║ 2.5 ║ SUM_X │ ╚═════╝ │ ┌───────────┘ FIMUL N_OF_X│ FADDP ST(2),ST ╔═════╗◄──┘──────────────────────►╔═════╗ ST(0)║ 50.0║ X_ARRAY(19)*20 ST(0)║ 6.25║ SUM_SQUARES ╟─────╢ ╟─────╢ ST(1)║ 6.25║ SUM_SQUARES ST(1)║ 50.0║ SUM_INDEXES ╟─────╢ ╟─────╢ ST(2)║ 0.0 ║ SUM_INDEXES ST(2)║ 2.5 ║ SUM_X ╟─────╢ ╚═════╝ ST(3)║ 2.5 ║ SUM_X ╚═════╝ 80287 Emulation The programming of applications to execute on both 80286 and 80287 is made much easier by the existence of an 80287 emulator for 80286 systems. The Intel E80287 emulator offers a complete software counterpart to the 80287 hardware; NPX instructions can be simply emulated in software rather than being executed in hardware. With software emulation, the distinction between 80286 and 80287 systems is reduced to a simple performance differential (see Table 1-2 for a performance comparison between an actual 80287 and an emulator 80287). Identical numeric programs will simply execute more slowly on 80286 systems (using software emulation of NPX instructions) than on executing NPX instructions directly. When incorporated into the systems software, the emulation of NPX instructions on the 80286 systems is completely transparent to the programmer. Applications software needs no special libraries, linking, or other activity to allow it to run on an 80286 with 80287 emulation. To the applications programmer, the development of programs for 80286 systems is the same whether the 80287 NPX hardware is available or not. The full 80287 instruction set is available for use, with NPX instructions being either emulated or executed directly. Applications programmers need not be concerned with the hardware configuration of the computer systems on which their applications will eventually run. For systems programmers, details relating to 80287 emulators are described in a later section of this supplement. An E80287 software emulator for 80286 systems is contained in the iMDX 364 8086 Software Toolbox, available from Intel and described in the 8086 Software Toolbox Manual. Concurrent Processing with the 80287 Because the 80286 CPU and the 80287 NPX have separate execution units, it is possible for the NPX to execute numeric instructions in parallel with instructions executed by the CPU. This simultaneous execution of different instructions is called concurrency. No special programming techniques are required to gain the advantages of concurrent execution; numeric instructions for the NPX are simply placed in line with the instructions for the CPU. CPU and numeric instructions are initiated in the same order as they are encountered by the CPU in its instruction stream. However, because numeric operations performed by the NPX generally require more time than operations performed by the CPU, the CPU can often execute several of its instructions before the NPX completes a numeric instruction previously initiated. This concurrency offers obvious advantages in terms of execution performance, but concurrency also imposes several rules that must be observed in order to assure proper synchronization of the 80286 CPU and 80287 NPX. All Intel high-level languages automatically provide for and manage concurrency in the NPX. Assembly-language programmers, however, must understand and manage some areas of concurrency in exchange for the flexibility and performance of programming in assembly language. This section is for the assembly-language programmer or well-informed high-level-language programmer. Managing Concurrency Concurrent execution of the host and 80287 is easy to establish and maintain. The activities of numeric programs can be split into two major areas: program control and arithmetic. The program control part performs activities such as deciding what functions to perform, calculating addresses of numeric operands, and loop control. The arithmetic part simply adds, subtracts, multiplies, and performs other operations on the numeric operands. The NPX and host are designed to handle these two parts separately and efficiently. Managing concurrency is necessary because both the arithmetic and control areas must converge to a well-defined state before starting another numeric operation. A well-defined state means all previous arithmetic and control operations are complete and valid. Normally, the host waits for the 80287 to finish the current numeric operation before starting another. This waiting is called synchronization. Managing concurrent execution of the 80287 involves three types of synchronization: 1. Instruction synchronization 2. Data synchronization 3. Error synchronization For programmers in higher-level languages, all three types of synchronization are automatically provided by the appropriate compiler. For assembly-language programmers, instruction synchronization is guaranteed by the NPX interface, but data and error synchronization are the responsibility of the assembly-language programmer. Instruction Synchronization Instruction synchronization is required because the 80287 can perform only one numeric operation at a time. Before any numeric operation is started, the 80287 must have completed all activity from its previous instruction. Instruction synchronization is guaranteed for most ESC instructions because the 80286 automatically checks the BUSY status line from the 80287 before commencing execution of most ESC instructions. No explicit WAIT instructions are necessary to ensure proper instruction synchronization. Data Synchronization Data synchronization addresses the issue of both the CPU and the NPX referencing the same memory values within a given block of code. Synchronization ensures that these two processors access the memory operands in the proper sequence, just as they would be accessed by a single processor with no concurrency. Data synchronization is not a concern when the CPU and NPX are using different memory operands during the course of one numeric instruction. The two cases where data synchronization might be a concern are 1. The 80286 CPU reads or alters a memory operand first, then invokes the 80287 to load or alter the same operand. 2. The 80287 is invoked to load or alter a memory operand, after which the 80286 CPU reads or alters the same location. Due to the instruction synchronization of the NPX interface, data synchronization is automatically provided for the first case──the 80286 will always complete its operation before invoking the 80287. For the second case, data synchronization is not always automatic. In general, there is no guarantee that the 80287 will have finished its processing and accessed the memory operand before the 80286 accesses the same location. Figure 2-9 shows examples of the two possible cases of the CPU and NPX sharing a memory value. In the examples of the first case, the CPU will finish with the operand before the 80287 can reference it. The NPX interface guarantees this. In the examples of the second case, the CPU must wait for the 80287 to finish with the memory operand before proceeding to reuse it. The FWAIT instructions shown in these examples are required in order to ensure this data synchronization. There are several NPX control instructions where automatic data synchronization is provided; however, the FSTSW/FNSTSW, FSTCW/FNSTCW, FLDCW, FRSTOR, and FLDENV instructions are all guaranteed to finish their execution before the CPU can read or alter the referenced memory locations. The 80287 provides data synchronization for these instructions by making a request on the Processor Extension Data Channel before the CPU executes its next instruction. Since the NPX data transfers occur before the CPU regains control of the local bus, the CPU cannot change a memory value before the NPX has had a chance to reference it. In the case of the FSTSW AX instruction, the 80286 AX register is explicitly updated before the CPU continues execution of the next instruction. For the numeric instructions not listed above, the assembly-language programmer must remain aware of synchronization and recognize cases requiring explicit data synchronization. Data synchronization can be provided either by programming an explicit FWAIT instruction, or by initiating a subsequent numeric instruction before accessing the operands or results of a previous instruction. After the subsequent numeric instruction has started execution, all memory references in earlier numeric instructions are complete. Reaching the next host instruction after the synchronizing numeric instruction indicates that previous numeric operands in memory are available. The data-synchronization function of any FWAIT or numeric instruction must be well-documented, as shown in figure 2-10. Otherwise, a change to the program at a later time may remove the synchronizing numeric instruction and cause program failure. High-level languages automatically establish data synchronization and manage it, but there may be applications where a high-level language may not be appropriate. For assembly-language programmers, automatic data synchronization can be obtained using the assembler, although concurrency of execution is lost as a result. To perform automatic data synchronization, the assembler can be changed to always place a WAIT instruction after the ESCAPE instruction. Figure 2-11 shows an example of how to change the ASM286 Code Macro for the FIST instruction to automatically place a WAIT instruction after the ESCAPE instruction. This Code Macro is included in the ASM286 source module. The price paid for this automatic data synchronization is the lack of any possible concurrency between the CPU and NPX. Figure 2-9. Synchronizing References to Shared Data Case 1: Case 2: MOV I, 1 FILD I FILD I FWAIT MOV I, 5 MOV AX, I FISTP I FISTP I FWAIT MOV AX, I Figure 2-10. Documenting Data Synchronization FISTP I FMUL ; I is updated before FMUL is executed MOV AX, I ; I is now safe to use Figure 2-11. Nonconcurrent FIST Instruction Code Macro ; ; This is an ASM286 code macro to redefine the FIST ; instruction to prevent any concurrency ; while the instruction runs. A wait ; instruction is placed immediately after the ; escape to ensure the store is done ; before the program may continue. CodeMacro FIST memop: Mw RfixM 111B, memop ModRM 010B, memop RWfix EndM Error Synchronization Almost any numeric instruction can, under the wrong circumstances, produce a numeric error. Concurrent execution of the CPU and NPX requires synchronization for these errors just as it does for data references and numeric instructions. In fact, the synchronization required for data and instructions automatically provides error synchronization. However, incorrect data or instruction synchronization may not be discovered until a numeric error occurs. A further complication is that a programmer may not expect his numeric program to cause numeric errors, but in some systems, they may regularly happen. To better understand these points, let's look at what can happen when the NPX detects an error. The NPX can perform one of two things when a numeric exception occurs: ■ The NPX can provide a default fix-up for selected numeric errors. Programs can mask individual error types to indicate that the NPX should generate a safe, reasonable result whenever that error occurs. The default error fix-up activity is treated by the NPX as part of the instruction causing the error; no external indication of the error is given. When errors are detected, a flag is set in the numeric status register, but no information regarding where or when is available. If the NPX performs its default action for all errors, then error synchronization is never exercised. This is no reason to ignore error synchronization, however. ■ As an alternative to the NPX default fix-up of numeric errors, the 80286 CPU can be notified whenever an exception occurs. The CPU can then implement any sort of recovery procedures desired, for any numeric error detectable by the NPX. When a numeric error is unmasked and the error occurs, the NPX stops further execution of the numeric instruction and signals this event to the CPU. On the next occurrence of an ESC or WAIT instruction, the CPU traps to a software exception handler. Some ESC instructions do not check for errors. These are the nonwaited forms FNINIT, FNSTENV, FNSAVE, FNSTSW, FNSTCW, and FNCLEX. When the NPX signals an unmasked exception condition, it is requesting help. The fact that the error was unmasked indicates that further numeric program execution under the arithmetic and programming rules of the NPX is unreasonable. If concurrent execution is allowed, the state of the CPU when it recognizes the exception is undefined. The CPU may have changed many of its internal registers and be executing a totally different program by the time the exception occurs. To handle this situation, the NPX has special registers updated at the start of each numeric instruction to describe the state of the numeric program when the failed instruction was attempted. Error synchronization ensures that the NPX is in a well-defined state after an unmasked numeric error occurs. Without a well-defined state, it would be impossible for exception recovery routines to figure out why the numeric error occurred, or to recover successfully from the error. Incorrect Error Synchronization An example of how some instructions written without error synchronization will work initially, but fail when moved into a new environment is shown in figure 2-12. In figure 2-12, three instructions are shown to load an integer, calculate its square root, then increment the integer. The NPX interface and synchronous execution of the NPX emulator will allow this program to execute correctly when no errors occur on the FILD instruction. This situation changes if the 80287 numeric register stack is extended to memory. To extend the NPX stack to memory, the invalid error is unmasked. A push to a full register or pop from an empty register will cause an invalid error. The recovery routine for the error must recognize this situation, fix up the stack, then perform the original operation. The recovery routine will not work correctly in the first example shown in the figure. The problem is that the value of COUNT is incremented before the NPX can signal the exception to the CPU. Because COUNT is incremented before the exception handler is invoked, the recovery routine will load an incorrect value of COUNT, causing the program to fail or behave unreliably. Figure 2-12. Error Synchronization Examples INCORRECT ERROR SYNCHRONIZATION FILD COUNT ; NPX instruction INC COUNT ; CPU instruction alters operand FSQRT COUNT ; subsequent NPX instruction -- error from ; previous NPX instruction detected here PROPER ERROR SYNCHRONIZATION FILD COUNT ; NPX instruction FSQRT COUNT ; subsequent NPX instruction - error from ; previous NPX instruction detected here INC COUNT ; CPU instruction alters operand Proper Error Synchronization Error Synchronization relies on the WAIT instructions required by instruction and data synchronization and the BUSY and ERROR signals of the 80287. When an unmasked error occurs in the 80287, it asserts the ERROR signal, signalling to the CPU that a numeric error has occurred. The next time the CPU encounters an error-checking ESC or WAIT instruction, the CPU acknowledges the ERROR signal by trapping automatically to Interrupt #16, the Processor Extension Error vector. If the following ESC or WAIT instruction is properly placed, the CPU will not yet have disturbed any information vital to recovery from the error. Chapter 3 System-Level Numeric Programming ─────────────────────────────────────────────────────────────────────────── System programming for 80287 systems requires a more detailed understanding of the 80287 NPX than does application programming. Such things as emulation, initialization, exception handling, and data and error synchronization are all the responsibility of the systems programmer. These topics are covered in detail in the sections that follow. 80287 Architecture On a software level, the 80287 NPX appears as an extension of the 80286 CPU. On the hardware level, however, the mechanisms by which the 80286 and 80287 interact are a bit more complex. This section describes how the 80287 NPX and 80286 CPU interact and points out features of this interaction that are of interest to systems programmers. Processor Extension Data Channel All transfers of operands between the 80287 and system memory are performed by the 80286's internal Processor Extension Data Channel. This independent, DMA-like data channel permits all operand transfers of the 80287 to come under the supervision of the 80286 memory-management and protection mechanisms. The operation of this data channel is completely transparent to software. Because the 80286 actually performs all transfers between the 80287 and memory, no additional bus drivers, controllers, or other components are necessary to interface the 80287 NPX to the local bus. Any memory accessible to the 80286 CPU is accessible by the 80287. The Processor Extension Data Channel is described in more detail in Chapter Six of the 80286 Hardware Reference Manual. Real-Address Mode and Protected Virtual-Address Mode Like the 80286 CPU, the 80287 NPX can operate in both Real-Address mode and in Protected mode. Following a hardware RESET, the 80287 is initially activated in Real-Address mode. A single, privileged instruction (FSETPM) is necessary to set the 80287 into Protected mode. As an extension to the 80286 CPU, the 80287 can access any memory location accessible by the task currently executing on the 80286. When operating in Protected mode, all memory references by the 80287 are automatically verified by the 80286's memory management and protection mechanisms as for any other memory references by the currently-executing task. Protection violations associated with NPX instructions automatically cause the 80286 to trap to an appropriate exception handler. To the programmer, these two 80287 operating modes differ only in the manner in which the NPX instruction and data pointers are represented in memory following an FSAVE or FSTENV instruction. When the 80287 operates in Protected mode, its NPX instruction and data pointers are each represented in memory as a 16-bit segment selector and a 16-bit offset. When the 80287 operates in Real-Address mode, these same instruction and data pointers are represented simply as the 20-bit physical addresses of the operands in question (see figure 1-7 in Chapter One). Dedicated and Reserved I/O Locations The 80287 NPX does not require that any memory addresses be set aside for special purposes. The 80287 does make use of I/O port addresses in the range 00F8H through 00FFH, although these I/O operations are completely transparent to the 80286 software. 80286 programs must not reference these reserved I/O addresses directly. To prevent any accidental misuse or other tampering with numeric instructions in the 80287, the 80286's I/O Privilege Level (IOPL) should be used in multiuser reprogrammable environments to restrict application program access to the I/O address space and so guarantee the integrity of 80287 computations. Chapter Eight of the 80286 Operating System Writer's Guide contains more details regarding the use of the I/O Privilege Level. Processor Initialization and Control One of the principal responsibilities of systems software is the initialization, monitoring, and control of the hardware and software resources of the system, including the 80287 NPX. In this section, issues related to system initialization and control are described, including recognition of the NPX, emulation of the 80287 NPX in software if the hardware is not available, and the handling of exceptions that may occur during the execution of the 80287. System Initialization During initialization of an 80286 system, systems software must ■ Recognize the presence or absence of the NPX ■ Set flags in the 80286 MSW to reflect the state of the numeric environment If an 80287 NPX is present in the system, the NPX must be ■ Initialized ■ Switched into Protected mode (if desired) All of these activities can be quickly and easily performed as part of the overall system initialization. Recognizing the 80287 NPX Figure 3-1 shows an example of a recognition routine that determines whether an NPX is present, and distinguishes between the 80387 and the 8087/80287. This routine can be executed on any 80386, 80286, or 8086 hardware configuration that has an NPX socket. The example guards against the possibility of accidentally reading an expected value from a floating data bus when no NPX is present. Data read from a floating bus is undefined. By expecting to read a specific bit pattern from the NPX, the routine protects itself from the indeterminate state of the bus. The example also avoids depending on any values in reserved bits, thereby maintaining compatibility with future numerics coprocessors. Figure 3-1. Software Routine to Recognize the 80287 ; The following algorithm detects the presence of the 8087 as well as the ; 80287 in a system. This will make it easier for ISVs to port their 8086-87 ; software to 286-287 systems. ; cc_cr equ 0DH ; carriage return cc_lf equ 0AH ; line feed assume cz:code, ds:data ; code segment public start: mov ax,data ; set data segment mov ds,ax ; ; Test if 8087 is present in PC or PC/XT, or 80287 is in PC/AT ; fninit ; initialize coprocessor xor ah,ah ; zero ah register and memory byte mov byte ptr control + 1,ah fnstcw control ; store coprocessor's control word in ; memory mov ah,byte ptr control+1 cmp ah,03h ; upper byte of control work will be ; 03 if 8087 or 80287 coprocessor ; is present jne no_coproc ; coproc: mov ah,09h ; print string-coprocessor present mov dx,offset msg_yes int 21h jmp done ; no_coproc: mov ah,09h ; print string-coprocessor not ; present mov dx,offset msg_no int 21h ; done: mov ah,4CH ; terminate program int 21h code ends data segment public control dw 00 msg_yes db cc_cr,cc_lf, db 'System has an 8087 or 80287',cc_cr, cc_lf, '$' msg_no db cc_cr,cc_lf, db 'System does not have an 8087 or 80287',cc_cr, cc_lf, '$' data ends end start ; start is the entry point Configuring the Numerics Environment Once the 80286 CPU has determined the presence or absence of the 80287 NPX, the 80286 must set either the MP or the EM bit in its own machine status word accordingly. The initialization routine can either ■ Set the MP bit in the 80286 MSW to allow numeric instructions to be executed directly by the 80287 NPX component ■ Set the EM bit in the 80286 MSW to permit software emulation of the 80287 numeric instructions The Math Present (MP) flag of the 80286 machine status word indicates to the CPU whether an 80287 NPX is physically available in the system. The MP flag controls the function of the WAIT instruction. When executing a WAIT instruction, the 80286 tests only the Task Switched (TS) bit if MP is set; if it finds TS set under these conditions, the CPU traps to exception #7. The Emulation Mode (EM) bit of the 80286 machine status word indicates to the CPU whether NPX functions are to be emulated. If the CPU finds EM set when it executes an ESC instruction, program control is automatically trapped to exception #7, giving the exception handler the opportunity to emulate the functions of an 80287. The 80286 EM flag can be changed only by using the LMSW (load machine status word) instruction (legal only at privilege level 0) and examined with the aid of the SMSW (store machine status word) instruction (legal at any privilege level). The EM bit also controls the function of the WAIT instruction. If the CPU finds EM set while executing a WAIT, the CPU does not check the ERROR pin for an error indication. For correct 80286 operation, the EM bit must never be set concurrently with MP. The EM and MP bits of the 80286 are described in more detail in the 80286 Operating System Writer's Guide. More information on software emulation for the 80287 NPX is described in the "80287 Emulation" section later in this chapter. In any case, if ESC instructions are to be executed, either the MP or EM bit must be set, but not both. Initializing the 80287 Initializing the 80287 NPX simply means placing the NPX in a known state unaffected by any activity performed earlier. The example software routine to recognize the 80287 (figure 3-1) performed this initialization using a single FNINIT instruction. This instruction causes the NPX to be initialized in the same way as that caused by the hardware RESET signal to the 80287. All the error masks are set, all registers are tagged empty, the ST is set to zero, and default rounding, precision, and infinity controls are set. Table 3-1 shows the state of the 80287 NPX following initialization. Following a hardware RESET signal, such as after initial power-up, the 80287 is initialized in Real-Address mode. Once the 80287 has been switched to Protected mode (using the FSETPM instruction), only another hardware RESET can switch the 80287 back to Real-Address mode. The FNINIT instruction does not switch the operating state of the 80287. 80287 Emulation If it is determined that no 80287 NPX is available in the system, systems software may decide to emulate ESC instructions in software. This emulation is easily supported by the 80286 hardware, because the 80286 can be configured to trap to a software emulation routine whenever it encounters an ESC instruction in its instruction stream. As described previously, whenever the 80286 CPU encounters an ESC instruction, and its MP and EM status bits are set appropriately (MP = 0, EM = 1), the 80286 will automatically trap to interrupt #7, the Processor Extension Not Available exception. The return link stored on the stack points to the first byte of the ESC instruction, including the prefix byte(s), if any. The exception handler can use this return link to examine the ESC instruction and proceed to emulate the numeric instruction in software. The emulator must step the return pointer so that, upon return from the exception handler, execution can resume at the first instruction following the ESC instruction. To an application program, execution on an 80286 system with 80287 emulation is almost indistinguishable from execution on an 80287 system, except for the difference in execution speeds. There are several important considerations when using emulation on an 80286 system: ■ When operating in Protected-Address mode, numeric applications using the emulator must be executed in execute-readable code segments. Numeric software cannot be emulated if it is executed in execute-only code segments. This is because the emulator must be able to examine the particular numeric instruction that caused the Emulation trap. ■ Only privileged tasks can place the 80286 in emulation mode. The instructions necessary to place the 80286 in Emulation mode are privileged instructions, and are not typically accessible to an application. An emulator package (E80287) that runs on 80286 systems is available from Intel in the 8086 Software Toolbox, Order Number 122203. This emulation package operates in both Real and Protected mode, providing a complete functional equivalent for the 80287 emulated in software. When using the E80287 emulator, writers of numeric exception handlers should be aware of one slight difference between the emulated 80287 and the 80287 hardware: ■ On the 80287 hardware, exception handlers are invoked by the 80286 at the first WAIT or ESC instruction following the instruction causing the exception. The return link, stored on the 80286 stack, points to this second WAIT or ESC instruction where execution will resume following a return from the exception handler. ■ Using the E80287 emulator, numeric exception handlers are invoked from within the emulator itself. The return link stored on the stack when the exception handler is invoked will therefore point back to the E80287 emulator, rather than to the program code actually being executed (emulated). An IRET return from the exception handler returns to the emulator, which then returns immediately to the emulated program. This added layer of indirection should not cause confusion, however, because the instruction causing the exception can always be identified from the 80287's instruction and data pointers. Table 3-1. NPX Processor State Following Initialization Field Value Interpretation Control Word Infinity Control 0 Projective Rounding Control 00 Round to nearest Precision Control 11 64 bits Interrupt-Enable Mask 1 Interrupts disabled Exception Masks 111111 All exceptions masked Status Word Busy 0 Not busy Condition Code ???? (Indeterminate) Stack Top 000 Empty stack Interrupt Request 0 No interrupt Exception Flags 000000 No exceptions Tag Word Tags 11 Empty Registers N.C. Not changed Exception Pointers Instruction Code N.C. Not changed Instruction Address N.C. Not changed Operand Address N.C. Not changed Handling Numeric Processing Exceptions Once the 80287 has been initialized and normal execution of applications has been commenced, the 80287 NPX may occasionally require attention in order to recover from numeric processing errors. This section provides details for writing software exception handlers for numeric exceptions. Numeric processing exceptions have already been introduced in previous sections of this manual. As discussed previously, the 80287 NPX can take one of two actions when it recognizes a numeric exception: ■ If the exception is masked, the NPX will automatically perform its own masked exception response, correcting the exception condition according to fixed rules, and then continuing with its instruction execution. ■ If the exception is unmasked, the NPX signals the exception to the 80286 CPU using the ERROR status line between the two processors. Each time the 80286 encounters an ESC or WAIT instruction in its instruction stream, the CPU checks the condition of this ERROR status line. If ERROR is active, the CPU automatically traps to Interrupt vector #16, the Processor Extension Error trap. Interrupt vector #16 typically points to a software exception handler, which may or may not be a part of systems software. This exception handler takes the form of an 80286 interrupt procedure. When handling numeric errors, the CPU has two responsibilities: ■ The CPU must not disturb the numeric context when an error is detected. ■ The CPU must clear the error and attempt recovery from the error. Although the manner in which programmers may treat these responsibilities varies from one implementation to the next, most exception handlers will include these basic steps: ■ Store the NPX environment (control, status, and tag words, operand and instruction pointers) as it existed at the time of the exception. ■ Clear the exception bits in the status word. ■ Enable interrupts on the CPU. ■ Identify the exception by examining the status and control words in the save environment. ■ Take some system-dependent action to rectify the exception. ■ Return to the interrupted program and resume normal execution. It should be noted that the NPX exception pointers contained in the stored NPX environment will take different forms, depending on whether the NPX is operating in Real-Address mode or in Protected mode. The earlier discussion of Real versus Protected mode details how this information is presented in each of the two operating modes. Simultaneous Exception Response In cases where multiple exceptions arise simultaneously, the 80287 signals one exception according to the precedence sequence shown in table 3-2. This means, for example, that zero divided by zero will result in an invalid operation, and not a zero divide exception. Exception Recovery Examples Recovery routines for NPX exceptions can take a variety of forms. They can change the arithmetic and programming rules of the NPX. These changes may redefine the default fix-up for an error, change the appearance of the NPX to the programmer, or change how arithmetic is defined on the NPX. A change to an error response might be to automatically normalize all denormals loaded from memory. A change in appearance might be extending the register stack into memory to provide an "infinite" number of numeric registers. The arithmetic of the NPX can be changed to automatically extend the precision and range of variables when exceeded. All these functions can be implemented on the NPX via numeric errors and associated recovery routines in a manner transparent to the application programmer. Some other possible system-dependent actions, mentioned previously, may include: ■ Incrementing an exception counter for later display or printing ■ Printing or displaying diagnostic information (e.g., the 80287 environment and registers) ■ Aborting further execution ■ Storing a diagnostic value (a NaN) in the result and continuing with the computation Notice that an exception may or may not constitute an error, depending on the implementation. Once the exception handler corrects the error condition causing the exception, the floating-point instruction that caused the exception can be restarted, if appropriate. This cannot be accomplished using the IRET instruction, however, because the trap occurs at the ESC or WAIT instruction following the offending ESC instruction. The exception handler must obtain from the NPX the address of the offending instruction in the task that initiated it, make a copy of it, execute the copy in the context of the offending task, and then return via IRET to the current CPU instruction stream. In order to correct the condition causing the numeric exception, exception handlers must recognize the precise state of the NPX at the time the exception handler was invoked, and be able to reconstruct the state of the NPX when the exception initially occurred. To reconstruct the state of the NPX, programmers must understand when, during the execution of an NPX instruction, exceptions are actually recognized. Invalid operation, zero divide, and denormalized exceptions are detected before an operation begins, whereas overflow, underflow, and precision exceptions are not raised until a true result has been computed. When a before exception is detected, the NPX register stack and memory have not yet been updated, and appear as if the offending instructions has not been executed. When an after exception is detected, the register stack and memory appear as if the instruction has run to completion; i.e., they may be updated. (However, in a store or store-and-pop operation, unmasked over/underflow is handled like a before exception; memory is not updated and the stack is not popped.) The programming examples contained in Chapter Four include an outline of several exception handlers to process numeric exceptions for the 80287. Table 3-2. Precedence of NPX Exceptions Signaled First: Denormalized operand (if unmasked) Invalid operation Zero divide Denormalized (if masked) Over/Underflow Signaled Last: Precision Chapter 4 Numeric Programming Examples ─────────────────────────────────────────────────────────────────────────── The following sections contain examples of numeric programs for the 80287 NPX written in ASM286. These examples are intended to illustrate some of the techniques for programming the 80287 computing system for numeric applications. Conditional Branching Examples As discussed in Chapter Two, several numeric instructions post their results to the condition code bits of the 80287 status word. Although there are many ways to implement conditional branching following a comparison, the basic approach is as follows: ■ Execute the comparison. ■ Store the status word. (80287 allows storing status directly into AX register.) ■ Inspect the condition code bits. ■ Jump on the result. Figure 4-1 is a code fragment that illustrates how two memory-resident long real numbers might be compared (similar code could be used with the FTST instruction). The numbers are called A and B, and the comparison is A to B. The comparison itself requires loading A onto the top of the 80287 register stack and then comparing it to B, while popping the stack with the same instruction. The status word is then written into the 80286 AX register. A and B have four possible orderings, and bits C3, C2, and C0 of the condition code indicate which ordering holds. These bits are positioned in the upper byte of the NPX status word so as to correspond to the CPU's zero, parity, and carry flags (ZF, PF, and CF), when the byte is written into the flags. The code fragment sets ZF, PF, and CF of the CPU status word to the values of C3, C2, and C0 of the NPX status word, and then uses the CPU conditional jump instructions to test the flags. The resulting code is extremely compact, requiring only seven instructions. The FXAM instruction updates all four condition code bits. Figure 4-2 shows how a jump table can be used to determine the characteristics of the value examined. The jump table (FXAM_TBL) is initialized to contain the 16-bit displacement of 16 labels, one for each possible condition code setting. Note that four of the table entries contain the same value, because four condition code settings correspond to "empty." The program fragment performs the FXAM and stores the status word. It then manipulates the condition code bits to finally produce a number in register BX that equals the condition code times 2. This involves zeroing the unused bits in the byte that contains the code, shifting C3 to the right so that it is adjacent to C2, and then shifting the code to multiply it by 2. The resulting value is used as an index that selects one of the displacements from FXAM_TBL (the multiplication of the condition code is required because of the 2-byte length of each value in FXAM_TBL). The unconditional JMP instruction effectively vectors through the jump table to the labelled routine that contains code (not shown in the example) to process each possible result of the FXAM instruction. Figure 4-1. Conditional Branching for Compares . . . A DQ ? B DQ ? . . . FLD A ; LOAD A ONTO TOP OF 287 STACK FCOMP B ; COMPARE A:B, POP A FSTSW AX ; STORE RESULT TO CPU AX REGISTER ; ; CPU AX REGISTER CONTAINS CONDITION CODES (RESULTS OF ; COMPARE) ; LOAD CONDITION CODES INTO CPU FLAGS SAHF ; ; USE CONDITIONAL JUMPS TO DETERMINE ORDERING OF A TO B ; JP A_B_UNORDERED ; TEST C2 (PF) JB A_LESS ; TEST C0 (CF) JE A_EQUAL ; TEST C3 (ZF) A_GREATER: ; C0 (CF) = 0, C3 (ZF) = 0 . . A_EQUAL: ; C0 (CF) = 0, C3 (ZF) = 1 . . A_LESS: ; C0 (CF) = 1, C3 (ZF) = 0 . . A_B_UNORDERED: ; C2 (PF) = 1 . . Figure 4-2. Conditional Branching for FXAM ; JUMP TABLE FOR EXAMINE ROUTINE ; FXAM_TBL DW POS_UNNORM, POS_NAN, NEG_UNNORM, NEG_NAN, & POS_NORM, POS_INFINITY, NEG_NORM, & NEG_INFINITY, POS_ZERO, EMPTY, NEG_ZERO, & EMPTY, POS_DENORM, EMPTY, NEG_DENORM, EMPTY . . ; EXAMINE ST AND STORE RESULT (CONDITION CODES) FXAM FSTSW AX ; ; CALCULATE OFFSET INTO JUMP TABLE MOV BH,0 ; CLEAR UPPER HALF OF BX, MOV BL,AH ; LOAD CONDITION CODE INTO BL AND BL,00000111B ; CLEAR ALL BITS EXCEPT C2-C0 AND AH,01000000B ; CLEAR ALL BITS EXCEPT C3 SHR AH,2 ; SHIFT C3 TWO PLACES RIGHT SAL BX,1 ; SHIFT C2-C0 1 PLACE LEFT (MULTIPLY ; BY 2) OR BL,AH ; DROP C3 BACK IN ADJACENT TO C2 ; (000XXXX0) ; ; JUMP TO THE ROUTINE `ADDRESSED' BY CONDITION CODE JMP FXAM_TBL[BX] ; ; HERE ARE THE JUMP TARGETS, ONE TO HANDLE ; EACH POSSIBLE RESULT OF FXAM POS_UNNORM: . POS_NAN: . NEG_UNNORM: . NEG_NAN: . POS_NORM: . POS_INFINITY: . NEG_NORM: . NEG_INFINITY: . POS_ZERO: . EMPTY: . NEG_ZERO: . POS_DENORM: . NEG_DENORM: Exception Handling Examples There are many approaches to writing exception handlers. One useful technique is to consider the exception handler procedure as consisting of "prologue," "body," and "epilogue" sections of code. (For compatibility with the 80287 emulators, this procedure should be invoked by interrupt pointer (vector) number 16.) At the beginning of the prologue, CPU interrupts have been disabled. The prologue performs all functions that must be protected from possible interruption by higher-priority sources. Typically, this will involve saving CPU registers and transferring diagnostic information from the 80287 to memory. When the critical processing has been completed, the prologue may enable CPU interrupts to allow higher-priority interrupt handlers to preempt the exception handler. The exception handler body examines the diagnostic information and makes a response that is necessarily application-dependent. This response may range from halting execution, to displaying a message, to attempting to repair the problem and proceed with normal execution. The epilogue essentially reverses the actions of the prologue, restoring the CPU and the NPX so that normal execution can be resumed. The epilogue must not load an unmasked exception flag into the 80287 or another exception will be requested immediately. Figures 4-3, 4-4 and 4-5 show the ASM286 coding of three skeleton exception handlers. They show how prologues and epilogues can be written for various situations, but provide comments indicating only where the application-dependent exception handling body should be placed. Figure 4-3 and 4-4 are very similar; their only substantial difference is their choice of instructions to save and restore the 80287. The tradeoff here is between the increased diagnostic information provided by FNSAVE and the faster execution of FNSTENV. For applications that are sensitive to interrupt latency or that do not need to examine register contents, FNSTENV reduces the duration of the "critical region," during which the CPU will not recognize another interrupt request (unless it is a nonmaskable interrupt). After the exception handler body, the epilogues prepare the CPU and the NPX to resume execution from the point of interruption (i.e., the instruction following the one that generated the unmasked exception). Notice that the exception flags in the memory image that is loaded into the 80287 are cleared to zero prior to reloading (in fact, in these examples, the entire status word image is cleared). The examples in figures 4-3 and 4-4 assume that the exception handler itself will not cause an unmasked exception. Where this is a possibility, the general approach shown in figure 4-5 can be employed. The basic technique is to save the full 80287 state and then to load a new control word in the prologue. Note that considerable care should be taken when designing an exception handler of this type to prevent the handler from being reentered endlessly. Figure 4-3. Full-State Exception Handler SAVE_ALL PROC ; ; SAVE CPU REGISTERS, ALLOCATE STACK SPACE FOR 80287 STATE IMAGE PUSH BP MOV BP,SP SUB SP,94 ; SAVE FULL 80287 STATE, WAIT FOR COMPLETION, ENABLE CPU INTERRUPTS FNSAVE [BP-94] FWAIT STI ; ; APPLICATION-DEPENDENT EXCEPTION HANDLING CODE GOES HERE ; ; CLEAR EXCEPTION FLAGS IN STATUS WORD RESTORE MODIFIED STATE IMAGE MOV BYTE PTR [BP-92], 0H FRSTOR [BP-94] ; DE-ALLOCATE STACK SPACE, RESTORE CPU REGISTERS MOV SP,BP . . POP BP ; ; RETURN TO INTERRUPTED CALCULATION IRET SAVE_ALL ENDP Figure 4-4. Reduced-Latency Exception Handler SAVE_ENVIRONMENT PROC ; ; SAVE CPU REGISTERS, ALLOCATE STACK SPACE FOR 80287 ENVIRONMENT PUSH BP . MOV BP,SP SUB SP,14 ; SAVE ENVIRONMENT, WAIT FOR COMPLETION, ENABLE CPU INTERRUPTS FNSTENV [BP-14] FWAIT STI ; ; APPLICATION EXCEPTION-HANDLING CODE GOES HERE ; ; CLEAR EXCEPTION FLAGS IN STATUS WORD RESTORE MODIFIED ; ENVIRONMENT IMAGE MOV BYTE PTR [BP-12], 0H FLDENV [BP-14] ; DE-ALLOCATE STACK SPACE, RESTORE CPU REGISTERS MOV SP,BP POP BP ; ; RETURN TO INTERRUPTED CALCULATION IRET SAVE_ENVIRONMENT ENDP Figure 4-5. Reentrant Exception Handler . . . LOCAL_CONTROL DW ? ; ASSUME INITIALIZED . . . REENTRANT PROC ; ; SAVE CPU REGISTERS, ALLOCATE STACK SPACE FOR ; 80287 STATE IMAGE PUSH BP . . . MOV BP,SP SUB SP,94 ; SAVE STATE, LOAD NEW CONTROL WORD, FOR COMPLETION, ENABLE CPU ; INTERRUPTS FNSAVE [BP-94] FLDCW LOCAL_CONTROL STI . . . ; APPLICATION EXCEPTION HANDLING CODE GOES HERE. ; AN UNMASKED EXCEPTION GENERATED HERE WILL CAUSE THE EXCEPTION ; HANDLER TO BE REENTERED. ; IF LOCAL STORAGE IS NEEDED, IT MUST BE ALLOCATED ON THE CPU STACK. . . . ; CLEAR EXCEPTION FLAGS IN STATUS WORD RESTORE MODIFIED STATE IMAGE MOV BYTE PTR [BP-92], 0H FRSTOR [BP-94] ; DE-ALLOCATE STACK SPACE, RESTORE CPU REGISTERS MOV SP,BP . . . POP BP ; RETURN TO POINT OF INTERRUPTION IRET REENTRANT ENDP Floating-Point to ASCII Conversion Examples Numeric programs must typically format their results at some point for presentation and inspection by the program user. In many cases, numeric results are formatted as ASCII strings for printing or display. This example shows how floating-point values can be converted to decimal ASCII character strings. The function shown in figure 4-6 can be invoked from PL/M-286, Pascal-286, FORTRAN-286, or ASM286 routines. Shortness, speed, and accuracy were chosen rather than providing the maximum number of significant digits possible. An attempt is made to keep integers in their own domain to avoid unnecessary conversion errors. Using the extended precision real number format, this routine achieves a worst case accuracy of three units in the 16th decimal position for a noninteger value or integers greater than 10^(18). This is double precision accuracy. With values having decimal exponents less than 100 in magnitude, the accuracy is one unit in the 17th decimal position. Higher precision can be achieved with greater care in programming, larger program size, and lower performance. Figure 4-6. Floating-Point to ASCII Conversion Routine iAPX286 MACRO ASSEMBLER 80287 Floating-Point to 18-Digit ASCII Conversion 10:12:38 09/25/83 PAGE 1 SERIES-III iAPX286 MACRO ASSEMBLER X108 ASSEMBLY OF MODULE FLOATING_TO_ASCII OBJECT MODULE PLACED IN :F3:FPASC.OBJ ASSEMBLER INVOKED BY: ASM286.86 :F3:FPASC.AP2 LOC OBJ LINE SOURCE 1 +1 $title("80287 Floating-Point to 18-Digit ASCII Conversion") 2 3 name floating_to_ascii 4 5 public floating_to_ascii 6 extrn get_power_IO near.tos_status near 7 ; 8 ; This subroutine will convert the floating point number in the 9 ; top of the 80287 stack to an ASCII string and separate power of 10 10 ; scaling value (in binary). The maximum width of the ASCII string 11 ; formed is controlled by a parameter which must be > 1. Unnormal values, 12 ; denormal values, and psuedo zeroes will be correclty converted. 13 ; A returned value will indicate how many binary bits of 14 ; precision were lost in an unnormal or denormal value. The magnitude 15 ; (in terms of binary power) of a psuedo zero will also be indicated. 16 ; Integers less than 10**18 in magnitude are accurately converted if the 17 ; destination ASCII string field is wide enough to hold all the 18 ; digits. Otherwise the value is converted to scientific notation. 19 ; 20 ; The status of the conversion is identified by the return value, 21 ; it can be: 22 ; 23 ; 0 conversion complete, string_size is defined 24 ; 1 invalid arguments 25 ; 2 exact integer conversion, string_size is defined 26 ; 3 indefinite 27 ; 4 + NAN (Not A Number) 28 ; 5 - NAN 29 ; 6 + Infinity 30 ; 7 - Infinity 31 ; 8 psuedo zero found, string_size is defined 32 ; 33 ; The PLM/286 calling convention is 34 ; 35 ;floating_to_ascii: 36 ; procedure (number,denormal ptr, string ptr, size_ptr, field_size, 37 ; power_ptr) word external. 38 ; declare (denormal_ptr, string ptr, power ptr, size_ptr) pointer, 39 ; declare field_size word, string_size basd size_ptr word, 40 ; delcare number real; 41 ; declare denormal integer based denormal_ptr, 42 ; declare power integer based power_ptr, 43 ; and floating_to_ascii, 44 ; 45 ; The floating point value is expected to be on the top of the NPX 46 ; stack. This subroutine expects 3 free entries on the NPX stack and 47 ; will pop the passed value off when done. The generated ASCII string 48 ; will have a leading character either '-' or '+' indicating the sign 49 ; of the value. The ASCII decimal digits will immediately follow. 50 ; The numeric value of the ASCII string is (ASCII STRING )*10**POWER. 51 ; If the given number was zero, the ASCII string will contain a sign 52 ; and a single zero character. The value string_size indicates the total 53 ; length of the ASCII string including the sign character. String(0) will 54 ; always hold the sign. It is possible for string_size to be less than 55 ; field_size. This occurs for zeroes or integer values. A psuedo zero 56 ; will return a special return code. The denormal count will indicate 57 ; the power of the two originally associated with the value. The power of 58 ; ten and ASCII string will be as if the value was an ordinary zero. 59 ; 60 ; The subroutine is accurate up to a maximum of 18 decimal digits for 61 ; integers. Integer values will have a decimal power of zero associated 62 ; with them. For non-integers, the result will be accurate to within 2 63 ; decimal digits of the 16th decimal place (double pracision). The 64 ; exponentiate instruction is also used for scaling the value into the 65 ; range acceptable for the BCD data type. The rounding mode in effect 66 ; on entry to the subroutine is used for the conversion. 67 ; 68 ; The following registers are not tranparent 69 ; 70 ; ax bx cx dx si di flags 71 ; 72 +1 $eject 73 ; 74 ; Define the stack layout 75 ; 0000[] 76 bp_save equ word ptr [bp] 0002[] 77 es_save equ bp_save + size bp_save 0004[] 78 return_ptr equ es_save + size es_save 0006[] 79 power_ptr equ return_ptr _ size return_ptr 0008[] 80 field_size equ power_ptr + size power_ptr 000A[] 81 size_ptr equ field_size + size field_size 000C[] 82 string_ptr equ size_ptr + size size_ptr 000E[] 83 denormal_ptr equ string_ptr + size string_ptr 84 85 parms_size equ size power_ptr + size field_size + size_ptr + 000A 86 & size string_ptr + size denormal_ptr 87 88 Define constants used 89 0012 90 BCD_DIGIIS equ 18 ; Number of digits in bcd_value 0002 91 WORD-SIZE equ 2 000A 92 BCD_SIZE equ 10 0001 93 MINUS equ 1 ; Define return values 0004 94 NAN equ 4 ; The exact values chosen here are 0006 95 INFINITY equ 6 ; important. They must correspond to 0003 96 INDEFINITE equ 3 ; the possible return values and be in 0008 97 PSUEDO_ZERO equ 8 ; the same numeric order as tested by -0002 98 INVALID equ -2 ; the program. -0004 99 ZERO equ -4 -0006 100 DENORMAL equ -6 0008 101 UNNORMAL equ -8 0000 102 NORMAL equ 0 0002 103 EXACT equ 2 104 ; 105 ; Define layout of temporary storage area 106 ; -0002[] 107 status equ word ptr [bp-WORD_SIZE] -0004[] 108 power_two equ status - WORD_SIZE -0006[] 109 power_ten equ power_two - WORD_SIZE -0010[] 110 bcd_value equ tbyte ptr power_ten - BCD_SIZE -0010[] 111 bcd_byte equ byte ptr bcd_value -0010[] 112 fraction equ bcd_value 113 114 local_size equ size status + size pwer_two + size power_ten 0010 115 & + size bcd_value 116 ---- 117 stack stackseg (local_size+6) ; Allocate stack space for locals 118 +1 $eject ---- 119 code segment or public 120 extrn power_table:qword 121 ; 122 ; Constants used by this function 123 ; 124 even ; Optimize for 16 bits 0000 0A00 125 const10 dw 10 ; Adjustment value for too big BCD 126 ; 127 ; Convert the C3,C2,C1,C0 encoding from tos_status into meaningful bit 128 ; flags and values. 129 ; 0002 F8 130 status_table db UNNORMAL, NAN, UNNORMAL + MINUS, NAN + MINUS 0003 04 0004 F9 0005 05 0006 00 131 & NORMAL, INFINITY, NORMAL + MINUS, INFINITY + MINUS 0007 06 0008 01 0009 07 000A FC 132 & ZERO, INVALID, ZERO + MINUS, INVALID 000B FE 000C FD 000D FE 000E FA 133 & DENORMAL, INVALID, DENORMAL + MINUS, INVALID 000F FE 0010 FB 0011 FE 134 0012 135 floating_to_ascii proc 136 0012 E80000 137 call tos_status ; look at status of ST(0) 0015 8BD8 138 mov bx,ax ; Get descriptor from table 0017 2E8A870200 139 mov al,status_table[bx] 001C 3CFE 140 cmp al,INVALID ; Look for empty ST(0) 001E 752B 141 jne not_empty 142 ; 143 ; ST(0) is empty! Return the status value 144 ; 0020 C20A00 145 ret parms_size 146 ; 147 ; Remove infinity from stack and exit 148 ; 0023 149 found_infinity 150 0023 DDD8 151 fstp st(0) ; OR to leave fstp running 0025 EB02 152 jmp short exit_proc 153 ; 154 ; String space is too small! Return invalid code 155 ; 0027 156 small_string 157 0027 B0FE 158 mov al,INVALID 159 0029 160 exit_proc: 161 0029 C9 162 leave ; Restore stack 002A 07 163 pop es 002B C20A00 164 ret parms_size 165 ; 166 ; ST(0) is NAN or indefinite. Store the value in memory and look 167 ; at the fraction field to separate indefinite from an ordinary NAN. 168 ; 002E 169 NAN_or_indefinite: 002E DB7EF0 170 0031 A801 171 fstp fraction ; Remove value from stack for examination 0033 9B 172 test al,MINUS ; Look at sign bit 0034 74F3 173 fwait ; Insure store is done 174 jz exit_proc ; Can't be indefinite if positive 0036 BB00C0 175 0039 2B5EF6 176 mov bx,0C000H ; Match against upper 16 bits of fraction 003C 0B5EF4 177 sub bx,word ptr fraction+6 ; Compare bits 63-4B 003F 0B5EF2 178 or bx,word ptr fraction+4 ; Bits 32-47 must be zero 0042 0B5EF0 179 or bx,word ptr fraction+2 ; Bits 31-16 must be zero 0045 75E2 180 or bx,word ptr fraction ; Bits 15-0 must be zero 181 jnz exit_proc 0047 B003 182 0049 EBDE 183 mov al,INDEFINITE ; Set return value for indefinite value 184 jmp exit_proc 185 ; 186 ; Allocate stack space for local variables and establish parameter 187 ; addressibility. 188 ; 004B 189 not_empty: 190 004B 06 191 push es ; Save working register 004C C8100000 192 enter local_size,0 ; Format stack 193 0050 8B4E08 194 mov cx,field_ize ; Check for enough string space 0053 83F902 195 cmp cx,2 0056 7CCF 196 jl sjall_string 197 005B 49 198 dec cx ; Adjust for sign character 0059 83F912 199 cmp cx,BCD_DIGITS ; See if string is too large for BCD 005C 7603 200 jbe size_ok 201 005E B91200 202 mov cx,BCD_DIGITS ; Else set maximum string size 203 0061 204 size_ok: 205 0061 3C06 206 cmp al,INFINITY ; Look for infinity 0063 7DBE 207 jge found_infinity ; Return status value for + or - inf 208 0065 3C04 209 cmp al,NAN ; Look for NAN or INDEFINITE 0067 7DC5 210 jge NAN_or_indefinite 211 ; 212 ; Set default return values and check that the number is normalized 213 ; 0069 D9E1 214 fabs ; Use positive value only 215 ; sign bit in al has true sign of value 006B 8BD0 216 mov dx,ax ; Save return value for later 006D 33C0 217 xor ax,ax ; Form 0 constant 006F 8B7E0E 218 mov di,denormal_ptr ; Zero denormal count 0072 8905 219 mov word ptr [di],ax 0074 8B5E06 220 mov bx,power_ptr ; Zero power of ten value 0077 B907 221 mov word ptr [bx],ax 0079 80FAFC 222 cmp dl,ZERO ; Test for zero 007C 732B 223 jae real_zero ; Skip power code if value is zero 224 007E 80FAFA 225 cmp dl,DENORMAL ; Look for a denormal value 008A 732C 226 jae found_denormal ; Handle it specially 227 0083 D9F4 228 fxtract ; Separate exponent from signifand 0085 80FAF8 229 cmp dl,UNNORMAL ; Test for unnormal value 0088 7240 230 jb normal_value 231 008A 80EAF8 232 sub dl,UNNORMAL-NORMAL ; Return normal status with correct sign 233 ; 234 ; Normalize the fraction, adjust the power of two in ST(1) and set 235 ; the denormal count value 236 ; 237 ; Assert 0 <= ST(0) < 1.0 238 ; 008D D9E8 239 fld1 ; Load constant to normalize fraction 240 008F 241 normalize_fraction 242 008F DCC1 243 fadd st(1),st ; Set integer bit in fraction 0091 DEE9 244 fsub ; Form normalized fraction in ST(0) 0093 D9F4 245 fxtract ; Power of two field will be negative 246 ; of denormal count 0095 D9C9 247 fxch ; Put denormal count in ST(0) 0097 DF15 248 fist word ptr [di] ; Put negative of denormal count in memory 0099 DEC2 249 faddp st(2),st ; Form correct power of two in st(1) 250 ; OK to use word ptr [di] now 009B F71D 251 neg word ptr [di] ; Form positive denormal count 009D 752B 252 jnz not_psuedo_zero 253 ; 254 ; A psuedo zero will appear as an unnormal number. When attempting 255 ; to normalize it, the resultant fraction field will be zero. Performing 256 ; an fxtract on zero will yield a zero exponent value. 257 ; 009F D9C9 258 fxch ; Put power of two value in st(0) 00A1 DF1D 259 fistp wrd ptr [di] ; Set denormal count ot power of two value. 260 ; Word ptr [di] is not used by convert 261 ; integer, OK to leave running 00A3 B0EAF8 262 sub dl,NORMAL-PSUEDO_ZERO ; Set return value saving the sign bit 00A6 E9A400 263 jmp convert_integer ; Put zero value into memory 264 ; 265 ; The number is a real zero, set the return value and setup for 266 ; conversion to BCD. 267 ; 00A9 268 real_zero 269 00A9 80EAF0 270 sub dl,ZERO-NORMAL ; Convert status to normal value 00AC E99E00 271 jmp convert_integer ; Treat the zero as an integer 272 ; 273 ; The number is a denormal. FXTRACT will not work correctly in this 274 ; case. To correctly separate the exponent and fraction, add a fixed 275 ; constant to the exponent to guarantee the rsult is not a denormal. 276 ; 00AF 277 found_denormal: 278 00AF D9E8 279 fld1 ; Prepare to bump exponent 00B1 D9C9 280 fxch 00B3 D9F8 281 fprem ; Force denormal to smallest representable 282 ; extended real format exponent 00B5 D9F4 283 fxtract ; This will work correctly now 284 ; 285 ; The power of the original enormal value has been safely isolated. 286 ; Check if the fraction value is an unnormal. 287 ; 00B7 D9E5 288 fxam ; See if the fraction is an unnormal 00B9 9BDFE0 289 fstsw ax ; Save 80287 status in CPU AX reg for later 00BC D9C9 290 fxch ; Put exponent in ST(0) 00BE D9CA 291 fxch st(2) ; Put 1.0 into ST(0), exponent in ST(2) 00C0 80EAFA 292 sub dl,DENORMAL-NORMAL ; Return normal status with correct sign 00C3 A90044 293 test ax,4400H ; See if C0=C2=0 impling unnormal or NAN 00C6 74C7 294 jz normalize_fraction ; Jump if fraction is an unnormal 295 00C8 DDD8 296 fstp st(0) ; Remove unnecessary 1.0 from st(0) 297 ; 298 ; Calculate the decimal magnitude associated with this number to 299 ; within one order. This error will always be inevitable due to 300 ; rounding and lost precision. As a result, we will deliberately fail 301 ; to consider the LOQ10 of the fraction value in calcuating the order. 302 ; Since the fraction will always be 1 <= F < 2, its LOQ10 will not change 303 ; the basic accuracy of the function. To get the decimal order of magnitude, 304 ; simply multiply the power of two by LOQ10(2) and truncate the result to 305 ; an integer. 306 ; 00CA 307 normal_value: 00CA 308 not_pseudo_zero: 309 00CA DB7EF0 310 fstp fraction ; Save the fraction field for later use 00CD DF56FC 311 fist power_two ; Save power of two 00D0 D9EC 312 fldlg2 ; Get LOQ10(2) 313 ; Power_two is now safe to use 00D2 DEC9 314 fmul ; Form LOQ10(of exponent of number) 00D4 DF5EFA 315 fistp power_ten ; Any rounding mode will work here 316 ; 317 ; Check if the magnitude of the number rules out treating it as 318 ; an integer. 319 ; 320 ; CX has the maximum number of decimal digits allowed. 321 ; 00D7 7B 322 fwait ; Wait for power_ten to be valid 00D8 3B46FA 323 mov ax,power_ten ; Get power of ten of value 00DB 2BC1 324 sub ax,cx ; Form scaling factor necessary in ax 00DD 7722 325 ja adjust_result ; Jump if number will not fit 326 ; 327 ; The number is between 1 and 10**(field_size). 328 ; Test if it is an integer. 329 ; 00DF 0F46FC 330 fild power_two ; Restore original number 00E2 8BF2 331 mov si,dx ; Save return value 00E4 80EAFE 332 sub dl,NORMAL-EXACT ; Convert to exact return value 00E7 0B6EF0 333 fld fraction 00EA 09FD 334 fscale ; Form full value, this is safe here 00EC DDD1 335 fst st(1) ; Copy value for compare 00EE 09FC 336 frndint ; Test if its an integer 00F0 08D9 337 fcomp ; Compare values 00F2 7BDD7EFE 338 fstsw status ; Save status 00F6 F746FE0040 339 test status,4000H ; C3=1 implies it was an integer 00FB 7550 340 jnz convert_integer 341 00FD DDD8 342 fstp st(0) ; Remove non integer value 00FF 8BD6 343 mov dx,si ; Restore original return value 344 ; 345 ; Scale the number to within the range allowed by the BCD format 346 ; The scaling operation should produce a number within one decimal order 347 ; of magnitude of the largest decimal number representable within the 348 ; given string width. 349 ; 350 ; The scaling power of ten value is in ax. 351 ; 0101 352 adjust_result: 353 0101 8907 354 mov word ptr [bx],ax ; Set initial power of ten return value 0103 F7D8 355 neg ax ; Substract one for each order 356 ; of magnitude the value is scaled by 0105 E80000 E 357 call get_power_10 ; Scaling factor is returned as exponent 358 ; and fraction 0108 DB6EF0 359 fld fraction ; Get fraction 010B DEC9 360 fmul ; Combine fractions. 010D 8BF1 361 mov si,cx ; Form power of ten of the maximum 010F D1E6 362 shl si,1 ; BCD value to fit in the string 0111 D1E6 363 shl si,1 ; Index in si 0113 D1E6 364 shl si,1 0115 DF46FC 365 fild power_two ; Combine powers of two 0118 DEC2 366 faddp st(2),st 011A D9FD 367 fscale ; Form full value, exponent was safe 011C DDD9 368 fstp st(1) ; Remove exponent 369 ; 370 ; Test the adjusted value against a table of exact powers of ten. 371 ; The combined errors of the magnitude estimate and power function 372 ; result in a value one order of magnitude too small or too large to fit 373 ; correctly in the BCD field. To handle this problem, pretest the 374 ; adjusted value, if it is too small or large, then adjust it by ten and 375 ; adjust the power of ten value. 376 ; 011E 377 test_power: 011E 2EDC940800 E 378 379 fcom power_table[si]_type power_table; Compare against exact power 380 ; entry. Use the next entry since cx 381 ; has been decremented by one. 0123 9BDFE0 382 fstsw ax ; No wait is necessary 0126 690041 383 test ax,4100H ; If C3 = C0 = 0 then too big 0129 750C 384 jnz text_for_small 385 012B 2EDE360000 R 386 fidiv const10 ; Else adjust value 0130 80E2FD 387 and dl,not EXACT ; Remove exact flag 0133 FF07 388 inc word ptr [bx] ; Adjust power of ten value 0135 EB14 389 jmp short in_range ; Convert the value to a BCD integer 390 0137 391 test_for_small 392 0137 2EDC940000 E 393 fcom power_table[si] ; Test relative size 013C 9BDFE0 394 fstsw ax ; No wait is necessary 013F A90001 395 test ax,100H ; If C0 = 0 then st(0) >= lower bound 0142 7407 396 jz in_range ; Convert the value to a BCD integer 397 0144 2EDE0E0000 R 398 fimul const10 ; Adjust value into range 0149 FF0F 399 dec word ptr [bx] ; Adjust power of ten value 400 014B 401 in_range: 402 014B D9FC 403 frndint ; Form integer value 404 ; 405 ; Assert: 0 <= TOS <= 999,999,999,999,999,999 406 ; The TOS number will be exactly representable in 18 digit BCD format 407 ; 014D 408 convert_integer: 409 014D DF76F0 410 fbstp bcd_value ; Store ax BCD format number 411 ; 412 ; While the store BCD runs, setup registers for the conversion to 413 ; ASCII. 414 ; 0150 BE0800 415 mov si,BCD_SIZE-2 ; Initial BCD index value 0153 B9040F 416 mov cx,0f04h ; Set shift count and mask 0156 BB0100 417 mov bx,1 ; Set initial size of ASCII field for sign 0159 8B730C 418 mov di,string_ptr ; Get address of start of ASCII string 015C BCD8 419 mov ax,ds ; Copy ds to es 015E BEC0 420 mov es,ax 0160 FC 421 cld ; Set autoincrement mode 0161 B02B 422 mov al,'+' ; Clear sign field 0163 F6C201 423 text dl,MINUS ; Look for negative value 0166 7402 424 jr positive_result 425 0168 B02D 426 mov al,'-' 427 016A 428 positive_result: 429 016A AA 430 stash ; Bump string pointer past sign 016B 809E2FE 431 and dl,not MINUS ; Turn off sign bit 016E 9B 432 fwait ; Wait for fbstp to finish 433 ; 434 ; Register usage: 435 ; ah: BCD byte value in use 436 ; al: ASCII character value 437 ; dx: Return value 438 ; ch: BCD mask = ofh 439 ; cl BCD shift count = 4 440 ; bx: ASCII string field width 441 ; si: BCD field index 442 ; di: ASCII string field pointer 443 ; ds,es: ASCII string segment base 444 ; 445 ; Remove leading zeroes from the number. 446 ; 016F 447 skip_leading_zeroes 448 016F 8A62F0 449 move ah,bcd_byte[si] ; Get BCD byte 0172 BAC4 450 move al,ah ; Copy value 0174 D2E8 451 shr al,cl ; Get high order digit 0176 22C5 452 and al,ch ; Set zero flag 0178 7516 453 jnz enter_odd ; Enter loop if leading non zero found 454 017A 8AC4 455 mov al,ah ; Get BCD byte again 017C 22C5 456 and al,ch ; Get low order digit 017E 7518 457 jnz enter_even 458 0180 4E 459 dec si ; Decrement BCD index 0181 79EC 460 jns skip_leading_zeroes 461 ; 462 ; The significand was all zeroes 463 ; 0183 B030 464 mov al,'0' ; Set initial zero 0185 AA 465 stosb 0186 43 466 inc bx ; Bump string length 0187 EB16 467 jmp short exit_with_value 468 ; 469 ; Now expand the BCD string into digit per byte values 0-9 470 ; 0189 471 digit_loop 472 0189 8A62F0 473 mov ah,bcd_byte[si] ; Get BCD byte 018C 8AC4 474 mov al,ah 018E D2E8 475 shr al,cl ; Get high order digit 476 0190 477 enter_odd 478 0190 0430 479 add al,'0' ; Convert to ASCII 0192 AA 480 stosb ; Put digit into ASCII string area 0193 8AC4 481 mov al,ah ; Get low order digit 0195 22C5 482 and al,ch 0197 43 483 inc bx ; Bump field size counter 484 0198 0430 485 enter_even 019A AA 486 019B 43 487 add al,'0' ; Convert to ASCII 019C 4E 488 stosb ; Put digit into ASCII area 019D 79EA 489 inc bx ; Bump field size counter 490 dec si ; Go to next BCD byte 491 jns digit_loop 492 ; 493 ; Conversion complete. Set the string size and remainder 494 ; 019F 495 exit_with_value: 496 019F 8B7E0A 497 move di,size_ptr 01A2 891D 498 mov word ptr [di],bx 01A4 8BC2 499 mov ax,x ; Set return value 01A6 E980FE 500 jmp exit_proc 501 502 floating_to_ascii endp ---- 503 code ends 504 end ASSEMBLY COMPLETE, NO WARNINGS, NO ERRORS iAPX286 MACRO ASSEMBLER Calculate the value of 10**ax 12:11:08 09/25/83 PAGE 1 SERIES-III iAPX286 MACRO ASSEMBLER X108 ASSEMBLY OF MODULE GET_POWER_10 OBJECT MODULE PLACED IN :F3:POW10.OBJ ASSEMBLER INVOKED BY: ASM286.86 :F3:POW10.AP2 LOC OBJ LINE SOURCE 1 +1 $title("Calculate the value of 10**ax") 2 ; 3 ; This subroutine will calculate the value of 10**ax. 4 ; For values of 0 <= ax <19, the result will exact. 5 ; All 80286 registers are transparent and the value is returned on 6 ; the TOS as two numbers, exponent in ST(1) and fraction in ST(0). 7 ; The exponent value can be larger than the largest exponent of an 8 ; extended real format number. Three stack entries are used. 9 ; 10 name get_power_10 11 12 public get_power_10,power_table 13 ---- 14 stack stackseg 8 15 ---- 16 code segment or public 17 ; 18 ; Use exact values from 1:0 to 1e18 19 ; 20 even ; Optimize 16 bit access 0000 000000000000F0 21 power_table dq 1.0,1e1,1e2,1e3 3F 0008 00000000000024 40 0010 00000000000059 40 0018 0000000000408F 40 0020 000000000088C3 22 dq 1e4,1e5,1e6,1e7 40 0028 00000000006AF8 40 0030 0000000080842E 41 0038 00000000D01263 41 0040 0000000084D797 23 dq 1e8,1e9,1e10,1e11 41 0048 0000000065CDCD 41 0050 000000205FA002 42 0058 000000E8764837 42 0060 000000A2941A6D 24 dq 1e12,1e13,1e14,1e15 42 0068 000040E59C30A2 42 0070 0000901EC4BCD6 42 0078 00003426F56B0C 43 0080 0080E03779C341 25 dq 1e16,1e17,1e18 43 0088 00A0D885573476 43 0090 00C84E676DC1AB 43 0098 26 27 get_power_10 proc 0098 3D1200 28 009B 770F 29 cmp ax,18 ; Test for 0 <= ax < 19 30 ja out_of_range 009D 53 31 009E 8BD8 32 push bx ; Get working index register 00A0 C1E303 33 mov bx,ax ; Form table index 00A3 2EDD870000 R 34 shl bx,3 00A8 5B 35 fld power_table[bx] ; Get exact value 00A9 D9F4 36 pop bx ; Restore register value 00AB C3 37 fxtract ; Separate power and fraction 38 ret ; OK to leave fxtract running 39 ; 40 ; Calculate the value using the exponentiate instruction. 41 ; The following relations are used: 42 ; 10**x = 2**(log2(10)*x) 43 ; 2**(I+F) = 2**I * 2**F 44 ; if st(1) = I and st(0) = 2**F then fscale produces 2**(I+F) 45 ; 00AC 46 out_of_range: 47 00AC D9E9 48 fld12t ; TOS = LOG2(10) 00AE C8040000 49 enter 4.0 ; Format stack 00B2 8946FE 50 mov [bp-2],ax ; Save power of 10 value 00B5 DE4EFE 51 fimul word ptr [bp-1] ; TOS, x= LOG2(10)*P = LOG2(10**P) 00B8 9BD97EFC 52 fstcw word ptr [bp-4] ; Get current control word 00BC 8B46FC 53 mov ax,word ptr [bp-4] ; Get control word, no wait necessary 00BF 25FFF3 54 and ax,not OCOOH ; Mask off current rounding field 00C2 0D0004 55 or ax,0400H ; Set round to negative infinity 00C5 6746FC 56 xchg ax,word ptr [bp-4] ; Put new control word in memory 57 ; old control word is in ax 00C8 D9E8 58 fld1 ; Set TOS = -1.0 00CA D9E0 59 fchs 00CC D9C1 60 fld st(1) ; Copy power value in base two 00CE D96EFC 61 fldcw word ptr [bp-4] ; Set new control word value 00D1 D9FC 62 frndint ; TOS = I: -inf < I <= X, I is an integer 00D3 8946FC 63 mov word ptr [bp-4],ax ; Restore original rounding control 00D6 D96EFC 64 fldcw word ptr [bp-4] 00D9 D9CA 65 fxch st(2) ; TOS = X, ST(1) = -1.0, St(2) = I 00DB DBE2 66 fsub st,st(2) ; TOS,F=X-I; 0 <= TOS < 1.0 00DD 8B46FE 67 mov ax,[bp-2] ; Restore power of ten 00E0 D9FD 68 fscale ; TOS = F/2: 0 <= TOS < 0.5 00E2 D9F0 69 f2xm1 ; TOS = 2**(F/2) - 1.0 00E4 C9 70 leave ; Restore stack 00E5 DEE1 71 fsubr ; Form 2**(F/2) 00E7 DCC8 72 fmul st,st(0) ; Form 2**F 00E9 C3 73 ret ; OK to leave fmul running 74 75 get_power_10 endp 76 ---- 77 code ends 78 end ASSEMBLY COMPLETE, NO WARNINGS, NO ERRORS iAPX286 MACRO ASSEMBLER Determine TOS register contents 12:12:13 09/25/83 PAGE 1 SERIES-III iAPX286 MACRO ASSEMBLER X108 ASSEMBLY OF MODULE TOS_STATUS OBJECT MODULE PLACED IN :F3:T0SST.OBJ ASSEMBLER INVOKED BY: ASM286.86 :F3:TOSST.AP2 LOC OBJ LINE SOURCE 1 +1 $title("Determine TOS register contents") 2 ; 3 ; This subroutine will return a value from 0-15 in AX corresponding 4 ; to the contents of 80287 TOS. All registers are transparent and no 5 ; errors are possible. The return value corresponds to c3,c2,c1,c0 6 ; of FXAM instruction. 7 ; 8 name tos_status 9 10 public tos_status 11 ---- 12 stack stackseg 6 ; Allocate space on the stack 13 ---- 14 code segment er public 15 0000 16 tos_status proc 17 0000 D9E5 18 fxam ; Get register contents status 0002 9BDFE0 19 fstsw ax ; Get status 0005 8AC4 20 mov al,ah ; Put bit 10-8 into bits 2-0 0007 250740 21 and ax,4007h ; Mask out bits c3,c2,c1,c0 000A C0EC03 22 shr ah,3 ; Put bit c3 into bit 11 000D 0AC4 23 or al,ah ; Put c3 into bit 3 000F B400 24 mov ah,0 ; Clear return value 0011 C3 25 ret 26 27 tos_status endp 28 ---- 29 code ends 30 end ASSEMBLY COMPLETE, NO WARNINGS, NO ERRORS Function Partitioning Three separate modules implement the conversion. Most of the work of the conversion is done in the module FLOATING_TO_ASCII. The other modules are provided separately, because they have a more general use. One of them, GET_POWER_10, is also used by the ASCII to floating-point conversion routine. The other small module, TOS_STATUS, will identify what, if anything, is in the top of the numeric register stack. Exception Considerations Care is taken inside the function to avoid generating exceptions. Any possible numeric value will be accepted. The only exceptions possible would occur if insufficient space exists on the numeric register stack. The value passed in the numeric stack is checked for existence, type (NaN or infinity), and status (unnormal, denormal, zero, sign). The string size is tested for a minimum and maximum value. If the top of the register stack is empty, or the string size is too small, the function will return with an error code. Overflow and underflow is avoided inside the function for very large or very small numbers. Special Instructions The functions demonstrate the operation of several numeric instructions, different data types, and precision control. Shown are instructions for automatic conversion to BCD, calculating the value of 10 raised to an integer value, establishing and maintaining concurrency, data synchronization, and use of directed rounding on the NPX. Without the extended precision data type and built-in exponential function, the double precision accuracy of this function could not be attained with the size and speed of the shown example. The function relies on the numeric BCD data type for conversion from binary floating-point to decimal. It is not difficult to unpack the BCD digits into separate ASCII decimal digits. The major work involves scaling the floating-point value to the comparatively limited range of BCD values. To print a 9-digit result requires accurately scaling the given value to an integer between 10^(8) and 10^(9). For example, the number +0.123456789 requires a scaling factor of 10^(9) to produce the value +123456789.0, which can be stored in 9 BCD digits. The scale factor must be an exact power of 10 to avoid to changing any of the printed digit values. These routines should exactly convert all values exactly representable in decimal in the field size given. Integer values that fit in the given string size will not be scaled, but directly stored into the BCD form. Noninteger values exactly representable in decimal within the string size limits will also be exactly converted. For example, 0.125 is exactly representable in binary or decimal. To convert this floating-point value to decimal, the scaling factor will be 1000, resulting in 125. When scaling a value, the function must keep track of where the decimal point lies in the final decimal value. Description of Operation Converting a floating-point number to decimal ASCII takes three major steps: identifying the magnitude of the number, scaling it for the BCD data type, and converting the BCD data type to a decimal ASCII string. Identifying the magnitude of the result requires finding the value X such that the number is represented by I*10^(X), where 1.0 <= I < 10.0. Scaling the number requires multiplying it by a scaling factor 10^(S), so that the result is an integer requiring no more decimal digits than provided for in the ASCII string. Once scaled, the numeric rounding modes and BCD conversion put the number in a form easy to convert to decimal ASCII by host software. Implementing each of these three steps requires attention to detail. To begin with, not all floating-point values have a numeric meaning. Values such as infinity, indefinite, or Not a Number (NaN) may be encountered by the conversion routine. The conversion routine should recognize these values and identify them uniquely. Special cases of numeric values also exist. Denormals, unnormals, and pseudo zero all have a numeric value but should be recognized, because all of them indicate that precision was lost during some earlier calculations. Once it has been determined that the number has a numeric value, and it is normalized setting appropriate unnormal flags, the value must be scaled to the BCD range. Scaling the Value To scale the number, its magnitude must be determined. It is sufficient to calculate the magnitude to an accuracy of 1 unit, or within a factor of 10 of the given value. After scaling the number, a check will be made to see if the result falls in the range expected. If not, the result can be adjusted one decimal order of magnitude up or down. The adjustment test after the scaling is necessary due to inevitable inaccuracies in the scaling value. Because the magnitude estimate need only be close, a fast technique is used. The magnitude is estimated by multiplying the power of 2, the unbiased floating-point exponent, associated with the number by log{10}2. Rounding the result to an integer will produce an estimate of sufficient accuracy. Ignoring the fraction value can introduce a maximum error of 0.32 in the result. Using the magnitude of the value and size of the number string, the scaling factor can be calculated. Calculating the scaling factor is the most inaccurate operation of the conversion process. The relation 10^(X)=2**(X * log{2}10) is used for this function. The exponentiate instruction (F2XM1) will be used. Due to restrictions on the range of values allowed by the F2XM1 instruction, the power of 2 value will be split into integer and fraction components. The relation 2**(I + F) = 2**I * 2**F allows using the FSCALE instruction to recombine the 2**F value, calculated through F2XM1, and the 2**I part. Inaccuracy in Scaling The inaccuracy of these operations arises because of the trailing zeros placed into the fraction value when stripping off the integer valued bits. For each integer valued bit in the power of 2 value separated from the fraction bits, one bit of precision is lost in the fraction field due to the zero fill occurring in the least significant bits. Up to 14 bits may be lost in the fraction because the largest allowed floating point exponent value is 2^(14) - 1. Avoiding Underflow and Overflow The fraction and exponent fields of the number are separated to avoid underflow and overflow in calculating the scaling values. For example, to scale 10^(4932) to 10^(8) requires a scaling factor of 10^(4950), which cannot be represented by the NPX. By separating the exponent and fraction, the scaling operation involves adding the exponents separate from multiplying the fractions. The exponent arithmetic will involve small integers, all easily represented by the NPX. Final Adjustments It is possible that the power function (Get_Power_10) could produce a scaling value such that it forms a scaled result larger than the ASCII field could allow. For example, scaling 9.9999999999999999 * 10^(4900) by 1.00000000000000010 * 10^(-4883) would produce 1.00000000000000009 * 10^(18). The scale factor is within the accuracy of the NPX and the result is within the conversion accuracy, but it cannot be represented in BCD format. This is why there is a post-scaling test on the magnitude of the result. The result can be multiplied or divided by 10, depending on whether the result was too small or too large, respectively. Output Format For maximum flexibility in output formats, the position of the decimal point is indicated by a binary integer called the power value. If the power value is zero, then the decimal point is assumed to be at the right of the rightmost digit. Power values greater than zero indicate how many trailing zeros are not shown. For each unit below zero, move the decimal point to the left in the string. The last step of the conversion is storing the result in BCD and indicating where the decimal point lies. The BCD string is then unpacked into ASCII decimal characters. The ASCII sign is set corresponding to the sign of the original value. Trigonometric Calculation Examples The 80287 instruction set does not provide a complete set of trigonometric functions that can be used directly in calculations. Rather, the basic building blocks for implementing trigonometric functions are provided by the FPTAN and FPREM instructions. The example in figure 4-7 shows how three trigonometric functions (sine, cosine, and tangent) can be implementing using the 80287. All three functions accept a valid angle argument between -2^(62) and +2^(62). These functions may be called from PL/M-286, Pascal-286, FORTRAN-286, or ASM286 routines. These trigonometric functions use the partial tangent instruction together with trigonometric identities to calculate the result. They are accurate to within 16 units of the low 4 bits of an extended precision value. The functions are coded for speed and small size, with tradeoffs available for greater accuracy. Figure 4-7. Calculating Trigonometric Functions iAPX286 MACRO ASSEMBLER 80287 Trigonometric functions 10:13:51 09/25/83 PAGE 1 SERIES-888 iAPX286 MACRO ASSEMBLER X108 ASSEMBLY OF MODULE TRIG_FUNCTIONS OBJECT MODULE PLACED IN :F3:TRIG.OBJ ASSEMBLER INVOKED BY: ASM286.86 :F3:TRIG.AP2 LOC OBJ LINE SOURCE 1 +1 $title("80287 Trigonometric Functions") 2 3 name trig_functions 4 public sine,cosine,tangent 5 ---- 6 stack stackseg 6 ; Reserve local space 7 # 8 sw_287 record res1:1,cond3:1,top:3,cond2:1,cond1:1,cond0:1, 9 res2:8 10 ---- 11 code segment er public 12 ; 13 ; Define local constants 14 ; 15 even 0000 35C26821A2DA0F 16 pi_quarter dt 3FFEC90FDAA22168C235R ; PI/4 C9FE3F 000A 0000C0FF 17 indefinite dd 0FFC000000R ; Indefinite special value 18 +1 $eject 19 ; 20 ; This subroutine calculates the sine or cosine of the angle, given in 21 ; radians. The angle is in ST(0), the returned value will be in ST(0). 22 ; The result is accurate to within 7 units of the least significant three 23 ; bits of the NPX extended real format. The PLM/86 definition is: 24 ; 25 ; sine: procedure (angle) real external, 26 ; declare angle real; 27 ; and sine; 28 ; 29 ; cosine: procedure (angle) real external; 30 ; declare angle real; 31 ; and cosine; 32 ; 33 ; Three stack registers are required. The result of the function 34 ; defined as follows for the following arguments: 35 ; 36 ; angle result 37 ; 38 ; valid or unnormal less than 2**62 in magnitude correct value 39 ; zero 0 or 1 40 ; denormal correct denormal 41 ; valid or unnormal greater than 2**62 indefinite 42 ; infinity indefinite 43 ; NAN NAN 44 ; empty empty 45 +1 $eject 46 ; 47 ; This function is based on the NPX fptan instruction. The fptan 48 ; instruction will only work with an angle of from 0 to PI/4. With this 49 ; instruction, the sine or cisone of angles from 0 to PI/4 can be accurately 50 ; calculated. The technique used by this routine can calculate a general 51 ; sine or cosine by using one of four possible operations: 52 ; 53 ; Let R = |angle mod PI/4| 54 ; S = -1 or 1, according to the sign of the angle 55 ; 56 ; 1) sin(R) 2) cos(R) 3) sin(PI/4-R) 4) cos(PI/4-R) 57 ; 58 ; The choice of the relation and the sign of the result follows 59 ; decision table shown below based on the octant the angle falls in: 60 ; 61 ; octant sine cosine 62 ; 63 ; 0 s*1 2 64 ; 1 s*4 3 65 ; 2 s*2 -1*1 66 ; 3 s*3 -1*4 67 ; 4 -s*1 -1*2 68 ; 5 -s*4 -1*3 69 ; 6 -s*2 1 70 ; 7 -s*3 4 71 ; 72 +1 $eject 73 ; 74 ; Angle to sine function is a zero or unnormal 75 ; 000E 76 sine_zero_unnormal: 77 000E DDD9 78 fstp st(1) ; Remove PI/4 0010 7501 79 jnz enter_sine_normalize ; Jump if angle is unnormal 80 ; 81 ; Angle is a zero. 82 ; 0012 C3 83 ret 84 ; 85 ; Angle is an unnormal 86 ; 0013 87 enter_sine_normalize: 88 0013 E80901 89 call normalize_value 0016 EB2F 90 jmp short enter_sine 91 0018 92 cosine proc ; Entry point to cosine 93 0018 D9E5 94 fxam ; Look at the value 00A1 9BDF30 95 fstsw ax ; Store status value 001D 2EDB2E0000 R 96 fld pi_quarter ; Setup for angle reduce 0022 B101 97 mov c1,1 ; Signal cosine function 0024 9E 98 sahf ; ZF = C3, PF = C2, CF = C0 0025 7263 99 jc funny_parameter ; Jump if parameter is 100 ; empty, NAN, or infinity 101 ; 102 ; Angle is unnormal, normal, zero, denormal. 103 ; 0027 D9C9 104 fxch ; st(0) = angle, st(1) = PI/4 0029 7A1C 105 jpe enter_sine ; Jump if normal or denormal 106 ; 107 ; Angle is an unnormal or zero 108 ; 002B DDD9 109 fstp st(1) ; Remove PI/4 002D 75E4 110 jnz enter_sine_normalize 111 ; 112 ; Angle is a zero, cos(0) = 1.0 113 ; 002F DDD8 114 fstp st(0) ; Remove 0 0031 D9E8 115 fldl ; Return 1 0033 C3 116 ret 117 ; 118 ; All work is done as a sine function. By adding PI/2 to the angle 119 ; a cosine is converted to a sine. Of course the angle addition is not 120 ; done to the argument but rather to the program logic control values. 121 ; 0034 122 sine ; Entry point for sine function 123 0034 D9E5 124 fxam ; Look at the parameter 0036 9BDFE0 125 fstsw ax ; Look at fxam status 0039 2EDB2E0000 R 126 fld pi_quarter ; Get PI/4 value 003E 9E 127 sahf ; CF = C0, PF = C2, ZF - C3 003F 7249 128 jc funny_parameter ; Jump if empty, NAN, or infinity 129 ; 130 ; Angle is unnormal, normal, zero, or denormal 131 ; 0041 D9C9 132 fxch ; ST(1) = PI/4, st(0) angle 0043 B100 133 mov c1,0 ; Signal sine 0045 7BC7 134 jpo sine_zero_unnormal ; Jump if zero or unnormal 135 ; 136 ; ST(0) is either a normal or denormal value. Both will work 137 ; Use the fprem instruction to accurately reduce the range of the given 138 ; angle to within 0 and PI/4 in magnitude. If fprem cannot reduce the 139 ; angle in one shot, the angle is too big to be meaningful, >2**62 140 ; radians. Any roundoff error in the calculation of the angle given 141 ; could completely change the result of this function. It is safest to 142 ; call this very rare case an error. 143 ; 0047 144 enter_sine 0047 D9F8 145 fprem ; Reduce angle 146 ; Note that fprem will force a 147 ; denormal to a very small unnormal 148 ; Fptan of a very small unnormal 149 ; will be the same very small 150 ; unnormal, which is correct. 0049 93 151 xchg ax,bx ; Save old status in BX 004A 9BDFE0 152 fstsw ax ; Check if reduction was complete 153 ; Quotient in C0,C3,C1 004D 93 154 xchg ax,bx ; Put new status in bx 004E F6C704 155 test bh,high(mask cond2) ; sin(2*N*PI+x) = sin(x) 0051 7544 156 jnz angle_too_big 157 ; 158 ; Set sign flags and test for which eighth of the revolution the 159 ; angle fell intl. 160 ; 161 ; Assert -PI/4 < st(0) < PI/4 162 ; 0053 D9E1 163 fabs ; Force the argument positive 164 ; cond1 bit in bx holds the sign. 0055 0AC9 165 or c1,c1 ; Test for sine or cosine function 0057 740F 166 jz sine_select ; Jump if sine function 167 ; 168 ; This is a cosine function. Ignore the original sign of the angle 169 ; and add a quarter revolution to the octant id from the fprem instruction 170 ; cos(A) = sin(A+PI/2) and cos(|A|) = cos(A) 171 ; 0059 B0E4FD 172 and ah,not high(mask cond1) ; Turn off sign of argument 005C B0CF80 173 or by,80H ; Prepare to add 010 to c0,c3,c1 174 ; status value in ax 175 ; Set busy bit so carry out from 005F 80C740 176 add bh,high(mask cond3) ; C3 will go into the carry flag 0062 B000 177 mov al,0 ; Extract carry flag 0064 D0D0 178 rcl al,1 ; Put carry flag in low bit 0066 32FB 179 xor bh,al ; Add carry to C0 not changing 180 ; C1 flag 181 ; 182 ; See if the argument should be reversed, depending on the octant in 183 ; which the argument fell during fprem. 184 ; 0068 185 sine_select: 186 0068 F6C702 187 test bh,high(mask cond1) ; Reverse angle if C1 = 1 006B 7404 188 jz no_sine_reverse 189 ; 190 ; Angle was in octants 1,3,5,7. 191 ; 006D DEE9 192 fsub ; Invert sense of rotation 006F EB0E 193 jmp short do_sine_fptan ; 0 < arg <= PI/4 194 ; 195 ; Angle was in octants 0,2,4,6 196 ; Test for a zero argument since fptan will not work if st(0) = 0 197 ; 0071 198 no_sine_reverse: 199 0071 D9E4 200 ftst ; Test for zero angle 0073 91 201 xchg ax,cx 0074 9BDFE0 202 fstsw ax ; cond3 = 1 if st(0) = 0 0077 91 203 xchg ax,cx 0078 DDD9 204 fstp st(1) ; Remove PI/4 007A F6C540 205 test ch,high(mask cond3) ; If c3=1, argument is zero 007D 7514 206 jnz sine_argument_zero 207 ; 208 ; Assert: 0 < st(0) <= PI/4 209 ; 007F 210 do_sine_fptan: 211 007F D9F2 212 fptan ; TAN ST(0) = ST(1)/ST(0) = Y/X 213 0081 214 after_sine_fptan: 215 0081 F6C742 216 test bh,high(mask cond3 + mask cond1); Look at octant angle fell into 0084 7B1A 217 jpo x_numerator ; Calculate cosine for octants 218 ; 1,2,5,6 219 ; 220 ; Calculate the sine of the argument 221 ; sine(A) = tan(A)/sqrt(1+tan(A)**2) if tan(A) = Y/X then 222 ; sin(A) = Y/sqrt(X*X + Y*Y) 223 ; 0086 D9C1 224 fld st(1) ; Copy Y value 0088 EB1A 225 jmp short finish_sine ; Put Y value in numerator 226 ; 227 ; The top of the stack is either NAN, infinity, or empty 228 ; 008A 229 funny_parameter: 230 008A DDD8 231 fstp st(0) ; Remove PI/4 008C 7404 232 jz return_empty ; Return empty if no parm 233 008E 7B02 234 jpo return_NAN ; Jump if st(0) is NAN 235 ; 236 ; st(0) is infinity. Return an indefinite value. 237 ; 0090 D9F8 238 fprem ; ST(1) can be anything 239 0092 240 return_NAN: 0092 241 return_empty: 242 0092 C3 243 ret ; OK to leave fprem running 244 ; 245 ; Simulate fptan with st(0) = 0 246 ; 0093 247 sine_argument_zero: 248 0093 D9EB 249 fld1 ; Simulate tan(0) 0095 EBEA 250 jmp after_sine_fptan ; Return the zero value 251 ; 252 ; The angle was too large. Remove the modulus and dividend from the 253 ; stack and return an indefinite result. 254 ; 0097 255 angle_too_big: 256 0097 DED9 257 fcompp ; Pop two values from the stack 0099 2ED9060A00 R 258 fld indefinite ; Return indefinite 009E 9B 259 fwait ; Wait for load to finish 009F C3 260 ret 261 ; 262 ; Calculate the cosine of the argument 263 ; cos(A) = 1/sqrt(1+tan(A)**2) if tan(A) - Y/X then 264 ; cos(A) = X/sqrt(X*X + Y*Y) 265 ; 00A0 266 X_numerator: 267 00A0 D9C0 268 fld st(0) ; Copy X value 00A2 D9CA 269 fxch st(2) ; Put X in numerator 270 00A4 271 finish_sine: 272 00A4 DCCB 273 fmul st,st(0) ; Form X*X + Y*Y 00A6 D9C9 274 fxch 00AB DCC8 275 fmul st,st(0) 00AA DEC1 276 fadd ; st(0) = X*X + Y*Y 00AC D9FA 277 fsqrt ; st(0) = sqrt(X*X + Y*Y) 278 279 280 ; Form the sign of th result. The two conditions are the C1 flag from 281 ; FXAM in bh and the CO flag from fprem in ah 282 ; 00AE 80E701 283 and bh,high(mask cond0) ; Look at the fprem C0 flag 00B1 80E402 284 and ah,high(mask cond1) ; Look at the fxam C1 flag 00B4 0AFC 285 or bh,ah ; Even number of flags cancel 00B6 7A02 286 jpe positive_sine ; Two negatives make a positive 287 00B8 D9E0 288 fchs ; Forc result negative 289 00BA 290 positive_sine: 291 00BA DEF9 292 fdiv ; Form final result 00BC C3 293 ret ; Ok to leave fdiv running 294 295 cosine endp 296 +1 $eject 297 ; 298 ; This function will calculate the tangent of an angle. 299 ; The angle, in radians is passed in ST(0), the tangent is returned 300 ; in ST(0). The tangent is calculated to an accuracy of 4 units in the 301 ; least three significant bits of an extended real format number. The 302 ; PLM/86 calling format is: 303 ; 304 ; tangent procedure (angle) real external; 305 ; declare angle real; 306 ; end tangent; 307 ; 308 ; Two stack registers are used. The result of the tangent function is 309 ; defined for the following cases: 310 ; 311 ; angle result 312 ; 313 ; valid or unnormal < 2**62 in magnitude correct value 314 ; 0 0 315 ; denormal correct denormal 316 ; valid or unnormal > 2**62 in magnitude indefinite 317 ; NAN NAN 318 ; infinity indefinite 319 ; empty empty 320 ; 321 ; The tangent instruction uses the fptan instruction. Four possible 322 ; relations are used: 323 ; 324 ; Let R = |angle MOD PI/4| 325 ; S = -1 or 1 depending on the sign of the angle 326 ; 327 ; 1) tan(R) 2) tan(PI/4-R) 3) 1/tan(R) 4) 1/tan(PI/4-R) 328 ; 329 ; The following table is used to decide which relation to use depending 330 ; on in which octant the angle fell. 331 ; 332 ; octant relation 333 ; 334 ; 0 s*1 335 ; 1 s*4 336 ; 2 -s*3 337 ; 3 -s*2 338 ; 4 s*1 339 ; 5 s*4 340 ; 6 -s*3 341 ; 7 -s*2 342 ; 00BD 343 tangent proc 344 00BD D9E5 345 fram ; Look at the parameter 00BF 9BDFE0 346 fstw ax ; Get fxam status 00C2 2EDB2E0000 R 347 fld pi_quarter ; Get PI/4 00C7 9E 348 sahf ; CF = C1, PF = C2, ZF = C3 00C8 72C0 349 jc funny_parameter 350 ; 351 ; Angle is unnormal, normal, zero, or denormal. 352 ; 00CA D9C9 353 fxch ; st(0) = angle, st(1) = PI/4 00CC 7A17 354 jpe tan_zero_unnormal 355 ; 356 ; Angle is either an normal or denormal. 357 ; Reduc the angle to the range -PI/4 < result < PI/4 358 ; If fprem cannot perform this operation in one try, the magnitude of the 359 ; angle must be > 2**62. Such an angle is so large that any rounding 360 ; errors could make a very large difference in the reduced angle. 361 ; It is safest to call this very rare case an error. 362 ; 00CE 363 tan_normal 364 00CE D9F8 365 fprem ; Quotient in C0,C3,C1 366 ; Convert denormals into unnormals 00D0 93 367 xchg ax,bx 00D1 9BDFE0 368 fstsw ax ; Quotient identifies octant 369 ; original angle fell into 00D4 93 370 xchg ax,bx 00D5 F6C704 371 test bh,high(mask cond2) ; Test for complete reduction 00D8 7BD 372 jnz angle_too_big ; Exit if angle was too big 373 ; 374 ; See if the angle must be reversed. 375 ; 376 ; Assert -PI/4 < st(0) < PI/4 377 ; 00DA D9E1 378 fabs ; 0 <= st(0) < PI/4 379 ; C3 in bx has the sign flag 00DC F6C702 380 test bh,high(mask cond1) ; must be reversed 00DF 740E 381 jz no_tan_reverse 382 ; 383 ; Angle fell in octants 1,3,5,7. Reverse it, subtract it from PI/4. 384 ; 00E1 DEE9 385 fsub ; Reverse angle 00E3 EB18 386 jmp short do_tangent 387 ; 388 ; Angle is either zero or an unnormal 389 ; 00E5 390 tan_zero_unnormal: 391 00E5 DDD9 392 fstp st(1) ; Remove PI/4 00E7 7405 393 jz tan_angle_zero 394 ; 395 ; Angle is an unnormal. 396 ; 00E9 E83300 397 call normalize_value 00EC EBE0 398 jmp tan_normal 399 00EE 400 tan_angle_zero: 401 00EE C3 402 ret 403 ; 404 Angle fell in octants 0,2,4,6. Test for st(0) = 0, fptan won't work 405 ; 00EF 406 no_tan_reverse: 407 00EF D9E4 408 ftst ; Test for zero angle 00F1 91 409 xchg ax,cx 00F2 9BDFE0 410 fstsw ax ; C3 = 1 if st(0) = 0 00F5 91 411 fstp st(1) 00F6 DDD9 412 test ch,high(mask cond3) 00F8 F6C540 413 jnz tan_zero 00FB 7515 414 415 do_tangent: 00FD 416 417 fptan ; tan ST(0) = ST(0)/ST(0) 00FD D9F2 418 419 after_tangent: 00FF 420 ; 421 ; Decide on the order of the operands and their sign for the divide 422 ; operation while the fptan instruction is working. 423 ; 424 mov al,bh ; Get a copy of fprem C3 flag 00FF 8AC7 425 and ax,mask cond1 + high(mask cond3); Examine fprem C3 flag and 0101 254002 426 ; FXAM C1 flag 427 test bh,high(mask cond1 + mask cond3); Use reverse divide if in 0104 F6C742 428 ; octants 1,2,5,6 429 jpo reverse_divide ; Note: parity works on low 0107 7B0D 430 ; 8 bits only! 431 ; 432 ; Angle was in octants 0,3,4,7 433 ; Test for the sign of the result. Two negatives cancel. 434 ; 435 or al,ah 0109 0AC4 436 jpe positive_divide 010B 7A02 437 438 fchs ; Force result negative 010D D9E0 439 440 positive_divide: 010F 441 442 fdiv ; Form result 010F DEF9 443 ret ; Ok to leave fdiv running 0111 C3 444 445 tan_zero: 0112 446 447 fld1 ; Force 1/0 = tan(PI/2) 0112 D9E8 448 ; 0114 EBE9 449 ; Angle was in octants 1,2,5,6 450 ; Set the correct sign of the result 451 ; 452 reverse_divide: 453 0116 454 or al,ah 455 jpe positive_r_divide 0116 0AC4 456 0118 7A02 457 fchs ; Force result negative 458 011A D9E0 459 positive_r_divide: 460 011C 461 fdivr ; Form reciprocal of result 462 ret ; Ok to leave fdiv running 011C DEF1 463 011E C3F1 464 tangent endp 465 ; 466 ; This function will normalize the value in st(0) 467 ; Then PI/4 is placed into st(1). 468 ; 469 normalize_value: 470 011F 471 fabs ; Force value positive 472 fxtract ; 0 <= st(0) < 1 011F D9E1 473 fld1 ; Get normalize bit 0121 D9F4 474 fadd st(1),st ; Normalize fraction 0123 D9E8 475 fsub ; Resotre original value 0125 DCC1 476 fscale ; Form original normalized value 0127 DEE9 477 fstp st(1) ; Remove scale factor 0129 D9FD 478 fld pi_quarter ; Get PI/4 012B DDD9 479 fxch 012D 2EDB2E0000 R 480 ret 0132 D9C9 481 0134 C3 482 code ends 483 ---- 484 485 ASSEMBLY COMPLETE, NO WARNINGS, NO ERRORS FPTAN and FPREM These trigonometric functions use the FPTAN instruction of the NPX. FPTAN requires that the angle argument be between 0 and π/4 radians, 0 to 45 degrees. The FPREM instruction is used to reduce the argument down to this range. The low three quotient bits set by FPREM identify which octant the original angle was in. One FPREM instruction iteration can reduce angles of 10^(18) radians or less in magnitude to π/4! Larger values can be reduced, but the meaning of the result is questionable, because any errors in the least significant bits of that value represent changes of 45 degrees or more in the reduced angle. Cosine Uses Sine Code To save code space, the cosine function uses most of the sine function code. The relation sin (│A│ + π/2) = cos(A) is used to convert the cosine argument into a sine argument. Adding π/2 to the angle is performed by adding 010{2} to the FPREM quotient bits identifying the argument's octant. It would be very inaccurate to add π/2 to the cosine argument if it was very much differentfrom π/2. Depending on which octant the argument falls in, a different relation will be used in the sine and tangent functions. The program listings show which relations are used. For the tangent function, the ratio produced by FPTAN will be directly evaluated. The sine function will use either a sine or cosine relation depending on which octant the angle fell into. On exit, these functions will normally leave a divide instruction in progress to maintain concurrency. If the input angles are of a restricted range, such as from 0 to 45 degrees, then considerable optimization is possible since full angle reduction and octant identification is not necessary. All three functions begin by looking at the value given to them. Not a Number (NaN), infinity, or empty registers must be specially treated. Unnormals need to be converted to normal values before the FPTAN instruction will work correctly. Denormals will be converted to very small unnormals that do work correctly for the FPTAN instruction. The sign of the angle is saved to control the sign of the result. Within the functions, close attention was paid to maintain concurrent execution of the 80287 and host. The concurrent execution will effectively hide the execution time of the decision logic used in the program. Appendix A Machine Instruction Encoding and Decoding ─────────────────────────────────────────────────────────────────────────── Machine instructions for the 80287 come in one of five different forms as shown in table A-1. In all cases, the instructions are at least two bytes long and begin with the bit pattern 11011B, which identifies the ESCAPE class of instructions. Instructions that reference memory operands are encoded much like similar CPU instructions, because all of the CPU memory-addressing modes may be used with ESCAPE instructions. Note that several of the processor control instructions (see table 2-11 in Chapter Two) may be preceded by an assembler-generated CPU WAIT instruction (encoding: 10011011B) if they are programmed using the WAIT form of their mnemonics. The ASM286 assembler inserts a WAIT instruction only before these specific processor control instructions──all of the numeric instructions are automatically synchronized by the 80286 CPU and an explicit WAIT instruction, though allowed, is not necessary. Table A-2 lists all 80287 machine instructions in binary sequence. This table may be used to "disassemble" instructions in unformatted memory dumps or instructions monitored from the data bus. Users writing exception handlers may also find this information useful to identify the offending instruction. Table A-1. 80287 Instruction Encoding ┌───────────────────────┬─────────────────────────────┬───────────────────┐ │ Lower-Addressed Byte │ Higher-Addressed Byte │ 0, 1, or 2 bytes │ ├───────────────┬───────┼───┬───────┬───┬─────┬───────┼───────────────────┤ 1 Memory transfers, including applicable processor control instructions; 0, 1, or 2 displacement bytes may follow.│ 1 1 0 1 1 │ OP-A │ 1 │ MOD │ 1 │OP-B │ R/M │ DISPLACEMENT │ ├───────────────┼───────┼───┼───────┼───┴─────┼───────┼───────────────────┤ 2 Memory arithmetic and comparison instructions; 0, 1, or 2 displacement bytes may follow.│ 1 1 0 1 1 │FORMAT │OP-A MOD │ OP-B │ R/M │ DISPLACEMENT │ ├───────────────┼───┬───┼───┼───┬───┼─────────┼───────┼───────────────────┘ 3 Stack arithmetic and comparison instructions.│ 1 1 0 1 1 │ R │ P │OP-A 1 │ 1 │ OP-B │ REG │ ├───────────────┼───┼───┼───┼───┼───┼───┬─────┴───────┤ 4 Constant, transcendental, some arithmetic instructions.│ 1 1 0 1 1 │ 0 │ 0 │ 1 │ 1 │ 1 │ 1 │ OP │ ├───────────────┼───┼───┼───┼───┼───┼───┼─────────────┤ 5 Processor control instructions that do not reference memory.│ 1 1 0 1 1 │ 0 │ 1 │ 1 │ 1 │ 1 │ 1 │ OP │ └───────────────┴───┴───┴───┴───┴───┴───┴─────────────┘ 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 ─────────────────────────────────────────────────────────────────────────── NOTES OP, OP-A, OP-B: Instruction opcode, possibly split into two fields. MOD: Same as 80286 CPU mode field. R/M: Same as 80286 CPU register/memory field. FORMAT: Defines memory operand 00 = short real 01 = short integer 10 = long real 11 = word integer R: 0 = return result to stack top 1 = return result to other register P: 0 = do not pop stack 1 = pop stack after operation REG: register stack element 000 = stack top 001 = next on stack 010 = third stack element, etc. ─────────────────────────────────────────────────────────────────────────── Table A-2. Machine Instruction Decoding Guide ┌─1st Byte──┐ ASM286 Instruction Hex Binary 2nd Byte Bytes 3, 4 Format D8 1101 1000 MOD00 0R/M (disp-lo),(disp-hi) FADD short-real D8 1101 1000 MOD00 1R/M (disp-lo),(disp-hi) FMUL short-real D8 1101 1000 MOD01 0R/M (disp-lo),(disp-hi) FCOM short-real D8 1101 1000 MOD01 1R/M (disp-lo),(disp-hi) FCOMP short-real D8 1101 1000 MOD10 0R/M (disp-lo),(disp-hi) FSUB short-real D8 1101 1000 MOD10 1R/M (disp-lo),(disp-hi) FSUBR short-real D8 1101 1000 MOD11 0R/M (disp-lo),(disp-hi) FDIV short-real D8 1101 1000 MOD11 1R/M (disp-lo),(disp-hi) FDIVR short-real D8 1101 1000 1100 0REG FADD ST,ST(i) D8 1101 1000 1100 1REG FMUL ST,ST(i) D8 1101 1000 1101 0REG FCOM ST(i) D8 1101 1000 1101 1REG FCOMP ST(i) D8 1101 1000 1110 0REG FSUB ST,ST(i) D8 1101 1000 1110 1REG FSUBR ST,ST(i) D8 1101 1000 1111 0REG FDIV ST,ST(i) D8 1101 1000 1111 1REG FDIVR ST,ST(i) D9 1101 1001 MOD00 0R/M (disp-lo),(disp-hi) FLD short-real D9 1101 1001 MOD00 1R/M reserved D9 1101 1001 MOD01 0R/M (disp-lo),(disp-hi) FST short-real D9 1101 1001 MOD01 1R/M (disp-lo),(disp-hi) FSTP short-real D9 1101 1001 MOD10 0R/M (disp-lo),(disp-hi) FLDENV 14-bytes D9 1101 1001 MOD10 1R/M (disp-lo),(disp-hi) FLDCW 2-bytes D9 1101 1001 MOD11 0R/M (disp-lo),(disp-hi) FSTENV 14-bytes D9 1101 1001 MOD11 1R/M (disp-lo),(disp-hi) FSTCW 2-bytes D9 1101 1001 1100 0REG FLD ST(i) D9 1101 1001 1100 1REG FXCH ST(i) D9 1101 1001 1101 0000 FNOP D9 1101 1001 1101 0001 reserved D9 1101 1001 1101 001- reserved D9 1101 1001 1101 01-- reserved D9 1101 1001 1101 1REG (1) The marked encodings are not generated by the language translators. If, however, the 80287 encounters one of these encodings in the instruction stream, it will execute it as follows: FSTP ST(i) D9 1101 1001 1110 0000 FCHS D9 1101 1001 1110 0001 FABS D9 1101 1001 1110 001- reserved D9 1101 1001 1110 0100 FTST D9 1101 1001 1110 0101 FXAM D9 1101 1001 1110 011- reserved D9 1101 1001 1110 1000 FLD1 D9 1101 1001 1110 1001 FLDL2T D9 1101 1001 1110 1010 FLDL2E D9 1101 1001 1110 1011 FLDPI D9 1101 1001 1110 1100 FLDLG2 D9 1101 1001 1110 1101 FLDLN2 D9 1101 1001 1110 1110 FLDZ D9 1101 1001 1110 1111 reserved D9 1101 1001 1111 0000 F2XM1 D9 1101 1001 1111 0001 FYL2X D9 1101 1001 1111 0010 FPTAN D9 1101 1001 1111 0011 FPATAN D9 1101 1001 1111 0100 FXTRACT D9 1101 1001 1111 0101 reserved D9 1101 1001 1111 0110 FDECSTP D9 1101 1001 1111 0111 FINCSTP D9 1101 1001 1111 1000 FPREM D9 1101 1001 1111 1001 FYL2XP1 D9 1101 1001 1111 1010 FSQRT D9 1101 1001 1111 1011 reserved D9 1101 1001 1111 1100 FRNDINT D9 1101 1001 1111 1101 FSCALE D9 1101 1001 1111 111- reserved DA 1101 1010 MOD00 0R/M (disp-lo),(disp-hi) FIADD short-integer DA 1101 1010 MOD00 1R/M (disp-lo),(disp-hi) FIMUL short-integer DA 1101 1010 MOD01 0R/M (disp-lo),(disp-hi) FICOM short-integer DA 1101 1010 MOD01 1R/M (disp-lo),(disp-hi) FICOMP short-integer DA 1101 1010 MOD10 0R/M (disp-lo),(disp-hi) FISUB short-integer DA 1101 1010 MOD10 1R/M (disp-lo),(disp-hi) FISUBR short-integer DA 1101 1010 MOD11 0R/M (disp-lo),(disp-hi) FIDIV short-integer DA 1101 1010 MOD11 1R/M (disp-lo),(disp-hi) FIDIVR short-integer DA 1101 1010 11-- ---- reserved DB 1101 1011 MOD00 0R/M (disp-lo),(disp-hi) FILD short-integer DB 1101 1011 MOD00 1R/M (disp-lo),(disp-hi) reserved DB 1101 1011 MOD01 0R/M (disp-lo),(disp-hi) FIST short-integer DB 1101 1011 MOD01 1R/M (disp-lo),(disp-hi) FISTP short-integer DB 1101 1011 MOD10 0R/M (disp-lo),(disp-hi) reserved DB 1101 1011 MOD10 1R/M (disp-lo),(disp-hi) FLD temp-real DB 1101 1011 MOD11 0R/M (disp-lo),(disp-hi) reserved DB 1101 1011 MOD11 1R/M (disp-lo),(disp-hi) FSTP temp-real DB 1101 1011 110- ---- reserved DB 1101 1011 1110 0000 reserved (8087 FENI) DB 1101 1011 1110 0001 reserved (8087 FDISI) DB 1101 1011 1110 0010 FCLEX DB 1101 1011 1110 0011 FINIT DB 1101 1011 1110 0100 FSETPM DB 1101 1011 1110 1--- reserved DB 1101 1011 1111 ---- reserved DC 1101 1100 MOD00 0R/M (disp-lo),(disp-hi) FADD long-real DC 1101 1100 MOD00 1R/M (disp-lo),(disp-hi) FMUL long-real DC 1101 1100 MOD01 0R/M (disp-lo),(disp-hi) FCOM long-real DC 1101 1100 MOD01 1R/M (disp-lo),(disp-hi) FCOMP long-real DC 1101 1100 MOD10 0R/M (disp-lo),(disp-hi) FSUB long-real DC 1101 1100 MOD10 1R/M (disp-lo),(disp-hi) FSUBR long-real DC 1101 1100 MOD11 0R/M (disp-lo),(disp-hi) FDIV long-real DC 1101 1100 MOD11 1R/M (disp-lo),(disp-hi) FDIVR long-real DC 1101 1100 1100 0REG FADD ST(i),ST DC 1101 1100 1100 1REG FMUL ST(i),ST DC 1101 1100 1101 0REG (2) The marked encodings are not generated by the language translators. If, however, the 80287 encounters one of these encodings in the instruction stream, it will execute it as follows: FCOM ST(i) DC 1101 1100 1101 1REG (3) The marked encodings are not generated by the language translators. If, however, the 80287 encounters one of these encodings in the instruction stream, it will execute it as follows: FCOMP ST(i) DC 1101 1100 1110 0REG FSUB ST(i),ST DC 1101 1100 1110 1REG FSUBR ST(i),ST DC 1101 1100 1111 0REG FDIV ST(i),ST DC 1101 1100 1111 1REG FDIVR ST(i),ST DD 1101 1101 MOD00 0R/M (disp-lo),(disp-hi) FLD long-real DD 1101 1101 MOD00 1R/M reserved DD 1101 1101 MOD01 0R/M (disp-lo),(disp-hi) FST long-real DD 1101 1101 MOD01 1R/M (disp-lo),(disp-hi) FSTP long-real DD 1101 1101 MOD10 0R/M (disp-lo),(disp-hi) FRSTOR 94-bytes DD 1101 1101 MOD10 1R/M (disp-lo),(disp-hi) reserved DD 1101 1101 MOD11 0R/M (disp-lo),(disp-hi) FSAVE 94-bytes DD 1101 1101 MOD11 1R/M (disp-lo),(disp-hi) FSTSW 2-bytes DD 1101 1101 1100 0REG FFREE ST(i) DD 1101 1101 1100 1REG (4) The marked encodings are not generated by the language translators. If, however, the 80287 encounters one of these encodings in the instruction stream, it will execute it as follows: FXCH ST(i) DD 1101 1101 1101 0REG FST ST(i) DD 1101 1101 1101 1REG FSTP ST(i) DD 1101 1101 111- ---- reserved DE 1101 1110 MOD00 0R/M (disp-lo),(disp-hi) FIADD word-integer DE 1101 1110 MOD00 1R/M (disp-lo),(disp-hi) FIMUL word-integer DE 1101 1110 MOD01 0R/M (disp-lo),(disp-hi) FICOM word-integer DE 1101 1110 MOD01 1R/M (disp-lo),(disp-hi) FICOMP word-integer DE 1101 1110 MOD10 0R/M (disp-lo),(disp-hi) FISUB word-integer DE 1101 1110 MOD10 1R/M (disp-lo),(disp-hi) FISUBR word-integer DE 1101 1110 MOD11 0R/M (disp-lo),(disp-hi) FIDIV word-integer DE 1101 1110 MOD11 1R/M (disp-lo),(disp-hi) FIDIVR word-integer DE 1101 1110 1100 0REG FADDP ST(i),ST DE 1101 1110 1100 1REG FMULP ST(i),ST DE 1101 1110 1101 0--- (5) The marked encodings are not generated by the language translators. If, however, the 80287 encounters one of these encodings in the instruction stream, it will execute it as follows: FCOMP ST(i) DE 1101 1110 1101 1000 reserved DE 1101 1110 1101 1001 FCOMPP DE 1101 1110 1101 101- reserved DE 1101 1110 1101 11-- reserved DE 1101 1110 1110 0REG FSUBP ST(i),ST DE 1101 1110 1110 1REG FSUBRP ST(i),ST DE 1101 1110 1111 0REG FDIVP ST(i),ST DE 1101 1110 1111 1REG FDIVRP ST(i),ST DF 1101 1111 MOD00 0R/M (disp-lo),(disp-hi) FILD word-integer DF 1101 1111 MOD00 1R/M (disp-lo),(disp-hi) reserved DF 1101 1111 MOD01 0R/M (disp-lo),(disp-hi) FIST word-integer DE 1101 1110 MOD01 1R/M (disp-lo),(disp-hi) FISTP word-integer DF 1101 1111 MOD10 0R/M (disp-lo),(disp-hi) FBLD packed-decimal DF 1101 1111 MOD10 1R/M (disp-lo),(disp-hi) FILD long-integer DF 1101 1111 MOD11 0R/M (disp-lo),(disp-hi) FBSTP packed-decimal DF 1101 1111 MOD11 1R/M (disp-lo),(disp-hi) FISTP long-integer DF 1101 1111 1100 0REG (6) The marked encodings are not generated by the language translators. If, however, the 80287 encounters one of these encodings in the instruction stream, it will execute it as follows: FFREE ST(i) and pop stack DF 1101 1111 1100 1REG (7) The marked encodings are not generated by the language translators. If, however, the 80287 encounters one of these encodings in the instruction stream, it will execute it as follows: FXCH ST(i) DF 1101 1111 1101 0REG (8) The marked encodings are not generated by the language translators. If, however, the 80287 encounters one of these encodings in the instruction stream, it will execute it as follows: FSTP ST(i) DF 1101 1111 1101 1REG (9) The marked encodings are not generated by the language translators. If, however, the 80287 encounters one of these encodings in the instruction stream, it will execute it as follows: FSTP ST(i) DF 1101 1111 1110 000 FSTSW AX DF 1101 1111 1111 XXX reserved Appendix B Compatibility Between the 80287 NPX and the 8087 ─────────────────────────────────────────────────────────────────────────── The 80286/80287 operating in Real-Address mode will execute 8087 programs without major modification. However, because of differences in the handling of numeric exceptions by the 80287 NPX and the 8087 NPX, exception-handling routines may need to be changed. This appendix summarizes the differences between the 80287 NPX and the 8087 NPX, and provides details showing how 8087 programs can be ported to the 80287. 1. The 80287 signals exceptions through a dedicated ERROR line to the 80286. The 80287 error signal does not pass through an interrupt controller (the 8087 INT signal does). Therefore, any interrupt-controller-oriented instructions in numeric exception handlers for the 8087 should be deleted. 2. The 8087 instructions FENI/FNENI and FDISI/FNDISI perform no useful function in the 80287. If the 80287 encounters one of these opcodes in its instruction stream, the instruction will effectively be ignored── none of the 80287 internal states will be updated. While 8087 code containing these instructions may be executed on the 80287, it is unlikely that the exception-handling routines containing these instructions will be completely portable to the 80287. 3. Interrupt vector 16 must point to the numeric exception handling routine. 4. The ESC instruction address saved in the 80287 includes any leading prefixes before the ESC opcode. The corresponding address saved in the 8087 does not include leading prefixes. 5. In Protected-Address mode, the format of the 80287's saved instruction and address pointers is different than for the 8087. The instruction opcode is not saved in Protected mode──exception handlers will have to retrieve the opcode from memory if needed. 6. Interrupt 7 will occur in the 80286 when executing ESC instructions with either TS (task switched) or EM (emulation) of the 80286 MSW set (TS = 1 or EM = 1). If TS is set, then a WAIT instruction will also cause interrupt 7. An exception handler should be included in 80287 code to handle these situations. 7. Interrupt 9 will occur if the second or subsequent words of a floating-point operand fall outside a segment's size. Interrupt 13 will occur if the starting address of a numeric operand falls outside a segment's size. An exception handler should be included in 80287 code to report these programming errors. 8. Except for the processor control instructions, all of the 80287 numeric instructions are automatically synchronized by the 80286 CPU── the 80286 automatically tests the BUSY line from the 80287 to ensure that the 80287 has completed its previous instruction before executing the next ESC instruction. No explicit WAIT instructions are required to assure this synchronization. For the 8087 used with 8086 and 8088 processors, explicit WAITs are required before each numeric instruction to ensure synchronization. Although 8087 programs having explicit WAIT instructions will execute perfectly on the 80287 without reassembly, these WAIT instructions are unnecessary. 9. Since the 80287 does not require WAIT instructions before each numeric instruction, the ASM286 assembler does not automatically generate these WAIT instructions. The ASM86 assembler, however, automatically precedes every ESC instruction with a WAIT instruction. Although numeric routines generated using the ASM86 assembler will generally execute correctly on the 80286/20, reassembly using ASM286 may result in a more compact code image. The processor control instructions for the 80287 may be coded using either a WAIT or No-WAIT form of mnemonic. The WAIT forms of these instructions cause ASM286 to precede the ESC instruction with a CPU WAIT instruction, in the identical manner as does ASM86. 10. A recommended way to detect the presence of an 80287 in an 80286 system (or an 8087 in an 8086 system) is shown below. It assumes that the sytem hardware causes the data bus to be high if no 80287 is present to drive the data lines during the FSTSW (Store 80287 Status Word) instruction. FND_287: FNINIT ; initialize numeric processor. FSTSTW STAT ; store status word into location ; STAT. MOV AX,STAT OR AL,AL ; Zero Flag reflects result of OR. JZ GOT_287 ; Zero in AL means 80287 is present. ; ; No 80287 Present ; SMSW AX OR AX,0004H ; set EM bit in Machine Status Word. LMSW AX ; to enable software emulation of 287. JMP CONTINUE ; ; 80287 is present in system ; GOT_287: SMSW AX OR AX,0002H ; set MP bit in Machine Status Word LMSW AX ; to permit normal 80287 operation ; ; Continue . . . ; CONTINUE: ; and off we go An 80286/80287 design must place a pullup resistor on one of the low eight data bus bits of the 80286 to be sure it is read as a high when no 80287 is present. Appencix C Implementing the IEEE P754 Standard ─────────────────────────────────────────────────────────────────────────── The 80287 NPX and standard support library software, provides an implementation of the IEEE "A Proposed Standard for Binary Floating-Point Arithmetic," Draft 10.0, Task P754, of December 2, 1982. The 80287 Support Library, described in 80287 Support Library Reference Manual, Order Number 122129, is an example of such a support library. This appendix describes the relationship between the 80287 NPX and the IEEE Standard. Where the Standard has options, Intel's choices in implementing the 80287 are described. Where portions of the Standard are implemented through software, this appendix indicates which modules of the 80287 Support Library implement the Standard. Where special software in addition to the Support Library may be required by your application, this appendix indicates how to write this software. This appendix contains many terms with precise technical meanings, specified in the 754 Standard. Where these terms are used, they have been capitalized to emphasize the precision of their meanings. The Glossary provides the definitions for all capitalized phrases in this appendix. Options Implemented in the 80287 The 80287 SHORT_REAL and LONG_REAL formats conform precisely to the Standard's Single and Double Floating-Point Numbers, respectively. The 80287 TEMP_REAL format is the same as the Standard's Double Extended format. The Standard allows a choice of Bias in representing the exponent; the 80287 uses the Bias 16383 decimal. For the Double Extended format, the Standard contains an option for the meaning of the minimum exponent combined with a nonzero significand. The Bias for this special case can be either 16383, as in all the other cases, or 16382, making the smallest exponent equivalent to the second-smallest exponent. The 80287 uses the Bias 16382 for this case. This allows the 80287 to distinguish between Denormal numbers (integer part is zero, fraction is nonzero, Biased exponent is 0) and Unnormal numbers of the same value (same as the denormal except the Biased Exponent is 1). The Standard allows flexibility in specifying which NaNs are trapping and which are nontrapping. The EH287.LIB module of the 80287 Support Library provides a software implementation of nontrapping NaNs, and defines one distinction between trapping and nontrapping NaNs: If the most significant bit of the fractional part of a NaN is 1, the NaN is nontrapping. If it is 0, the NaN is trapping. When a masked Invalid Operation error involves two NaN inputs, the Standard allows flexibility in choosing which NaN is output. The 80287 selects the NaN whose absolute value is greatest. Areas of the Standard Implemented in Software There are five areas of the Standard that are not implemented directly in the 80287 hardware; these areas are instead implemented in software as part of the 80287 Support Library. 1. The Standard requires that a Normalizing Mode be provided, in which any nonnormal operands to functions are automatically normalized before the function is performed. The NPX provides a "Denormal operand" exception for this case, allowing the exception handler the opportunity to perform the normalization specified by the Standard. The Denormal operand exception handler provided by EH287.LIB implements the Standard's Normalizing Mode completely for Single- and Double-precision arguments. Normalizing mode for Double Extended operands is implemented in EH287.LIB with one non-Standard feature, discussed in the next section. 2. The Standard specifies that in comparing two operands whose relationship is "unordered," the equality test yield an answer of FALSE, with no errors or exceptions. The 80287 FCOM and FTST instructions themselves issue an Invalid Operation exception in this case. The error handler EH287.LIB filters out this Invalid Operation error using the following convention: Whenever an FCOM or FTST instruction is followed by a MOV AX,AX instruction (8BC0 Hex), and neither argument is a trapping NaN, the error handler will assume that a Standard equality comparison was intended, and return the correct answer with the Invalid Operation exception flag erased. Note that the Invalid Operation exception must be unmasked for this action to occur. 3. The Standard requires that two kinds of NaN's be provided: trapping and nontrapping. Nontrapping NaNs will not cause further Invalid Operation errors when they occur as operands to calculations. The NPX hardware directly supports only trapping NaN's; the EH287.LIB software implements nontrapping NaNs by returning the correct answer with the Invalid Operation exception flag erased. Note that the Invalid Operation exception must be unmasked for this action to occur. 4. The Standard requires that all functions that convert real numbers to integer formats automatically normalize the inputs if necessary. The integer conversion functions contained in CEL287.LIB fully meet the Standard in this respect; the 80287 FIST instruction alone does not perform this normalization. 5. The Standard specifies the remainder function which is provided by mqerRMD in CEL287.LIB. The 80287 FPREM instruction returns answers within a different range. Additional Software to Meet the Standard There are two cases in which additional software is required in conjunction with the 80287 Support Library in order to meet the standard. The 80287 Support Library does not provide this software in the interest of saving space and because the vast majority of applications will never encounter these cases. 1. When the Invalid Operation exception is masked, Nontrapping NaNs are not implemented fully. Likewise, the Standard's equality test for "unordered" operands is not implemented when the Invalid Operation exception is masked. Programmers can simulate the Standard notion of a masked Invalid Operation exception by unmasking the 80287 Invalid Operation exception, and providing an Invalid Operation exception handler that supports nontrapping NaNs and the equality test, but otherwise acts just as if the Invalid Operation exception were masked. The 80287 Support Library Reference Manual contains examples for programming this handler in both ASM286 andPL/M-286. 2. In Normalizing Mode, Denormal operands in the TEMP_REAL format are converted to 0 by EH287.LIB, giving sharp Underflow to 0. The Standard specifies that the operation be performed on the real numbers represented by the denormals, giving gradual underflow. To correctly perform such arithmetic while in Normalizing Mode, programmers would have to normalize the operands into a format identical to TEMP_REAL except for two extra exponent bits, then perform the operation on those numbers. Thus, software must be written to handle the 17-bit exponent explicitly. In designing the EH287.LIB, it was felt that it would be a disadvantage to most users to increase the size of the Normalizing routine by the amount necessary to provide this expanded arithmetic. Because the TEMP_REAL exponent field is so much larger than the LONG_REAL exponent field, it is extremely unlikely that TEMP_REAL underflow will be encountered in most applications. If meeting the Standard is a more important criterion for your application than the choice between Normalizing and warning modes, then you can select warning mode (Denormal operand exceptions masked), which fully meets the Standard. If you do wish to implement the Normalization of denormal operands in TEMP_REAL format using extra exponent bits, the list below indicates some useful pointers about handling Denormal operand exceptions: 1. TEMP_REAL numbers are considered Denormal by the NPX whenever the Biased Exponent is 0 (minimum exponent). This is true even if the explicit integer bit of the significand is 1. Such numbers can occur as the result of Underflow. 2. The 80287 FLD instruction can cause a Denormal Operand error if a number is being loaded from memory. It will not cause this exception if the number is being loaded from elsewhere in the 80287 stack. 3. The 80287 FCOM and FTST instructions will cause a Denormal Operand exception for unnormal operands as well as for denormal operands. 4. In cases where both the Denormal Operand and Invalid Operation exceptions occur, you will want to know which is signalled first. When a comparison instruction operates between a nonexistent stack element and a denormal number in 80286 memory, the D and I exceptions are issued simultaneously. In all other situations, a Denormal Operand exception takes precedence over a nonstack Invalid operation exception, while a stack Invalid Operation exception takes precedence over a Denormal Operand exception. Glossary of 80287 and Floating-Point Terminology ─────────────────────────────────────────────────────────────────────────── This glossary defines many terms that have precise technical meanings as specified in the IEEE 754 Standard. Where these terms are used, they have been capitalized to emphasize the precision of their meanings. Affine Mode: a state of the 80287, selected in the 80287 Control Word, in which infinities are treated as having a sign. Thus, the values +INFINITY and -INFINITY are considered different; they can be compared with finite numbers and with each other. Base: (1) a term used in logarithms and exponentials. In both contexts, it is a number that is being raised to a power. The two equations (y = log base b of x) and (b^(y) = x) are the same. Base: (2) a number that defines the representation being used for a string of digits. Base 2 is the binary representation; Base 10 is the decimal representation; Base 16 is the hexadecimal representation. In each case, the Base is the factor of increased significance for each succeeding digit (working up from the bottom). Bias: the difference between the unsigned Integer that appears in the Exponent field of a Floating-Point Number and the true Exponent that it represents. To obtain the true Exponent, you must subtract the Bias from the given Exponent. For example, the Short Real format has a Bias of 127 whenever the given Exponent is nonzero. If the 8-bit Exponent field contains 10000011, which is 131, the true Exponent is 131-127, or +4. Biased Exponent: the Exponent as it appears in a Floating-Point Number, interpreted as an unsigned, positive number. In the above example, 131 is the Biased Exponent. Binary Coded Decimal: a method of storing numbers that retains a base 10 representation. Each decimal digit occupies 4 full bits (one hexadecimal digit). The hex values A through F (1010 through 1111) are not used. The 80287 supports a Packed Decimal format that consists of 9 bytes of Binary Coded Decimal (18 decimal digits) and one sign byte. Binary Point: an entity just like a decimal point, except that it exists in binary numbers. Each binary digit to the right of the Binary Point is multiplied by an increasing negative power of two. C3──C0: the four "condition code" bits of the 80287 Status Word. These bits are set to certain values by the compare, test, examine, and remainder functions of the 80287. Characteristic: a term used for some non-Intel computers, meaning the Exponent field of a Floating-Point Number. Chop: to set the fractional part of a real number to zero, yielding the nearest integer in the direction of zero. Control Word: a 16-bit 80287 register that the user can set, to determine the modes of computation the 80287 will use, and the error interrupts that will be enabled. Denormal: a special form of Floating-Point Number, produced when an Underflow occurs. On the 80287, a Denormal is defined as a number with a Biased Exponent that is zero. By providing a Significand with leading zeros, the range of possible negative Exponents can be extended by the number of bits in the Significand. Each leading zero is a bit of lost accuracy, so the extended Exponent range is obtained by reducing significance. Double Extended: the Standard's term for the 80287 Temporary Real format, with more Exponent and Significand bits than the Double (Long Real) format, and an explicit Integer bit in the Significand. Double Floating Point Number: the Standard's term for the 80287's 64-bit Long Real format. Environment: the 14 bytes of 80287 registers affected by the FSTENV and FLDENV instructions. It encompasses the entire state of the 80287, except for the 8 Temporary Real numbers of the 80287 stack. Included are the Control Word, Status Word, Tag Word, and the instruction, opcode, and operand information provided by interrupts. Exception: any of the six error conditions (I, D, O, U, Z, P) signalled by the 80287. Exponent: (1) any power that is raised by an exponential function. For example, the operand to the function mqerEXP is an Exponent. The Integer operand to mqerYI2 is an Exponent. Exponent: (2) the field of a Floating-Point Number that indicates the magnitude of the number. This would fall under the above more general definition (1), except that a Bias sometimes needs to be subtracted to obtain the correct power. Floating-Point Number: a sequence of data bytes that, when interpreted in a standardized way, represents a Real number. Floating-Point Numbers are more versatile than Integer representations in two ways. First, they include fractions. Second, their Exponent parts allow a much wider range of magnitude than possible with fixed-length Integer representations. Gradual Underflow: a method of handling the Underflow error condition that minimizes the loss of accuracy in the result. If there is a Denormal number that represents the correct result, that Denormal is returned. Thus, digits are lost only to the extent of denormalization. Most computers return zero when Underflow occurs, losing all significant digits. Implicit Integer Bit: a part of the Significand in the Short Real and Long Real formats that is not explicitly given. In these formats, the entire given Significand is considered to be to the right of the Binary Point. A single Implicit Integer Bit to the left of the Binary Point is always 1, except in one case. When the Exponent is the minimum (Biased Exponent is 0), the Implicit Integer Bit is 0. Indefinite: a special value that is returned by functions when the inputs are such that no other sensible answer is possible. For each Floating-Point format there exists one Nontrapping NaN that is designated as the Indefinite value. For binary Integer formats, the negative number furthest from zero is often considered the Indefinite value. For the 80287 Packed Decimal format, the Indefinite value contains all 1's in the sign byte and the uppermost digits byte. Infinity: a value that has greater magnitude than any Integer or any Real number. The existence of Infinity is subject to heated philosophical debate. However, it is often useful to consider Infinity as another number, subject to special rules of arithmetic. All three Intel Floating-Point formats provide representations for +INFINITY and -INFINITY. They support two ways of dealing with Infinity: Projective (unsigned) and Affine (signed). Integer: a number (positive, negative, or zero) that is finite and has no fractional part. Integer can also mean the computer representation for such a number: a sequence of data bytes, interpreted in a standard way. It is perfectly reasonable for Integers to be represented in a Floating-Point format; this is what the 80287 does whenever an Integer is pushed onto the 80287 stack. Invalid Operation: the error condition for the 80287 that covers all cases not covered by other errors. Included are 80287 stack overflow and underflow, NaN inputs, illegal infinite inputs, out-of-range inputs, and illegal unnormal inputs. Long Integer: an Integer format supported by the 80287 that consists of a 64-bit Two's Complement quantity. Long Real: a Floating-Point Format supported by the 80287 that consists of a sign, an 11-bit Biased Exponent, an Implicit Integer Bit, and a 52-bit Significand──a total of 64 explicit bits. Mantissa: a term used for some non-Intel computers, meaning the Significand of a Floating-Point Number. Masked: a term that applies to each of the six 80287 Exceptions I,D,Z,O,U,P. An exception is Masked if a corresponding bit in the 80287 Control Word is set to 1. If an exception is Masked, the 80287 will not generate an interrupt when the error condition occurs; it will instead provide its own error recovery. NaN: an abbreviation for Not a Number; a Floating-Point quantity that does not represent any numeric or infinite quantity. NaNs should be returned by functions that encounter serious errors. If created during a sequence of calculations, they are transmitted to the final answer and can contain information about where the error occurred. Nontrapping NaN: a NaN in which the most significant bit of the fractional part of the Significand is 1. By convention, these NaNs can undergo certain operations without visible error. Nontrapping NaNs are implemented for the 80287 via the software in EH87.LIB. Normal: the representation of a number in a Floating-Point format in which the Significand has an Integer bit 1 (either explicit or Implicit). Normalizing Mode: a state in which nonnormal inputs are automatically converted to normal inputs whenever they are used in arithmetic. Normalizing Mode is implemented for the 80287 via the software in EH87.LIB. NPX: Numeric Processor Extension. This is the 80287. Overflow: an error condition in which the correct answer is finite, but has magnitude too great to be represented in the destination format. Packed Decimal: an Integer format supported by the 80287. A Packed Decimal number is a 10-byte quantity, with nine bytes of 18 Binary Coded Decimal digits, and one byte for the sign. Pop: to remove from a stack the last item that was placed on the stack. Precision Control: an option, programmed through the 80287 Control Word, that allows all 80287 arithmetic to be performed with reduced precision. Because no speed advantage results from this option, its only use is for strict compatibility with the IEEE Standard, and with other computer systems. Precision Exception: an 80287 error condition that results when a calculation does not return an exact answer. This exception is usually Masked and ignored; it is used only in extremely critical applications, when the user must know if the results are exact. Projective Mode: a state of the 80287, selected in the 80287 Control Word, in which infinities are treated as not having a sign. Thus the values +INFINITY and -INFINITY are considered the same. Certain operations, such as comparison to finite numbers, are illegal in Projective Mode but legal in Affine Mode. Thus Projective Mode gives you a greater degree of error control over infinite inputs. Pseudo Zero: a special value of the Temporary Real format. It is a number with a zero significand and an Exponent that is neither all zeros or all ones. Pseudo zeros can come about as the result of multiplication of two Unnormal numbers; but they are very rare. Real: any finite value (negative, positive, or zero) that can be represented by a decimal expansion. The fractional part of the decimal expansion can contain an infinite number of digits. Reals can be represented as the points of a line marked off like a ruler. The term Real can also refer to a Floating-Point Number that represents a Real value. Short Integer: an Integer format supported by the 80287 that consists of a 32-bit Two's Complement quantity. Short Integer is not the shortest 80287 Integer format──the 16-bit Word Integer is. Short Real: a Floating-Point Format supported by the 80287, which consists of a sign, an 8-bit Biased Exponent, an Implicit Integer Bit, and a 23-bit Significand──a total of 32 explicit bits. Significand: the part of a Floating-Point Number that consists of the most significant nonzero bits of the number, if the number were written out in an unlimited binary format. The Significand alone is considered to have a Binary Point after the first (possibly Implicit) bit; the Binary Point is then moved according to the value of the Exponent. Single Extended: a Floating-Point format, required by the Standard, that provides greater precision than Single; it also provides an explicit Integer Significand bit. The 80287's Temporary Real format meets the Single Extended requirement as well as the Double Extended requirement. Single Floating-Point Number: the Standard's term for the 80287's 32-bit Short Real format. Standard: "a Proposed Standard for Binary Floating-Point Arithmetic," Draft 10.0 of IEEE Task P754, December 2, 1982. Status Word: A 16-bit 80287 register that can be manually set, but which is usually controlled by side effects to 80287 instructions. It contains condition codes, the 80287 stack pointer, busy and interrupt bits, and error flags. Tag Word: a 16-bit 80287 register that is automatically maintained by the 80287. For each space in the 80287 stack, it tells if the space is occupied by a number; if so, it gives information about what kind of number. Temporary Real: the main Floating-Point Format used by the 80287. It consists of a sign, a 15-bit Biased Exponent, and a Significand with an explicit Integer bit and 63 fractional-part bits. Transcendental: one of a class of functions for which polynomial formulas are always approximate, never exact for more than isolated values. The 80287 supports trigonometric, exponential, and logarithmic functions; all are Transcendental. Trapping NaN: a NaN that causes an I error whenever it enters into a calculation or comparison, even a nonordered comparison. Two's Complement: a method of representing Integers. If the uppermost bit is 0, the number is considered positive, with the value given by the rest of the bits. If the uppermost bit is 1, the number is negative, with the value obtained by subtracting (2^(bit count)) from all the given bits. For example, the 8-bit number 11111100 is -4, obtained by subtracting 2^(8) from 252. Unbiased Exponent: the true value that tells how far and in which direction to move the Binary Point of the Significand of a Floating-Point Number. For example, if a Short Real Exponent is 131, we subtract the Bias 127 to obtain the Unbiased Exponent +4. Thus, the Real number being represented is the Significand with the Binary Point shifted 4 bits to the right. Underflow: an error condition in which the correct answer is nonzero, but has a magnitude too small to be represented as a Normal number in the destination Floating-Point format. The Standard specifies that an attempt be made to represent the number as a Denormal. Unmasked: a term that applies to each of the six 80287 Exceptions: I,D,Z,O,U,P. An exception is Unmasked if a corresponding bit in the 80287 Control Word is set to 0. If an exception is Unmasked, the 80287 will generate an interrupt when the error condition occurs. You can provide an interrupt routine that customizes your error recovery. Unnormal: a Temporary Real representation in which the explicit Integer bit of the Significand is zero, and the exponent is nonzero. We consider Unnormal numbers distinct from Denormal numbers. Word Integer: an Integer format supported by both the 80286 and the 80287 that consists of a 16-bit Two's Complement quantity. Zero divide: an error condition in which the inputs are finite, but the correct answer, even with an unlimited exponent, has infinite magnitude. Index ─────────────────────────────────────────────────────────────────────────── A ─────────────────────────────────────────────────────────────────────────── Address Modes Architecture Arithmetic Instructions ASM 286 Automatic Exception Handling B ─────────────────────────────────────────────────────────────────────────── Binary Integers C ─────────────────────────────────────────────────────────────────────────── Comparison Instructions Compatibility Between the 80287 and 8087 Computation Fundamentals Concurrent (80286 and 80287) Processing Condition Codes Interpretation Constant Instructions Control Word D ─────────────────────────────────────────────────────────────────────────── Data Synchronization Data Transfer Instructions Data Types and Formats Binary Integers Decimal Integers Encoding of Data Type Infinity Control Precision Control Real Numbers Rounding Control Decimal Integers Denormalization Denormalized Operand Denormals Destination Operands E ─────────────────────────────────────────────────────────────────────────── EM (Emulation Mode) Bit in 80286 Emulation of 80287 Encoding of Data Types Error Synchronization Exception Handling Examples Exception Handling, Numeric Processing Exceptions, Numeric Automatic Exception Handling Handling Numeric Errors Inexact Result Invalid Operation Masked Response Numeric Overflow and Underflow Software Exception Handling Zero Divisor Exponent Field F ─────────────────────────────────────────────────────────────────────────── F2XM1 (Exponentiation) FADD (Add Real) FADDP (Add Real and POP) FABS (Absolute Value) FBLD (Packed Decimal──BCD──Load) FBSTP (Packed Decimal──BCD──Store and Pop) FCHS (Change Signs) FCLEX/FNCLEX (Clear Exceptions) FCOM (Compare Real) FCOMP (Compare Real and Pop) FCOMPP (Compare Real and Pop Twice) FDECSTP (Decrement Stack Pointer) FDISI/FNDISI FDIV (Divide Real) FDIV DWORD PTR (Division, Single Precision) FDIVP (Divide Real and Pop) FDIVR (Divide Real Reversed) FDIVRP (Divide Real Reversed and Pop) FENI/FNENI FFREE (Free Register) FIADD (Integer Add) FICOM (Integer Compare) FICOMP (Integer Compare and Pop) FIDIV (Integer Divide) FIDIVR (Integer Divide Reversed) FILD (Integer Load) FIMUL (Integer Multiply) FINCSTP (Increment Stack Pointer) FINIT/FNINIT (Initialize Processor) FIST (Integer Store) FISTP (Integer Store and Pop) FISUB (Integer Subtract) FISUBR (Integer Subtract Reversed) FLD (Load Real) FLD1 (Load One) FLDCW (Load Control Word) FLDENV (Load Environment) FLDL2E (Load Log Base 2 of e) FLDL2T (Load Log Base 2 of 10) FLDLG2 (Load Log Base 3 10 of 2) FLDLN2 (Load Log Base e of 2) FLDPI (Load PI) FLDZ (Load Zero) FMUL (Multiply Real) FMULP (Multiply Real and Pop) FNOP (No Operation) FPATAN (Partial Arctangant) FPREM (Partial Remainder) FPTAN (Partial Tangent) FRNDINT (Round to Integer) FRSTOR (Restore State) FSAVE, FNSAVE (Save State) FSCALE (Scale) FSETPM (Set Protected Mode) FSQRT (Square Root) FST (Store Real) FSTCW/FNSTCW (Store Control Word) FSTENV/FNSTENV (Store Environment) FSTP (Store Real and Pop) FSTSW/FNSTSW (Store Status Word) FSTSW AX, FNSTSW AX (Store Status Word in AX) FSUB (Subtract Real) FSUBP (Subtract Real and Pop) FSUBR (Subtract Real Reversed) FSUBRP (Subtract Real Reversed and Pop) FTST (Test) FWAIT (CPU Wait) FXAM (Examine) FXCH (Exchange Registers) FXTRACT (Extract Exponent and Significand) FYL2X (Logarithm──of x) FYL2XP1 (Logarithm──of x+1) G ─────────────────────────────────────────────────────────────────────────── GET$REAL$ERROR (Store, then Clear, Exception Flags) H ─────────────────────────────────────────────────────────────────────────── Handling Numeric Errors Hardware Interface I ─────────────────────────────────────────────────────────────────────────── I/O Locations (Dedicated and Reserved) IEEE P754 Standard, Implementation Indefinite Inexact Result Infinity Infinity Control INIT$REAL$MATH$UNIT (Initialize Processor Procedure) Initialization and Control Instruction Coding and Decoding Instruction Execution Times Instruction Length Integer Bit Introduction to Numeric Processor 80287 Invalid Operation L ─────────────────────────────────────────────────────────────────────────── Long Integer Format Long Real Format M ─────────────────────────────────────────────────────────────────────────── Machine Instruction Encoding and Decoding Masked Response MP (Math Present) Flag N ─────────────────────────────────────────────────────────────────────────── NaN (Not a Number) NO-WAIT FORM Nonnormal Real Numbers Number System Numeric Exceptions Numeric Operands Numeric Overflow and Underflow Numeric Processor Overview O ─────────────────────────────────────────────────────────────────────────── Output Format Overflow P ─────────────────────────────────────────────────────────────────────────── Packed Decimal Notation Precision Control PLM-286 Pointers (INstruction/Data) Processor Control Instructions Programming Examples, Comparative Conditional Branching Exception Handling Floating Point to ASCII Conversion Function Partitioning Special Instructions Programming Interface Pseudo zeros and zeros R ─────────────────────────────────────────────────────────────────────────── Real Number Range Real Numbers Recognizing the 80287 Register Stack RESTORE$REAL$STATUS (Restore Processor State) Rounding Control S ─────────────────────────────────────────────────────────────────────────── SAVE$REAL$STATUS (Save Processor State) Scaling SET$REAL$MODE (Set Exception Masks,Rounding Precision, and Infinity Controls) Short Integer Format Short Real Format Significand Software Exception Handling Source Operands Status Word T ─────────────────────────────────────────────────────────────────────────── Tag Word Temporary Real Format Transcendental Instructions Trigonometric Calculation Examples U ─────────────────────────────────────────────────────────────────────────── Underflow Unnormals Upgradability W ─────────────────────────────────────────────────────────────────────────── WAIT Form Word Integer Format Z ─────────────────────────────────────────────────────────────────────────── Zero Divisor