Fun_People Archive
21 Nov
Pentium Floating Point Division Bug


Date: Mon, 21 Nov 94 20:12:47 PST
To: Fun_People
Subject: Pentium Floating Point Division Bug

Forwarded-by: bostic@bellcore.bellcore.com@CS.Berkeley.EDU (Keith Bostic)
Forwarded-by: Elan Amir <elan@mercenary.CS.Berkeley.EDU>
Forwarded-by: Nick Kralevich <nickkral@po.EECS.Berkeley.EDU>

From: moler@mathworks.com (Cleve Moler)
Newsgroups: comp.sys.intel
Subject: MATLAB and the FDIV bug
Date: 15 Nov 1994 23:30:22 -0500
              
              Pentium Floating Point Division Bug

There has been a flurry of activity the last fews days on the
Internet news group, comp.sys.intel, that should interest MATLAB
users.  A serious design flaw has been discovered in the floating
point unit on Intel's Pentium chip.  Double precision divisions
involving operands with certain bit patterns can produce incorrect
results.

The most dramatic example seen so far can be extracted from a
posting last night by Tim Coe of Vitesse Semiconductor.  In MATLAB,
his example becomes

    x = 4195835
    y = 3145727
    z = x - (x/y)*y

With exact computation, z would be zero.  In fact, we get zero on
most machines, including those using Intel 286, 386 and 486 chips.
Even with roundoff error, z should not be much larger than eps*x,
which is about 9.3e-10.  But, on the Pentium,

    z = 256

The relative error, z/x, is about 2^(-14) or 6.1e-5.  The computed
quotient, x/y, is accurate to only 14 bits.

An article in last week's edition of Electronic Engineering Times
credits Prof. Thomas Nicely, a mathematics professor at Lynchburg
College in Virginia, with the first public announcement of the
Pentium division bug.  One of Nicely's examples involves

    p = 824633702441

With exact computation

    q = 1 - (1/p)*p

would be zero.  With floating point computation, q should be on
the order of eps.  On most machines, we find that

    q = eps/2 = 2^(-53) ~= 1.11e-16

But on the Pentium

    q = 2^(-28) ~= 3.72e-09

This is roughly single precision accuracy and is typical of the
most of the examples that had been posted before Coe's analysis.

The bit patterns of the operands involved in these examples 
are very special.  The denominator in Coe's example is

    y = 3*2^20 - 1

Nicely's research involves a theorem about sums of reciprocals
of prime numbers.  His example involves a prime of the form

    p = 3*2^38 - 18391

We're not sure yet how many operands cause the Pentium's floating
point division to fail, or even what operands produce the largest
relative error.  It is certainly true that failures are very rare.
But, as far as we are concerned, the real difficulty is having to
worry about this at all.  There are so many other things than can
go wrong with computer hardware, and software, that, at least, we
ought to be able to rely on the basic arithmetic.

The bug is definitely in the Pentium chip.  It occurs at all clock
rates.  The bug does not affect other arithmetic operations, or the
built-in transcendental functions.  Intel has recently made changes
to the on-chip Program Logic Array that fix the bug and is now
believed to be producing error free CPUs.  It remains to be seen
how long it will take for these to reach users.  

An unnamed Intel spokesman is quoted in the EE Times article as
saying "If customers are concerned, they can call and we'll replace
any of the parts that contain the bug."  But, at the MathWorks,
we have our own friends and contacts at Intel and we're unable
to confirm this policy.  We'll let you know when we hear anything
more definite.  In the meantime, the phone number for Customer
Service at Intel is 800-628-8686.

   -- Cleve Moler
   Chairman and Chief Scientist
   The MathWorks, Inc.



[=] © 1994 Peter Langston []