computer organisation and architecture: Floating Point

Floating point describes a method of representing real numbers in a way that can support a wide range of values.
Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent.
The base for the scaling is normally 2, 10 or 16.
The typical number that can be represented exactly is of the form:

Significant digits × base ^exponent

^{Examples: 2.3005 ×}^10⁵,^{-6.134 × 10^-5}, etc...

^{In binary form:}±1.xxxxxxx2 × 2yyyy

(where x consist of only 0s and 1s and y consist of any real number)

To show or insert a number with floating point, you must insert function float or double instead of integer.

Floating Point Standard

Floating point is defined by IEEE Standard 754-1985
Developed in response to divergence of representation of very large/small numbers
Portability issues for scientific code
Now almost universally adopted
Two representations
- ◦Single precision (32-bit data)
- ◦Double precision (64-bit data)

IEEE Floating Point Format

^{(32 bits) single : 8 bits single : 23 bits}
^{(64 bits) double : 11 bits double : 52bits}

^{x=(-1)^s}^{×(1+fraction) × 2^{(Exponent - Bias)}}

^{^{where s = Sign binary (0 = postitive, 1 = negative)}}

^{^{For normalized significand,x, {1.0≤|x|<2.0}}}

significand = significant digits

Always have a leading pre-binary-point 1 bit, so no need to represent its explicity (a.k.a. hidden bit)
means: actual bits= (23+1 bits) single, (52+1 bits) double

Exponent

represents both postive & negative numbers
Bias is added to the actual exponent to get the stored exponent
Exponents of -127 (all 0s) and +128 (all 1s) are reserved for special numbers.

	Sign	Exponent	Fraction	Bias
Single Precision	1 [31]	8 [30-23]	23 [22-00]	127
Double Precision	1 [63]	11 [62-52]	52 [51-00]	1023

Range of Single & Double Precision

	Binary	Decimal
Single	± 1 × 2¹²⁷	≈ 1.2 ± 10³⁸
Double	± 1 × 2¹⁰²³	≈ 3.4 ± 10³⁰⁸

computer organisation and architecture

Friday, October 19, 2012

Floating Point

Floating Point Standard

IEEE Floating Point Format

Exponent

Range of Single & Double Precision

No comments:

Post a Comment