Friday, October 19, 2012

Floating Point

Floating point describes a method of representing real numbers in a way that can support a wide range of values. 
Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent
The base for the scaling is normally 2, 10 or 16. 
The typical number that can be represented exactly is of the form: 

Significant digits × base exponent

Examples: 2.3005 × 10 5 , -6.134 × 10 -5 , etc...


In binary form:  ±1.xxxxxxx2 × 2yyyy 
  (where x consist of only 0s and 1s and y consist of any real number)

 To show or insert a number with floating point, you must insert function float or double instead of integer.

Floating Point Standard

  • Floating point is defined by IEEE Standard 754-1985
  • Developed in response to divergence of representation of very large/small numbers
  • Portability issues for scientific code 
  • Now almost universally adopted 
  • Two representations
    • Single precision (32-bit data)
    • Double precision (64-bit data) 
     

 IEEE Floating Point Format

(32 bits) single : 8 bits                                 single : 23 bits
(64 bits)  double : 11 bits                              double : 52bits


x=(-1) s×(1+fraction) × 2 (Exponent - Bias)

where s = Sign binary (0 = postitive, 1 = negative)

For normalized significand,x,  {1.0≤|x|<2.0}


significand = significant digits
  • Always have a leading pre-binary-point 1 bit, so no need to represent its explicity (a.k.a. hidden bit)
  • means: actual bits= (23+1 bits)  single, (52+1 bits) double

 Exponent 

  • represents both postive & negative numbers
  • Bias is added to the actual exponent to get the stored exponent
  • Exponents of -127 (all 0s) and +128 (all 1s) are reserved for special numbers.  
 


Sign
Exponent
Fraction
Bias
Single Precision
1 [31]
8 [30-23]
23 [22-00]
127
Double Precision
1 [63]
11 [62-52]
52 [51-00]
1023
 

Range of Single & Double Precision




Binary
Decimal
Single
± 1  × 2127
1.2 ± 1038
Double
± 1 × 21023
3.4 ± 10308









No comments:

Post a Comment