Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent.
The base for the scaling is normally 2, 10 or 16.
The typical number that can be represented exactly is of the form:
Significant digits × base exponent
Examples: 2.3005 × 10 5 , -6.134 × 10 -5 , etc...
In binary form: ±1.xxxxxxx2 × 2yyyy
(where x consist of only 0s and 1s and y consist of any real number)
To show or insert a number with floating point, you must insert function float or double instead of integer.
Floating Point Standard
- Floating point is defined by IEEE Standard 754-1985
- Developed in response to divergence of representation of very large/small numbers
- Portability issues for scientific code
- Now almost universally adopted
- Two
representations
- ◦Single precision (32-bit data)
- ◦Double precision (64-bit data)
IEEE Floating Point Format
(32 bits) single : 8 bits single : 23 bits(64 bits) double : 11 bits double : 52bits
x=(-1) s×(1+fraction) × 2 (Exponent - Bias)
where s = Sign binary (0 = postitive, 1 = negative)
For normalized significand,x, {1.0≤|x|<2.0}
significand = significant digits
- Always have a leading pre-binary-point 1 bit, so no need to represent its explicity (a.k.a. hidden bit)
- means: actual bits= (23+1 bits) single, (52+1 bits) double
Exponent
- represents both postive & negative numbers
- Bias is added to the actual exponent to get the stored exponent
- Exponents of -127 (all 0s) and +128 (all 1s) are reserved for special numbers.
Sign
|
Exponent
|
Fraction
|
Bias
|
|
Single
Precision
|
1
[31]
|
8
[30-23]
|
23
[22-00]
|
127
|
Double
Precision
|
1
[63]
|
11
[62-52]
|
52
[51-00]
|
1023
|
Range of Single & Double Precision
Binary
|
Decimal
|
|
Single
|
± 1 × 2127
|
≈ 1.2 ± 1038
|
Double
|
± 1 × 21023
|
≈ 3.4 ± 10308
|
No comments:
Post a Comment