# CS 211 - Lecture 10 - Fixed and Floating Point Representations

Bernhard Firner

2025-10-02

---

## Review

* Fixed Point
* Floating Point

---

## Fixed Point Numbers

* Split a plain number into three sections:
  * Sign
  * Value
  * Fraction
* Have to handle the sign separately now
* Also need to shift right after multiplication, shift left before division

---

## Fixed Point Math

```C
#include <stdio.h>
#include <stdlib.h>

typedef struct FixedFourStorage FixedFourStorage;
struct FixedFourStorage {
    unsigned int fraction : 4;
    unsigned int value : 27;
    unsigned int sign : 1;
};

typedef union FixedFour FixedFour;
union FixedFour {
    FixedFourStorage storage;
    // Used for math
    unsigned int raw;
};

FixedFour FixedFourAdd(FixedFour a, FixedFour b) {
    FixedFour result;
    result.raw = a.raw + b.raw;
    return result;
}

FixedFour FixedFourMultiply(FixedFour a, FixedFour b) {
    FixedFour result;
    // Mask out the sign bit
    unsigned int mask = 0x7FFFFFFF;
    result.raw = ((a.raw&mask) * (b.raw&mask)) >> 4;
    // Use xor to get the sign. Overwrite whatever the multiply operation put there
    result.storage.sign = a.storage.sign ^ b.storage.sign;
    return result;
}

FixedFour FixedFourDivide(FixedFour a, FixedFour b) {
    FixedFour result;
    // Mask out the sign bit
    unsigned int mask = 0x7FFFFFFF;
    // We are going to lose precision, so shift before dividing
    result.raw = ((a.raw << 4) / (mask&b.raw));
    // Use xor to get the sign. Overwrite whatever the divide operation put there
    result.storage.sign = a.storage.sign ^ b.storage.sign;
    return result;
}

int getFFValue(FixedFour a) {
    if (a.storage.sign) {
        return -1 * a.storage.value;
    }
    else {
        return a.storage.value;
    }
}

int getFFFraction(FixedFour a) {
    // If 0x1000 is 0.5, then 0x0001 must be 0.0625
    // Always print with width 4 when using printf
    const unsigned int ffour_fraction = 625;
    return a.storage.fraction * ffour_fraction;
}

int main(void) {
    FixedFour ff = {.storage.value = 10, .storage.fraction = 1};

// Print with %04u means pad with leading zeros to width 4
    printf("ff value is %i.%04u\n", getFFValue(ff), getFFFraction(ff));

FixedFour other = {.storage.value = 10, .storage.fraction = 1};
    ff = FixedFourAdd(ff, other);
    printf("Adding with another value.\n");
    printf("ff value is %i.%04u\n", getFFValue(ff), getFFFraction(ff));

printf("Tripling.\n");
    other.storage.value = 3;
    other.storage.fraction = 0;
    ff = FixedFourMultiply(ff, other);
    printf("ff value is %i.%04u\n", getFFValue(ff), getFFFraction(ff));

printf("Dividing by 7.\n");
    other.storage.value = 7;
    other.storage.fraction = 0;
    ff = FixedFourDivide(ff, other);
    printf("ff value is %i.%04u\n", getFFValue(ff), getFFFraction(ff));

printf("Multiplying by -1.\n");
    other.storage.sign = 1;
    other.storage.value = 1;
    other.storage.fraction = 0;
    ff = FixedFourMultiply(ff, other);
    printf("ff value is %i.%04u\n", getFFValue(ff), getFFFraction(ff));

printf("Dividing by -0.5\n");
    other.storage.sign = 1;
    other.storage.value = 0;
    other.storage.fraction = 0x1<<3;
    ff = FixedFourDivide(ff, other);
    printf("ff value is %i.%04u\n", getFFValue(ff), getFFFraction(ff));

return 0;
}
```

---

## Pros and Cons

* Simple, not much more work than regular ints
* But has a reduced range
* Didn't gain that much in fractional representation

---

## Floating Point Numbers

* Three sections
  * Sign: 0 or 1
  * Exponent
  * Significand (also called mantissa)
* 32 bit (full precision) and 64 bit (double precision) most common

<table>
<tr><td>Sign</td><td colspan="8">Exponent</td><td colspan="23">Significand</td></tr>
<tr><td>31</td><td>30</td><td colspan="6">...</td><td>23</td><td colspan="21">22</td><td>...</td><td>0</td></tr>
</table>

---

## Floating Point Values

* Can divide floating point numbers into different types
  * Two special cases
    * NaN (not a number)
    * Infinity
  * normalized
  * subnormal
    * Including 0, when all memory is set to 0

---

## NaN and Infinity

* Positive and negative infinity
  * Set all exponent values to 1
  * All significand values to 0
* All exponents bits set
  * Any significand bits set

---

## Exponent

* Always subtract bias from the exponent
  * bias = $2^{k-1}$ without NaN and inf values, $2^{k-1} - 1$ with them
  * k is the number of exponent bits
  * For 32 bit floats, exponent range is $2^{1-127}$ to $2^{255-127}$
* That translates to float increments of $2^{-127}$ to $2^{128}$
  * Or 1.1754943508222875e-38
  * 170141183460469231731687303715884105728
  * Good enough

---

## Limits

* All of those values are stored in float.h
  * FLT_MIN, FLT_MAX
* FLT_RADIX
  * Value is is raised to the exponent. 2 for us.
* FLT_ROUNDS and FLT_EVAL_METHOD
  * Read (or set) rounding mode and evaluation precision
  * Will revisit later

---

## Why Bias?

* The bias allows the exponent field to act like it's part of a regular number
* The sign+exponent+significant can be compared directly to another float
  * If it were 2's complement, it would require special processing

## Subnormal Numbers

* When the exponent bits are 0, this is like a fixed point format
  * You would expect the exponent to be $0 - bias$
  * However, the equation is different than for normal numbers
* Each step in the significand is a step of the minimum increment
* value = $(-1)^{sign} \times 2^{1-bias} \times (\frac{Significand}{2^{23}})$

---

## Normalized Numbers

* We don't want to waste space representing anything in the subnormal range
  * Where exponent bits = 0
* So when exponent bits > 0, we are in the normal number range
  * Begin with an implied 1 in the equation
* value = $(-1)^{sign} \times 2^{exponent - 0x7F} \times (1+\frac{Significand}{2^{23}})$
  * This makes the first normal number $2^{exponent - 0x7F} \times \frac{1}{significand range}$ greater than the largest subnormal

---

## Why Both?

* We want to have the smallest increments available around 0
* 0.5 to 1 in steps of $2^{-1}\frac{Significand}{2^{23}}$
  * Steps of 5.960464477539063e-08
* From 0.25 to 0.5 in steps of 2.9802322387695312e-08 
* Gets tiny close to 0
  * Very important to prevent math errors where dividing or multiplying by small numbers is wildly inaccurate

---

## Sane increments

* Incrementing a float by 1 is sane
  * Keep increasing the significand and eventually it overflows into the exponent
  * This works out
  * $2^2(1+\frac{2^{23}-1}{2^{23}})$ is the largest value less than 8
    * 0x7F+2 in the exponent, 0x7FFFFF in the significand
  * Incrementing by 1 yields 8
    * $2^3(1+\frac{0}{2^{23}})$ is 8 exactly

---

## Proof

```c
#include <stdio.h>
#include <stdlib.h>

typedef struct FloatBits FloatBits;
struct FloatBits {
    unsigned int significand : 23;
    unsigned int exponent : 8;
    unsigned int sign : 1;
};

typedef union FloatIntBits FloatIntBits;
union FloatIntBits {
    float the_float;
    int the_int;
    FloatBits the_bits;
};

int main(void) {
    FloatIntBits fib = {.the_bits.exponent = 0x7F+2, .the_bits.significand = 0x7FFFFF};
    printf("The float is %.20f\n", fib.the_float);

fib.the_int++;
    printf("The float is %f\n", fib.the_float);
    return 0;
}
```

---

## Output

<pre>
The float is 7.99999952316284179688
The float is 8.000000
</pre>

---

## Arithmetic

* Incrementing and decrementing are sane
* Does comparison work the same way as integers?
  * Almost
  * Comparing infinity values works
    * They use the highest value of the exponent
    * Sign bit still means negative
  * Need to check for NaN values

---

## Addition and Subtraction

* Obviously these involve matching up the exponents somehow
  * Need to convert the numbers to have the same range
  * Then handle rounding

---

## Rounding and Addition

```c
#include <stdio.h>
#include <float.h>

int main(void) {
    printf("Float rounding method is %i and the eval precision is %i\n", FLT_ROUNDS, FLT_EVAL_METHOD);
    float a = 3.14;
    float b = 1e10;
    float c = -1e10;
    printf("a, b, and c are %f, %f, and %f\n", a, b, c);
    printf("a + b + c is %f\n", a + b + c);
    printf("c + b + a is %f\n", c + b + a);
    return 0;
}
```

---

## Outputs

<pre>
Float rounding method is 1 and the eval precision is 0
a, b, and c are 3.140000, 10000000000.000000, and -10000000000.000000
a + b + c is 0.000000
c + b + a is 3.140000
</pre>

---

## What does this mean?

* FLT_ROUNDS == 1 means rounding to the nearest value when there is imprecision
* FLT_EVAL_METHOD == 0 means that floating point operations are done with 32 bits
* When we add 3.14 and 10000000000, what are the bits?
  * The exponents of the two numbers are 1 and 33
  * Their minimum increments are 2.384185791015625e-07 and 1024
    * We can't represent 3.14 in the second float
    * Also, 512.0 + 1e10 - 1e10 == 1024

---

## Rounding

* So any operation on floating point numbers will probably involve rounding
* Why round to nearest?
  * Statistical bias
  * If always up or down then calculations would be consistently large or small
  * This way they are wrong, but not in a biased way

---

## Multiplication and Division

* Multiplication involves adding the exponents and multiplying the significands
  * So does it take the time of one multiplication and one addition?
* And how long does division take?
* What even is multiplication? A bunch of additions?
* What even is division on a computer? Is it successive subtractions?

---

## Addition

* Addition is done one bit at a time

---

## Handling Carry

* Need to also handle carry in from the previous bit

---

## Ripple Carry Adder

* Just cascade the one bit adders together to an arbitrary precision
* To complete a 32-bit addition, we need to wait for 32 layers of 1-bit additions
  * Not really, there is something called a "carry lookahead"
    * We'll talk about it later

---

## Multiplication

* Booth's algorithm, compressors, and hardware techniques
* Complicated
  * But parts can be made parallel
  * That just means more wires
  * We are good at tiny wires

---

## Timing: integers

```c
/*
 * Test the time it takes to do an operation on some number of ints.
 */

#include <stdio.h>
#include <stdlib.h>

const int randints[] = {
    71501472,-406930519,167198576,-506476827,677574771,
    -216019958,-896939258,587778791,-921524727,-979857950,
    937022794,-1000917168,697685341,857223902,222195556,
    257300523,-44621541,-798482123,-436348977,-634093188,
    -642097513,-92061249,-801071892,-484722028,-177026294,
    181769787,-25895578,-788014379,-846172028,-302776229,
    868440015,-643364568,-163554712,-1025734220,-109980683,
    -489093616,-74256562,759298958,949504440,-262565140};

void add(int repeats) {
    // Do a bunch of additions.
    // Use different random numbers, just in case some ops take
    // different times with different values.
    int left_idx = 0;
    int right_idx = 0;
    for (;repeats > 0; --repeats) {
        int result;
        result = randints[0] + randints[10];
        result = randints[1] + randints[11];
        result = randints[2] + randints[12];
        result = randints[3] + randints[13];
        result = randints[4] + randints[14];
        result = randints[5] + randints[15];
        result = randints[6] + randints[16];
        result = randints[7] + randints[17];
        result = randints[8] + randints[18];
        result = randints[9] + randints[19];
    }
}

void subtract(int repeats) {
    // Do a bunch of additions.
    // Use different random numbers, just in case some ops take
    // different times with different values.
    int left_idx = 0;
    int right_idx = 0;
    for (;repeats > 0; --repeats) {
        int result;
        result = randints[0] - randints[10];
        result = randints[1] - randints[11];
        result = randints[2] - randints[12];
        result = randints[3] - randints[13];
        result = randints[4] - randints[14];
        result = randints[5] - randints[15];
        result = randints[6] - randints[16];
        result = randints[7] - randints[17];
        result = randints[8] - randints[18];
        result = randints[9] - randints[19];
    }
}

void multiply(int repeats) {
    // Do a bunch of additions.
    // Use different random numbers, just in case some ops take
    // different times with different values.
    int left_idx = 0;
    int right_idx = 0;
    for (;repeats > 0; --repeats) {
        int result;
        result = randints[0] * randints[10];
        result = randints[1] * randints[11];
        result = randints[2] * randints[12];
        result = randints[3] * randints[13];
        result = randints[4] * randints[14];
        result = randints[5] * randints[15];
        result = randints[6] * randints[16];
        result = randints[7] * randints[17];
        result = randints[8] * randints[18];
        result = randints[9] * randints[19];
    }
}

void divide(int repeats) {
    // Do a bunch of additions.
    // Use different random numbers, just in case some ops take
    // different times with different values.
    int left_idx = 0;
    int right_idx = 0;
    for (;repeats > 0; --repeats) {
        int result;
        result = randints[0] / randints[10];
        result = randints[1] / randints[11];
        result = randints[2] / randints[12];
        result = randints[3] / randints[13];
        result = randints[4] / randints[14];
        result = randints[5] / randints[15];
        result = randints[6] / randints[16];
        result = randints[7] / randints[17];
        result = randints[8] / randints[18];
        result = randints[9] / randints[19];
    }
}

void gt(int repeats) {
    // Do a bunch of additions.
    // Use different random numbers, just in case some ops take
    // different times with different values.
    int left_idx = 0;
    int right_idx = 0;
    for (;repeats > 0; --repeats) {
        int result;
        result = randints[0] > randints[10];
        result = randints[1] > randints[11];
        result = randints[2] > randints[12];
        result = randints[3] > randints[13];
        result = randints[4] > randints[14];
        result = randints[5] > randints[15];
        result = randints[6] > randints[16];
        result = randints[7] > randints[17];
        result = randints[8] > randints[18];
        result = randints[9] > randints[19];
    }
}

void equality(int repeats) {
    // Do a bunch of additions.
    // Use different random numbers, just in case some ops take
    // different times with different values.
    int left_idx = 0;
    int right_idx = 0;
    for (;repeats > 0; --repeats) {
        int result;
        result = randints[0] == randints[10];
        result = randints[1] == randints[11];
        result = randints[2] == randints[12];
        result = randints[3] == randints[13];
        result = randints[4] == randints[14];
        result = randints[5] == randints[15];
        result = randints[6] == randints[16];
        result = randints[7] == randints[17];
        result = randints[8] == randints[18];
        result = randints[9] == randints[19];
    }
}

int main(int argc, char** argv) {
    if (argc < 3) {
        printf("Usage: %s <op> <count>\n\tOp is one of + - x /\n\tcount is the number of repetitions\n", argv[0]);
        return 1;
    }

int count = atoi(argv[2]);
    char op = argv[1][0];
    switch(op) {
        case('+'):
            add(count);
            break;
        case('-'):
            subtract(count);
            break;
        case('x'):
            multiply(count);
            break;
        case('/'):
            divide(count);
            break;
        case('>'):
            gt(count);
            break;
        case('='):
            equality(count);
            break;
        default:
            break;
    }
    return 0;
}
```

---

## Timing: floats

```c
/*
 * Test the time it takes to do an operation on some number of floats.
 */

#include <stdio.h>
#include <stdlib.h>

const int randfloats[] = {
    17089592.442231685,42650262.76743865,-53497819.84984845,-116691867.6590221,24809873.623072624,
    108600008.32102972,125398338.0416575,-120378736.75773332,116731913.37504315,77312670.38473937,
    -71530409.76935619,80182138.68649063,-47645384.879797846,-88263059.38595527,48201775.93114355,
    92453382.04082373,15758869.336459607,92874705.66463462,-114804327.49872798,6557254.501784831,
    -101537030.2437717,-92741851.55613247,-34323721.76140547,-86950365.81892607,133378638.29638937,
    18108168.7427693,-128727443.24053451,-71597764.37060487,-38187532.21805462,31284076.22326246,
    101780333.4527517,123052312.94178456,94335178.55253151,33028173.831077695,34771174.536762565,
    76130639.93920115,-60319246.15152177,-90615915.30101988,78392121.87562653,-59348261.877902895
};

void add(int repeats) {
    // Do a bunch of additions.
    // Use different random numbers, just in case some ops take
    // different times with different values.
    int left_idx = 0;
    int right_idx = 0;
    for (;repeats > 0; --repeats) {
        float result;
        result = randfloats[0] + randfloats[10];
        result = randfloats[1] + randfloats[11];
        result = randfloats[2] + randfloats[12];
        result = randfloats[3] + randfloats[13];
        result = randfloats[4] + randfloats[14];
        result = randfloats[5] + randfloats[15];
        result = randfloats[6] + randfloats[16];
        result = randfloats[7] + randfloats[17];
        result = randfloats[8] + randfloats[18];
        result = randfloats[9] + randfloats[19];
    }
}

void subtract(int repeats) {
    // Do a bunch of additions.
    // Use different random numbers, just in case some ops take
    // different times with different values.
    int left_idx = 0;
    int right_idx = 0;
    for (;repeats > 0; --repeats) {
        float result;
        result = randfloats[0] - randfloats[10];
        result = randfloats[1] - randfloats[11];
        result = randfloats[2] - randfloats[12];
        result = randfloats[3] - randfloats[13];
        result = randfloats[4] - randfloats[14];
        result = randfloats[5] - randfloats[15];
        result = randfloats[6] - randfloats[16];
        result = randfloats[7] - randfloats[17];
        result = randfloats[8] - randfloats[18];
        result = randfloats[9] - randfloats[19];
    }
}

void multiply(int repeats) {
    // Do a bunch of additions.
    // Use different random numbers, just in case some ops take
    // different times with different values.
    int left_idx = 0;
    int right_idx = 0;
    for (;repeats > 0; --repeats) {
        float result;
        result = randfloats[0] * randfloats[10];
        result = randfloats[1] * randfloats[11];
        result = randfloats[2] * randfloats[12];
        result = randfloats[3] * randfloats[13];
        result = randfloats[4] * randfloats[14];
        result = randfloats[5] * randfloats[15];
        result = randfloats[6] * randfloats[16];
        result = randfloats[7] * randfloats[17];
        result = randfloats[8] * randfloats[18];
        result = randfloats[9] * randfloats[19];
    }
}

void divide(int repeats) {
    // Do a bunch of additions.
    // Use different random numbers, just in case some ops take
    // different times with different values.
    int left_idx = 0;
    int right_idx = 0;
    for (;repeats > 0; --repeats) {
        float result;
        result = randfloats[0] / randfloats[10];
        result = randfloats[1] / randfloats[11];
        result = randfloats[2] / randfloats[12];
        result = randfloats[3] / randfloats[13];
        result = randfloats[4] / randfloats[14];
        result = randfloats[5] / randfloats[15];
        result = randfloats[6] / randfloats[16];
        result = randfloats[7] / randfloats[17];
        result = randfloats[8] / randfloats[18];
        result = randfloats[9] / randfloats[19];
    }
}

void gt(int repeats) {
    // Do a bunch of additions.
    // Use different random numbers, just in case some ops take
    // different times with different values.
    int left_idx = 0;
    int right_idx = 0;
    for (;repeats > 0; --repeats) {
        int result;
        result = randfloats[0] > randfloats[10];
        result = randfloats[1] > randfloats[11];
        result = randfloats[2] > randfloats[12];
        result = randfloats[3] > randfloats[13];
        result = randfloats[4] > randfloats[14];
        result = randfloats[5] > randfloats[15];
        result = randfloats[6] > randfloats[16];
        result = randfloats[7] > randfloats[17];
        result = randfloats[8] > randfloats[18];
        result = randfloats[9] > randfloats[19];
    }
}

void equality(int repeats) {
    // Do a bunch of additions.
    // Use different random numbers, just in case some ops take
    // different times with different values.
    int left_idx = 0;
    int right_idx = 0;
    for (;repeats > 0; --repeats) {
        int result;
        result = randfloats[0] == randfloats[10];
        result = randfloats[1] == randfloats[11];
        result = randfloats[2] == randfloats[12];
        result = randfloats[3] == randfloats[13];
        result = randfloats[4] == randfloats[14];
        result = randfloats[5] == randfloats[15];
        result = randfloats[6] == randfloats[16];
        result = randfloats[7] == randfloats[17];
        result = randfloats[8] == randfloats[18];
        result = randfloats[9] == randfloats[19];
    }
}

int main(int argc, char** argv) {
    if (argc < 3) {
        printf("Usage: %s <op> <count>\n\tOp is one of + - x /\n\tcount is the number of repetitions\n", argv[0]);
        return 1;
    }

---

## Note on Timing

* The `time` command measures how long something runs
  * Gives `real`, `user`, and `sys` times
  * **real**: wall clock time
  * **user**: time this took in "userspace"
    * Doesn't count time the OS took doing things, like opening files
    * Also doesn't count time the process isn't running
  * **sys**: Time during operating system calls
* We can just look at user

---

## Differences

<pre>
$ time ./timing_ints "+" 1000000000

real	0m2.833s
user	0m2.831s
sys	0m0.001s
$ time ./timing_ints "-" 1000000000

real	0m2.841s
user	0m2.840s
sys	0m0.001s
$ time ./timing_floats "+" 1000000000

real	0m3.165s
user	0m3.163s
sys	0m0.002s
$ time ./timing_floats "-" 1000000000

real	0m3.164s
user	0m3.162s
sys	0m0.002s
</pre>

---

## Multiplication and division

<pre>
$ time ./timing_ints "x" 1000000000

real	0m2.812s
user	0m2.810s
sys	0m0.002s
$ time ./timing_ints "/" 1000000000

real	0m34.541s
user	0m34.538s
sys	0m0.001s
$ time ./timing_floats "x" 1000000000

real	0m3.199s
user	0m3.197s
sys	0m0.002s
$ time ./timing_floats "/" 1000000000

real	0m34.273s
user	0m34.270s
sys	0m0.001s
</pre>

---

## Comparisons

<pre>
$ time ./timing_ints ">" 1000000000

real	0m4.017s
user	0m4.015s
sys	0m0.001s
$ time ./timing_ints "=" 1000000000

real	0m3.995s
user	0m3.994s
sys	0m0.002s
$ time ./timing_floats ">" 1000000000

real	0m4.021s
user	0m4.019s
sys	0m0.002s
$ time ./timing_floats "=" 1000000000

real	0m3.997s
user	0m3.992s
sys	0m0.005s
</pre>

---

## Magic?

* The floating point operations seem like they should be slower than the integer ones
* More steps, more fields, special cases, etc
* So why not?

---

## ALUs and FPUs

* That wasn't (and isn't) always the case
* An ALU is an Arithmetic Logic Unit
  * Piece of hardware that does integer calculations
  * It does the adding from before, plus other operations
  * Cannot handle floating point operations though

---

## FPU

* There is an ALU equivalent for floating point operations
  * The FPU
  * Floating Point Unit
* Integer operations and floating point are too different, so they are different pieces of hardware
* When we couldn't cram so many transistors into the same place, the FPUs were either not present, or only present in a co-processor
  * Like a GPU or external sound card
  * The GPU is just about the only co-processor that remains in general computers

---

## Emulating Floating Point

* If we time our fixed point implementation, we'll find it to be horribly slow
  * Why? No hardware support.
  * Steps are done separately, the total time is the sum of those steps.
* An actual FPU would be faster

---

## Hardware Vs Software

<pre>
$ time ./timing_fixed "+" 1000000000

real	0m9.988s
user	0m9.982s
sys	0m0.002s
$ time ./timing_fixed "x" 1000000000

real	1m0.209s
user	1m0.201s
sys	0m0.002s
</pre>

---

## FPU

* The floating point unit combines the steps required for floating point math into one unit
* Anything that can be done in parallel is
  * e.g. adding exponents and multiplying significand don't need to be sequential

---

## The same speed?

* Why would they be the same speed though?
  * There is a minimum unit of time in your system
  * The clock tick
* This is the speed of a CPU
  * 2200 MHz is 2,200,000 clock ticks per second

---

## Practical Limits

* We could try to make the time for a `char` addition the base unit, but why?
  * Other operations would need to be chopped up, increasing complexity
* Also, on 64-bit systems memory operations will need to be 64-bit
  * And we want them to be 1 clock cycle
  * Gives us a lower bound on the system clock cycles

---

## Silicon to Speed

* As technology has improved, engineers have squeezed more stuff into your CPU
  * This includes ALUs and FPUs
* Improvements have also decreased clock cycle latency for operations

---

## Real Example

* On Zen5 (2024), integer addition and subtraction take 1 clock cycle
  * And multiple can be done in parallel per core!
* Integer multiplication has a latency of ~3, and division is ~11-14
  * On the AMD K7 (1999) division latency was 40 cycles for 32 bit operations
* Although CPU speeds have been stable, practical execution times have improved

---

## Max Speed

* Most of the integer operations are already at maximum speed
* Do they have their result before the next clock cycle ticks?
  * Possibly. But to use the result, the clock would have to be faster
  * If you've ever overclocked something, you know going faster isn't always stable
  * We'll get into why later
* So the system is limited to the most unstable parts
  * If the FPU can finish its ops in less time, they'll take 1 cycle

---

## Actual Costs

* The slowest floating point operation is actually FBSTP
  * Converting the float to a binary coded decimal
* Other high latency operations are FSIN, FCOS, FPATAN, etc
  * Yes, your CPU is the thing doing the trig functions

---

## What is Supported?

* How do we know what the hardware supports?
* By looking at the opcodes of our instruction set

---

## For Next Time

* Don't want to start diving into instruction sets today
  * So take some time to do homework 2 and study
* Next class we'll being looking at common instructions and how they execute on a CPU