# CS 211 - Lecture 27

## Computer Architecture

### Review

Bernhard Firner

2026-05-04

---

## Review

* Let's review the problems that gave you the most trouble this semester
* The final is cumulative
  * And most of the concepts we have encountered are cumulative
* That means you shouldn't forget how to program in C

---

## Architecture Stack

* C
* Assembly
* Digital Logic
  * Synchronous
  * Combinational

---

## Abstraction Light

* C is only a light abstraction over assembly
  * Compiler optimizations are the most opaque part
* Knowing about assembly and hardware, C should make even more sense now
  * You saw plenty of points as memory addresses in the bomb lab

---

## Static and Dynamic Memory

* The C memory model should make sense now
* Here are some questions that should feel intuitive
  * Why can't we return variable length arrays from a function?
  * Does dynamic memory allocation occur on the stack or the heap?
  * Which direction does stack memory grow? And heap memory?

---

## Stack and Heap Memory

* Stack memory generally grows by *subtracting* from %rsp
  * So the stack grows down
* The heap is global, and grows upwards

-v-

## Memory Example

```C
/*
 * Stack Vs Heap example
 */

#include <stdio.h>
#include <stdlib.h>

void a() {
    unsigned char stack_memory[16];
    unsigned char *heap_memory = calloc(16, sizeof(unsigned char));

printf("a stack memory is %p\n", stack_memory);
    printf("a heap memory is %p\n", heap_memory);
    free(heap_memory);

return;
}

void b() {
    unsigned char stack_memory[16];
    unsigned char *heap_memory = calloc(16, sizeof(unsigned char));

printf("b stack memory is %p\n", stack_memory);
    printf("b heap memory is %p\n", heap_memory);

a();

free(heap_memory);

return;
}

void c() {
    unsigned char stack_memory[16];
    unsigned char *heap_memory = calloc(16, sizeof(unsigned char));

printf("c stack memory is %p\n", stack_memory);
    printf("c heap memory is %p\n", heap_memory);

b();

free(heap_memory);

return;
}

void d() {
    unsigned char stack_memory[16];
    unsigned char *heap_memory = calloc(16, sizeof(unsigned char));

printf("d stack memory is %p\n", stack_memory);
    printf("d heap memory is %p\n", heap_memory);

c();

free(heap_memory);

return;
}

int main(void) {
    d();
    return 0;
}
```

-v-

## Example Output

* Example values not guaranteed, direction is consistent

<pre>
$ ./a.out
d stack memory is 0x7ffc2eceb250
d heap memory is 0x55d2ddab12a0
c stack memory is 0x7ffc2eceb210
c heap memory is 0x55d2ddab16d0
b stack memory is 0x7ffc2eceb1d0
b heap memory is 0x55d2ddab16f0
a stack memory is 0x7ffc2eceb190
a heap memory is 0x55d2ddab1710
</pre>

---

## The Operating System

* Memory allocated on the stack is easy to understand
  * Adjust %rsp and that's it
  * %rsp is set back to %rbp and memory is returned
* Memory with longer lifetimes is more complicated
  * This memory is managed by your operating sytem
  * Or, on an embedded device, you set aside a memory pool

---

## More OS Stuff

* This isn't an OS course, but the OS is important
  * Take CS 214 and CS 416 to dive more into systems programming
* In addition to memory, the OS also handles exceptions and I/O
  * That includes networking I/O which is pretty important
    * See CS 352

---

## Pointers

* Pointer questions are tricky if you aren't comfortable with memory
* But remember that all pointers are just a value in a register

```python
#include <stdio.h>

int main(int argc, char** argv) {
    printf("%p, %p\n", argv, argv+1);
    return 0;
}
```

* What can we say about the two values printed in the above code?

---

## Pointers

```C
#include <stdio.h>

int main(int argc, char** argv) {
    printf("%p, %p\n", argv, argv+1);
    return 0;
}
```

* The values are both pointers
* The different between the two values is the size of a pointer
  * Why? argv is a char\*\*
  * argv and argv+1 are the same as argv[0] and argv[1]

---

## Hex

* Values in memory are often awkward to write
  * So we use hex to keep things simple
* Writing hex and binary should (at some point) become natural to you
* Let's do some examples

---

## Hex/Binary Examples

* What is 0x200 in decimal?

---

## Hex/Binary Examples

* What is 0x200 in decimal?
  * $0\text{x}100 = 2^8 = 256$
  * $2\times 0\text{x}100 = 2 \times 256 = 512$

---

## Hex/Binary Examples

* What is 0b111100001100 in hex?

---

## Hex/Binary Examples

* What is 0b111100001100 in hex?
  * Each 4 binary characters a 1 hex character
  * A 8's place, 4's place, 2's place, and 1's place
  * $0b1100 = 8 + 4 = 0\text{x}C$
  * $0b1111 = 8 + 4 + 2 + 1 = 0\text{x}F$
  * $0b111100001100 = 0\text{x}F0C$

---

## Memory and Unions

* Unions allow us to easily treat a single part of memory as multiple different types
* We could accomplish the same thing with pointers, but it would be more tedious
  * What number is printed?

```C
union together {
  int number;
  unsigned char chars [4];
};

int main(int argc, char** argv) {
  // With a union
  union together forever = {};
  forever.chars[1] = 0x1;
  // With pointers
  int a_number = 0;
  unsigned char* some_chars = (unsigned char*)(&a_number);
  some_chars[1] = 0x1;
  printf("%i, %i \n", forever.number, a_number);
}
```

---

## Endianness

* The value printed by the previous code is 256
  * Why? x86 and x86-64 are little endian
  * The least significant byte is first
  * The second byte is the second least significant
* $0b00000000000000000000000100000000 = 0\text{x}00000100 = 256$

---

## Casting

* We cast an int* to an unsigned char* there
* What happens when we cast?

```c
#include <stdio.h>

int main(int argc, char** argv) {
    int a = 260;
    char b = (char)a;

printf("a and b are %i and %i\n", a, b);

return 0;
}
```

* What prints out?

---

## Loss of Precision

* Converting an int (32 bits) into a char (8 bits) means we lose some values
  * 260 is 0x104, from 256 + 4
  * The char cannot hold the 256, so that is discarded
  * b is left with the value 4

---

## ASM

* Can you read this code?

```asm
	.file	"check.c"
	.text
	.section	.rodata
.LC0:
	.string	"a and b are %i and %i\n"
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	endbr64
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$32, %rsp
	movl	%edi, -20(%rbp)
	movq	%rsi, -32(%rbp)
	movl	$260, -4(%rbp)
	movl	-4(%rbp), %eax
	movb	%al, -5(%rbp)
	movsbl	-5(%rbp), %edx
	movl	-4(%rbp), %eax
	movl	%eax, %esi
	leaq	.LC0(%rip), %rax
	movq	%rax, %rdi
	movl	$0, %eax
	call	printf@PLT
	movl	$0, %eax
	leave
	ret
.LFE0:
	.size	main, .-main
	.ident	"GCC: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0"
	.section	.note.GNU-stack,"",@progbits
	.section	.note.gnu.property,"a"
	.align 8
	.long	1f - 0f
	.long	4f - 1f
	.long	5
0:
	.string	"GNU"
1:
	.align 8
	.long	0xc0000002
	.long	3f - 2f
2:
	.long	0x3
3:
	.align 8
4:
```

---

## ASM

* Familiar from the bomb lab, right?
* This loads the immediate 260 value:
  * `movl	$260, -4(%rbp)`
* This moves the value into %eax
  * `movl	-4(%rbp), %eax`
* This moves the lower byte of that register on a stack variable
  * `movb	%al, -5(%rbp)`

---

## Addressing Types

* You remember these, right?
  * Immediate
  * Register
  * Direct
  * Indirect

---

## Immediate Addressing

* We can use constant values with registers
  * e.g. `$5` refers to the constant value 5
* This sets the destination to 5
  * `movq $5 dst`

---

## Register Addressing

* More common to work with register values
* `movq %rax, %rdi`
  * Moves the value in %rax into %rdi

---

## Direct and Indirect

* These are pointer equivalents
* Direct uses a hard-coded address
  * `movq 0x40abcd, dst`
  * Would move `memory[0x40abcd]` to destination
* `moveq (%rax), dst`
  * Would move `memory[%rax]` to destination
* The first is `direct`, the second is `indirect`

---

## Displacement and Scaling

* Arrays are very common, so syntax supports them
  * `8(%rax)` is `memory[%rax + 8]`
  * like `char_ptr[8]`
* But most displacement has a scaling factor
  * E.g. ints have a scale of 4 bytes
  * `8(%rax, %rcx, 4)` is `memory[%rax + 8 + 4*%rcx]`
* Can skip the displacement for familiar array access
  * `(%rax, %rcx, 4)` is `memory[%rax + 4*%rcx]`

---

## Floating Point Numbers

* These seem to trouble you

<div style="text-align: left;">
Consider a 4-bit floating point value with 1 bit of sign, 2 bits of exponent, and 1 bit of signficand. Bias = 2. Infinity and NaN are not used in this format. What is the smallest (largest negative) value that this format can represent?
</div>

---

## Floating Point Numbers

* Slight change.

<div style="text-align: left;">
Consider a 4-bit floating point value with 1 bit of sign, 1 bit of exponent, and 2 bits of signficand. Bias = 1. Infinity and NaN are not used in this format. What is the smallest (largest negative) value that this format can represent?
</div>

---

## Float Values

* Often support special patterns for inf and NaN
  * Generally with all exponent bits set
* For fp32
  * value = $(-1)^{sign} \times 2^{exponent - bias} \times (1+\frac{Significand}{2^{23}})$
* If exponent bits all 0
  * value = $(-1)^{sign} \times 2^{1-bias} \times (\frac{Significand}{2^{23}})$
  * These are subnormal numbers

---

## Rounding

* How do floats round?
  * What if we always chose up or down?
    * Then we get biased results
* So we could round up half the time and down the other half
  * For example, round towards even

---

## Stochastic Rounding

* For machine learning, that isn't good enough
* Rounding to nearest even has bias with some distributions
* Instead, round with probability proportional to the distance
  * e.g. 0.1 has to round to 0 or 0.5
    * it rounds to 0 with p = 0.2,
    * it rounds to 0.5 with p = 0.8

---

## Digital Math

* All math happens in the CPU
  * Integer math in ALUs
    * Arithmetic logic units
  * Floating point math in FPUs
    * Floating Point Units

---

## Idyllic Architecture

* Real hardware is complex, so we made up a simple system

---

## Pipelines

* Here's our idyllic version of a 5-stage CPU pipeline
* Each gray bar is a buffer, called a `pipeline register`
* The pipeline stages increase throughput roughly 5x

</div>
<div class="col">

</div>
</div>

---

## Pipeline Stages$^*$

1. Instruction fetch (IF)
2. Instruction decode/register fetch (ID)
3. Execute (math or address calculation) (EX)
4. Memory access (MA)
5. Write back (WB)

<p style='font-size:25pt'>$^*$ For our super-simplified example pipeline</p>

---

## Where is the Cache?

* Notice that there are multiple pieces with memory
  * Instruction cache in the IF stage
  * Registers in ID
  * L1 cache in MA
    * And other memory if there is a miss in L1
* So our idyllic version of the pipeline is hiding many details

---

## Dependencies

* Our idyllic pipeline is most useful for revealing dependencies
  * If we need to load data into a register for some math, that is a data dependency

<div style="text-align: left;">
Q: A load instruction issued on cycle 0 completes at the end of cycle 6. The next instruction requires that load result in the EX stage. How many no-ops instructions are issued to insert a bubble in the pipeline until the data is ready? Assume that we can forward directly from the MA stage to the EX stage.
</div>

---

## Dependency

* If the first instruction is issued on cycle 0, the next is issued on cycle 1
* EX is the third stage, so EX for the second instruction occurs on cycle 3
  * But the data is not ready until cycle 6
  * If the result is forwarded directly to the EX state, we still need to stall for 3 cycles

---

## Dealing with Dependencies

* Data hazards
  * Data forwarding
  * Out of order execution
* Control Hazards
  * Use speculative execution with branch prediction
* Instructions are prefetched so that different branch instructions are available

---

## Prefetching

* It takes time to load data into a cache
  * Whether the instruction cache, or the data cache
* So modern CPUs predict what data you want
* They do this by guessing the data access pattern
  * We call this locality
  * Good locality leads to better prefetching and fewer misses, and more efficient data usage

---

## Locality

* Programs use data near what they are currently using
* `Temporal locality`: programs keep using the data they've got
* `Spatial locality`: when programs read new data, it is near what they recently addressed

---

## Locality