<!--
Abstract:

CS 211
Computer Architecture
Lecture 03: Pointers and Memory
-->

# CS 211 - Lecture 03 - Pointers and Memory

Bernhard Firner

2025-09-09

---

## Review!

* `argc` and `argv`
* arrays and strings
* pointers

---

## Looking at argc and argv

```C
/*
 * This is an example program!
 * To compile code:
 * gcc example.c -o example
 */

#include <stdio.h>

int main(int argc, char** argv) {
	// Your program goes here.

// Say hello.
    printf("Hello world!\n");
    // Print out argc, the count of the program arguments
    printf("There are %d arguments.\n", argc);
    // What's the first argument?
    printf("The first argument is %s.\n", argv[0]);
    // Return something.
    return 123;
}
```

---

## What Are argc and argv?

* The main function has two arguments
* `argc` is the argument count, the number of arguments to your program
* `argv` is the argument vector, the string representations of the arguments
* These are not known at `compile time`, only at `run time`.
* argv is an `array`
  * a block of memory with space for a fixed number of elements

---

## What is a string in C?

* The argument vector is an array of strings
* A string is an array of characters
  * Each string ends with a special, null character
    * Value of 0
    * Written `\0`
* Use `man ascii` on linux to see all ASCII characters

---

## Array Access

* Pointers are numbers to memory locations
  * e.g. 0x7fff82006545
* Those memory locations are arbitrary
  * The OS chooses an arbitrary range for a program
  * This allows for virtual memory, swap, virtual machines, etc
  * Not an OS course, so we'll leave it at that

---

## Diving into arrays

* argv looks something like this:

<table>
<tr>
<td><b>index</b></td>
<td>0</td>
<td>1</td>
<td>2</td>
<td>...</td>
<tr>
<td><b>content</b></td>
<td>pointer1</td>
<td>pointer2</td>
<td>pointer3</td>
<td>...</td>
</tr>
</table>

<br/>

* And `argv[0]` points to something like this:

<table>
<tr>
<td><b>index</b></td>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<tr>
<tr>
<td><b>character</b></td>
<td>a</td>
<td>.</td>
<td>o</td>
<td>u</td>
<td>t</td>
<td>\0</td>
</tr>
</table>

---

## Arrays

* To access elements in an array we specify an index inside of square brackets
  * These things: `[ ]`
* The first index is 0

```C
//Access the first element
argv[0]
//Access the third element
argv[2]
```

---

## Arrays Vs Pointers

* Arrays are pointers.
  * Associated with a C type
  * That means `char*` is a different type  than `int*`
    * Why?

---

## More Pointer Math

* `argv` is a `char**`
  * That means it stores 8 byte numbers, since this is a 64-bit system
  * Adding 1 to a `char**` increases the memory location by 8
  * Subtracting 1 would decrease it by 8
* `argv[0]` is a `char*`
  * A `char` is 8 bits
  * Adding 1 will increase the pointer value by 1

---

## Data sizes

* It is easy to check the size of a type or variable
* Use the `sizeof` operator
* `sizeof(argv)` will return 8
* `sizeof(char**)` will return 8
* `sizeof(char*)` will return 8
* `sizeof(char)` will return 1
* [https://cppreference.com/w/c/language/sizeof.html](https://cppreference.com/w/c/language/sizeof.html)
* Self explanatory

---

## Access Operators

* We've used the array operator `[]`
  * But it's just syntactic sugar for `pointer dereference`
  * `argv[x]` is equivalent to `*(argv + x)`
* There is also a reverse operator
  * Called the `address of` operator
  * `&`
* [https://cppreference.com/w/c/language/operator_member_access.html](https://cppreference.com/w/c/language/operator_member_access.html)

```C
int a = 6;
// b points to a
int *b = &a;
// a is now 2
b[0] = 2;
// a is now 4
*b = 4;
```

---

## Casting Tricks

* We can tell the compiler to treat a variable of one type as another type
* This is especially useful with pointers

```C
int a = 256;
// This casts &a, which is an int*, as a char*.
unsigned char* b = (char*)&a;
// Now we can examine each byte of a.
for (int i = 0; i < sizeof(a); ++i) {
    // Notice that we print the char values with %i, so we see number.
    printf("a part %i of %i is %i\n", i, sizeof(a), b[i]);
}
```

---

## Thinking in Hexedecimal

* Usually we prefer to look at byte data in hex
* Use the `x` (or `X` for capital letters) format character in `printf`
  * Works with unsigned values, but you can print any data

---

## Full Example

```C
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv) {
    if (argc < 2) {
        printf("This program expects a number as an argument!\n");
        return 1;
    }

int a = atoi(argv[1]);
    printf("%i is 0x%x\n", a, a);
    unsigned char* bytes = (char*)&a;
    for (int i = 0; i < sizeof(a); ++i) {
        printf("Byte %i is 0x%x\n", i, bytes[i]);
    }
    return 0;
}
```

---

## Numeric Representations

* What is 1?

---

## Endianness

* The `least significant byte` is first
  * Meaning that `1` is stored at index 0
  * `bytes[0]`
* A system that stores the least significant byte first is called `little-endian`
* The opposite would be `big-endian`
  * `most significant byte` first

---

## More on Endianness

* The internet generally stores things in big-endian format
  * This leads to headaches when sending data online
* Some systems also store data in mixed-endian formats
  * Thankfully not on the ilab machines

---

## More Numbers

* What is $2^{31} - 1$?
* This is `INT_MAX`, from limits.h

<pre>
$ ./a.out 2147483647
2147483647 is 0x7fffffff
Byte 0 is 0xff
Byte 1 is 0xff
Byte 2 is 0xff
Byte 3 is 0x7f
</pre>

---

## More! Bigger!

---

## Overflow

* Why?
* An `int`, using 4 bytes, cannot store 2147483648
  * Range is from $-2^{31}$ to $2^{31}-1$
* An operation outside of the range makes it negative
  * By why is 0x80000000 the largest negative number?

---

## Negative Numbers

<pre>
$ ./a.out -1
-1 is 0xffffffff
Byte 0 is 0xff
Byte 1 is 0xff
Byte 2 is 0xff
Byte 3 is 0xff
</pre>

* Weird!
* 0x80000000 is more negative than 0xffffffff?

---

## Two's Complement

* Remember modulo arithmetic?
* Two's complement allows for elegant numeric representation
* 2147483647+1 does not exist, so it wraps to the end
  * 2147483647 is 0xffffff7f
  * 2147483647 + 1 is 0x00000080
    * Your hardware likes this!

---

## Remember Endianness!

* Adding 1 to 0xffffff7f begins at the left
  * Since this is a little endian system
  * 0xff + 1 is 0x00 with a bit overflowing to the adjacent byte
  * That overlow reaches 0x7f, turning it to 0x80

---

## More Two's Complement

* 0x80000000 + 1 is 0x80000001
* That continues until -1, which is 0xffffffff
* Any number with the highest bit set is negative
* Any negative number plus its positive counterpart has value 0
  * 0xffffffff + 0x00000001 = 0x00000000
  * So elegant!

---

## Converting to Two's Complement

* Two find a negative number, start with the positive value
  * E.g. 1 = 0x00000001
  * The highest bit represents the sign, limiting the range of values
* Flip every bit
  * 0xfffffffe
* Add 1
  * 0xffffffff

---

## Example Negation

```C
    a = 0x80000000 + (a ^ 0x7fffffff) + 1;
    printf("%i is 0x%x\n", a, a);
```

* `^` is xor, or `exclusive or`
  * It means a or b, but not both
  * 1 ^ 1 = 0
  * 1 ^ 0 = 1
  * 0 ^ 0 = 0

---

## Implications

* Two's complement simplifies hardware
* Easy to check for negative values as well
  * Just see if the most significat bit is set
  * In C, you can use the `binary and` (`&`) operator to check for a bit
  * `a & 0x80000000`
* `Addition, subtraction, and multiplication are the same for signed and unsigned numbers`

---

## Why not simpler?

* Why not just use the most significant bit for the sign?
  * The rest of the number would stay the same
* First, it's wasteful
  * There would be two values for 0
* Second, it requires more complicated hardware
  * Two's complement has been dominant since the 1960s
    * Since IBM System/360 and PDP computers

---

## Summing Examples

```C
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv) {
    if (argc < 3) {
        printf("This program expects 2 numbers as arguments!\n");
        return 1;
    }

// This reads a long long.
    // The second argument is used for error checking.
    // NULL means we aren't checking for errors and the function ignores it.
    // The last argument, 16, says that we are reading values in base 16.
    long long a = strtoll(argv[1], NULL, 16);
    long long b = strtoll(argv[2], NULL, 16);
    printf("0x%llx + 0x%llx = 0x%llx\n", a, b, a+b);
    printf("%lli + %lli = %lli\n", a, b, a+b);
    return 0;
}
```

---

## Some Sums

<pre>
$ ./a.out 7fffffffffffffff 1
0x7fffffffffffffff + 0x1 = 0x8000000000000000
9223372036854775807 + 1 = -9223372036854775808
</pre>
<pre>
$ ./a.out -1 1
0xffffffffffffffff + 0x1 = 0x0
-1 + 1 = 0
</pre>
<pre>
$ ./a.out -ffffffffffffffff -1
0x8000000000000000 + 0xffffffffffffffff = 0x7fffffffffffffff
-9223372036854775808 + -1 = 9223372036854775807
</pre>

---

## Pause

* Any questions or confusion so far?

---

## Let's Write an Algorithm

* We've covered just about enough topics to do something interesting
  * So let's find some prime numbers!
  * With an algorithm called the `Sieve of Eratosthenes`

---

## The Sieve of Eratosthenes

* Finds all of the primes up to some number
* Algorithm:
  1. Mark all numbers from 2 to N as being possibly prime
  2. Start at 2 (the first prime)
  3. This number is prime; mark all of its multiples as not prime
  4. Go to the next number still marked as possibly prime
  5. Repeat steps 3 and 4 until reaching N

---

## Storing information

* We could make variables for each number

```C
bool two_is_prime = false;
bool three_is_prime = false;
bool four_is_prime = false;
```

* That would be silly
* Instead, we should use an array

---

## Initializing arrays

* [https://cppreference.com/w/c/language/array.html](https://cppreference.com/w/c/language/array.html)
* Three types:
  * Known constant size
  * Variable length
  * Unknown size
* Let's just look at the first for now
  * Save some fun for next class!

---

## Constant size arrays

```C
#include <stdio.h>
#include <stdbool.h>

void print_array(bool array[10]) {
    for (int i = 0; i < 10; ++i) {
        printf("%i: %i\n", i, array[i]);
    }
}

int main(int argc, char** argv) {
    {
        // This has a fixed size
        bool bool_array[10];
        print_array(bool_array);
    }
    {
        // We could initialize with a set of values
        bool bool_array[] = {false, false, false, false, false,
                             true, true, true, true, true};
        print_array(bool_array);
    }
    {
        // Or we could just set some values
        bool bool_array[10] = {false, false, true};
        print_array(bool_array);
    }
    return 0;
}
```

---

## Always Initialize Arrays

```
0: 0
1: 0
2: 112
3: 84
4: 50
5: 106
6: 252
7: 127
8: 0
9: 0
0: 0
1: 0
2: 0
3: 0
4: 0
5: 1
6: 1
7: 1
8: 1
9: 1
0: 0
1: 0
2: 1
3: 0
4: 0
5: 0
6: 0
7: 0
8: 0
9: 0
```

---

## Initialization

* Any initialization will force C to initialize all elements with a default
* For non-default values, use `memset` from `<string.h>`

---

## Proof Through Assembly

> $ gcc -g -c example12.c -o example12.o
>
> $ objdump -d -M intel -S example12.o

```
example12.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <print_array>:
#include <stdio.h>
#include <stdbool.h>

void print_array(bool array[10]) {
   0:	f3 0f 1e fa          	endbr64
   4:	55                   	push   rbp
   5:	48 89 e5             	mov    rbp,rsp
   8:	48 83 ec 20          	sub    rsp,0x20
   c:	48 89 7d e8          	mov    QWORD PTR [rbp-0x18],rdi
    for (int i = 0; i < 10; ++i) {
  10:	c7 45 fc 00 00 00 00 	mov    DWORD PTR [rbp-0x4],0x0
  17:	eb 30                	jmp    49 <print_array+0x49>
        printf("%i: %i\n", i, array[i]);
  19:	8b 45 fc             	mov    eax,DWORD PTR [rbp-0x4]
  1c:	48 63 d0             	movsxd rdx,eax
  1f:	48 8b 45 e8          	mov    rax,QWORD PTR [rbp-0x18]
  23:	48 01 d0             	add    rax,rdx
  26:	0f b6 00             	movzx  eax,BYTE PTR [rax]
  29:	0f b6 d0             	movzx  edx,al
  2c:	8b 45 fc             	mov    eax,DWORD PTR [rbp-0x4]
  2f:	89 c6                	mov    esi,eax
  31:	48 8d 05 00 00 00 00 	lea    rax,[rip+0x0]        # 38 <print_array+0x38>
  38:	48 89 c7             	mov    rdi,rax
  3b:	b8 00 00 00 00       	mov    eax,0x0
  40:	e8 00 00 00 00       	call   45 <print_array+0x45>
    for (int i = 0; i < 10; ++i) {
  45:	83 45 fc 01          	add    DWORD PTR [rbp-0x4],0x1
  49:	83 7d fc 09          	cmp    DWORD PTR [rbp-0x4],0x9
  4d:	7e ca                	jle    19 <print_array+0x19>
    }
}
  4f:	90                   	nop
  50:	90                   	nop
  51:	c9                   	leave
  52:	c3                   	ret

0000000000000053 <main>:

int main(int argc, char** argv) {
  53:	f3 0f 1e fa          	endbr64
  57:	55                   	push   rbp
  58:	48 89 e5             	mov    rbp,rsp
  5b:	48 83 ec 30          	sub    rsp,0x30
  5f:	89 7d dc             	mov    DWORD PTR [rbp-0x24],edi
  62:	48 89 75 d0          	mov    QWORD PTR [rbp-0x30],rsi
  66:	64 48 8b 04 25 28 00 	mov    rax,QWORD PTR fs:0x28
  6d:	00 00
  6f:	48 89 45 f8          	mov    QWORD PTR [rbp-0x8],rax
  73:	31 c0                	xor    eax,eax
    {
        // This has a fixed size
        bool bool_array[10];
        print_array(bool_array);
  75:	48 8d 45 ee          	lea    rax,[rbp-0x12]
  79:	48 89 c7             	mov    rdi,rax
  7c:	e8 00 00 00 00       	call   81 <main+0x2e>
    }
    {
        // We could initialize with a set of values
        bool bool_array[] = {false, false, false, false, false,
  81:	c6 45 ee 00          	mov    BYTE PTR [rbp-0x12],0x0
  85:	c6 45 ef 00          	mov    BYTE PTR [rbp-0x11],0x0
  89:	c6 45 f0 00          	mov    BYTE PTR [rbp-0x10],0x0
  8d:	c6 45 f1 00          	mov    BYTE PTR [rbp-0xf],0x0
  91:	c6 45 f2 00          	mov    BYTE PTR [rbp-0xe],0x0
  95:	c6 45 f3 01          	mov    BYTE PTR [rbp-0xd],0x1
  99:	c6 45 f4 01          	mov    BYTE PTR [rbp-0xc],0x1
  9d:	c6 45 f5 01          	mov    BYTE PTR [rbp-0xb],0x1
  a1:	c6 45 f6 01          	mov    BYTE PTR [rbp-0xa],0x1
  a5:	c6 45 f7 01          	mov    BYTE PTR [rbp-0x9],0x1
                             true, true, true, true, true};
        print_array(bool_array);
  a9:	48 8d 45 ee          	lea    rax,[rbp-0x12]
  ad:	48 89 c7             	mov    rdi,rax
  b0:	e8 00 00 00 00       	call   b5 <main+0x62>
    }
    {
        // Or we could just set some values
        bool bool_array[10] = {false, false, true};
  b5:	48 c7 45 ee 00 00 00 	mov    QWORD PTR [rbp-0x12],0x0
  bc:	00
  bd:	66 c7 45 f6 00 00    	mov    WORD PTR [rbp-0xa],0x0
  c3:	c6 45 f0 01          	mov    BYTE PTR [rbp-0x10],0x1
        print_array(bool_array);
  c7:	48 8d 45 ee          	lea    rax,[rbp-0x12]
  cb:	48 89 c7             	mov    rdi,rax
  ce:	e8 00 00 00 00       	call   d3 <main+0x80>
    }
    return 0;
  d3:	b8 00 00 00 00       	mov    eax,0x0
}
  d8:	48 8b 55 f8          	mov    rdx,QWORD PTR [rbp-0x8]
  dc:	64 48 2b 14 25 28 00 	sub    rdx,QWORD PTR fs:0x28
  e3:	00 00
  e5:	74 05                	je     ec <main+0x99>
  e7:	e8 00 00 00 00       	call   ec <main+0x99>
  ec:	c9                   	leave
  ed:	c3                   	ret
```

---

## Important Notes

* The compiler reuses the same memory multiple times
  * An unitialized array created at the end would see the third arrays values
* Array access is going from a smaller address to a larger address
  * rbp-0x12 to rbp-0x9
    * Look at instructions starting at 0x81

---

## Extra Notes

* Our `i < 10` is converted to `i <= 9`
  * At 0x4d, `jle` is jump if less than equal
* The `for` loop jumps to the comparison before doing anything
  * At 0x17 it `jmp` (jumps) to 0x49
  * We know this is a waste because `i` begins at `0` and is less than `10`
  * Compile again with the `-O1` flag, and the compiler optimizes this away

---

## Sieve to 100

```C
#include <stdio.h>
#include <stdbool.h>
#include <string.h>

int main(int argc, char** argv) {
    // Find and print the primes to 100
    bool maybe_prime[101] = {false, false};
    // Fill the next 99 values with true
    memset(maybe_prime+2, true, 99);

// Loop over all numbers to 100
    for (int i = 2; i <= 100; ++i) {
        // Loop over all multiples of any prime numbers
        if (maybe_prime[i]) {
            // Mark all multiples as not prime
            for (int j = i+1; j <= 100; ++j) {
                // A remainder of 0 means i evenly divides into j
                if (j % i == 0) {
                    maybe_prime[j] = false;
                }
            }
        }
    }

// Print the primes
    for (int i = 0; i <= 100; ++i) {
        if (maybe_prime[i]) {
            printf("%i is prime.\n", i);
        }
    }

return 0;
}
```

---

## Primes

```
2 is prime.
3 is prime.
5 is prime.
7 is prime.
11 is prime.
13 is prime.
17 is prime.
19 is prime.
23 is prime.
29 is prime.
31 is prime.
37 is prime.
41 is prime.
43 is prime.
47 is prime.
53 is prime.
59 is prime.
61 is prime.
67 is prime.
71 is prime.
73 is prime.
79 is prime.
83 is prime.
89 is prime.
97 is prime.
```

---

## Next Time:

* Variable size arrays
* Memory management

<!--
## Compiler Options

Note:
-Wall
-Werror
-fsanitize=address
-std=c99 (do we want to do this?)
-lm
-->