# CS 211 - Lecture 04 - Memory and Variable Size Arrays Bernhard Firner 2025-09-11 --- ## Review! * Pointers * Fixed-size arrays * Variable length arrays --- ## Arrays * [https://cppreference.com/w/c/language/array.html](https://cppreference.com/w/c/language/array.html) * Three types of arrays: * Known constant size * Variable length * Unknown size --- ## Known-Constant Size * Set at `compile time` * In function arguments: * `void print_array(bool array[10]) {` * In variable declaration: * `bool bool_array[10];` --- ## `sizeof` * The `sizeof` operator returns the size of a variable or type * Can be used to get the number of elements of an array with known size ```C #include
int main(void) { float a[10]; // floats are 4 bytes, and there are 10 of them printf("sizeof(a) is %li\n", sizeof(a)); printf("sizeof(*a) is %li\n", sizeof(*a)); printf("There are %li elements\n", sizeof(a)/sizeof(*a)); return 0; } ``` --- ## Sieve of Eratosthenes * Finds all of the primes up to some number * Algorithm: 1. Mark all numbers from 2 to N as being possibly prime 2. Start at 2 (the first prime) 3. This number is prime; mark all of its multiples as not prime 4. Go to the next number still marked as possibly prime 5. Repeat steps 3 and 4 until reaching N --- ## Sieve to 100 ```C #include
#include
#include
int main(void) { // Find and print the primes to 100 bool maybe_prime[101] = {false, false}; // Fill the next 99 values with true memset(maybe_prime+2, true, 99); // Loop over all numbers to 100 for (int i = 2; i <= 100; ++i) { // Loop over all multiples of any prime numbers if (maybe_prime[i]) { // Mark all multiples as not prime for (int j = i+1; j <= 100; ++j) { // A remainder of 0 means i evenly divides into j if (j % i == 0) { maybe_prime[j] = false; } } } } // Print the primes for (int i = 0; i <= 100; ++i) { if (maybe_prime[i]) { printf("%i is prime.\n", i); } } return 0; } ``` --- ## Variable-Length Arrays * We can't know everything ahead of time * For example, how many primes does a user need? * Have to get the input from them, at `run time` * Means a shift in responsibilities * From the compiler to us, the programmers --- ## Variable-Length Declaration * Just put the length in the `[]` * If the desired size is in variable x: * `int a[x];` * But now we cannot initialize as before * `int a[x] = {0, 1, 2, 3};` * Because the compiler does not know if there will be four elements. --- ## `sizeof` and Variable-Length Arrays * `sizeof` still works ```C #include
#include
int main(int argc, char** argv) { float b[atoi(argv[1])]; // floats are 4 bytes, and there are ? of them printf("sizeof(b) is %li\n", sizeof(b)); printf("sizeof(*b) is %li\n", sizeof(*b)); printf("There are %li elements\n", sizeof(b)/sizeof(*b)); return 0; } ``` * But only if the size is known within the current scope --- ## Cooler Sieve ```C #include
#include
#include
#include
int main(int argc, char** argv) { int size = atoi(argv[1]); bool maybe_prime[size]; memset(maybe_prime, true, size-3); maybe_prime[0] = false; maybe_prime[1] = false; for (int i = 2; i < size; ++i) { // Mark multiples as not prime // Go through every multiple of i, starting with 2*i // Then 2*i+i, 3*i+i, etc if (maybe_prime[i]) { for (int j = 2*i; j < size; j += i) { maybe_prime[j] = false; } } } for (int i = 0; i < size; ++i) { if (maybe_prime[i]) { printf("%i is prime.\n", i); } } return 0; } ``` --- ## Improvements Example * Let's write a program that sums numbers * We should validate our inputs * Check `argc` and `argv` before calling `atoi` * We'll pass the array to a function for practice --- ## Some Details * An additional complication: * We cannot use `sizeof` to get the size of the array * The array is passed into a function that handles arbitrary sizes * `int sum(int values[]) {` * vs * `int sum(int values[10]) {` * The array is just a pointer, it doesn't have a size * Remember `int values[]` is the same as `int* values` --- ## Summing numbers ```C #include
#include
int sum(int values[], int length) { int total = 0; for (int i = 0; i < length; ++i) { total += values[i]; } return total; } int main(int argc, char** argv) { // Exit early if we are not given anything to do if (argc < 2) { printf("Please provide at least one number to sum.\n"); return 0; } int numbers[argc-1]; // Remember to begin at i = 1 to skip the program name. for (int i = 1; i < argc; ++i) { numbers[i-1] = atoi(argv[i]); } printf("The sum is %i\n", sum(numbers, argc-1)); return 0; } ``` --- ## Improved Sieve * Let's revisit the sieve and move it into a function --- ## New Sieve ```C #include
#include
#include
#include
int main(int argc, char** argv) { // Return early if there are any problems. if (argc < 2) { printf("This program requires the maximum number to check as an argument.\n"); return 0; } int max = atoi(argv[1]); if (max < 2) { printf("Checking max < 2 is meaningless. %i requested.\n", max); return 0; } // Find and print the primes to max // This initialization is no longer allowed because the compiler // is not certain that there are two values. // bool maybe_prime[max+1] = {false, false}; bool maybe_prime[max+1]; // It is the programmer's responsibility to be sure this is safe maybe_prime[0] = false; maybe_prime[1] = false; // Fill the rest of the values with true. memset(maybe_prime+2, true, max-2); // Loop over all numbers to max for (int i = 2; i <= max; ++i) { // Loop over all multiples of any prime numbers if (maybe_prime[i]) { // Mark all multiples as not prime for (int j = i+1; j <= max; ++j) { // A remainder of 0 means i evenly divides into j if (j % i == 0) { maybe_prime[j] = false; } } } } // Print the primes for (int i = 0; i <= max; ++i) { if (maybe_prime[i]) { printf("%i is prime.\n", i); } } return 0; } ``` --- ## Making Useful Code * Notice how many libraries we are using? * They are saving us a lot of time * What if we wanted our sieve to be like that? * First step: take the code out of main * Turn it into its own function * Return the primes --- ## Returning an Array * We haven't talked about `ownership` * Any array we've used is owned by its local scope * When a variable goes `out of scope`, the compiler `reclaims` that memory * Remember how the assembly code re-used previous memory for consecutive arrays? * If we want to return an array, we'll need to change that --- ## Memory Ownership * In the summation example, `main` owns `numbers` * The `sum` function can change values, but cannot delete `numbers` * The compiler `reclaims` the memory from `numbers` when it goes `out of scope` ```C #include
#include
int sum(int values[], int length) { int total = 0; for (int i = 0; i < length; ++i) { total += values[i]; } return total; } int main(int argc, char** argv) { // Exit early if we are not given anything to do if (argc < 2) { printf("Please provide at least one number to sum.\n"); return 0; } int numbers[argc-1]; // Remember to begin at i = 1 to skip the program name. for (int i = 1; i < argc; ++i) { numbers[i-1] = atoi(argv[i]); } printf("The sum is %i\n", sum(numbers, argc-1)); return 0; } ``` --- ## Dynamic Memory Allocation * The fastest way to shoot yourself in the foot --- ## Dynamic Memory Allocation * When passing dynamic arrays * Keep track of array sizes or else * Memory corruption and segfaults * Security vulnerability * Range check all usage or else * Program crashes * Buffer overflow exploits --- ## Dynamic Memory Allocation * When cleaning up * Remember to delete them when they are no longer needed * If not? Memory leaks and system crashes * Make sure no code acts like they aren't deleted or * Memory corruption and segfaults * Security vulnerability --- ## Dynamic Memory Allocation * This is why people kept making new languages --- ## But Why Not... ```python #include
#include
// The wrong way int* doubled(int values[], int length) { int double_values[length]; for (int i = 0; i < length; ++i) { double_values[i] = values[i]*2; } return double_values; } int main(int argc, char** argv) { // Exit early if we are not given anything to do if (argc < 2) { printf("Please provide at least one number to double.\n"); return 0; } int numbers[argc-1]; // Remember to begin at i = 1 to skip the program name. for (int i = 1; i < argc; ++i) { numbers[i-1] = atoi(argv[i]); } int* doubled_values = doubled(numbers, argc-1); for (int i = 1; i < argc; ++i) { printf("%i doubled to %i\n", numbers[i], doubled_values[i]); } return 0; } ``` --- ## ``` example17.c: In function ‘doubled’: example17.c:10:12: warning: function returns address of local variable [-Wreturn-local-addr] 10 | return double_values; ``` * Returning function-local memory will fail * In this case, the function `doubled` owns the array * The memory inside of that function is reclaimed when the function ends --- ## Memory Reclamation * Remember the assembly code we don't like seeing? * When a function is called its variables are initialized * Where does the memory come from? * Usually the `stack` --- ## Stack Memory * The compiler cannot know what memory you've allocated * Needs a deterministic way to set aside memory for functions, etc * Solution: * Function memory grows from one side of the address space * The `stack` * User allocated memory grows from the other side * The `heap`
stack
$\longrightarrow$
$\longleftarrow$
heap
0
1
2
...
...
--- ## Unwinding * When a function returns, its memory is released * Called unwinding because there can be multiple functions that called functions that called... * Fast, efficient, and automatic * `Stack memory is a first in, first out stack` * Each function's memory "on top of" the last one * We can unwind without checking anything but the current function --- ## Stack Overflows * If you call enough functions, and they use enough memory, you can run out * This is called a `stack overflow` * Think of the `stack` as an array of fixed size that the compiler sets aside * Normally sufficient * Strange function calling patterns will cause problems --- ## Example ```C /* * Demonstrate a stack overflow. */ void bad() { int a[1000]; bad(); } int main(void) { bad(); return 0; } ``` --- ## Pause * Before going into Heap Memory, is everything clear? --- ## Heap Memory * When you manually allocate memory it goes onto the `heap` * Why do dynamic memory allocation? * Otherwise you can't return memory from your functions * We want our sieve function to be useful * it needs to return an array of numbers of unknown length --- ## Variable Length Arrays * Variable length arrays (VLAs) are a special case * They may either go onto the stack, or onto the heap * It depends upon their size, and is a compiler-specific decision * Always cleaned up when they go out of scope though --- ## Side Note on `VLAs` * Require special consideration * On embedded devices * or if you want precise control over memory consumption * In those cases, choose a maximum "worst case" size and make them fixed size --- ## `malloc`, `calloc`, and `free` * `Dynamically allocate` memory on the `heap` * This memory does *not* go out of scope * In `stdlib.h` * Read the `man` pages * Joined by `realloc` and `reallocarray` * Which you should avoid unless you know what you are doing ```C void *malloc(size_t size); void free(void *_Nullable ptr); void *calloc(size_t nmemb, size_t size); ``` --- ## `malloc` ```C void *malloc(size_t size); ``` * Allocates `size` bytes * You need to do the math * Need 100 `floats`? * `malloc(100 * sizeof(float))` * Forgot the `sizeof`? Welcome to buglandia. * If you're lucky, your program will crash * Returns a pointer to the beginning of those bytes * Memory is "as-is", uninitialized * Please initialize or use `calloc` --- ## `calloc` ```C void *calloc(size_t nmemb, size_t size); ``` * Like `malloc`, but splits the number of elements and their size into two arguments * Also sets the memory to 0 * This is what you probably want to use * Unless you need to be super duper fast and skip that pesky initialization * Not in this class, pal! --- ## Failure * Calloc and malloc can fail!!!! * Return `NULL` * Set `errno` * Often unchecked--how could allocating 100 values fail? * Very well-written code checks for and handles failure * The rest of our software crashes --- ## `free` ```C void free(void *_Nullable ptr); ``` * Gives the memory back * Basically, tells the OS that you are done with it --- ## Limited Memory * Remember, the pointer addresses you are using are fake * The OS translates those to real address under the hood * This makes the memory look like it's all yours, but it is shared * The address space is $2^{64}$ on our machines * But there isn't that much memory * It's a finite resource * Lesson learned whenever you leave your web browser open long enough --- ## `free` Rules ```C void free(void *_Nullable ptr); ``` * Only call `free` on memory that you allocate * e.g. from `malloc` and `calloc` * Don't call it on the same memory twice * Your program will crash if you are lucky * A lot of pointer documentation mentions "undefined behavior" * Just means that anything could happen --- ## The right way ```python #include
#include
int* doubled(int values[], int length) { // Allocate new memory int* double_values = calloc(length, sizeof(int)); for (int i = 0; i < length; ++i) { double_values[i] = values[i]*2; } // Return the memory. It is now owned by the caller. return double_values; } int main(int argc, char** argv) { // Exit early if we are not given anything to do if (argc < 2) { printf("Please provide at least one number to double.\n"); return 0; } int numbers[argc-1]; // Remember to begin at i = 1 to skip the program name. for (int i = 1; i < argc; ++i) { numbers[i-1] = atoi(argv[i]); } // Get the new memory, filled with doubled numbers. int* doubled_values = doubled(numbers, argc-1); for (int i = 0; i < argc-1; ++i) { printf("%i doubled to %i\n", numbers[i], doubled_values[i]); } // Remember to free the memory free(doubled_values); return 0; } ``` --- ## Better Sieve ```C #include
#include
#include
// This is for size_t, which is also defined in stdio.h and stdlib.h #include
#include
unsigned long* getPrimes(unsigned long max, size_t* num_found) { bool maybe_prime[max]; memset(maybe_prime, true, max-3); maybe_prime[0] = false; maybe_prime[1] = false; // Never trust the user *num_found = 0; for (size_t i = 2; i < max; ++i) { // Mark multiples as not prime // Go through every multiple of i, starting with 2*i // Then 2*i+i, 3*i+i, etc if (maybe_prime[i]) { // Increment the number of discovered primes *num_found += 1; for (size_t j = 2*i; j < max; j += i) { maybe_prime[j] = false; } } } unsigned long* primes = calloc(*num_found, sizeof(unsigned long)); // Use this to track the current prime position // We could use another number, but this feels cleaner. unsigned long* cur_prime = primes; for (unsigned long i = 0; i < max; ++i) { if (maybe_prime[i]) { *cur_prime = i; ++cur_prime; } } return primes; } int main(int argc, char** argv) { if (argc < 2) { printf("Please provide a maximum value to search.\n"); return 1; } unsigned long max = strtoul(argv[1], NULL, 0); if (max < 3) { printf("2 is the first prime number. %lu is too small.\n", max); return 1; } printf("Finding primes to %lu\n", max); // This is going to be equivalent to an int // Technically, any kind of counting in loops or sizing of memory should be // done with size_t. size_t size = 0; unsigned long* primes = getPrimes(max, &size); for (unsigned long* p = primes; p != primes+size; ++p) { printf("%lu is prime.\n", *p); } // Remember to free primes. free(primes); return 0; } ``` --- ## Passing By Reference * C++ has true reference types * C uses pointers to single objects ```C unsigned long* getPrimes(unsigned long max, size_t* num_found); ``` * `num_found` can be modified by getPrimes * Allows the function to return a second value * We've already seen this with the `strtol` family of functions --- ## Iterating ```C for (unsigned long* p = primes; p != primes+size; ++p) { printf("%lu is prime.\n", *p); } ``` * We could do this with a counting variable * Isn't this cleaner? * May be up to debate * Specifying a beginning and end pointer is an alternative to counting --- ## Next Classes * So many exciting topics! * Debugging! * 2D arrays! * Floating point numbers! * Structs!