# CS 530 - Lecture 03

## Planning Under Non-Determinism

Bernhard Firner

2026-01-29

---

## Review: Agents

* DAVE (the Darpa Autnomous VEhicle) showcased a simple, robust reflex agent
  * This isn't quite a full system, but it looks like part of one
* Relied upon a DNN trained from pixels to produce actions
  * Called end-to-end in the paper, nowadays it might be called imitation learning instead

---

## DNNs and Noisy Data

* DNNs make evaluation metrics difficult
  * Sometimes they give good results in spite of bad data
* In the case of DAVE, the training data was doubtless noisy
  * So the trained model didn't match training data, so what?
  * The training signal was raw and unprocessed

---

## DNNs and Noisy Data

* In fact, noisy labels, if they are unbiased, are basically ignored by modern ML

---

## DNNs In Practice

* The end result is that we don't know how a DNN will perform in the wild
  * And yet we build systems on top of them
* How? Usually through a technique called calibration
* We'll get into the details later; first, let's talk about agents and uncertainty
* Now is a good moment to discuss the [Chinese Room](https://en.wikipedia.org/wiki/Chinese_room)

---

## Review: Agents

* Agents interact with an environment (and perhaps each other)
* DAVE, the autonomous obstacle avoider, simplified things by being purely reflexive
* If we don't know if our action will succeed, and want to do something when it fails, we need planning
  * Start with section 11.5.3 for more background

---

## Planning and Execution

* Planning is a key component of any non-reflexive agent
* Enables self-evaluation, required for robust systems
  * Remember the Sphex wasp
* But what is planning? How does an agent monitor itself?

---

## Execution Monitoring

* A plan is pointless unless you check your progress against it
* An agent with a plan must therefore apply **execution monitoring**, in some fashion
* Monitoring should occur before taking an action, to verify that the action still makes sense

---

## Action Monitoring

* This is a simple checklist
  * For example, before pressing the gas, verify that the vehicle is on and the parking brake is off
* Basically, if there is a set of circumstances where the action would be a bad idea, this stops a disaster
  * So we could put safety checks here

---

## Plan Monitoring

* Before taking an action, verify that the current plan still makes sense
  * If we begin to change lanes to avoid a slow moving vehicle, we can stop if that vehicle moves into an exit lane
* It is important to remember the original cause of an agent's action
  * Continuing with a plan after the original cause is gone makes an agent look particularly silly

---

## Goal Monitoring

* Let's say you drop your friend off at the store and look for a parking spot
  * The lot is quite full, so you begin to circle
* If you see your friend exit the store, you can abandon the search for parking
  * The end goal was to accompany your friend, not find a parking spot
  * Parking was just a step in your plan that is no longer required

---

## Uncertainty and Decisions

* The environment could change, which changes the best current action
* But it is just as likely that your *estimate* of the environment updates
  * For example, you think that a parking spot is empty and begin to pull in
  * Then you see a motorcycle parked in the spot
* Now you must back up, remember the bike is there, and make a new plan

---

## Uncertainty and Replanning

* Uncertainty means that many plans will be partially executed and aborted
* It makes planning more difficult
  * Can also advantage systems that have exploratory behavior
* Humans plan under uncertainty every day; how do we do it?

---

## Getting Here on Time

* We all chose some plan to get here (approximately) on time
* I could imagine traffic and parking that take an hour to get through
  * But I am unlikely to leave an hour early
  * Why not?

</div>

---

## Preferences

* I, as a human being with great predictive powers, can imagine several outcomes
  * Outcome 1: I am early. Now I have to sit around somewhere, wasting my time.
  * Outcome 2: I am just in time, making optimal use of my time.
  * Outcome 3: I am late, forcing me to run up the stairs and break a sweat. I will feel the shame at doing a poor job.
* I can also imagine how good, or bad, any of those outcomes would feel

---

## Utility Theory

* If I can quantify my preferences for those outcomes, and I weigh them against the extra time I get by leaving later
  * If I have literally nothing to do, for example, then I may as well get to class an hour early
    * There is no change in utility from sitting around my office, staring at a wall, and coming here to stare at a wall

---

## Probability

* If travel time were deterministic, then I could always leave at $class-15m$
  * It isn't!
* Each minute earlier increases the probability of outcomes 1 and 2, and decreases 3
* I can multiply the utility of each outcome with their probability, and thus score every departure time

---

## Decision Theory

* Combining probability and utility like that leads to decision theory
  * Governs everything from your cleaning robot making a second pass to your emergency braking system
* In a perfect world, we know all utility values and probabilities
  * But that obviously isn't the case

---

## Discovering Probability

* Probabilities are discovered from data
* These are the machine learning models that you learned about
  * Bayes rule, regression, decision trees, SVM, neural networks
* They all use past data to predict the future
  * Some are probability-base, some require calibration

---

## Utility

* We predict the utility of an outcome, but we could be wrong
  * For example, we may believe that we want to be in the left lane
    * But upon getting there, we discover that it is blocked by an accident
* Simple agents may still hard-code actions
  * But advanced ones predict utility

---

## Simulation

* Combining different probabilities is a challenge
* But, if we can accurately model a system, we can use that model itself to train our behavior
  * This is the insight behind reinforcement learning
  * We'll talk more about this later

---

## Example System

* Let's look at another system
  * This one is more complicated than DAVE
  * Supports multiple behaviors, simple planning

---

## Project Details

* Also want to talk about the class project details
  * This research example can serve as a good example

---

## AI Case Study

* [Toward Low-Flying Autonomous MAV Trail Navigation using Deep Neural Networks for Environmental Awareness](https://arxiv.org/abs/1705.02550)
* [Video](https://youtu.be/USYlt9t0lZY?si=2CXPqOkOz--Wp3Aw) on first author's youtube
* Code [https://github.com/NVIDIA-AI-IOT/redtail](https://github.com/NVIDIA-AI-IOT/redtail)
* Project code is stale now, but would not be complicated to replicate

---

## System Components

* Drone egomotion system
  * Lidar Lite V3
  * PX4FLOW optical flow sensor with 6mm wide-angle lens
  * IMU
* Fairly standard part of any quadcopter flight hardware

---

## System Components

* Inference Camera
  * Microsoft HD Lifecam HD5000
  * 720p, 30fps
  * $70^\circ$ FOV (?)

---

## Learned Components

<div class="col">
* Learned components are fused by a controller into waypoint instructions
  * Trail following DNN
  * Object Detection DNN
  * Obstacle Detector
</div>

</div>

---

## Timing

* System runs at two different time scales
  * Real-time control of rotors to provide stability and execute long-term paths
  * Longer-term execution of waypoint navigation or hold commands from DNN fusion

---

## Drone Egomotion

* Quadcopters rely upon a fairly standard control loop
* Egomotion is a module that estimates an agent's motion
  * Here, lidar is used to track height
  * Optical flow (from a camera) is used to estimate motion and changes in pose
  * Those are combined with tilt and gyro measurements from an IMU

---

## Filtering

* Egomotion estimates are filtered with an extended Kalman filter
  * We will talk about filtering later
* This is a way to strip out the noise and uncertainty of the egomotion estimates
* Because the system is autocorrecting, as long as egomotion isn't *terrible*, the drone will follow a path

---

## Control

* The drone can be commanded to move to a given coordinate (or turn to a new pose)
* A "plan" is sent to the Pixhawk, which breaks it into a series of steps
  * Feedback from egomotion is used to adjust actions
  * Probably with a PID controller
    * "Proportional-integral-derivative" controller

---

## Commands

* Where do commands come from?
  * There are three ML sources
  * Trail following, which adjusts the yaw and position of the vehicle
  * Object detection, which tells it to stop when a vulnerable human is sighted
  * Obstacle detection

---

## Prediction to Action

* Predicted trail angle and offset and combined (not in a robust way)
* Combined into a single "steering" angle for the drone
* This is what is known as a "pure pursuit" controller
  * Updates to new angle at each step and pursues the target
  * Looks stable if the predictions are good and turn increments are small

---

## Prediction to Action

```C++
void PX4Controller::computeDNNControl(const float class_probabilities[6], float& linear_control_val, float& angular_control_val)
{
    // Normalize probabilities just in case. We have 6 classes, they are disjoint - 1st 3 are rotations and 2nd 3 are translations
    float prob_sum = class_probabilities[0] + class_probabilities[1] + class_probabilities[2];
    assert(prob_sum!=0);
    float left_view_p   = class_probabilities[0] / prob_sum;
    float right_view_p  = class_probabilities[2] / prob_sum;

prob_sum = class_probabilities[3] + class_probabilities[4] + class_probabilities[5];
    assert(prob_sum!=0);
    float left_side_p   = class_probabilities[3] / prob_sum;
    float right_side_p  = class_probabilities[5] / prob_sum;

// Compute turn angle from probabilities. Positive angle - turn left, negative - turn right, 0 - go straight
    float current_turn_angle_deg =  dnn_turn_angle_*(right_view_p - left_view_p) + dnn_lateralcorr_angle_*(right_side_p - left_side_p);

// Do sanity check and convert to radians
    current_turn_angle_deg = std::max(-90.0f, std::min(current_turn_angle_deg, 90.0f));   // just in case to avoid bad control
    float current_turn_angle_rad = (float)angles::from_degrees((float)current_turn_angle_deg);

// Filter computed turning angle with the exponential filter
    turn_angle_ = turn_angle_*(1-direction_filter_innov_coeff_) + current_turn_angle_rad*direction_filter_innov_coeff_; // TODO: should this protected by a lock?
    float turn_angle_rad = turn_angle_;
    // end of turning angle filtering

ROS_INFO("DNN turn angle: %4.2f deg.", (float)angles::to_degrees(turn_angle_rad));

// Create control values that lie on a unit circle to mimic max joystick control values that are on a unit circle
    linear_control_val  = cosf(turn_angle_rad);
    angular_control_val = sinf(turn_angle_rad);
}
```

---

## Trail Data

* Some of the data came from earlier work
  * [https://ieeexplore.ieee.org/abstract/document/7358076](https://ieeexplore.ieee.org/abstract/document/7358076)

---

## Collection Rig

* Around 17k training frames, 7k testing

---

## New Data Collection

* Prior data lacked off-center correction
* New data had positions taken over a meter of space
  * At 30fps, 3 hours of data is around 1 million frames

---

## Objects

* Objects, for this project, mainly mean humans
* When a human is detected, the drone stops and hovers in place for safety

---

## Object Callback

```C++
void PX4Controller::objDnnCallback(const sensor_msgs::Image::ConstPtr& msg)
{
    if (obj_det_limit_==-1.0f)
    {
        // If disabled
        return;
    }

std::string expected_encoding("32FC1"); // 1 channel float array
    const unsigned int elem_count = 6;

if(msg->width!=elem_count || expected_encoding.compare(msg->encoding)!=0)
    {
        ROS_INFO("OBJ DNN CALLBACK ERROR: This node expects to receive width=6,encoding=%s", expected_encoding.c_str());
        assert(false);
        return;
    }

const float* objects = (const float*)(msg->data.data());
    unsigned int obj_count = msg->height;

// Do not use DNN outputs until operator presses "enable DNN" button (usually A).
    if (!use_dnn_data_)
    {
        return;
    }

int obj_class = -1;
    float obj_prob = -1;
    int obj_x = -1;
    int obj_y = -1;
    int obj_height = 0;
    int obj_width = 0;
    bool should_stop = false;
    for(unsigned int i = 0; i<obj_count; i++)
    {
        float class_id  = objects[i*elem_count + 0];
        float prob      = objects[i*elem_count + 1];
        float x         = objects[i*elem_count + 2];
        float y         = objects[i*elem_count + 3];
        float w         = objects[i*elem_count + 4];
        float h         = objects[i*elem_count + 5];

// Stop if object's height more than some limit relative to dnn frame height
        // for correct class and sufficient probability
        if( (int)class_id==CLASS_OBJ_STOP &&
            prob >= obj_det_limit_ &&
            h/float(DNN_FRAME_HEIGHT)>OBJ_STOP_HEIGHT_RATIO
        )
        {
            should_stop = true;
            obj_class = (int)class_id;
            obj_prob = prob;
            obj_x = (int)x;
            obj_y = (int)y;
            obj_height = (int)h;
            obj_width = (int)w;
            break;
        }
    }

if(should_stop)
    {
        use_dnn_data_ = false;  // Turn OFF AI control and stop
        linear_control_val_ = 0;
        angular_control_val_ = 0;
        ROS_INFO("OBJ DNN STOP DETECTED: class=%d, prob=%4.2f, x=%d, y=%d, width=%d, height=%d", obj_class, obj_prob, obj_x, obj_y, obj_width, obj_height);
        ROS_INFO("DNN control is de-activated!");
    }
}
```

---

## Object Data

* Simply reused the PASCAL VOC dataset
  * Same dataset used to train original YOLO

---

## Obstacle Detection

* Meant to be a reflexive measure to avoid collisions
* Used a monocular approach that computes 3D maps
  * direct sparse odometry (DSO)
  * [https://youtu.be/H7Ym3DMSGms?si=PlgShWSHoFf1AhXj&t=121](https://youtu.be/H7Ym3DMSGms?si=PlgShWSHoFf1AhXj&t=121)

---

## Domain Adaptation

* So how much data do the DNNs require?
  * [https://youtu.be/ZKF5N8xUxfw?si=_55raoOEVJze5tVa](https://youtu.be/ZKF5N8xUxfw?si=_55raoOEVJze5tVa)
  * Reported 30 minutes of data to fly 2km on a railroad
* Similar to DAVE, no human labelling is required

---

## Course Project

* Imagine you want to make a product or research project using an AI agent
  * Let's say a drone burrito delivery robot
* Your project will be to describe what you need to build the agent
  * Hardware
  * Data
  * Algorithms

---

## Hardware

* This is the most straightforward part
  * But I want you to justify decisions
* If you want to deliver warm burritos, maybe you'll need to fly at a particular speed
  * That means that you need a good enough camera to dodge pedestrians
    * Justify it!

---

## Data

* How much data will you need? What are the labels?
  * Find related work to justify the amounts and approach
* Collect or simulate some data
* Mock up a labelling tool
  * Report how long it takes to label, and label enough to report label accuracy

---

## Data Augmentation

* Data augmentation can drastically reduced the required data
  * Can you augment your data?
  * How will you know that the augmentation confers real benefits?
* Not talking about simple color perturbatinos

---

## Algorithms

* What learning algorithms will you use?
* What control algorithms?
  * We haven't talked about Kalman filters and PID controllers, but those choices go here
* Simulate your agent making choices with different amounts of noise
  * How good do the predictions need to be?
    * For comfort? To prevent spilling the queso? Whatever is relevant to you.