# CS 530 - Lecture 06

## Noise and Fusion in State Space Models

Bernhard Firner

2026-02-17

---

## State Space Models

* Last time we talked about continuous space prediction and filtering
  * Using a Kalman filter, which assumes everything is gaussian
* The math is similar to HMMs
  * HMMs have hidden states
  * Continuous models have "hidden" set of means and covariances

---

## Discrete Sampling

* There is an added difficulty in the continuous space
* Our sampling is discrete, but the process is continuous
  * This means that we miss some (possibly large) changes of state
  * A bit like playing "red light, green light" with the world

---

## Filtering Vs Smoothing

* With HMMs we used the Baum-Welch algorithm to fit our parameters to observations
  * Using forward-backward as part of an expectation-maximization algorithm
* We can also do smoothing in the continuous domain
  * again by going backward with the predictions and samples from the forward pass

---

## Forward Pass

* Previously we only cared about the estimate of the internal state, $x$
  * Filtered mean at time t, given the data to t: $\mu_{t|t}$
* The backward pass will check how that differed from our predictions
  * Predicted mean at t+t, given the data to t: $\mu_{t+1|t}$
* The filtered and predicted variances are also required

---

## Forward Pass Code

```python
predicted_data = []
predicted_covariances = []
filtered_data = []
filtered_covariances = []
# Subtract the bias, which gets rid of gravity
for idx, datum in enumerate((accel_data - imu_mean)):
    kfilter.predict()
    # Predictions to be used in smoothing
    predicted_data.append(kfilter.x.flatten())
    predicted_covariances.append(kfilter.P)

# Normal kalman update
    kfilter.update(datum)

# The current estimates of the means and covariance matrix
    filtered_data.append(kfilter.x.flatten())
    filtered_covariances.append(kfilter.P)
```

---

## Kalman Smoothing

* Initialize
  * $\mu_{t|T}$ and $\Sigma_{t|T}$ come from the last state of filtering
* The backward pass begins with computing a Kalman gain matrix:
  * $J_t \triangleq \Sigma_{t|t}A^T\Sigma_{t+1|t}^{-1}$
* This matrix projects the difference between the backwards estimate and the forward estimate to correct the current esimate

---

## The Math

* Update the smoothed estimates
  * $\mu_{t|T} = \mu_{t|t} + J_t(\mu_{t+1|T} - \mu_{t+1|t})$
  * $\Sigma_{t|T} = \Sigma_{t|t} + J_t(\Sigma_{t+1|T} - \Sigma_{t+1|t})J_t^T$
* Then repeat for $t-1$

---

## The Problems

* Matrix inversion often leads to trouble
  * $J_t \triangleq \Sigma_{t|t}A^T\Sigma{t+1|t}^{-1}$
  * It is costly to compute, and often unstable
* There are multiple ways to attempt a fix, but let's pause here and discuss smoothing vs filtering

---

## Smoothing Alternatives

* Look up smoothing and you'll find a world of approaches
  * Continuous spaces have myriad assumptions, so we end up with a plethora of techniques
* Many are mathematically justified, but often assumptions don't hold
  * Extended Kalman filter allow nonlinear system but still assumes gaussian noise
  * Unscented Kalman filter is similar (but doesn't stink)

---

## Choosing

* Each domain has their own set of preferred filters
  * Wireless communication, different robots, astronomy, etc
* I'm not an expert in everything, so I can't really dig into each one
* But let's pivot back to filtering

---

## Filtering and Information

* Smoothing is generally superior to filtering because we have more information
  * "seeing" the future, basically
* Some algorithms try to incorporate more information into filtering
  * The particle filter attempts to match observations to a known environment

---

## Importance of Filtering

* Offline solutions are often obtainable, even with brute force
  * Add in different mixes of assumptions and extra information until something works
* Any interactive agent operating in the real world must use filtering
  * That means being somehow robust to noise

---

## Noise and Filtering

* If there were no noise, we wouldn't even talk about filtering or smoothing
* Let's simulate a bit of data to illustrate this

---

## Simulated IMU

```python
imu_mean = [0., 0., 9.8]
imu_var = numpy.array([6.47079693e-05, 3.39856070e-04, 2.72198300e-04])
# The IMU follows the right hand rule with Z pointing up and Y pointing forward

# 100 samples per second
samples_per_second = 100
delta_t = 1.0 / samples_per_second

# Human walking speed at 15 minutes/mile in m/s (with 1 mile approximately 1500m)
# This is 5/3 m/s
walking_speed = 1500 / (15 * 60)

# Generate some samples
imu_samples = []

waypoint_times = [4*samples_per_second, 8*samples_per_second, 12*samples_per_second]
waypoints = numpy.array([[0., 5.5, 0.],
                         [-1.45, 6.75, 0.],
                         [-1.45, 0.2, 0.]])

# Accelerate to walking speed over one second
for i in range(samples_per_second):
    # Accelerate at a constant rate until we reach walking speed
    # Dividing by one so we remember this means we accelerate to walking speed in 1 second
    imu_samples.append([0., walking_speed/1, 0.])
# Keep moving forward for three seconds
for i in range(3*samples_per_second):
    imu_samples.append([0., 0., 0.])
# Stop in 1 second
for i in range(samples_per_second):
    # Deccelerate at a constant rate until we reach 0 velocity
    imu_samples.append([0., -walking_speed/1, 0.])
# Accelerate left until reaching 1/2 walking speed
for i in range(samples_per_second):
    # Accelerate at a constant rate until we reach 1/2 walking speed at 1 second
    imu_samples.append([-0.5*walking_speed, 0., 0.])
# Continue left for 1 second
for i in range(samples_per_second):
    imu_samples.append([0., 0., 0.])
# Stop in 1/2 second
for i in range(int(samples_per_second/2)):
    # Deccelerate at a constant rate until we reach 0 velocity
    # We are stopping in 1/2 second from 1/2 walking speed, so the rate is walking_speed/second
    imu_samples.append([walking_speed, 0., 0.])
# Reverse the above action to return to the starting location
imu_samples = numpy.array(imu_samples)
imu_samples = numpy.concatenate((imu_samples, (imu_samples * [-1, -1, 1])))

# Make times for the samples
times = numpy.arange(len(imu_samples)) * delta_t

# Add noise
accel_data = imu_samples + numpy.random.normal(imu_mean, imu_var, imu_samples.shape)
```

---

## Outputs

* Let's run this data through our filter

```python
kfilter = KalmanFilter(delta_t, system_noise = system_noise, measurement_noise=imu_var)
filtered_data = []
# Subtract the bias, which gets rid of gravity
for idx, datum in enumerate((accel_data - imu_mean)):
    kfilter.predict()

# Normal kalman update
    kfilter.update(datum)

filtered_data.append(kfilter.x.flatten())
filtered_data = numpy.array(filtered_data)
```

---

## Results: Acceleration

---

## Results: Velocity

---

## Results: XYZ Position

---

## Results: Map Position

---

## Noise

* The noise I'm using comes from the resting variance of my phone IMU
* When in motion, the variance is orders of magnitude greater
  * Contains both sensor noise and noise from me
* So let's generate the regenerate those figures with noise included

---

## Results: Acceleration

---

## Reasonable?

* Gravity is $9.8m/s^2$
* Do we accelerate as much to move?
* Think how we bounce when we walk

</div>
<div class="col">

</div>
</div>

---

## Results: Velocity

---

# So Bad!

* Mistakes in acceleration accumulate in velocity
* And those then accumulate in position

</div>
<div class="col">

</div>
</div>

---

## Results: XYZ Position

* Velocities never come back to 0
* Position drifts off

</div>
<div class="col">

</div>
</div>

---

## Results: Map Position

---

## Doomed?

* So it looks like we'll never be able to use an IMU in real-time
  * Smoothing or particle filtering give more information
  * But noise ruins our estimates
* But what if we combine the IMU with other information?

---

## Fusion

* Remember the full update equation:
  * $z_t = A_tz_{t-1} + B_tu_t + \epsilon_t$
* We can inject information through $Bu$
* Or, alternatively, we can introduce new values for $A$ with different observations

---

## Fusing What?

* The obvious answer is location information
* But our problem stems from residual errors in the velocity
  * So let's begin with that
* Whenever the system comes to a stop, we'll inform the Kalman filter

---

## Control Update

```python
    def predict(self):
        # Update mean and variance predictions using the transition matrix
        self.x = self.A @ self.x
        self.P = self.A @ self.P @ self.A.T + self.Q

def control(self, B, u):
        # Call after predict to update with a control input
        # If the control matrix exists we follow the full equation:
        #  \mu_{t|t-1} \triangleeq A_t\mu{t-1} + B_t u_t
        self.x += B @ u

def _internal_update(self, C, y, measure_noise):
        # Compare the prediction to the observed state
        y_hat = C @ self.x
        residual = y - y_hat
        # Update the variance
        S = C @ self.P @ C.T + measure_noise
        # Kalman gain matrix
        K = self.P @ C.T @ numpy.linalg.inv(S)

# Update from the mean and covariance. 18.31-18.32 in Murphy's book.
        self.x = self.x + K @ residual
        self.P = (numpy.eye(len(self.P)) - K @ C) @ self.P

def update(self, y):
        """
        Arguments:
            y: y is the observed state.
        """
        y = y.reshape(-1, 1)
        self._internal_update(self.C, y, self.R)
```

---

## Control

* This is described as the "heavy handed" approach

```python
for idx, datum in enumerate((accel_data - imu_mean)):
    kfilter.predict()
    if use_stops and idx in stop_times:
        # If we were using a control input we would do this:
        kfilter.control(B, kfilter.x)
        datum = datum * 0
        kfilter.update(datum)
    else:
        # Normal kalman update
        kfilter.update(datum)
    filtered_data.append(kfilter.x.flatten())
filtered_data = numpy.array(filtered_data)
```

---

## Control Acceleration

* We forced acceleration and velocity 0 at several points
* It is instantaneous, so not obvious in acceleration

</div>
<div class="col">

</div>
</div>

---

## Control Velocity

---

## Control XYZ Position

* Now our velocities go to 0, so there is no drifting
* Graph is ugly

</div>
<div class="col">

</div>
</div>

---

## Control Map Position

* But the final result is much improved

</div>
<div class="col">

</div>
</div>

---

## Gentler Solution

* The gentler solution is to introduce a new observation through a observation matrix
* Tugs the means back to 0 without blowing away the statistics

```python
    def zero_velocity_update(self):
        # Make an observation matrix that contains the velocity for 0 velocity updates.
        self.zeroC = numpy.array([[1, 0, 0, 0, 0, 0, 0, 0, 0],
                                  [0, 1, 0, 0, 0, 0, 0, 0, 0],
                                  [0, 0, 1, 0, 0, 0, 0, 0, 0],
                                  [0, 0, 0, 1, 0, 0, 0, 0, 0],
                                  [0, 0, 0, 0, 1, 0, 0, 0, 0],
                                  [0, 0, 0, 0, 0, 1, 0, 0, 0]])
        # As an alternative to the control input, we can also expand the
        # observation to include velocity, and indicate that it is 0.
        # This doesn't interrupt the statistics of the system, so it could be preferable.
        y = numpy.zeros((6, 1))
        self._internal_update(self.zeroC, y, self.jitter)
```

---

## Zero Update Velocity

---

## Zero Update XYZ Position

* We don't abruptly change our running estimates
* So the graph remains smooth

</div>
<div class="col">

</div>
</div>

---

## Zero Update Map Position

* The shape of the path is correct
* Still room to improve

</div>
<div class="col">

</div>
</div>

---

## Fusing Location

* Location information is like cheating
  * But we often have GPS, landmarks, or some other intermittent localization
* Let's add 3 locations, at 4, 8, and 12 seconds

```python
waypoint_times = [4*samples_per_second, 8*samples_per_second, 12*samples_per_second]
waypoints = numpy.array([[0., 5.5, 0.],
                         [-1.45, 6.75, 0.],
                         [-1.45, 0.2, 0.]])
```

---

## Location Observation Update

* We'll need to add another observation matrix for location updates

```python
    def fused_location_update(self, y_accel, y_location):
        # Make an observation matrix that contains the location for location updates.
        self.fuseLocation = numpy.array([[1, 0, 0, 0, 0, 0, 0, 0, 0],
                                         [0, 1, 0, 0, 0, 0, 0, 0, 0],
                                         [0, 0, 1, 0, 0, 0, 0, 0, 0],
                                         [0, 0, 0, 1, 0, 0, 0, 0, 0],
                                         [0, 0, 0, 0, 1, 0, 0, 0, 0],
                                         [0, 0, 0, 0, 0, 1, 0, 0, 0],
                                         [0, 0, 0, 0, 0, 0, 1, 0, 0],
                                         [0, 0, 0, 0, 0, 0, 0, 1, 0],
                                         [0, 0, 0, 0, 0, 0, 0, 0, 1]])
        # As an alternative to the control input, we can also expand the
        # observation to include location.
        # This doesn't interrupt the statistics of the system, so it could be preferable.
        y = numpy.concatenate((y_accel.reshape(-1, 1), self.x[3:6], y_location.reshape(-1, 1)), axis=0)
        self._internal_update(self.fuseLocation, y, self.fusion_jitter)
```

---

## Location+Stop Velocity

---

## Location+Stop XYZ Position

</div>
<div class="col">

</div>
</div>

---

## Location+Stop Map Position

</div>
<div class="col">

</div>
</div>

---

# Map Comparison

</div>

<img style="width: 50%" class="r-stretch" src="./figures/08_kexample_noise_10000.0_stops_position.svg" />
<img style="width: 50%" class="r-stretch" src="./figures/08_kexample_noise_10000.0_stops_waypoints_position.svg" />

</div>
</div>

---

## Real World Fusion

* Navigation with IMU only is common
* GPS + IMU is common
* Optical flow + IMU is used on drones

---

## Dead Reckoning

* Path integration, or dead reckoning, is the ability to move "directly" to a target location purely based upon current location and pose estimates
* This is a highly desirable ability
  * And even more desirable is if it works with a simple IMU fused with something robust
* Let's talk about IMU + Landmarks + Sun Azimuth Angle

---

## Robust Navigation

* IMUs are an internal sensor
* External information is required to make them robust
* GPS is possible, but landmarks work in almost any environment and give roughly equivalent localization
* In large, open areas with no landmarks we could use a clock and the angle of the sun
  * And uses the same image system as the landmark detector!

---

## Required Information

* We want this system to be able to move to a location using only two pieces of information
  * Distance
  * Angle to move relative to the sun
* What system are we describing?

---

## Robust Navigating Agent

---

## Biological Models

* Biological system are robust
  * So, if we understand how they work, imitating them is often a good plan
* Do we know that bees navigate using sensor fusion?

---

## Evidence for Fusion

* [Honeybee Long-distance Orientation in a Controlled Environment](https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1439-0310.1995.tb01093.x)
* [The Influences of Landmarks on Distance Estimation of Honey Bees](https://www.sciencedirect.com/science/article/pii/S0003347285702175)

---

## Summary

* Bees use sun angle and a sense of movement to navigate
  * But they use landmarks to adjust their location estimates
* If the landmark is grossly out of place, it is rejected
  * So this implies a probability model
    * "What is the probability that I moved 270m given that I feel like I've moved 180m?"

---

---

## Lessons

* Bee sensor fusion is a wonderful lesson in robust systems
  * They are complicated!
* The bee is making multiple complex decisions
  * Do I accept this landmark?
  * How much noise is there in my distance estimation?
  * How accurate was information from my sister's waggle dance?

---

## Parallels

* When we combine a noisy IMU with a navigation map through a particle filter, that is similar to sensor fusion
  * The constraints exist on the location estimates
* We can also learn how to do landmark-based navigation
  * With one-shot learning of landmarks!

---

## AI Navigation

* Plenty of current AI systems use maps
  * But that is cheating! What if we need to learn our environment "on the fly"?
* "dead reckoning" should be possible
  * [lagr project](https://cs.nyu.edu/~yann/research/lagr/index.html)
  * [Fast Incremental Learning for Off-Road Robot Navigation](https://arxiv.org/abs/1606.08057)

---

## Next Topics

* We want to get to learning, but let's work through a bit more statistics
  * Then we can do all the learning systems and algorithms at once
* First, let's finish decision-making
  * How should an agent evaluate decisions?
  * How are outcomes scored and compared?