# CS 530 - Lecture 06 ## Noise and Fusion in State Space Models Bernhard Firner 2026-02-17 --- ## State Space Models * Last time we talked about continuous space prediction and filtering * Using a Kalman filter, which assumes everything is gaussian * The math is similar to HMMs * HMMs have hidden states * Continuous models have "hidden" set of means and covariances --- ## Discrete Sampling * There is an added difficulty in the continuous space * Our sampling is discrete, but the process is continuous * This means that we miss some (possibly large) changes of state * A bit like playing "red light, green light" with the world --- ## Filtering Vs Smoothing * With HMMs we used the Baum-Welch algorithm to fit our parameters to observations * Using forward-backward as part of an expectation-maximization algorithm * We can also do smoothing in the continuous domain * again by going backward with the predictions and samples from the forward pass --- ## Forward Pass * Previously we only cared about the estimate of the internal state, $x$ * Filtered mean at time t, given the data to t: $\mu_{t|t}$ * The backward pass will check how that differed from our predictions * Predicted mean at t+t, given the data to t: $\mu_{t+1|t}$ * The filtered and predicted variances are also required --- ## Forward Pass Code ```python predicted_data = [] predicted_covariances = [] filtered_data = [] filtered_covariances = [] # Subtract the bias, which gets rid of gravity for idx, datum in enumerate((accel_data - imu_mean)): kfilter.predict() # Predictions to be used in smoothing predicted_data.append(kfilter.x.flatten()) predicted_covariances.append(kfilter.P) # Normal kalman update kfilter.update(datum) # The current estimates of the means and covariance matrix filtered_data.append(kfilter.x.flatten()) filtered_covariances.append(kfilter.P) ``` --- ## Kalman Smoothing * Initialize * $\mu_{t|T}$ and $\Sigma_{t|T}$ come from the last state of filtering * The backward pass begins with computing a Kalman gain matrix: * $J_t \triangleq \Sigma_{t|t}A^T\Sigma_{t+1|t}^{-1}$ * This matrix projects the difference between the backwards estimate and the forward estimate to correct the current esimate --- ## The Math * Update the smoothed estimates * $\mu_{t|T} = \mu_{t|t} + J_t(\mu_{t+1|T} - \mu_{t+1|t})$ * $\Sigma_{t|T} = \Sigma_{t|t} + J_t(\Sigma_{t+1|T} - \Sigma_{t+1|t})J_t^T$ * Then repeat for $t-1$ --- ## The Problems * Matrix inversion often leads to trouble * $J_t \triangleq \Sigma_{t|t}A^T\Sigma{t+1|t}^{-1}$ * It is costly to compute, and often unstable * There are multiple ways to attempt a fix, but let's pause here and discuss smoothing vs filtering --- ## Smoothing Alternatives * Look up smoothing and you'll find a world of approaches * Continuous spaces have myriad assumptions, so we end up with a plethora of techniques * Many are mathematically justified, but often assumptions don't hold * Extended Kalman filter allow nonlinear system but still assumes gaussian noise * Unscented Kalman filter is similar (but doesn't stink) --- ## Choosing * Each domain has their own set of preferred filters * Wireless communication, different robots, astronomy, etc * I'm not an expert in everything, so I can't really dig into each one * But let's pivot back to filtering --- ## Filtering and Information * Smoothing is generally superior to filtering because we have more information * "seeing" the future, basically * Some algorithms try to incorporate more information into filtering * The particle filter attempts to match observations to a known environment --- ## Importance of Filtering * Offline solutions are often obtainable, even with brute force * Add in different mixes of assumptions and extra information until something works * Any interactive agent operating in the real world must use filtering * That means being somehow robust to noise --- ## Noise and Filtering * If there were no noise, we wouldn't even talk about filtering or smoothing * Let's simulate a bit of data to illustrate this --- ## Simulated IMU ```python imu_mean = [0., 0., 9.8] imu_var = numpy.array([6.47079693e-05, 3.39856070e-04, 2.72198300e-04]) # The IMU follows the right hand rule with Z pointing up and Y pointing forward # 100 samples per second samples_per_second = 100 delta_t = 1.0 / samples_per_second # Human walking speed at 15 minutes/mile in m/s (with 1 mile approximately 1500m) # This is 5/3 m/s walking_speed = 1500 / (15 * 60) # Generate some samples imu_samples = [] waypoint_times = [4*samples_per_second, 8*samples_per_second, 12*samples_per_second] waypoints = numpy.array([[0., 5.5, 0.], [-1.45, 6.75, 0.], [-1.45, 0.2, 0.]]) # Accelerate to walking speed over one second for i in range(samples_per_second): # Accelerate at a constant rate until we reach walking speed # Dividing by one so we remember this means we accelerate to walking speed in 1 second imu_samples.append([0., walking_speed/1, 0.]) # Keep moving forward for three seconds for i in range(3*samples_per_second): imu_samples.append([0., 0., 0.]) # Stop in 1 second for i in range(samples_per_second): # Deccelerate at a constant rate until we reach 0 velocity imu_samples.append([0., -walking_speed/1, 0.]) # Accelerate left until reaching 1/2 walking speed for i in range(samples_per_second): # Accelerate at a constant rate until we reach 1/2 walking speed at 1 second imu_samples.append([-0.5*walking_speed, 0., 0.]) # Continue left for 1 second for i in range(samples_per_second): imu_samples.append([0., 0., 0.]) # Stop in 1/2 second for i in range(int(samples_per_second/2)): # Deccelerate at a constant rate until we reach 0 velocity # We are stopping in 1/2 second from 1/2 walking speed, so the rate is walking_speed/second imu_samples.append([walking_speed, 0., 0.]) # Reverse the above action to return to the starting location imu_samples = numpy.array(imu_samples) imu_samples = numpy.concatenate((imu_samples, (imu_samples * [-1, -1, 1]))) # Make times for the samples times = numpy.arange(len(imu_samples)) * delta_t # Add noise accel_data = imu_samples + numpy.random.normal(imu_mean, imu_var, imu_samples.shape) ``` --- ## Outputs * Let's run this data through our filter ```python kfilter = KalmanFilter(delta_t, system_noise = system_noise, measurement_noise=imu_var) filtered_data = [] # Subtract the bias, which gets rid of gravity for idx, datum in enumerate((accel_data - imu_mean)): kfilter.predict() # Normal kalman update kfilter.update(datum) filtered_data.append(kfilter.x.flatten()) filtered_data = numpy.array(filtered_data) ``` --- ## Results: Acceleration
--- ## Results: Velocity
--- ## Results: XYZ Position
--- ## Results: Map Position
--- ## Noise * The noise I'm using comes from the resting variance of my phone IMU * When in motion, the variance is orders of magnitude greater * Contains both sensor noise and noise from me * So let's generate the regenerate those figures with noise included --- ## Results: Acceleration
--- ## Reasonable?
* Gravity is $9.8m/s^2$ * Do we accelerate as much to move? * Think how we bounce when we walk
--- ## Results: Velocity
--- # So Bad!
* Mistakes in acceleration accumulate in velocity * And those then accumulate in position
--- ## Results: XYZ Position
* Velocities never come back to 0 * Position drifts off
--- ## Results: Map Position
--- ## Doomed? * So it looks like we'll never be able to use an IMU in real-time * Smoothing or particle filtering give more information * But noise ruins our estimates * But what if we combine the IMU with other information? --- ## Fusion * Remember the full update equation: * $z_t = A_tz_{t-1} + B_tu_t + \epsilon_t$ * We can inject information through $Bu$ * Or, alternatively, we can introduce new values for $A$ with different observations --- ## Fusing What? * The obvious answer is location information * But our problem stems from residual errors in the velocity * So let's begin with that * Whenever the system comes to a stop, we'll inform the Kalman filter --- ## Control Update ```python def predict(self): # Update mean and variance predictions using the transition matrix self.x = self.A @ self.x self.P = self.A @ self.P @ self.A.T + self.Q def control(self, B, u): # Call after predict to update with a control input # If the control matrix exists we follow the full equation: # \mu_{t|t-1} \triangleeq A_t\mu{t-1} + B_t u_t self.x += B @ u def _internal_update(self, C, y, measure_noise): # Compare the prediction to the observed state y_hat = C @ self.x residual = y - y_hat # Update the variance S = C @ self.P @ C.T + measure_noise # Kalman gain matrix K = self.P @ C.T @ numpy.linalg.inv(S) # Update from the mean and covariance. 18.31-18.32 in Murphy's book. self.x = self.x + K @ residual self.P = (numpy.eye(len(self.P)) - K @ C) @ self.P def update(self, y): """ Arguments: y: y is the observed state. """ y = y.reshape(-1, 1) self._internal_update(self.C, y, self.R) ``` --- ## Control * This is described as the "heavy handed" approach ```python for idx, datum in enumerate((accel_data - imu_mean)): kfilter.predict() if use_stops and idx in stop_times: # If we were using a control input we would do this: kfilter.control(B, kfilter.x) datum = datum * 0 kfilter.update(datum) else: # Normal kalman update kfilter.update(datum) filtered_data.append(kfilter.x.flatten()) filtered_data = numpy.array(filtered_data) ``` --- ## Control Acceleration
* We forced acceleration and velocity 0 at several points * It is instantaneous, so not obvious in acceleration
--- ## Control Velocity
--- ## Control XYZ Position
* Now our velocities go to 0, so there is no drifting * Graph is ugly
--- ## Control Map Position
* But the final result is much improved
--- ## Gentler Solution * The gentler solution is to introduce a new observation through a observation matrix * Tugs the means back to 0 without blowing away the statistics ```python def zero_velocity_update(self): # Make an observation matrix that contains the velocity for 0 velocity updates. self.zeroC = numpy.array([[1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0]]) # As an alternative to the control input, we can also expand the # observation to include velocity, and indicate that it is 0. # This doesn't interrupt the statistics of the system, so it could be preferable. y = numpy.zeros((6, 1)) self._internal_update(self.zeroC, y, self.jitter) ``` --- ## Zero Update Velocity
--- ## Zero Update XYZ Position
* We don't abruptly change our running estimates * So the graph remains smooth
--- ## Zero Update Map Position
* The shape of the path is correct * Still room to improve
--- ## Fusing Location * Location information is like cheating * But we often have GPS, landmarks, or some other intermittent localization * Let's add 3 locations, at 4, 8, and 12 seconds ```python waypoint_times = [4*samples_per_second, 8*samples_per_second, 12*samples_per_second] waypoints = numpy.array([[0., 5.5, 0.], [-1.45, 6.75, 0.], [-1.45, 0.2, 0.]]) ``` --- ## Location Observation Update * We'll need to add another observation matrix for location updates ```python def fused_location_update(self, y_accel, y_location): # Make an observation matrix that contains the location for location updates. self.fuseLocation = numpy.array([[1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1]]) # As an alternative to the control input, we can also expand the # observation to include location. # This doesn't interrupt the statistics of the system, so it could be preferable. y = numpy.concatenate((y_accel.reshape(-1, 1), self.x[3:6], y_location.reshape(-1, 1)), axis=0) self._internal_update(self.fuseLocation, y, self.fusion_jitter) ``` --- ## Location+Stop Velocity
--- ## Location+Stop XYZ Position
--- ## Location+Stop Map Position
--- # Map Comparison
--- ## Real World Fusion * Navigation with IMU only is common * GPS + IMU is common * Optical flow + IMU is used on drones --- ## Dead Reckoning * Path integration, or dead reckoning, is the ability to move "directly" to a target location purely based upon current location and pose estimates * This is a highly desirable ability * And even more desirable is if it works with a simple IMU fused with something robust * Let's talk about IMU + Landmarks + Sun Azimuth Angle --- ## Robust Navigation * IMUs are an internal sensor * External information is required to make them robust * GPS is possible, but landmarks work in almost any environment and give roughly equivalent localization * In large, open areas with no landmarks we could use a clock and the angle of the sun * And uses the same image system as the landmark detector! --- ## Required Information * We want this system to be able to move to a location using only two pieces of information * Distance * Angle to move relative to the sun * What system are we describing? --- ## Robust Navigating Agent
--- ## Biological Models * Biological system are robust * So, if we understand how they work, imitating them is often a good plan * Do we know that bees navigate using sensor fusion? --- ## Evidence for Fusion * [Honeybee Long-distance Orientation in a Controlled Environment](https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1439-0310.1995.tb01093.x) * [The Influences of Landmarks on Distance Estimation of Honey Bees](https://www.sciencedirect.com/science/article/pii/S0003347285702175) --- ## Summary * Bees use sun angle and a sense of movement to navigate * But they use landmarks to adjust their location estimates * If the landmark is grossly out of place, it is rejected * So this implies a probability model * "What is the probability that I moved 270m given that I feel like I've moved 180m?" ---
--- ## Lessons * Bee sensor fusion is a wonderful lesson in robust systems * They are complicated! * The bee is making multiple complex decisions * Do I accept this landmark? * How much noise is there in my distance estimation? * How accurate was information from my sister's waggle dance? --- ## Parallels * When we combine a noisy IMU with a navigation map through a particle filter, that is similar to sensor fusion * The constraints exist on the location estimates * We can also learn how to do landmark-based navigation * With one-shot learning of landmarks! --- ## AI Navigation * Plenty of current AI systems use maps * But that is cheating! What if we need to learn our environment "on the fly"? * "dead reckoning" should be possible * [lagr project](https://cs.nyu.edu/~yann/research/lagr/index.html) * [Fast Incremental Learning for Off-Road Robot Navigation](https://arxiv.org/abs/1606.08057) --- ## Next Topics * We want to get to learning, but let's work through a bit more statistics * Then we can do all the learning systems and algorithms at once * First, let's finish decision-making * How should an agent evaluate decisions? * How are outcomes scored and compared?