# CS 530 - Lecture 01 ## Principles of AI Bernhard Firner 2026-01-20 --- ## Course Details * My email: `bfirner@cs.rutgers.edu` * Canvas * Office: Hill 273 * Office hours: TBD --- ## Syllabus: Book * "Artificial Intelligence: A Modern Approach," fourth edition, by Russel and Norvig. * Recommended * Less depth than focused texts, but broad * Gives a good starting point * I also recommend "The Mind of a Bee," by Lars Chittka * Bees have around 1 million neurons and can outperform anything we can build --- ## Syllabus: Topics * Follows CS 440 or CS520 * Precedes more advances courses * CS [533](https://www.cs.rutgers.edu/academics/graduate/m-s-program/course-synopses/course-details/16-198-533-natural-language-processing) (natural language processing) * CS [535](https://www.cs.rutgers.edu/academics/graduate/m-s-program/course-synopses/course-details/16-198-535-pattern-recognition-theory-and-applications) (pattern recognition) * CS [536](https://www.cs.rutgers.edu/academics/graduate/m-s-program/course-synopses/course-details/16-198-536-machine-learning) (machine learning) * Less focus on *implementation*, more focus on practical *application* --- ## Philosophy * The topics may, at times, feel slightly philosophical * They will also be driven by my own personal experiences * Mostly self-driving cars: [https://arxiv.org/pdf/2010.08776](https://arxiv.org/pdf/2010.08776) --- ## Syllabus: Tools and Languages * Python, Scikit, and PyTorch * [https://scikit-learn.org/stable/](https://scikit-learn.org/stable/) * [https://numpy.org/doc/stable/reference/index.html#reference](https://numpy.org/doc/stable/reference/index.html) * [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/) * [https://docs.pytorch.org/docs/stable/index.html](https://docs.pytorch.org/docs/stable/index.html) * Farama Foundation Gymnasium for RL Examples * [https://github.com/Farama-Foundation/Gymnasium](https://github.com/Farama-Foundation/Gymnasium) --- ## Structure * Lectures (hopefully interactive) * Examinations * Projects * Will ask you to evaluate and plot more than implement * Final project will feel like preparatory work for a large-scale project --- ## Syllabus: Grading * 15% Midterm * 30% Final * 20% Homeworks * 35% Final Project and Report --- ## Academic Integrity * [CS Academic Integrity Policy](https://www.cs.rutgers.edu/academics/undergraduate/academic-integrity-policy) * Academic Integrity applies to both exams and assignments * Violating the integrity policy negatively impacts your classmates, and current and future Rutgers students * So enforcement will be strict --- ## Assistance on Assignments * Your work must be your own * You may ask for advice from other students, on canvas, in recitation and in office hours * And you can use online material, including LLMs, to clarify any confusion * If you make heavy use of assistance, from a person, tool, or website, you must cite them --- ## Topics * Introduction to agents and environments (ch 1-2) * Probability and Planning with Uncertainty (11-14) * Evaluating Decisions and Actuation (ch 15-16) * Learning Behaviors from Data (ch 19-23) --- ## Real-World Progress * Progress in real-world academic and industry research is an arduous journey * It takes a deep understanding of the problem domain, available sensors and actuators, and data
--- ## Diving In * Let's start off with agents and environments * Chapters 1-2 in Russel and Norvig's Book --- ## Agents * What are agents? * Interactive with an environment * They perceive and they act * What is not an agent? * A program that takes in a medical image and prints out a list of diagnoses and probabilities --- ## Agent Difficulties * Interaction with the environment means that an agent can influence the data that it sees * Imagine a self-driving car * The current view of the environment is influenced by past steering * This makes evaluation difficult --- ## Ramifications * Agent actuation actually changes the environment
--- ## Evaluating an Agent * Let's say that I have a system that predicts the proper steering angle for a vehicle * Through extensive testing, over multiple continents, weather conditions, and lighting, I can guarantee that the steering angle predicted has an error of at most $\frac{1}{100}$ of a degree * Is this a good agent? --- ## Evaluating an Agent * If the agent is *biased*, then it could steer to the left by $\frac{1}{100}$ of a degree at all times * It won't be long before we crash * If the agent is too high-latency, it will not be able to track a curve * This will lead to oscillations as well * Agents must be evaluated on their successful interaction with their environment --- ## Another Example * I have a lane detection system that is always correct to within 1 cm, out to 300 meters * Can I build an autonomous vehicle? --- ## Sufficient * Lane markers are highly correlated with where vehicles drive * But humans ignore them all the time, when necessary * Obviously we change lanes * Also drive around double-parked cars and pedestrians * We also drive in places where there are no lane markers * Perfect knowledge of lanes is insufficient to drive --- ## Environments and Metrics * It is tempting to simplify a problem by evaluating metrics on the *environment* rather than on the *agent* * This divides the problem into multiple pieces * Usually a good software engineering approach * Software engineering is able to decouple components * But when an agent interacts with an environment, they are tightly coupled --- ## Sensing the Environment * Cameras, radar, lidar, ultrasonics, GPS, accelerometer, thermometer, barometer, magnetometer, microphone, etc, etc, etc * Nearly all sensors are *discrete* in time * This is a problem, as the world is continuous --- ## Discrete Vs Continuous * In a game of checkers or chess, squares are either occupied or not * In the real world, things are more messy * Pay attention to how you park, shifting from one side of the space to the other depending upon the adjacent vehicles * Not only that, moves in chess are instantaneous, you cannot intercept a rook with your pawn at it moves across the board * If you've ever attempted to merge in traffic, you must have noticed that driving is continuous in both time and space --- ## Static Vs Dynamic * This bring up another distinction * A chessboard is static, meaning that it does not change as your agent deliberates * Also true for some continuous environments, such as a factory * Not true for driving, cyber intrusion detection, a robotic catheter, etc * Sometimes we are caught in between * Playing chess on a timer, for example, makes the game semi-dynamic --- ## Ramifications * Actuation is also (generally) a continuous activity * e.g. the car keeps moving, unlike a chess piece * An agent's updates are generally limited by its sensors * So, perhaps 30fps for cameras * We also need to "fuse" different sensors, despite their different sensing rates --- ## Solutions * Discrete predictions are ill-suited for continuous environments * Steering angles for a car, or instantaneous yaw-pitch-roll commands for a drone * Instead, we would be better off predicting a continuous path * Or a path with enough points on it that we could interpolate * The agent then issues actuation commands that attempt to follow the path * This transforms our discrete predictions into continuous actuation --- ## Forced Decision Cadence * Physical systems often force decisions at some rate * For example, self-driving cars must keep steering if they aren't sitting still * Another example: automatically steered catheters * [https://journals.sagepub.com/doi/abs/10.1177/0278364920903785](https://journals.sagepub.com/doi/abs/10.1177/0278364920903785) --- ## Discrete Environments * Many digital systems are discrete (although still complicated) * Networks security, such as agents that defend or exploit cyber systems * See [https://cage-challenge.github.io/cage-challenge-4/](https://cage-challenge.github.io/cage-challenge-4/) * Video games
--- ## Assembly Line Example * Sometimes, when we can control an environments, we can simplify a continuous world to make it seem discrete * Consider a fully-automated assembly line * The assembly line can be made to limit activity until each robotic action completes * The individual actions may be continuous, but the steps become fully discrete * This is similar to using clock-driving sequential logic rather than asynchronous logic --- ## Determinism * Is the environment fully known? * Many are not * Too much complexity to measure * Sensors are insufficient * The world is stochastic in nature * Outside of games, most environments are not deterministic --- ## More Details * Even some things that seem deterministic are not, in practice * For example, a game using pseudo-random numbers is technically deterministic * But, because we do not know the RNG state, it won't seem that way * This is a problem of observability --- ## Observability * Hidden state means that an environment is only partially observable * Most environments are not entirely observable * Any sensor has limited temporal and spatial resolution, for example --- ## Number of Agents * The number of agents can also make an environments more difficult * Single-agent environments are straightforward * That means that most problems we want to work on will be multi-agent * Driving is, once again, a great example --- ## Cooperativity * Are other drivers cooperative? Competitive? * Probably easiest to describe them as a mix * For a non-game example of a competitive system, consider high-frequency trading agents working on a trading market --- ## Episodic Vs Sequential * Games reset themselves to a known starting state * This makes the decision space easier to explore * The real world is sequential, with the current state always dependent upon the previous ones * If possible, we want a way to "reset" an environment, but that is often impossible --- ## Known Vs Unknown * This is the final attribute to consider * Do we know all of the rules of our environment? * This determines our ability to simulate something * And that determines how well reinforcement learning (RL) will work --- ## Known Vs Unknown * The rules of chess, shogi, and go are all known * So, even with huge state spaces, reinforcement learning is effective * How about driving or the stock market? * Anything can happen * If our simulation is an approximation, then RL can be insufficient * And *that* means that we'll have to actually collect real world data --- ## Worst-Case Environment * In the worst-case, we have to deal with a partially observable, multiagent, non-deterministic, sequential, dynamic, continuous environment with unknown rules * Notice that this includes autonomous driving, drone delivery systems, robotic housekeepers, and so on * I would not expect any of those problems to be fully solved any time soon --- ## Reflex Agents * Since environments are so complicated, sometimes it makes sense to simplify things * A **simple reflex agent** only cares about the current state of an environment * For many fundamental rules, this is fine * Slam on the brakes if the car in front of you suddenly stops, for example --- ## Failures of Reflex Agents * Reflex agents fail when the environment is not fully observable * Imagine a cleaning robot that goes into one room and cleans it * Now what? It needs to remember if the last room was also clean. * If not, it will wander forever, never knowing if its task is complete --- ## Randomness as a Solution * *Velella velella* is a hydrozoan that drifts on the surface of the ocean * [Citizen Science Writeup](https://theoryandpractice.citizenscienceassociation.org/articles/10.5334/cstp.847) * Pushed by the wind and cannot steer themselves, often ending up beached * Some sails steer left while other steer right, so they cannot all suffer the same fate
--- ## Planning * Randomness works with swarm systems, but is a poor choice for most agents * Planning is a better alternative * Evaluate the environment, then formulate some plan * Plan could be a path or a sequence of actions * The agent evaluates its progress, allowing it to identify deficiencies --- ## Metacognition * This is a term that means "thoughts about thinking" * Agents must do more than simply interact with an environment * They must take feedback from the environment after acting * They must hold on to state * A planning agent is able to evaluate its own actions, going beyond a reflexive agent --- ## Biological Agent Example * Sphex wasp --- ## Another Driving Example * Let's say that your desired speed is 65mph, but you are only moving at 55 * $55 < 65$, so you push down the pedal * You don't go any faster * Response? --- ## Possible Responses * We could evaluate $55 < 65$ in a loop, pushing down the pedal until it is fully depressed * A better agent would evaluate why we aren't accelerating * Maybe we are missing the pedal with our foot? * Perhaps we are going up a steep incline and cannot expect to reach 65mph? * Or perhaps we have lost traction and we should not accelerate --- ## Exploring the Environment * If we want to know if our agent properly deals with those kinds of complicated situations, we must explore our environment * Here, environment means everything that our agent can experience through its sensors * We need a good sampling of the environment for testing, at least * Most modern techniques also require data for training --- ## The Data Problem * Current machine learning techniques are data hungry * Put to shame by biological systems * A few hundred hours is sufficient training for a bee to gather honey, protect its hive, and build honeycombs * Sure, some of that is instinctive behavior, but all of it is fine-tuned for each specific individual * And if we knew how to hard-code any "instincts" into our learning systems we would --- ## Course Goal * Learning a good approaches to AI agent development is desired outcome * What problems can AI solve and when does current AI struggle? * What can we do to simplify problems so that AI can solve them? * What specific AI techniques can we apply to different environments?