* Remember the [Never Give Up: Learning Directed Exploration Strategies](https://arxiv.org/abs/2002.06038) paper?
* The authors a predictive task to force a DNN to learn the embedding
See Figure 2 in the paper.
---
## Embeddings and Dimensionality
* We use embeddings all the time
* And then we use them for clustering and distance measures
* But is that meaningful?
* If each dimension has the same amount of information, then it is
* But what if multiple features are entangled into one embedding?
* DNNs are doing compression, but we don't have a good way to measure what is actually happening
---
## Explicit Compression
* Researchers may be making progress on this problem
* In [Generative Latent Coding for Ultra-Low Bitrate Image Compression](https://openaccess.thecvf.com/content/CVPR2024/html/Jia_Generative_Latent_Coding_for_Ultra-Low_Bitrate_Image_Compression_CVPR_2024_paper.html) the authors use generate models for image compression
* Generative models use a latent vector, $z$, as a basis for data generation
* Since $z$ can lead to an entire image, as long as $||z|| < ||image||$ it also serves as a compressor
---
## Framework
* The Generative Latent Coding (GLC) is trained in three parts
* First, train an auto-encoder to make visually correct images
* Second, train a module to predict the latent code for images
* Third, co-train the auto-encoder and code predictor together for fine-tuning
* Glossing over many details, but this high compression indicates a better estimate of the information in an image
---
> [T]hese methods often lack a careful consideration of the
correlation among the latents, resulting in a insufficient redundancy reduction
and consequently a high bit cost. In GLC, we introduce a transform coding
module to compress the latent, replacing the vector-quantization step for more
effective reduction of latent redundancy.
---
## Results
* It's worth looking at the paper's pdf [Generative Latent Coding for Ultra-Low Bitrate Image Compression](https://openaccess.thecvf.com/content/CVPR2024/html/Jia_Generative_Latent_Coding_for_Ultra-Low_Bitrate_Image_Compression_CVPR_2024_paper.html) so you can zoom in on the individual results
* The authors reported compression 0.04 bpp on natural images and 0.01 bpp on faces with seemingly high quality
* Assume 24 bit pixels
* High quality for jpeg is around 2.7:1, or about 2bpp
* Good quality is around 23:1 or 1bpp
---
## Experience
* The ability of GLC to compress images is based upon its *experience*, in the form of the encoder-decoder parameters
* Meaning that if they were trained on faces, they should do a good job
* But if they were trained on only faces and then used to compress flowers, the results would not be good
* Bringing this back around to knowledge compression, humans who are good at games are known to effectively compress the current game states
* This is known as [Chunking Theory](https://en.wikipedia.org/wiki/Chunking_(psychology)) in psychology
---
## Learning and Compression
* We know that the latent space learned through various training schemes can form a compressed representation of a scene
* Compression applies to more than just pixels
* In CompressARC this is enough to solve logic problems
* In Never Give Up, it was used to identify the novelty of game states
---
## World Models
* So here is a thought to bounce around your brain:
* Should problem solving be happening in the latent space itself?
* The latent space is basically a model of the current world state
* And if we can estimate how the world state changes with actions, we should be able to do planning in that space
* It turns out that this works!
---
## Latent Space "Imagination"
* [DreamerV3](https://www.nature.com/articles/s41586-025-08744-2) is an approach to reinforcement learning where simulation is done in the latent space
> The algorithm consists of three neural networks: the world model predicts
the outcomes of potential actions, the critic judges the value of each
outcome, and the actor chooses actions to reach the most valuable outcomes.
---
## Thoughts
* High level view
* World model predictions are in $z$, a discrete encoding of the state
* Actor-Critic learning is performed on both observed and estimated states
* We may be able to do operations using the latent space without fully understanding it
* But we could probably do a better job if we had more control over it
* It's something to think about!