* The components of $w$ can be thought of as coarse, medium, and fine, depending upon where they influence the output
* This leads to great control over the output images, as can be seem from example images in the [paper](https://openaccess.thecvf.com/content_CVPR_2019/html/Karras_A_Style-Based_Generator_Architecture_for_Generative_Adversarial_Networks_CVPR_2019_paper.html)
See the UDL text, Figure 15.19.
---
## More Control
* We can add more control over the latent code by using other methods
* StyleGAN uses a vector, S, to insert a style into the synthetic image
* [StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery](https://openaccess.thecvf.com/content/ICCV2021/html/Patashnik_StyleCLIP_Text-Driven_Manipulation_of_StyleGAN_Imagery_ICCV_2021_paper.html), from 2021, showed that S could be manipulated using text prompts and CLIP
* The goal is to get a new latent code, $w$, given a source code, $w_s$
---
## Advances: Editing Directions
* So optimize this:
* $\underset{w \in \mathcal{W}+}{argmin}D_{CLIP}(G(w), t) + \lambda_{L2} ||w - w_s|| + \lambda_{ID} \mathcal{L}_{ID}(w)$
* $D_{CLIP}(G(w), t)$ is the cosine distance between the embedding of the Generator's output and the text embedding
* ${L}_{ID}$ is a measure of output image similarity
* Basically, find a new $w$ that changes the images a little as possible but brings the CLIP embedding as close as possible
---
## Generators Overview
* So generators transform a latent variable into an output
* The latent input has both semantic meaning and noise components
* Let's say we had an output that didn't involve noise; what would that be?
* Just a semantic meaning vector that produces a fixed output
* Sounds like compression, right?
* Let's move on to our last, and most explicit, example
---
## Puzzle Solving
* In 2019, [On the Measure of Intelligence](https://arxiv.org/abs/1911.01547) suggested that "superhuman" performance of DNNs on various benchmarks was overselling progress
* Humans didn't evolve to classify 224x244 pixel images and the DNNs get to train with millions of examples
* AlphaGoZero played tens of millions of games, but a human playing 10 games a day for 100 years will only reach 36,000.
* Is that really a proper comparison?
* The author suggests testing the *acquisition* of new problem solving abilities
---
> That is to say, intelligence is the rate at which a learner turns its experience and priors into new skills at valuable tasks that involve uncertainty and adaptation.
> If an AI system has access to extensive, task-specific prior knowledge that is not available to a human, its performance on that task becomes a measure of the developer's cleverness in encoding that knowledge, not the AI's inherent intelligence.
---
## ARC
* ARC is the Abstraction and Reasoning Corpus
* It is designed measure *skill acquisition efficiency*, meaning how well an agent can learn to solve puzzles
* Puzzles expect that an intelligence can infer the rule from a small number of examples
* They are now on iteration three of the [ARC-AGI challenge](https://arcprize.org/arc-agi)
---
## Examples