| Basis | |
| 1 | -0.45375317 | 0.6001949 | 0.64249509 | 0.14516955 |
| 2 | 0.39904723 | 0.79616951 | -0.42580043 | -0.1599044 |
| 3 | -0.576825 | 0.00578817 | -0.23609516 | -0.78198369 |
| 4 | -0.54967471 | 0.07646366 | -0.59173738 | 0.58468615 |
---
## Explained Variance
* The basis vectors explain decreasing amounts of the variance
* 0.68633893, 0.19452929, 0.09216063, 0.02697115
* All four sum to 1 (they reconstruct the dataset)
* But we could drop some
---
## Basis 1v2
---
## Basis 1v3
---
## Basis 2v3
---
## Uses
* Dimensionality reduction is the most common use-case
* Also useful as a precursor to clustering
* Why? Because distances are approximately preserved
---
## More on Clustering
* Mentioned `the curse of dimensionality` before
* KNN struggles to find neighbors as dimensions increase
* By reducing dimensions, KNN suddenly works again
* This can allow KNN using anything that makes an embedding
---
## Common Current Use Case
* Take a black box that compresses high dimensional data to something smaller
* Wave the black box around objects of interest
* This is your feature discovery phase
* Take PCA of those discovered features
* Now you can use clustering to find similarities
---
## The black box
* What is the black box?
* Nowadays, a neural network
* We'll see later that they are excellent at compressing high-dimensional features to something reasonable
---
## Sample Questions
When is it not appropriate to use PCA?
a. When there are only a few columns of data.
b. When all of your data columns are orthogonal.
c. When there are too many columns of data.
d. When the correlation matrix values are high.
---
## Sample Questions
Which of the following is an unsupervised technique?
a. Decision Trees
b. Logistic Regression
c. PCA
d. Least Squares Regression
---
## Sample Questions
Which of the following is an unsupervised technique?
a. Decision Trees
b. Logistic Regression
c. Mixture of Experts
d. Least Squares Regression
---
## Sample Questions
Which of the following statements about unsupervised learning is **true**?
a. Unsupervised learning does not need test and validation sets.
b. Unsupervised learning can be applied to more data than supervised learning.
c. Unsupervised learning algorithms are always slower than supervised learning.
d. Supervised learning techniques have a stronger mathematical foundation for their approach.
---
## Sample Questions
Which statements about the K-Means algorithm are **false**?
a. K-Means is not guaranteed to converge on the best split.
b. K-Means works poorly when variables are correlated.
c. K-Means deals well with overlapping clusters of different classes.
d. K-Means uses hard clustering, assigning each point to the nearest cluster center.
---
## Sample Questions
Which is **false** about soft clustering?
a. Soft clustering is used in mixture of expert models.
b. Soft clustering is used during model training and not during classification.
c. If two cluster means are equidistant, the cluster with the lower variance has higher responsibility for the point.
d. Soft clustering is used in K-Means.