* The *sigmoid* function
* Also logistic or logit
* $sigm(\eta) \triangleq \frac{1}{1+exp(-\eta)} = \frac{e^\eta}{e^\eta + 1}$
* We will be seeing this function again
---
## Intuition
* For $x$ on the decision boundary, we expect $p(y)=1$ to be near 0.5
* If the classes are linearly separable, (meaning with a single line)
* Everything with p > 0.5 must lie left or right of x
* Everything with p < 0.5 must lie in the other direction
---
## Notation
* Going to break $\beta$ into two parts
* bias ($\beta_0$)
* weight (everything else)
* Will just look at weights, $w$, next
---
## Formulation
* Gauss made least squares easily formulated
* Not so for logistic regression
* Probability mass for Bernoulli is
* $f(k;p) = p^k(1 - p)^{1-k}$
* k is 0 or 1
* Becomes
* $P(y_i|x_i)=\hat{p}(x_i)^{y_i}(1-\hat{p}(x_i))^{1-y_i}$
---
## Optimization
* The conditional likelihood is taken over all samples
* $CL(w, t)=\prod\hat{p}(x_i)^{y_i}(1-\hat{p}(x_i))^{1-y_i}$
* Common practice to take the log and look at the log conditional likelihood
* $LCL(w, t)=\sum_{i}y_{i}ln\hat{p}(x_i)+(1-y_i)ln(1-\hat{p}(x_i))$
* It's also common to take the negative log likelihood
* It turns out that this has no analytic solution
---
## Solutions
* See reading for interpretations
* We can look at the gradients to intuit the solution space
* Brute force solution, at step $k$:
* $w_{k+1} = w_{k} - \eta_{k}*g_{k}$
* where $g_k$ is the gradient over all samples at step k
* $\eta_k$ is the learning rate at step k
* Called "steepest descent"
---
## Gradients
* Let's focus on a linear solution
* $y = wx + b$
* We somehow end up with boring gradients again
* $\frac{d}{dw}f(w) = \sum_{i}(\mu_i - y_i)x_i=X^{T}(\mu - y)$
* Every implementation I've seen also normalizes by the magnitude of y
* That's the number of samples
* For $b$, the $x$ vector is just 1s
---
## Example
```python
#! /usr/bin/python3
import math
import numpy as np
import sys
def sigmoid(eta):
"""
The sigmoid function.
"""
return 1 / (1 + np.exp(-eta))
def logistic_regression(file_path, x_label, y_label, y_target, learning_rate, epochs):
"""
Performs logistic regression to draw a decision boundary through
(hopefully) linearly separable data.
Args:
file_path (str): The path to a file containing x and y data.
xlabel (str): Name of the x column data
ylabel (str): Name of the y column data
ytarget (int): Target y class
learning_rate (float): Update rate
epochs (int): Number of times to iterate through the data
"""
# Get the first line of the file with the column names
with open(file_path) as f:
header_row = f.readline().strip('\n')
column_names = [name.strip(' ') for name in header_row.split(',')]
# Load data
columns = np.loadtxt(file_path, delimiter=',', skiprows=1, unpack=True)
x_index = column_names.index(x_label)
y_index = column_names.index(y_label)
x = columns[x_index]
y = columns[y_index]
# Set everything with the target class to 1
y = y == y_target
# We are going to call our parameter w, for weight, and b, for bias
w = 0
b = 0
# Gradient descent
for i in range(epochs):
# Get the predicted probabilities
y_hat = sigmoid(w * x.T + b)
# Calculate gradients w.r.t. w and b
# Average over the size of the dataset
dw = (1 / len(x)) * (y_hat - y) @ x
db = (1 / len(x)) * np.sum(y_hat - y)
# Update w
w -= learning_rate * dw
b -= learning_rate * db
print(f"epoch {i} error is {np.mean(y_hat-y):.4f}")
# Determine the accuracy
y_hat = sigmoid(x * w + b)
predictions = (y_hat > 0.5).astype(int)
accuracy = np.mean(predictions == y)
print(f"Final accuracy is {accuracy}")
np.set_printoptions(precision=3, suppress=True)
print(f"Linear model was {w} * x + {b:.3f}")
return w, b, x, y_hat, y
if __name__ == "__main__":
if len(sys.argv) < 7:
print("Provide the data file, the x column, the y column, the y target class, the learning rate, and the epochs")
else:
w, b, x, y_hat, y = logistic_regression(sys.argv[1], sys.argv[2], sys.argv[3], int(sys.argv[4]), float(sys.argv[5]), int(sys.argv[6]))
# Print out results
for x, y_hat, y in zip(x, y_hat, y):
print(f"{x} {y_hat} {y}")
```
---
## Random dataset
```python
#! /usr/bin/python3
import math
import numpy as np
import sys
num_samples = int(sys.argv[1])
mean_one = float(sys.argv[2])
stddev_one = float(sys.argv[3])
mean_two = float(sys.argv[4])
stddev_two = float(sys.argv[5])
rng = np.random.default_rng()
# Class 1 and class 2 samples
class_one = rng.normal(loc=mean_one, scale=stddev_one, size=(num_samples, 1))
class_two = rng.normal(loc=mean_two, scale=stddev_two, size=(num_samples, 1))
class_one = np.concatenate((class_one, np.ones((num_samples, 1))), axis=1)
class_two = np.concatenate((class_two, 2*np.ones((num_samples, 1))), axis=1)
samples = np.concatenate((class_one, class_two), axis=0)
rng.shuffle(samples)
print("x, class")
for i in range(2*num_samples):
print(f"{samples[i][0]}, {int(samples[i][1])}")
```
---
## Classes