ML - Logistic Regression

2017/11/02

Categories: Python machineLearning Tags: logisticRegression regularization

Logistic Regression Model

$$h(x)=h_{\theta}(x) = \Pr(y=1|x;\theta)$$

$$ x = \begin{pmatrix}x_0\newline x_1\newline\vdots\newline x_p\end{pmatrix}\in\mathbb{R}^{p+1}, x_0=1,y\in\{0,1\} $$


Logistic Regression Cost/ Loss Function

$$ J(\theta)=\frac{1}{N}\sum_{i=1}^N L\left(h(x^{(i)}),y^{(i)}\right), \text{where} $$

$$ L(h(x), y)=\begin{cases} -\log h(x), &y=1\newline -\log(1-h(x)), &y=0 \end{cases} $$

$$ \Rightarrow L(h(x), y)=-y\log h(x)-(1-y)\log(1-h(x))=-log\left[h(x)^y (1-h(x))^{1-y}\right] $$

learning rate

learning rate

learning rate


Gradient Descent

$$ \frac{\partial}{\partial\theta_j}J(\theta) = \frac{1}{N} \sum_{i=1}^N \frac{\partial}{\partial\theta_j}L(h(x^{(i)}),y^{(i)}) $$

Let $z = \theta^T x, a=h_{\theta}(x)=sigmoid(z)$, then $$ \frac{\partial L}{\partial\theta_j}=\frac{\partial L}{\partial a}\frac{\partial a}{\partial z}\frac{\partial z}{\partial\theta_j}\\
\Rightarrow \frac{\partial L}{\partial a} = -\frac{y}{a}+\frac{1-y}{1-a},\ \ \ \frac{\partial a}{\partial z}=a(1-a), \ \ \ \frac{\partial z}{\partial\theta_j}=x_j\\
\Rightarrow \frac{\partial L}{\partial\theta_j}=(a-y)x_j=(h(x)-y)x_j $$

Algorithm:

repeat{

$\theta_j \leftarrow \theta_j -\alpha\frac{\partial}{\partial \theta_j}J(\mathbf{\theta})=\theta_j -\alpha\frac{1}{N}\sum _{i=1}^N \left(h(x^{(i)}) - y^{(i)}\right)x^{(i)}_j$

} (simultaneously update for $j=0,1,...,p$)

$\alpha$: learning rate (step size)

Vectorization

$$ \theta \leftarrow \theta-\frac{\alpha}{N} X^T (h(X)-y) $$

def sigmoid(u):
    return 1/(1 + np.exp(-u))

# X: design matrix, N*(p+1)
# Y: N*1
# theta: (p+1)*1

# compute hypothesis
h = sigmoid(np.dot(theta.T, X))

# compute cost
J = - np.mean(np.log(h) * Y +  np.log(1-h) * (1-Y))

# compute descent
N = X.shape[0]
dtheta = np.dot(X.T, h - Y)/N

# update params
theta -= learning_rate*dtheta

Advanced Optimization Methods


Multiclass Classification: One-vs-All

$$ x \rightarrow \text{class } k^*=\arg\max_{k=1,2\ldots,K} h^{(k)}_{\theta}(x) $$


Solving the Problem of Overfitting


Regularized Linear Regression

$$ J(\theta)=\frac{1}{2N}\left(\sum_{i=1}^N (h_\theta(x^{(i)}) - y^{(i)})^2 + \lambda\sum_{j=1}^p \theta_j^2\right) $$


Gradient Descent

Algorithm:

repeat{

$\theta_0 \leftarrow \theta_0 -\alpha\frac{\partial}{\partial \theta_0}J(\mathbf{\theta})=\theta_j -\alpha\frac{1}{N}\sum _{i=1}^N \left(h(x^{(i)}) - y^{(i)}\right)$

$\theta_j \leftarrow \theta_j -\alpha\frac{\partial}{\partial \theta_j}J(\mathbf{\theta})=\theta_j -\frac{\alpha}{N}\left[\sum _{i=1}^N \left(h(x^{(i)}) - y^{(i)}\right)x^{(i)}_j+\lambda\theta_j\right],\ \ \ \ j=1,...,p$

} (simultaneously update)

$\alpha$: learning rate (step size)

$$\theta_j \leftarrow \left(1-\alpha\frac{\lambda}{N}\right)\theta_j -\alpha\frac{1}{N}\sum _{i=1}^N \left(h(x^{(i)}) - y^{(i)}\right)x^{(i)}_j\ \ \ \ j=1,…,p$$


Normal Equation

$$ 2NJ(\theta) =(X\theta-y)^T(X\theta-y) +\lambda \theta^TH\theta, \ \ \ \text{where}\\
H = \begin{pmatrix} 0 &0 &\cdots &0\newline 0 &1 &\cdots &0\newline \vdots &\vdots &\ddots &\vdots\newline 0 &0 &\cdots &1 \end{pmatrix}_{(p+1)\times(p+1)} $$

$$ \Rightarrow N\frac{\partial J}{\partial\theta}=X^TX\theta-X^Ty+\lambda H\theta $$

$$ \Rightarrow \theta^* = (X^TX +\lambda H)^{-1}X^Ty $$


Regularized Logistic Regression

$$ J(\theta)=-\frac{1}{N}\sum_{i=1}^N\left[y^{(i)}\log h(x^{(i)}) + (1-y^{(i)})\log\left (1-h(x^{(i)})\right)\right] + \frac{\lambda}{2N}\sum_{j=1}^p \theta_j^2 $$

Algorithm:

repeat{

$\theta_0 \leftarrow \theta_0 -\alpha\frac{\partial}{\partial \theta_0}J(\mathbf{\theta})=\theta_j -\alpha\frac{1}{N}\sum _{i=1}^N \left(h(x^{(i)}) - y^{(i)}\right)$

$\theta_j \leftarrow \theta_j -\alpha\frac{\partial}{\partial \theta_j}J(\mathbf{\theta})=\theta_j -\frac{\alpha}{N}\left[\sum _{i=1}^N \left(h(x^{(i)}) - y^{(i)}\right)x^{(i)}_j+\lambda\theta_j\right],\ \ \ \ j=1,...,p$

} (simultaneously update)

$\alpha$: learning rate (step size)

$$ \theta_j \leftarrow \left(1-\alpha\frac{\lambda}{N}\right)\theta_j -\alpha\frac{1}{N}\sum_{i=1}^N \left(h(x^{(i)}) - y^{(i)}\right)x^{(i)}_j,\ \ \ \ j=1,…,p $$


Code in Python