ML - Linear Regression with Multiple Variables

2017/11/01

Categories: Python machineLearning Tags: linearRegression

Linear Regression with One Variable

J(θ0,θ1)=12NNi=1(ˆy(i)y(i))2=12NNi=1(h(x(i))y(i))2

To break it apart, it is 12ˉx where ˉx is the mean of the squares of hθ(xi)yi , or the difference between the predicted value and the actual value.

The mean is halved (12) as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the 12 term.

Outline:

  1. Initialize θ0,θ1
  2. Keep changing θ0,θ1 to reduce J(θ0,θ1) until we hopefully end up at a minimum

Algorithm:

repeat until convergence{

θjθjαθjJ(θ0,θ1),   j=0,1

}

α: learning rate (step size)


Correct: simultaneous update

temp0θ0αθ0J(θ0,θ1)

temp1θ1αθ1J(θ0,θ1)

θ0temp0

θ1temp1


Incorrect: WHY?

temp0θ0αθ0J(θ0,θ1)

θ0temp0

temp1θ1αθ1J(θ0,θ1)

θ1temp1

θjJ(θ0,θ1)=θj12NNi=1(h(x(i))y(i))2={1NNi=1(h(x(i))y(i)),j=01NNi=1(h(x(i))y(i))x(i),j=1


Multivariate Linear Regression

hθ(x)=θTx=(θ0θ1θp)(x0x1xp)

J(θ)=12NNi=1(h(x(i))y(i))2

Algorithm:

repeat until convergence{

θjθjαθjJ(θ)=θjα1NNi=1(h(x(i))y(i))x(i)j

} (simultaneously update for j=0,1,...,p)

α: learning rate (step size)


Gradient Descent in Practice

Feature Scaling

Gradient Descent will descent slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.

Feature scaling can speed up Gradient Descent.

xixiμisi,si range or SD

featureScaling


Learning Rate

learning rate

learning rate


Vectorization

J(θ)=12N(Xθy)T(Xθy)

import numpy as np
# residual
resid = np.dot(X, theta) - y
cost = np.mean(resid ** 2)/2

θjθjαθjJ(θ)θθαddθJ(θ)=θαNXT(Xθy)

theta -= learning_rate/m*np.dot(X.T, resid)

Normal Equation

XN×(p+1)=(1x1xp)

where xj=(x(1)jx(2)jx(N)j)

y=(y(1)y(2)y(N))

ˆθ=(XTX)1XTy

Gradient Descent Normal Equation
need to choose learning rate no need to choose learning rate
need feature scaling no need to do feature scaling
many iterations no need to iterate
O(kp2), k is the no. of iterations O(p3), need to compute (XTX)1
works well when p is large slow if p is very large

Normal Equation Noninvertibility

Code in Python

jupyter notebook