ML - Support Vector Machines

2017/11/06

Categories: Python machineLearning Tags: svm logisticRegression

Cost Function

Logistic regression $$ \min_{\theta}\frac{1}{N}\sum_{i=1}^N \left[y^{(i)}\left(-\log \frac{1}{1+e^{-\theta^T x^{(i)}}}\right)+(1-y^{(i)})\left(-\log \frac{e^{-\theta^T x^{(i)}}}{1+e^{-\theta^T x^{(i)}}}\right)\right] +\frac{\lambda}{2N}\sum_{j=1}^p\theta_j^2 $$

Logistic Cost Function


SVM $$ \min_{\theta} C\sum_{i=1}^N \left[y^{(i)}cost_1(\theta^Tx^{(i)}) +(1-y^{(i)})cost_0(\theta^T x^{(i)})\right] +\frac{1}{2}\sum_{j=1}^p\theta_j^2 $$


Large Margin Classification

$$ \min_{\theta} \frac{1}{2}\sum_{j=1}^p\theta_j^2\\
s.t. \begin{cases} \theta^Tx^{(i)}\geq 1 &\text{if } y^{(i)}=1\newline \theta^Tx^{(i)}\leq -1 &\text{if } y^{(i)}=0\newline \end{cases} $$ Simplification: $\theta_0=0, p=2$ $$ \min_{\theta} \frac{1}{2}\Vert\theta\Vert_2^2\\
s.t. \begin{cases} u^{(i)}\Vert\theta\Vert \geq 1 &\text{if } y^{(i)}=1\newline u^{(i)}\Vert\theta\Vert \leq -1 &\text{if } y^{(i)}=0\newline \end{cases} $$ where $u^{(i)}$ is the projection of $x^{(i)}$ onto $\theta$.

large margin


Kernels

$$ f :=\begin{pmatrix} f_0\newline f_1\newline \vdots\newline f_N \end{pmatrix},\ \ \text{where } f_0:=1, f_k :=\text{similarity}(x, l^{(k)})=\text{similarity}(x, x^{(k)}) $$

$$ \min_{\theta}C\sum_{i=1}\left[y^{(i)}cost_1(\theta^T f^{(i)}) + (1-y^{(i)})cost_0(\theta^T f^{(i)})\right]+\frac{1}{2}\sum_{j=1}^N \theta_j^2 $$


Hyperparameters

$C=\frac{1}{\lambda}$

$\sigma^2$

address overfitting


SVM in Practice


Multiclass SVM Classification


Logistic Regression versus SVMs