Introduction
My ongoing notes for machine learning.
Definitions
θ = weights/parameters x = features J ( θ ) = cost function h θ ( x ) = predicted value of features n = size of weights m = size of training set α = learning rate λ = constant for regularization \begin{aligned}
\theta &= \text{weights/parameters} \\
x &= \text{features} \\
J(\theta) &= \text{cost function} \\
h_\theta(x) &= \text{predicted value of features} \\
n &= \text{size of weights} \\
m &= \text{size of training set} \\
\alpha &= \text{learning rate} \\
\lambda &= \text{constant for regularization} \\
\end{aligned} θ x J ( θ ) h θ ( x ) n m α λ = weights/parameters = features = cost function = predicted value of features = size of weights = size of training set = learning rate = constant for regularization
Linear Regression
h θ ( x ) = θ T x J ( θ ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 θ = θ − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x ( i ) θ = ( X T X ) − 1 X T y J ( θ ) r e g = λ 2 m ∑ j = 1 n θ j 2 θ r e g = λ m θ \begin{aligned}
h_\theta(x) &= \theta^Tx \\\\
J(\theta) &= \frac{1}{2m}\sum\limits_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})^2 \\\\
\theta &= \theta - \alpha\frac{1}{m}\sum\limits_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)} \\\\
\theta &= (X^TX)^{-1}X^Ty \\\\
J(\theta)_{reg} &= \frac{\lambda}{2m}\sum\limits_{j=1}^n\theta_j^2 \\\\
\theta_{reg} &= \frac{\lambda}{m}\theta \\\\
\end{aligned} h θ ( x ) J ( θ ) θ θ J ( θ ) r e g θ r e g = θ T x = 2 m 1 i = 1 ∑ m ( h θ ( x ( i ) ) − y ( i ) ) 2 = θ − α m 1 i = 1 ∑ m ( h θ ( x ( i ) ) − y ( i ) ) x ( i ) = ( X T X ) − 1 X T y = 2 m λ j = 1 ∑ n θ j 2 = m λ θ
Logistic Regression
h θ ( x ) = g ( θ T x ) where g ( z ) = 1 1 + e − z J ( θ ) = − 1 2 m ∑ i = 1 m ( y ( i ) l o g ( h θ ( x ( i ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) θ = θ − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x ( i ) J ( θ ) r e g = λ 2 m ∑ j = 1 n θ j 2 θ r e g = λ m θ \begin{aligned}
h_\theta(x) &= g(\theta^Tx) \text{ where } g(z) = \frac{1}{1 + e^{-z}} \\\\
J(\theta) &= -\frac{1}{2m}\sum\limits_{i=1}^m(y^{(i)}log(h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))\\\\
\theta &= \theta - \alpha\frac{1}{m}\sum\limits_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)} \\\\
J(\theta)_{reg} &= \frac{\lambda}{2m}\sum\limits_{j=1}^n\theta_j^2 \\\\
\theta_{reg} &= \frac{\lambda}{m}\theta \\\\
\end{aligned} h θ ( x ) J ( θ ) θ J ( θ ) r e g θ r e g = g ( θ T x ) where g ( z ) = 1 + e − z 1 = − 2 m 1 i = 1 ∑ m ( y ( i ) l o g ( h θ ( x ( i ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) = θ − α m 1 i = 1 ∑ m ( h θ ( x ( i ) ) − y ( i ) ) x ( i ) = 2 m λ j = 1 ∑ n θ j 2 = m λ θ
Neural Networks
Support Vector Machines
Kernels
K-means Clustering
Recommender Systems