Intro

what is machine learning

含参函数
$y = b + wx_1$ -> Model -> Linear
$x,y$ -> feature
$w$ -> weight
$b$ -> bias
Loss
$L(b, w)$ 来自于训练集
$x_1$ -> $0.5k + 1x_1 = y$ -> $\widehat{y}$ -> label
$e_1 = |y - \widehat{y}| …$ -> Loss: $L = \frac{1}{N}\sum_{n}^N e_n$
$e = |y - \widehat{y}|$ -> MAE
$e = (y - \widehat{y})^2$ -> MSE
If $y$ and $\widehat{y}$ are bath probability distributions -> Cross-entropy
error surface
optimization
$w^, b^ = arg \min_{w, b}{L}$
gradient descent
- 随机选取 $w^0$
- 计算 $w$ 对 $L$ 的微分在$w^0$处的值
- 斜率为负，增加 $w$ ；斜率为正，减小 $w$
- 增大和减小的数值根据斜率和学习率来决定 hyperparameters
- $w^0$ -> $w^1$ -> …
- 更新次数/$w^tt = 0$ 停止
- global minima local minima
evaluation