指数加权平均

$$ V_t = \beta V_{t-1} + (1-\beta)\theta_t $$

$\theta_t$代表真实测量数据
$V_t \approx \frac{1}{1-\beta}$个过去的$\theta$的平均。
当$\beta$较大时，V曲线抖动变小，但V曲线和$\theta$曲线相比往右偏移（适应更缓慢）。
当$\beta$较小时，V曲线抖动变大，但V曲线与$\theta$曲线更贴近。

指数衰减

将Vt公式展开得：
$$ \begin{aligned} V_t &=& (1-\beta)\theta_t + (1-\beta)\beta\theta_{t-1} + (1-\beta)\beta^2\theta_{t-1} + \cdots \ &=& \sum_{i=0}^t(1-\beta)\beta^i\theta_{t-i} \end{aligned} $$

每个$\theta$的系数为\beta的指数，因此称为指数加权平均。
当有新的$\theta$过来时，旧的$\theta$呈指数衰减。

公式中t代表当前时间，i代表距离t有多远

$$ (1-\beta)^{\frac{1}{\beta}} = \beta^{(\frac{1}{1-\beta})} = \frac{1}{e} \approx 0.35 $$

当$i > \frac{1}{1-\beta}$时，$\theta_{t-i}$对Vt的影响很少（少于$\frac{1}{e}$），认为不重要，因此说$V_t \approx \frac{1}{1-\beta}$个过去的$\theta$的平均。

上面为原始数据，下面是权重。这张图让我想到了DSP里面的激励信号*原始信号，以前都无法理解信号里的卷积，现在看来好像是有点道理的。