P72

Sampling-based Policy Optimization

✅ 基于采样的方法。

P73

Example: Locomotion Controller with Linear Policy

🔎 [Liu et al. 2012 – Terrain Runner]

P74

Find open-loop control using SAMCON

✅ 使用开环轨迹优化得到开环控制轨迹。

P76

✅ 使用反馈控制更新控制信号。由于假设了线性关系，根据偏离 offset 可直接得到调整 offset.

P78

✅ 把 $M$ 分解为两个矩阵，$M_{AXB} = M_{AXC}\cdot M_{CXB}$ 如果 $C$ 比较小，可以明显减少矩阵的参数量。
✅ 好处：(1) 减少参数，减化优化过程。(2) 抹掉状态里不需要的信息。

P79

✅ （1）根结点旋转（2）质心位置（3）质心速度（4）支撑脚位置

P80

✅ 仅对少数关节加反馈。

P81

$$ \delta a=M\delta s+\hat{a} $$

Optimize $M$
- CMA, Covariance Matrix Adaption ([Hansen 2006])
- For the running task:
  - #optimization variables: $12 ^\ast 9 = 108 / (12^\ast 3+3 ^\ast 9) = 63$
- 12 minutes on 24 cores

本文出自CaterpillarStudyGroup，转载请注明出处。

https://caterpillarstudygroup.github.io/GAMES105_mdbook/