Content Deformation Fields for Temporally Consistent Video Processing
Content Deformation Field (CoDeF)
Edit a video = edit a canonical image + learned deformaeon field
-
Limitations of Neural Layered Atlases
- Limited capacity for faithfully reconstructing intricate video details, missing subtle motion features like blinking eyes and slight smiles
- Distorted nature of the estimated atlas leads to impaired semantic information
-
Content Deformation Field: inspired by dynamic NeRF works, a new way of representing video, as a 2d canonical image + 3D deformation field over time
Problem Formulation
- Decode a video into a 2D canonical field and a 3D temporal deformation field
- Deformation Field: video (x, y, t) → canonical image coordinate (x’, y’)
- Canonical Field: (x’, y’) → (r, g, b), like a “2D image”
P251
CoDeF compared to Atlas
- Superior robustness to non-rigid motion
- Effective reconstruction of subtle movements (e.g. eyes blinking)
- More accurate reconstruction: 4.4dB higher PSNR
✅ CoDef 把 3D 视频压缩为 2D Image,因此可以利用很多 2D 算法,再把 deformation 传递到整个视频。
P252
✅ 在时序上有比较好的一致性。
✅ 由于使用了 control net,与原视频在 Spatial level 也保持得非常好。
P253