DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

Edit a video = edit a canonical image 3D NeRF

Canonical image in CoDeF is still 2D

Can we represent the video in a truly 3D space?

P255

✅ 利用现有成熟技术,把 3D 场景用 Nerf 表示出来编辑也是在 3D 上进行。

P256

✅ Nerf 在人体成像上比较好。
✅ Dynamic NeRF 本身也是比较难的。

P257

Main idea

  • For the first time introduce the dynamic NeRF as an innovative video representation for large-scale motion- and view-change human-centric video editing.

✅ 不直接编辑图像,而是编辑 Nerf.
✅(1)认为背景静止,学出背景 Neof.
✅ Stale Diffusion 用来计算 Loss.

P258

Follow HOSNeRF, represent the video as:

  • Background NeRF
  • Human NeRF
  • Deformation Field

Edit background NeRF and human NeRF respectively

P259

DynVideo-E significantly outperforms SOTA approaches on two challenging datasets by a large margin of 50% ∼ 95% in terms of human preference