3.3 Controlled Edifng (depth/pose/point/ControlNet)
✅ 已有一段视频,通过 guidance 或文本描述,修改视频。
P189
P190
Depth Control
Depth estimating network
| ID | Year | Name | Note | Tags | Link |
|---|---|---|---|---|---|
| 2022 | Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer | ✅ 深变信息 Encode 成 latent code, 与 noise concat 到一起。 | |||
| 122 | 2023 | Structure and Content-Guided Video Synthesis with Diffusion Models | Transfer the style of a video using text prompts given a “driving video”,以多种形式在预训练图像扩散模型中融入时序混合层进行扩展 | ![]() | Gen-1, Framewise, depth-guided |
| 123 | 2023 | Pix2Video: Video Editing using Image Diffusion | Framewise depth-guided video editing |
P199
ControlNet / Multiple Control
也是control net 形式,但用到更多控制条件。
| ID | Year | Name | Note | Tags | Link |
|---|---|---|---|---|---|
| 124 | 2023 | ControlVideo: Training-free Controllable Text-to-Video Generation | |||
| 2023 | VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet | Optical flow-guided video editing; I, P, B frames in video compression ✅ 内容一致性,适用于 style transfer, 但需要对物体有较大编辑力度时不适用(例如编辑物体形状)。 | ![]() | ||
| 2023 | CCEdit: Creative and Controllable Video Editing via Diffusion Models | ![]() | |||
| 2023 | VideoComposer: Compositional Video Synthesis with Motion Controllability | Image-, sketch-, motion-, depth-, mask-controlled video editing ✅ 每个 condition 进来,都过一个 STC-Encoder, 然后把不同 condition fuse 到一起,输入到 U-Net. Spako-Temporal Condikon encoder (STC-encoder): a unified input interface for condikons | ![]() | ||
| 2023 | Control-A-Video: Controllable Text-to-Video Generagon with Diffusion Models | 通过边缘图或深度图等序列化控制信号生成视频,并提出两种运动自适应噪声初始化策略 | ![]() | ||
| 2024 | Vmc: Video motion customization using temporal attention adaption for text-to-video diffusion models. | 轨迹控制 | |||
| 2023 | MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation | ![]() | |||
| 2023 | Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance | ![]() | |||
| 2023 | MagicEdit: High-Fidelity and Temporally Coherent Video Editing | ![]() | |||
| 2023 | EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints | ![]() |
P225
Point-Control
| ID | Year | Name | Note | Tags | Link |
|---|---|---|---|---|---|
| 98 | 2023 | VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence |
P226
本文出自CaterpillarStudyGroup,转载请注明出处。
https://caterpillarstudygroup.github.io/ImportantArticles/








