3.2 Training-free
P178
| ID | Year | Name | Note | Tags | Link |
|---|---|---|---|---|---|
| 117 | 2023 | TokenFlow: Consistent Diffusion Features for Consistent Video Editing | |||
| 2023 | FateZero: Fusing Attentions for Zero-shot Text-based Video Editing | Attention map fusing for better temporal consistency - During DDIM inversion, save inverted self-/cross-attention maps - During editing, use some algorithms to blend inverted maps and generated maps | ![]() |
P187
More Works
![]() | MeDM (Chu et al.) OpScal flow-based guidance for temporal consistency “MeDM: Mediagng Image Diffusion Models for Video-to Video Translagon with Temporal Correspondence Guidance,” arXiv 2023. |
![]() | Ground-A-Video (Jeong et al.) Improve temporal consistency via modified attention and optical flow “Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models,” arXiv 2023. |
![]() | Gen-L-Video (Lorem et al.) Edit very long videos using existing generators “Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising,” arXiv 2023. |
![]() | FLATTEN (Cong et al.) Optical flow-guided attention for temporal consistency “Flatten: optical flow-guided attention for consistent text-to-video editing,” arXiv 2023. |
![]() | InFusion (Khandelwal et al.) Improve temporal consistency via fusing latents “InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot Text-based Video Editing,” ICCVW 2023. |
![]() | Vid2Vid-Zero (Wang et al.) Improve temporal consistency via crossattention guidance and null-text inversion “Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models,” arXiv 2023. |
P194
✅ 对于输入文本的每个 wordtoken, 都可以通过 attentior map 找到图像中的大概位置,把要去除的 token mask 掉,剩下部分保留。生成图像则把非 token 部分 mask 掉,以此进行两部分的融合。
P197
✅ 基于不同信号的各种版的 control net.
本文出自CaterpillarStudyGroup,转载请注明出处。
https://caterpillarstudygroup.github.io/ImportantArticles/






