3.2 Training-free

P178

IDYearNameNoteTagsLink
1172023TokenFlow: Consistent Diffusion Features for Consistent Video Editing
2023FateZero: Fusing Attentions for Zero-shot Text-based Video EditingAttention map fusing for better temporal consistency
- During DDIM inversion, save inverted self-/cross-attention maps
- During editing, use some algorithms to blend inverted maps and generated maps


P187

More Works

MeDM (Chu et al.)
OpScal flow-based guidance for temporal consistency
“MeDM: Mediagng Image Diffusion Models for Video-to Video Translagon with Temporal Correspondence Guidance,” arXiv 2023.
Ground-A-Video (Jeong et al.)
Improve temporal consistency via modified attention and optical flow
“Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models,” arXiv 2023.
Gen-L-Video (Lorem et al.)
Edit very long videos using existing generators
“Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising,” arXiv 2023.
FLATTEN (Cong et al.)
Optical flow-guided attention for temporal consistency
“Flatten: optical flow-guided attention for consistent text-to-video editing,” arXiv 2023.
InFusion (Khandelwal et al.)
Improve temporal consistency via fusing latents
“InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot Text-based Video Editing,” ICCVW 2023.
Vid2Vid-Zero (Wang et al.)
Improve temporal consistency via cross￾attention guidance and null-text inversion
“Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models,” arXiv 2023.

P194

✅ 对于输入文本的每个 wordtoken, 都可以通过 attentior map 找到图像中的大概位置,把要去除的 token mask 掉,剩下部分保留。生成图像则把非 token 部分 mask 掉,以此进行两部分的融合。

P197

✅ 基于不同信号的各种版的 control net.


本文出自CaterpillarStudyGroup,转载请注明出处。

https://caterpillarstudygroup.github.io/ImportantArticles/