AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
T2I -> T2V
Transform domain-specific T2I models to T2V models
- Domain-specific (personalized) models are widely available for image
- Domain-specific finetuning methodologies: LoRA, DreamBooth…
- Communities: Hugging Face, CivitAI…
- Task: turn these image models into T2V models, without specific finetuning
✅ (1) 用同一个 patten 生成 noise,得到的 image 可能更有一致性。
✅ (2) 中间帧的特征保持一致。
P99
Methodology
- Train a motion modeling module (some temporal layers) together with frozen base T2I model
- Plug it into a domain-specific T2I model during inference
✅ 优势:可以即插即用到各种用户定制化的模型中。
✅ 在 noise 上对内容进行编辑,即定义第一帧的 noise,以及后面帧的 noise 运动趋势。
P100
Training
- Train on WebVid-10M, resized at 256x256 (experiments show can generalize to higher res.)
✅ 在低分辨率数据上训练,但结果可以泛化到高分辨率。
✅ 保证中间帧尽量相似。
P101
✅ 扣出背景并 smooth.