可控生成

ID	Year	Name	Note	Tags	Link
65	2023	T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models	1. 通过轻量级适配器（Adapter），将外部控制信号（如草图、深度图）与模型内部知识对齐，实现更精准的生成控制 2. 仅优化apapter，高效训练 3. 非均匀时间步采样，在扩散过程的早期阶段（图像结构形成期）增加采样概率，提升控制信号的有效性。	优化训练效率	link
66	2013	Adding Conditional Control to Text-to-Image Diffusion Models	通过克隆预训练模型的网络块，并引入“零卷积”连接，实现在不破坏原模型能力的前提下学习条件控制。	ControlNet	link
67	2023	GLIGEN: Open-Set Grounded Text-to-Image Generation			link

图像编辑

P10

Gaussian Noise方法

ID	Year	Name	Note	Tags	Link
22	2022	SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations	提出了一种无需额外训练的统一框架，通过加噪和去噪（随机微分方程SDE）的逆向过程实现图像生成与编辑。		link

DDIM Inversion方法

ID	Year	Name	Note	Tags	Link
23	2023	Dual diffusion implicit bridges for image-to-image translation	DDIB利用diffusion隐式空间的对齐性，提出了一种基于DDIM的图像到图像翻译方法，通过隐式桥接（Implicit Bridges）实现跨域转换。	DDIM	link
24	2023	DiffEdit: Diffusion-based semantic image editing with mask guidance	利用扩散模型在不同文本条件下的噪声预测差异，生成与编辑语义相关的区域mask，从而实现精准的局部编辑。	DDIM, auto mask	link

编辑文本embedding

ID	Year	Name	Note	Tags	Link
25	2023	Imagic: Text-Based Real Image Editing with Diffusion Models	1. 利用T2I实现图像文本图像编辑 2. 需要微调T2I 3. 先求出\(T_{orig}\)，然后在\(T_{orig}\)和\(T_{tgt}\)之间插值		link
76	2022	NULL-text Inversion for Editing Real Images Using Guided Diffusion Models	针对真实图像（非生成图像）的编辑，以CFG为基础，fix condition分支，优化无condition分支，使其embedding向condition分支的embedding靠近	DDIM	link

Attention based 方法

ID	Year	Name	Note	Tags	Link
20	2023	Prompt-to-Prompt Image Editing with Cross-Attention Control	交叉注意力层决定了文本提示（prompt）与图像空间布局的关联，通过修改注意力图即可在不破坏原始图像结构的情况下完成编辑。仅适用于编辑用相同预训模型生成的图像。	attention控制	link
77	2022	Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation	直接操纵扩散模型内部的空间特征和自注意力机制，实现生成过程的细粒度控制。其核心思想是：从源图像中提取中间层的空间特征和自注意力图，注入目标图像的生成过程，从而在保留源图像语义布局的同时，根据文本提示修改外观属性。	attention控制	link
21	2023	InstructPix2Pix: Learning to Follow Image Editing Instructions	在已有图片的情况，输入完整的控制文本不符合用户习惯，用户只需要告诉模型要怎么修改图像，通过 Prompt 2 Prompt 转化为完整 prompt.		link

P32

特定对象定制化的图像生成

ID	Year	Name	Note	Tags	Link
62	2023	DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation	每个主体分配一个罕见词（如“sks”），作为其文本标签。通用微调扩散模型，使其能够精准生成特定主体。	finetune	link
63	2023	An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion	不修改模型权重，而是通过优化文本嵌入空间中的一个新的嵌入向量来表示目标概念。该向量可以像普通词汇一样被插入到自然语言描述中，指导模型生成包含该概念的图像。	Textual Inversion, 优化	link
38	2021	Lora: Low-rank adaptation of large language models	对已训好的大模型进行微调，生成想要的风格。学习其中的残差。残差通常可以用low rank Matrix来拟合，因此称为low-rank adaptation。low rank的好处是要训练或调整的参数非常少。	优化训练效率	link
		Lora + Dreambooth (by Simo Ryu)	没有找到论文		https://github.com/cloneofsimo/lora

P43

多个特定对象定制化的图像生成

ID	Year	Name	Note	Tags	Link
52	2024	Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models	多个特定对象的图像生成，让多个特定的对象生成到一张图像中，并用2D pose控制对象的动作	TI, LoRA	link
64	2023	Multi-Concept Customization of Text-to-Image Diffusion	1. 用『正则化』的方法防止多concept之间的混淆 2. 用"仅finetune KV"的方法提升训练效率 3. 用『多概念组合优化』的方法把多个concept融合	优化训练效率， TI	link
79	2023	Key-Locked Rank One Editing for Text-to-Image Personalization	✅ 方法：dynamic rank one update. ✅ Perffusion 解决 Image Personalization 的 overfitting 问题的方法： ✅ (1) 训练时，Introducing new xxxx that locks the new concepts cross-attention keys to their sub-ordinate category. ✅ (2) 推断时，引入 a gate rank one approach 可用于控制 the learned concept的影响力。 ✅ (3) 允许 medel 把不同的 concept 结合到一起，并学到不同concept 之间的联系。 Results: 可以很好地model the interaction of the two conception。		link

P67

Other applications

P68

Your Diffusion Model is Secretly a Zero-Shot Classifier

✅ 一个预训练好的 diffusion model （例如stable diffusion model），无须额外训练可以用作分类器，甚至能完成 Zero-shot 的分类任务。

Li et al., "Your Diffusion Model is Secretly a Zero-Shot Classifier", arXiv 2023

Pipeline

✅ 输入图像\(x\)，用随机噪声\(\epsilon \)加噪；再用 condition c 预测噪声 \(\epsilon _\theta \)。优化条件 C 使得 \(\epsilon _\theta \) 最接近 \(\epsilon \). 得到的 C 就是分类。

P69

Improving Robustness using Generated Data

✅ 使用 diffusion Model 做数据增强。

Overview of the approach:

train a generative model and a nonrobust classifier, which are used to provide pseudo-labels to the generated data.
The generated and original training data are combined to train a robust classifier.

Gowal et al., "Improving Robustness using Generated Data", NeurIPS 2021

P70

Better Diffusion Models Further Improve Adversarial Training

Wang et al., "Better Diffusion Models Further Improve Adversarial Training", ICML 2023

多模态生成

ID	Year	Name	Note	Tags	Link
74	2023	One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale		U-Vit base model	link

P72

Reference

Li et al., "Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models", NeurIPS 2022
Avrahami et al., "Blended Diffusion for Text-driven Editing of Natural Images", CVPR 2022
Sarukkai et al., "Collage Diffusion", arXiv 2023
Bar-Tal et al., "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation", ICML 2023
Kumari et al., "Multi-Concept Customization of Text-to-Image Diffusion", CVPR 2023
Tewel et al., "Key-Locked Rank One Editing for Text-to-Image Personalization", SIGGRAPH 2023
Zhao et al., "A Recipe for Watermarking Diffusion Models", arXiv 2023
Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models", ICLR 2022
Avrahami et al., "SpaText: Spatio-Textual Representation for Controllable Image Generation", CVPR 2023
Orgad et al., "Editing Implicit Assumptions in Text-to-Image Diffusion Models", arXiv 2023
Han et al., "SVDiff: Compact Parameter Space for Diffusion Fine-Tuning", arXiv 2023
Xie et al., "DiffFit: Unlocking Transferability of Large Diffusion Models via Simple ParameterEfficient Fine-Tuning", rXiv 2023
Saharia et al., "Palette: Image-to-Image Diffusion Models", SIGGRAPH 2022
Whang et al., "Deblurring via Stochastic Refinement", CVPR 2022
Xu et al., "Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models", arXiv 2023
Saxena et al., "Monocular Depth Estimation using Diffusion Models", arXiv 2023
Li et al., "Your Diffusion Model is Secretly a Zero-Shot Classifier", arXiv 2023
Gowal et al., "Improving Robustness using Generated Data", NeurIPS 2021
Wang et al., "Better Diffusion Models Further Improve Adversarial Training", ICML 2023

本文出自CaterpillarStudyGroup，转载请注明出处。

https://caterpillarstudygroup.github.io/ImportantArticles/