SDS

ID	Year	Name	Note	Tags	Link
82	2023	Magic3D: High-Resolution Text-to-3D Content Creation	在68的基础上： 1. 采用“粗到细”（Coarse-to-Fine）的两阶段优化策略，，结合不同分辨率扩散模型与场景表示，coarse阶段速度更快，Fine阶段提升细节 2. Coarse阶段采用Instant-NGP** + eDiff-I，快速收敛，且适合处理复杂拓扑变化。 3. Fine阶段使用DMTet + LDM	SDS, Coarse-to-Fine	link
	2023	Wang et al.,"Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation",		Alternative to SDS
	2023	Wang et al., "ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation",		Alternative to SDS

P31

Alternative to SDS: Score Jacobian Chaining

A different formulation, motivated from approximating 3D score.

In principle, the diffusion model is the noisy 2D score (over clean images),
but in practice, the diffusion model suffers from out-of-distribution (OOD) issues!

For diffusion model on noisy images, the non-noisy images are OOD!

✅ 2D sample, 3D score

P32

Score Jacobian Chaining

SJC approximates noisy score with “Perturb-and-Average Scoring”, which is not present in SDS.

Use score model on multiple noise-perturbed data, then average it.

✅ 通过这种方法来近似 clean image 的输出，解决 clean image 的 OOD 问题。

P33

SJC and SDS

SJC is a competitive alternative to SDS.

P34

Alternative to SDS: ProlificDreamer

SDS-based method often set classifier-guidance weight to 100, which limits the “diversity” of the generated samples.
ProlificDreamer reduces this to 7.5, leading to diverse samples.

P35

ProlificDreamer and Variational Score Distillation

Instead of maximizing the likelihood under diffusion model, VSD minimizes the KL divergence via variational inference.

$$ \begin{matrix} \min_{\mu } D _ {\mathrm{KL} }(q^\mu _ 0(\mathbf{x} _ 0|y)||p _ 0(\mathbf{x} _ 0|y)). \\ \quad \mu \quad \text{is the distribution of NeRFs} . \end{matrix} $$

Suppose is a $\theta _ \tau \sim \mu $ NeRF sample, then VSD simulates this ODE:

Diffusion model can be used to approximate score of noisy real images.
How about noisy rendered images? sss

✅ 第一项由 diffusion model 得到，在此处当作 GT．

P36

Learn another diffusion model to approximate the score of noisy rendered images!

✅ 使用 LoRA 近第二项。

P37

Why does VSD work in practice?

The valid text-to-image NeRFs form a distribution with infinite possibilities!
In SDS, epsilon is the score of noisy “dirac distribution” over finite renders, which converges to the true score with infinite renders!
In VSD, the LoRA model aims to represent the (true) score of noisy distribution over infinite number of renders!
If the generated NeRF distribution is only one point and LoRA overfits perfectly, then VSD = SDS!
But LoRA has good generalization (and learns from a trajectory of NeRFs), so closer to the true score!
This is analogous to
- Representing the dataset score via mixture of Gaussians on the dataset (SDS), versus
- Representing the dataset score via the LoRA UNet (VSD)

本文出自CaterpillarStudyGroup，转载请注明出处。

https://caterpillarstudygroup.github.io/ImportantArticles/