1. ReadPapers
  2. 1. Introduction
  3. 2. Animate3d: Animating any 3d model with multi-view video diffusion
  4. 3. Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos
  5. 4. HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
  6. 5. PIG: Physically-based Multi-Material Interaction with 3D Gaussians
  7. 6. EnliveningGS: Active Locomotion of 3DGS
  8. 7. SplineGS: Learning Smooth Trajectories in Gaussian Splatting for Dynamic Scene Reconstruction
  9. 8. PAMD: Plausibility-Aware Motion Diffusion Model for Long Dance Generation
  10. 9. PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum Learning
  11. 10. LengthAware Motion Synthesis via Latent Diffusion
  12. 11. IKMo: Image-Keyframed Motion Generation with Trajectory-Pose Conditioned Motion Diffusion Model
  13. 12. UniMoGen: Universal Motion Generation
  14. 13. AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion
  15. 14. Flame: Free-form language-based motion synthesis & editing
  16. 15. Human Motion Diffusion as a Generative Prior
  17. 16. Text-driven Human Motion Generation with Motion Masked Diffusion Model
  18. 17. ReMoDiffuse: RetrievalAugmented Motion Diffusion Model
  19. 18. MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
  20. 19. ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment
  21. 20. Absolute Coordinates Make Motion Generation Easy
  22. 21. Seamless Human Motion Composition with Blended Positional Encodings
  23. 22. FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
  24. 23. Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model
  25. 24. Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
  26. 25. StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework
  27. 26. EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation
  28. 27. Motion Mamba: Efficient and Long Sequence Motion Generation
  29. 28. M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
  30. 29. T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences
  31. 30. AttT2M:Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism
  32. 31. BAD: Bidirectional Auto-Regressive Diffusion for Text-to-Motion Generation
  33. 32. MMM: Generative Masked Motion Model
  34. 33. Priority-Centric Human Motion Generation in Discrete Latent Space
  35. 34. AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond
  36. 35. MotionGPT: Human Motion as a Foreign Language
  37. 36. Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation
  38. 37. PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting
  39. 38. Incorporating Physics Principles for Precise Human Motion Prediction
  40. 39. PIMNet: Physics-infused Neural Network for Human Motion Prediction
  41. 40. PhysDiff: Physics-Guided Human Motion Diffusion Model
  42. 41. NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
  43. 42. Pose-NDF: Modeling Human Pose Manifolds with Neural Distance Fields
  44. 43. Geometric Neural Distance Fields for Learning Human Motion Priors
  45. 44. Character Controllers Using Motion VAEs
  46. 45. Improving Human Motion Plausibility with Body Momentum
  47. 46. MoGlow: Probabilistic and controllable motion synthesis using normalising flows
  48. 47. Modi: Unconditional motion synthesis from diverse data
  49. 48. MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
  50. 49. A deep learning framework for character motion synthesis and editing
  51. 50. Multi-Object Sketch Animation with Grouping and Motion Trajectory Priors
  52. 51. TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos
  53. 52. X-MoGen: Unified Motion Generation across Humans and Animals
  54. 53. Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
  55. 54. MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation
  56. 55. Drop: Dynamics responses from human motion prior and projective dynamics
  57. 56. POMP: Physics-constrainable Motion Generative Model through Phase Manifolds
  58. 57. Dreamgaussian4d: Generative 4d gaussian splatting
  59. 58. Drive Any Mesh: 4D Latent Diffusion for Mesh Deformation from Video
  60. 59. AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
  61. 60. ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
  62. 61. Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
  63. 62. Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals
  64. 63. Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation
  65. 64. Generating time-consistent dynamics with discriminator-guided image diffusion models
  66. 65. GENMO:AGENeralist Model for Human MOtion
  67. 66. HGM3: HIERARCHICAL GENERATIVE MASKED MOTION MODELING WITH HARD TOKEN MINING
  68. 67. Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion
  69. 68. MoCLIP: Motion-Aware Fine-Tuning and Distillation of CLIP for Human Motion Generation
  70. 69. FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
  71. 70. VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
  72. 71. DragAnything: Motion Control for Anything using Entity Representation
  73. 72. PhysAnimator: Physics-Guided Generative Cartoon Animation
  74. 73. SOAP: Style-Omniscient Animatable Portraits
  75. 74. Neural Discrete Representation Learning
  76. 75. TSTMotion: Training-free Scene-aware Text-to-motion Generation
  77. 76. Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis
  78. 77. A lip sync expert is all you need for speech to lip generation in the wild
  79. 78. MUSETALK: REAL-TIME HIGH QUALITY LIP SYN-CHRONIZATION WITH LATENT SPACE INPAINTING
  80. 79. LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync
  81. 80. T2m-gpt: Generating human motion from textual descriptions with discrete representations
  82. 81. Motiongpt: Finetuned llms are general-purpose motion generators
  83. 82. Guided Motion Diffusion for Controllable Human Motion Synthesis
  84. 83. OmniControl: Control Any Joint at Any Time for Human Motion Generation
  85. 84. Learning Long-form Video Prior via Generative Pre-Training
  86. 85. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
  87. 86. Magic3D: High-Resolution Text-to-3D Content Creation
  88. 87. CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
  89. 88. One-Minute Video Generation with Test-Time Training
  90. 89. Key-Locked Rank One Editing for Text-to-Image Personalization
  91. 90. MARCHING CUBES: A HIGH RESOLUTION 3D SURFACE CONSTRUCTION ALGORITHM
  92. 91. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
  93. 92. NULL-text Inversion for Editing Real Images Using Guided Diffusion Models
  94. 93. simple diffusion: End-to-end diffusion for high resolution images
  95. 94. One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
  96. 95. Scalable Diffusion Models with Transformers
  97. 96. All are Worth Words: a ViT Backbone for Score-based Diffusion Models
  98. 97. An image is worth 16x16 words: Transformers for image recognition at scale
  99. 98. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
  100. 99. Photorealistic text-to-image diffusion models with deep language understanding||Imagen
  101. 100. DreamFusion: Text-to-3D using 2D Diffusion
  102. 101. GLIGEN: Open-Set Grounded Text-to-Image Generation
  103. 102. Adding Conditional Control to Text-to-Image Diffusion Models
  104. 103. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
  105. 104. Multi-Concept Customization of Text-to-Image Diffusion
  106. 105. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
  107. 106. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
  108. 107. VisorGPT: Learning Visual Prior via Generative Pre-Training
  109. 108. NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
  110. 109. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
  111. 110. ModelScope Text-to-Video Technical Report
  112. 111. Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
  113. 112. Make-A-Video: Text-to-Video Generation without Text-Video Data
  114. 113. Video Diffusion Models
  115. 114. Learning Transferable Visual Models From Natural Language Supervision
  116. 115. Implicit Warping for Animation with Image Sets
  117. 116. Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
  118. 117. Motion-Conditioned Diffusion Model for Controllable Video Synthesis
  119. 118. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
  120. 119. UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
  121. 120. Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
  122. 121. Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
  123. 122. A Recipe for Scaling up Text-to-Video Generation
  124. 123. High-Resolution Image Synthesis with Latent Diffusion Models
  125. 124. Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
  126. 125. 数据集:HumanVid
  127. 126. HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
  128. 127. StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
  129. 128. 数据集:Zoo-300K
  130. 129. Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
  131. 130. LORA: LOW-RANK ADAPTATION OF LARGE LAN-GUAGE MODELS
  132. 131. TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
  133. 132. GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
  134. 133. MagicPony: Learning Articulated 3D Animals in the Wild
  135. 134. Splatter a Video: Video Gaussian Representation for Versatile Processing
  136. 135. 数据集:Dynamic Furry Animal Dataset
  137. 136. Artemis: Articulated Neural Pets with Appearance and Motion Synthesis
  138. 137. SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
  139. 138. CAT3D: Create Anything in 3D with Multi-View Diffusion Models
  140. 139. PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios
  141. 140. Humans in 4D: Reconstructing and Tracking Humans with Transformers
  142. 141. Learning Human Motion from Monocular Videos via Cross-Modal Manifold Alignment
  143. 142. PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
  144. 143. Imagic: Text-Based Real Image Editing with Diffusion Models
  145. 144. DiffEdit: Diffusion-based semantic image editing with mask guidance
  146. 145. Dual diffusion implicit bridges for image-to-image translation
  147. 146. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
  148. 147. Prompt-to-Prompt Image Editing with Cross-Attention Control
  149. 148. WANDR: Intention-guided Human Motion Generation
  150. 149. TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
  151. 150. 3D Gaussian Splatting for Real-Time Radiance Field Rendering
  152. 151. Decoupling Human and Camera Motion from Videos in the Wild
  153. 152. HMP: Hand Motion Priors for Pose and Shape Estimation from Video
  154. 153. HuMoR: 3D Human Motion Model for Robust Pose Estimation
  155. 154. Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video
  156. 155. Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation
  157. 156. WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
  158. 157. Tackling the Generative Learning Trilemma with Denoising Diffusion GANs
  159. 158. Elucidating the Design Space of Diffusion-Based Generative Models
  160. 159. SCORE-BASED GENERATIVE MODELING THROUGHSTOCHASTIC DIFFERENTIAL EQUATIONS
  161. 160. Consistency Models
  162. 161. Classifier-Free Diffusion Guidance
  163. 162. Cascaded Diffusion Models for High Fidelity Image Generation
  164. 163. LEARNING ENERGY-BASED MODELS BY DIFFUSIONRECOVERY LIKELIHOOD
  165. 164. On Distillation of Guided Diffusion Models
  166. 165. Denoising Diffusion Implicit Models
  167. 166. PROGRESSIVE DISTILLATION FOR FAST SAMPLING OF DIFFUSION MODELS
  168. 167. Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
  169. 168. ControlVideo: Training-free Controllable Text-to-Video Generation
  170. 169. Pix2Video: Video Editing using Image Diffusion
  171. 170. Structure and Content-Guided Video Synthesis with Diffusion Models
  172. 171. MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
  173. 172. MotionDirector: Motion Customization of Text-to-Video Diffusion Models
  174. 173. Dreamix: Video Diffusion Models are General Video Editors
  175. 174. Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
  176. 175. TokenFlow: Consistent Diffusion Features for Consistent Video Editing
  177. 176. DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
  178. 177. Content Deformation Fields for Temporally Consistent Video Processing
  179. 178. PFNN: Phase-Functioned Neural Networks

ReadPapers

VisorGPT

Can we model such visual prior with LLM

P114

Prompt design

P118

Modeling Visual Prior via Generative Pre-Training

P119

Sample from the LLM which has learned visual prior

P120