1. ReadPapers
  2. 1. Introduction
  3. 2. FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity
  4. 3. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
  5. 4. ParticleGS: Particle-Based Dynamics Modeling of 3D Gaussians for Prior-free Motion Extrapolation
  6. 5. Animate3d: Animating any 3d model with multi-view video diffusion
  7. 6. Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos
  8. 7. HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
  9. 8. PIG: Physically-based Multi-Material Interaction with 3D Gaussians
  10. 9. EnliveningGS: Active Locomotion of 3DGS
  11. 10. SplineGS: Learning Smooth Trajectories in Gaussian Splatting for Dynamic Scene Reconstruction
  12. 11. PAMD: Plausibility-Aware Motion Diffusion Model for Long Dance Generation
  13. 12. PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum Learning
  14. 13. LengthAware Motion Synthesis via Latent Diffusion
  15. 14. IKMo: Image-Keyframed Motion Generation with Trajectory-Pose Conditioned Motion Diffusion Model
  16. 15. UniMoGen: Universal Motion Generation
  17. 16. AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion
  18. 17. Flame: Free-form language-based motion synthesis & editing
  19. 18. Human Motion Diffusion as a Generative Prior
  20. 19. Text-driven Human Motion Generation with Motion Masked Diffusion Model
  21. 20. ReMoDiffuse: RetrievalAugmented Motion Diffusion Model
  22. 21. MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
  23. 22. ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment
  24. 23. Absolute Coordinates Make Motion Generation Easy
  25. 24. Seamless Human Motion Composition with Blended Positional Encodings
  26. 25. FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
  27. 26. Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model
  28. 27. Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
  29. 28. StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework
  30. 29. EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation
  31. 30. Motion Mamba: Efficient and Long Sequence Motion Generation
  32. 31. M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
  33. 32. T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences
  34. 33. AttT2M:Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism
  35. 34. BAD: Bidirectional Auto-Regressive Diffusion for Text-to-Motion Generation
  36. 35. MMM: Generative Masked Motion Model
  37. 36. Priority-Centric Human Motion Generation in Discrete Latent Space
  38. 37. AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond
  39. 38. MotionGPT: Human Motion as a Foreign Language
  40. 39. Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation
  41. 40. PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting
  42. 41. Incorporating Physics Principles for Precise Human Motion Prediction
  43. 42. PIMNet: Physics-infused Neural Network for Human Motion Prediction
  44. 43. PhysDiff: Physics-Guided Human Motion Diffusion Model
  45. 44. NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
  46. 45. Pose-NDF: Modeling Human Pose Manifolds with Neural Distance Fields
  47. 46. Geometric Neural Distance Fields for Learning Human Motion Priors
  48. 47. Character Controllers Using Motion VAEs
  49. 48. Improving Human Motion Plausibility with Body Momentum
  50. 49. MoGlow: Probabilistic and controllable motion synthesis using normalising flows
  51. 50. Modi: Unconditional motion synthesis from diverse data
  52. 51. MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
  53. 52. A deep learning framework for character motion synthesis and editing
  54. 53. Multi-Object Sketch Animation with Grouping and Motion Trajectory Priors
  55. 54. TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos
  56. 55. X-MoGen: Unified Motion Generation across Humans and Animals
  57. 56. Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
  58. 57. MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation
  59. 58. Drop: Dynamics responses from human motion prior and projective dynamics
  60. 59. POMP: Physics-constrainable Motion Generative Model through Phase Manifolds
  61. 60. Dreamgaussian4d: Generative 4d gaussian splatting
  62. 61. Drive Any Mesh: 4D Latent Diffusion for Mesh Deformation from Video
  63. 62. AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
  64. 63. ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
  65. 64. Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
  66. 65. Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals
  67. 66. Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation
  68. 67. Generating time-consistent dynamics with discriminator-guided image diffusion models
  69. 68. GENMO:AGENeralist Model for Human MOtion
  70. 69. HGM3: HIERARCHICAL GENERATIVE MASKED MOTION MODELING WITH HARD TOKEN MINING
  71. 70. Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion
  72. 71. MoCLIP: Motion-Aware Fine-Tuning and Distillation of CLIP for Human Motion Generation
  73. 72. FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
  74. 73. VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
  75. 74. DragAnything: Motion Control for Anything using Entity Representation
  76. 75. PhysAnimator: Physics-Guided Generative Cartoon Animation
  77. 76. SOAP: Style-Omniscient Animatable Portraits
  78. 77. Neural Discrete Representation Learning
  79. 78. TSTMotion: Training-free Scene-aware Text-to-motion Generation
  80. 79. Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis
  81. 80. A lip sync expert is all you need for speech to lip generation in the wild
  82. 81. MUSETALK: REAL-TIME HIGH QUALITY LIP SYN-CHRONIZATION WITH LATENT SPACE INPAINTING
  83. 82. LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync
  84. 83. T2m-gpt: Generating human motion from textual descriptions with discrete representations
  85. 84. Motiongpt: Finetuned llms are general-purpose motion generators
  86. 85. Guided Motion Diffusion for Controllable Human Motion Synthesis
  87. 86. OmniControl: Control Any Joint at Any Time for Human Motion Generation
  88. 87. Learning Long-form Video Prior via Generative Pre-Training
  89. 88. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
  90. 89. Magic3D: High-Resolution Text-to-3D Content Creation
  91. 90. CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
  92. 91. One-Minute Video Generation with Test-Time Training
  93. 92. Key-Locked Rank One Editing for Text-to-Image Personalization
  94. 93. MARCHING CUBES: A HIGH RESOLUTION 3D SURFACE CONSTRUCTION ALGORITHM
  95. 94. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
  96. 95. NULL-text Inversion for Editing Real Images Using Guided Diffusion Models
  97. 96. simple diffusion: End-to-end diffusion for high resolution images
  98. 97. One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
  99. 98. Scalable Diffusion Models with Transformers
  100. 99. All are Worth Words: a ViT Backbone for Score-based Diffusion Models
  101. 100. An image is worth 16x16 words: Transformers for image recognition at scale
  102. 101. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
  103. 102. Photorealistic text-to-image diffusion models with deep language understanding||Imagen
  104. 103. DreamFusion: Text-to-3D using 2D Diffusion
  105. 104. GLIGEN: Open-Set Grounded Text-to-Image Generation
  106. 105. Adding Conditional Control to Text-to-Image Diffusion Models
  107. 106. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
  108. 107. Multi-Concept Customization of Text-to-Image Diffusion
  109. 108. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
  110. 109. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
  111. 110. VisorGPT: Learning Visual Prior via Generative Pre-Training
  112. 111. NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
  113. 112. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
  114. 113. ModelScope Text-to-Video Technical Report
  115. 114. Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
  116. 115. Make-A-Video: Text-to-Video Generation without Text-Video Data
  117. 116. Video Diffusion Models
  118. 117. Learning Transferable Visual Models From Natural Language Supervision
  119. 118. Implicit Warping for Animation with Image Sets
  120. 119. Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
  121. 120. Motion-Conditioned Diffusion Model for Controllable Video Synthesis
  122. 121. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
  123. 122. UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
  124. 123. Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
  125. 124. Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
  126. 125. A Recipe for Scaling up Text-to-Video Generation
  127. 126. High-Resolution Image Synthesis with Latent Diffusion Models
  128. 127. Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
  129. 128. 数据集:HumanVid
  130. 129. HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
  131. 130. StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
  132. 131. 数据集:Zoo-300K
  133. 132. Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
  134. 133. LORA: LOW-RANK ADAPTATION OF LARGE LAN-GUAGE MODELS
  135. 134. TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
  136. 135. GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
  137. 136. MagicPony: Learning Articulated 3D Animals in the Wild
  138. 137. Splatter a Video: Video Gaussian Representation for Versatile Processing
  139. 138. 数据集:Dynamic Furry Animal Dataset
  140. 139. Artemis: Articulated Neural Pets with Appearance and Motion Synthesis
  141. 140. SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
  142. 141. CAT3D: Create Anything in 3D with Multi-View Diffusion Models
  143. 142. PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios
  144. 143. Humans in 4D: Reconstructing and Tracking Humans with Transformers
  145. 144. Learning Human Motion from Monocular Videos via Cross-Modal Manifold Alignment
  146. 145. PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
  147. 146. Imagic: Text-Based Real Image Editing with Diffusion Models
  148. 147. DiffEdit: Diffusion-based semantic image editing with mask guidance
  149. 148. Dual diffusion implicit bridges for image-to-image translation
  150. 149. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
  151. 150. Prompt-to-Prompt Image Editing with Cross-Attention Control
  152. 151. WANDR: Intention-guided Human Motion Generation
  153. 152. TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
  154. 153. 3D Gaussian Splatting for Real-Time Radiance Field Rendering
  155. 154. Decoupling Human and Camera Motion from Videos in the Wild
  156. 155. HMP: Hand Motion Priors for Pose and Shape Estimation from Video
  157. 156. HuMoR: 3D Human Motion Model for Robust Pose Estimation
  158. 157. Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video
  159. 158. Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation
  160. 159. WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
  161. 160. Tackling the Generative Learning Trilemma with Denoising Diffusion GANs
  162. 161. Elucidating the Design Space of Diffusion-Based Generative Models
  163. 162. SCORE-BASED GENERATIVE MODELING THROUGHSTOCHASTIC DIFFERENTIAL EQUATIONS
  164. 163. Consistency Models
  165. 164. Classifier-Free Diffusion Guidance
  166. 165. Cascaded Diffusion Models for High Fidelity Image Generation
  167. 166. LEARNING ENERGY-BASED MODELS BY DIFFUSIONRECOVERY LIKELIHOOD
  168. 167. On Distillation of Guided Diffusion Models
  169. 168. Denoising Diffusion Implicit Models
  170. 169. PROGRESSIVE DISTILLATION FOR FAST SAMPLING OF DIFFUSION MODELS
  171. 170. Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
  172. 171. ControlVideo: Training-free Controllable Text-to-Video Generation
  173. 172. Pix2Video: Video Editing using Image Diffusion
  174. 173. Structure and Content-Guided Video Synthesis with Diffusion Models
  175. 174. MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
  176. 175. MotionDirector: Motion Customization of Text-to-Video Diffusion Models
  177. 176. Dreamix: Video Diffusion Models are General Video Editors
  178. 177. Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
  179. 178. TokenFlow: Consistent Diffusion Features for Consistent Video Editing
  180. 179. DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
  181. 180. Content Deformation Fields for Temporally Consistent Video Processing
  182. 181. PFNN: Phase-Functioned Neural Networks

ReadPapers

VisorGPT

Can we model such visual prior with LLM

P114

Prompt design

P118

Modeling Visual Prior via Generative Pre-Training

P119

Sample from the LLM which has learned visual prior

P120