1. ReadPapers
  2. 1. Introduction
  3. 2. Regional Time Stepping for SPH
  4. 3. FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity
  5. 4. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
  6. 5. ParticleGS: Particle-Based Dynamics Modeling of 3D Gaussians for Prior-free Motion Extrapolation
  7. 6. Animate3d: Animating any 3d model with multi-view video diffusion
  8. 7. Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos
  9. 8. HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
  10. 9. PIG: Physically-based Multi-Material Interaction with 3D Gaussians
  11. 10. EnliveningGS: Active Locomotion of 3DGS
  12. 11. SplineGS: Learning Smooth Trajectories in Gaussian Splatting for Dynamic Scene Reconstruction
  13. 12. PAMD: Plausibility-Aware Motion Diffusion Model for Long Dance Generation
  14. 13. PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum Learning
  15. 14. LengthAware Motion Synthesis via Latent Diffusion
  16. 15. IKMo: Image-Keyframed Motion Generation with Trajectory-Pose Conditioned Motion Diffusion Model
  17. 16. UniMoGen: Universal Motion Generation
  18. 17. AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion
  19. 18. Flame: Free-form language-based motion synthesis & editing
  20. 19. Human Motion Diffusion as a Generative Prior
  21. 20. Text-driven Human Motion Generation with Motion Masked Diffusion Model
  22. 21. ReMoDiffuse: RetrievalAugmented Motion Diffusion Model
  23. 22. MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
  24. 23. ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment
  25. 24. Absolute Coordinates Make Motion Generation Easy
  26. 25. Seamless Human Motion Composition with Blended Positional Encodings
  27. 26. FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
  28. 27. Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model
  29. 28. Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
  30. 29. StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework
  31. 30. EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation
  32. 31. Motion Mamba: Efficient and Long Sequence Motion Generation
  33. 32. M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
  34. 33. T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences
  35. 34. AttT2M:Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism
  36. 35. BAD: Bidirectional Auto-Regressive Diffusion for Text-to-Motion Generation
  37. 36. MMM: Generative Masked Motion Model
  38. 37. Priority-Centric Human Motion Generation in Discrete Latent Space
  39. 38. AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond
  40. 39. MotionGPT: Human Motion as a Foreign Language
  41. 40. Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation
  42. 41. PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting
  43. 42. Incorporating Physics Principles for Precise Human Motion Prediction
  44. 43. PIMNet: Physics-infused Neural Network for Human Motion Prediction
  45. 44. PhysDiff: Physics-Guided Human Motion Diffusion Model
  46. 45. NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
  47. 46. Pose-NDF: Modeling Human Pose Manifolds with Neural Distance Fields
  48. 47. Geometric Neural Distance Fields for Learning Human Motion Priors
  49. 48. Character Controllers Using Motion VAEs
  50. 49. Improving Human Motion Plausibility with Body Momentum
  51. 50. MoGlow: Probabilistic and controllable motion synthesis using normalising flows
  52. 51. Modi: Unconditional motion synthesis from diverse data
  53. 52. MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
  54. 53. A deep learning framework for character motion synthesis and editing
  55. 54. Multi-Object Sketch Animation with Grouping and Motion Trajectory Priors
  56. 55. TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos
  57. 56. X-MoGen: Unified Motion Generation across Humans and Animals
  58. 57. Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
  59. 58. MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation
  60. 59. Drop: Dynamics responses from human motion prior and projective dynamics
  61. 60. POMP: Physics-constrainable Motion Generative Model through Phase Manifolds
  62. 61. Dreamgaussian4d: Generative 4d gaussian splatting
  63. 62. Drive Any Mesh: 4D Latent Diffusion for Mesh Deformation from Video
  64. 63. AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
  65. 64. ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
  66. 65. Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
  67. 66. Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals
  68. 67. Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation
  69. 68. Generating time-consistent dynamics with discriminator-guided image diffusion models
  70. 69. GENMO:AGENeralist Model for Human MOtion
  71. 70. HGM3: HIERARCHICAL GENERATIVE MASKED MOTION MODELING WITH HARD TOKEN MINING
  72. 71. Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion
  73. 72. MoCLIP: Motion-Aware Fine-Tuning and Distillation of CLIP for Human Motion Generation
  74. 73. FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
  75. 74. VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
  76. 75. DragAnything: Motion Control for Anything using Entity Representation
  77. 76. PhysAnimator: Physics-Guided Generative Cartoon Animation
  78. 77. SOAP: Style-Omniscient Animatable Portraits
  79. 78. Neural Discrete Representation Learning
  80. 79. TSTMotion: Training-free Scene-aware Text-to-motion Generation
  81. 80. Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis
  82. 81. A lip sync expert is all you need for speech to lip generation in the wild
  83. 82. MUSETALK: REAL-TIME HIGH QUALITY LIP SYN-CHRONIZATION WITH LATENT SPACE INPAINTING
  84. 83. LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync
  85. 84. T2m-gpt: Generating human motion from textual descriptions with discrete representations
  86. 85. Motiongpt: Finetuned llms are general-purpose motion generators
  87. 86. Guided Motion Diffusion for Controllable Human Motion Synthesis
  88. 87. OmniControl: Control Any Joint at Any Time for Human Motion Generation
  89. 88. Learning Long-form Video Prior via Generative Pre-Training
  90. 89. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
  91. 90. Magic3D: High-Resolution Text-to-3D Content Creation
  92. 91. CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
  93. 92. One-Minute Video Generation with Test-Time Training
  94. 93. Key-Locked Rank One Editing for Text-to-Image Personalization
  95. 94. MARCHING CUBES: A HIGH RESOLUTION 3D SURFACE CONSTRUCTION ALGORITHM
  96. 95. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
  97. 96. NULL-text Inversion for Editing Real Images Using Guided Diffusion Models
  98. 97. simple diffusion: End-to-end diffusion for high resolution images
  99. 98. One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
  100. 99. Scalable Diffusion Models with Transformers
  101. 100. All are Worth Words: a ViT Backbone for Score-based Diffusion Models
  102. 101. An image is worth 16x16 words: Transformers for image recognition at scale
  103. 102. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
  104. 103. Photorealistic text-to-image diffusion models with deep language understanding||Imagen
  105. 104. DreamFusion: Text-to-3D using 2D Diffusion
  106. 105. GLIGEN: Open-Set Grounded Text-to-Image Generation
  107. 106. Adding Conditional Control to Text-to-Image Diffusion Models
  108. 107. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
  109. 108. Multi-Concept Customization of Text-to-Image Diffusion
  110. 109. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
  111. 110. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
  112. 111. VisorGPT: Learning Visual Prior via Generative Pre-Training
  113. 112. NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
  114. 113. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
  115. 114. ModelScope Text-to-Video Technical Report
  116. 115. Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
  117. 116. Make-A-Video: Text-to-Video Generation without Text-Video Data
  118. 117. Video Diffusion Models
  119. 118. Learning Transferable Visual Models From Natural Language Supervision
  120. 119. Implicit Warping for Animation with Image Sets
  121. 120. Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
  122. 121. Motion-Conditioned Diffusion Model for Controllable Video Synthesis
  123. 122. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
  124. 123. UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
  125. 124. Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
  126. 125. Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
  127. 126. A Recipe for Scaling up Text-to-Video Generation
  128. 127. High-Resolution Image Synthesis with Latent Diffusion Models
  129. 128. Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
  130. 129. 数据集:HumanVid
  131. 130. HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
  132. 131. StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
  133. 132. 数据集:Zoo-300K
  134. 133. Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
  135. 134. LORA: LOW-RANK ADAPTATION OF LARGE LAN-GUAGE MODELS
  136. 135. TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
  137. 136. GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
  138. 137. MagicPony: Learning Articulated 3D Animals in the Wild
  139. 138. Splatter a Video: Video Gaussian Representation for Versatile Processing
  140. 139. 数据集:Dynamic Furry Animal Dataset
  141. 140. Artemis: Articulated Neural Pets with Appearance and Motion Synthesis
  142. 141. SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
  143. 142. CAT3D: Create Anything in 3D with Multi-View Diffusion Models
  144. 143. PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios
  145. 144. Humans in 4D: Reconstructing and Tracking Humans with Transformers
  146. 145. Learning Human Motion from Monocular Videos via Cross-Modal Manifold Alignment
  147. 146. PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
  148. 147. Imagic: Text-Based Real Image Editing with Diffusion Models
  149. 148. DiffEdit: Diffusion-based semantic image editing with mask guidance
  150. 149. Dual diffusion implicit bridges for image-to-image translation
  151. 150. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
  152. 151. Prompt-to-Prompt Image Editing with Cross-Attention Control
  153. 152. WANDR: Intention-guided Human Motion Generation
  154. 153. TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
  155. 154. 3D Gaussian Splatting for Real-Time Radiance Field Rendering
  156. 155. Decoupling Human and Camera Motion from Videos in the Wild
  157. 156. HMP: Hand Motion Priors for Pose and Shape Estimation from Video
  158. 157. HuMoR: 3D Human Motion Model for Robust Pose Estimation
  159. 158. Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video
  160. 159. Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation
  161. 160. WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
  162. 161. Tackling the Generative Learning Trilemma with Denoising Diffusion GANs
  163. 162. Elucidating the Design Space of Diffusion-Based Generative Models
  164. 163. SCORE-BASED GENERATIVE MODELING THROUGHSTOCHASTIC DIFFERENTIAL EQUATIONS
  165. 164. Consistency Models
  166. 165. Classifier-Free Diffusion Guidance
  167. 166. Cascaded Diffusion Models for High Fidelity Image Generation
  168. 167. LEARNING ENERGY-BASED MODELS BY DIFFUSIONRECOVERY LIKELIHOOD
  169. 168. On Distillation of Guided Diffusion Models
  170. 169. Denoising Diffusion Implicit Models
  171. 170. PROGRESSIVE DISTILLATION FOR FAST SAMPLING OF DIFFUSION MODELS
  172. 171. Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
  173. 172. ControlVideo: Training-free Controllable Text-to-Video Generation
  174. 173. Pix2Video: Video Editing using Image Diffusion
  175. 174. Structure and Content-Guided Video Synthesis with Diffusion Models
  176. 175. MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
  177. 176. MotionDirector: Motion Customization of Text-to-Video Diffusion Models
  178. 177. Dreamix: Video Diffusion Models are General Video Editors
  179. 178. Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
  180. 179. TokenFlow: Consistent Diffusion Features for Consistent Video Editing
  181. 180. DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
  182. 181. Content Deformation Fields for Temporally Consistent Video Processing
  183. 182. PFNN: Phase-Functioned Neural Networks

ReadPapers

VisorGPT

Can we model such visual prior with LLM

P114

Prompt design

P118

Modeling Visual Prior via Generative Pre-Training

P119

Sample from the LLM which has learned visual prior

P120