1. ReadPapers
  2. 1. Introduction
  3. 2. Seamless Human Motion Composition with Blended Positional Encodings
  4. 3. FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
  5. 4. Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model
  6. 5. Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
  7. 6. StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework
  8. 7. EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation
  9. 8. Motion Mamba: Efficient and Long Sequence Motion Generation
  10. 9. M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
  11. 10. T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences
  12. 11. AttT2M:Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism
  13. 12. BAD: Bidirectional Auto-Regressive Diffusion for Text-to-Motion Generation
  14. 13. MMM: Generative Masked Motion Model
  15. 14. Priority-Centric Human Motion Generation in Discrete Latent Space
  16. 15. AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond
  17. 16. MotionGPT: Human Motion as a Foreign Language
  18. 17. Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation
  19. 18. PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting
  20. 19. Incorporating Physics Principles for Precise Human Motion Prediction
  21. 20. PIMNet: Physics-infused Neural Network for Human Motion Prediction
  22. 21. PhysDiff: Physics-Guided Human Motion Diffusion Model
  23. 22. NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
  24. 23. Pose-NDF: Modeling Human Pose Manifolds with Neural Distance Fields
  25. 24. Geometric Neural Distance Fields for Learning Human Motion Priors
  26. 25. Character Controllers Using Motion VAEs
  27. 26. Improving Human Motion Plausibility with Body Momentum
  28. 27. MoGlow: Probabilistic and controllable motion synthesis using normalising flows
  29. 28. Modi: Unconditional motion synthesis from diverse data
  30. 29. MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
  31. 30. A deep learning framework for character motion synthesis and editing
  32. 31. Multi-Object Sketch Animation with Grouping and Motion Trajectory Priors
  33. 32. TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos
  34. 33. X-MoGen: Unified Motion Generation across Humans and Animals
  35. 34. Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
  36. 35. MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation
  37. 36. Drop: Dynamics responses from human motion prior and projective dynamics
  38. 37. POMP: Physics-constrainable Motion Generative Model through Phase Manifolds
  39. 38. Dreamgaussian4d: Generative 4d gaussian splatting
  40. 39. Drive Any Mesh: 4D Latent Diffusion for Mesh Deformation from Video
  41. 40. AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
  42. 41. ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
  43. 42. Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
  44. 43. Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals
  45. 44. Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation
  46. 45. Generating time-consistent dynamics with discriminator-guided image diffusion models
  47. 46. GENMO:AGENeralist Model for Human MOtion
  48. 47. HGM3: HIERARCHICAL GENERATIVE MASKED MOTION MODELING WITH HARD TOKEN MINING
  49. 48. Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion
  50. 49. MoCLIP: Motion-Aware Fine-Tuning and Distillation of CLIP for Human Motion Generation
  51. 50. FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
  52. 51. VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
  53. 52. DragAnything: Motion Control for Anything using Entity Representation
  54. 53. PhysAnimator: Physics-Guided Generative Cartoon Animation
  55. 54. SOAP: Style-Omniscient Animatable Portraits
  56. 55. Neural Discrete Representation Learning
  57. 56. TSTMotion: Training-free Scene-aware Text-to-motion Generation
  58. 57. Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis
  59. 58. A lip sync expert is all you need for speech to lip generation in the wild
  60. 59. MUSETALK: REAL-TIME HIGH QUALITY LIP SYN-CHRONIZATION WITH LATENT SPACE INPAINTING
  61. 60. LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync
  62. 61. T2m-gpt: Generating human motion from textual descriptions with discrete representations
  63. 62. Motiongpt: Finetuned llms are general-purpose motion generators
  64. 63. Guided Motion Diffusion for Controllable Human Motion Synthesis
  65. 64. OmniControl: Control Any Joint at Any Time for Human Motion Generation
  66. 65. Learning Long-form Video Prior via Generative Pre-Training
  67. 66. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
  68. 67. Magic3D: High-Resolution Text-to-3D Content Creation
  69. 68. CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
  70. 69. One-Minute Video Generation with Test-Time Training
  71. 70. Key-Locked Rank One Editing for Text-to-Image Personalization
  72. 71. MARCHING CUBES: A HIGH RESOLUTION 3D SURFACE CONSTRUCTION ALGORITHM
  73. 72. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
  74. 73. NULL-text Inversion for Editing Real Images Using Guided Diffusion Models
  75. 74. simple diffusion: End-to-end diffusion for high resolution images
  76. 75. One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
  77. 76. Scalable Diffusion Models with Transformers
  78. 77. All are Worth Words: a ViT Backbone for Score-based Diffusion Models
  79. 78. An image is worth 16x16 words: Transformers for image recognition at scale
  80. 79. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
  81. 80. Photorealistic text-to-image diffusion models with deep language understanding||Imagen
  82. 81. DreamFusion: Text-to-3D using 2D Diffusion
  83. 82. GLIGEN: Open-Set Grounded Text-to-Image Generation
  84. 83. Adding Conditional Control to Text-to-Image Diffusion Models
  85. 84. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
  86. 85. Multi-Concept Customization of Text-to-Image Diffusion
  87. 86. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
  88. 87. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
  89. 88. VisorGPT: Learning Visual Prior via Generative Pre-Training
  90. 89. NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
  91. 90. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
  92. 91. ModelScope Text-to-Video Technical Report
  93. 92. Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
  94. 93. Make-A-Video: Text-to-Video Generation without Text-Video Data
  95. 94. Video Diffusion Models
  96. 95. Learning Transferable Visual Models From Natural Language Supervision
  97. 96. Implicit Warping for Animation with Image Sets
  98. 97. Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
  99. 98. Motion-Conditioned Diffusion Model for Controllable Video Synthesis
  100. 99. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
  101. 100. UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
  102. 101. Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
  103. 102. Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
  104. 103. A Recipe for Scaling up Text-to-Video Generation
  105. 104. High-Resolution Image Synthesis with Latent Diffusion Models
  106. 105. Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
  107. 106. 数据集:HumanVid
  108. 107. HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
  109. 108. StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
  110. 109. 数据集:Zoo-300K
  111. 110. Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
  112. 111. LORA: LOW-RANK ADAPTATION OF LARGE LAN-GUAGE MODELS
  113. 112. TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
  114. 113. GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
  115. 114. MagicPony: Learning Articulated 3D Animals in the Wild
  116. 115. Splatter a Video: Video Gaussian Representation for Versatile Processing
  117. 116. 数据集:Dynamic Furry Animal Dataset
  118. 117. Artemis: Articulated Neural Pets with Appearance and Motion Synthesis
  119. 118. SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
  120. 119. CAT3D: Create Anything in 3D with Multi-View Diffusion Models
  121. 120. PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios
  122. 121. Humans in 4D: Reconstructing and Tracking Humans with Transformers
  123. 122. Learning Human Motion from Monocular Videos via Cross-Modal Manifold Alignment
  124. 123. PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
  125. 124. Imagic: Text-Based Real Image Editing with Diffusion Models
  126. 125. DiffEdit: Diffusion-based semantic image editing with mask guidance
  127. 126. Dual diffusion implicit bridges for image-to-image translation
  128. 127. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
  129. 128. Prompt-to-Prompt Image Editing with Cross-Attention Control
  130. 129. WANDR: Intention-guided Human Motion Generation
  131. 130. TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
  132. 131. 3D Gaussian Splatting for Real-Time Radiance Field Rendering
  133. 132. Decoupling Human and Camera Motion from Videos in the Wild
  134. 133. HMP: Hand Motion Priors for Pose and Shape Estimation from Video
  135. 134. HuMoR: 3D Human Motion Model for Robust Pose Estimation
  136. 135. Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video
  137. 136. Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation
  138. 137. WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
  139. 138. Tackling the Generative Learning Trilemma with Denoising Diffusion GANs
  140. 139. Elucidating the Design Space of Diffusion-Based Generative Models
  141. 140. SCORE-BASED GENERATIVE MODELING THROUGHSTOCHASTIC DIFFERENTIAL EQUATIONS
  142. 141. Consistency Models
  143. 142. Classifier-Free Diffusion Guidance
  144. 143. Cascaded Diffusion Models for High Fidelity Image Generation
  145. 144. LEARNING ENERGY-BASED MODELS BY DIFFUSIONRECOVERY LIKELIHOOD
  146. 145. On Distillation of Guided Diffusion Models
  147. 146. Denoising Diffusion Implicit Models
  148. 147. PROGRESSIVE DISTILLATION FOR FAST SAMPLING OF DIFFUSION MODELS
  149. 148. Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
  150. 149. ControlVideo: Training-free Controllable Text-to-Video Generation
  151. 150. Pix2Video: Video Editing using Image Diffusion
  152. 151. Structure and Content-Guided Video Synthesis with Diffusion Models
  153. 152. MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
  154. 153. MotionDirector: Motion Customization of Text-to-Video Diffusion Models
  155. 154. Dreamix: Video Diffusion Models are General Video Editors
  156. 155. Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
  157. 156. TokenFlow: Consistent Diffusion Features for Consistent Video Editing
  158. 157. DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
  159. 158. Content Deformation Fields for Temporally Consistent Video Processing
  160. 159. PFNN: Phase-Functioned Neural Networks

ReadPapers

VisorGPT

Can we model such visual prior with LLM

P114

Prompt design

P118

Modeling Visual Prior via Generative Pre-Training

P119

Sample from the LLM which has learned visual prior

P120