1. ReadPapers
  2. 1. Introduction
  3. 2. Locomotion 技术洞察
  4. 3. AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control
  5. 4. ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters
  6. 5. Feature-Based Locomotion Controllers
  7. 6. DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills
  8. 7. ControlVAE: Model-Based Learning of Generative Controllers for Physics-Based Characters
  9. 8. Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
  10. 9. UniPhys Unified Planner and Controller with Diffusion for Flexible
  11. 10. Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead
  12. 11. PDP: Physics-Based Character Animation via Diffusion Policy
  13. 12. DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets
  14. 13. Perpetual Humanoid Control for Real-time Simulated Avatars
  15. 14. Calm: Conditional Adversarial Latent Models for Directable Virtual Characters
  16. 15. Universal humanoid motion representations for physics-based control
  17. 16. DReCon: data-driven responsive control of physics-based characters
  18. 17. PARC: Physics-based Augmentation with Reinforcement Learning for Character Controllers
  19. 18. TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
  20. 19. SAM 3: Segment Anything with Concepts
  21. 20. CLOSD: CLOSING THE LOOP BETWEEN SIMULATION AND DIFFUSION FOR MULTI-TASK CHARACTER CONTROL
  22. 21. MotionPersona: Characteristics-aware Locomotion Control
  23. 22. Diffuse-CLoC Guided Diffusion for Physics-based Character Look-ahead
  24. 23. Gait-Conditioned Reinforcement Learning with Multi-Phase Curriculum for Humanoid Locomotion
  25. 24. UniPhys: Unified Planner and Controller with Diffusion for Flexible
  26. 25. Maskedmimic: Unified physics-based character control through masked motion
  27. 26. Regional Time Stepping for SPH
  28. 27. FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity
  29. 28. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
  30. 29. ParticleGS: Particle-Based Dynamics Modeling of 3D Gaussians for Prior-free Motion Extrapolation
  31. 30. Animate3d: Animating any 3d model with multi-view video diffusion
  32. 31. Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos
  33. 32. HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
  34. 33. PIG: Physically-based Multi-Material Interaction with 3D Gaussians
  35. 34. EnliveningGS: Active Locomotion of 3DGS
  36. 35. SplineGS: Learning Smooth Trajectories in Gaussian Splatting for Dynamic Scene Reconstruction
  37. 36. PAMD: Plausibility-Aware Motion Diffusion Model for Long Dance Generation
  38. 37. PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum Learning
  39. 38. LengthAware Motion Synthesis via Latent Diffusion
  40. 39. IKMo: Image-Keyframed Motion Generation with Trajectory-Pose Conditioned Motion Diffusion Model
  41. 40. UniMoGen: Universal Motion Generation
  42. 41. AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion
  43. 42. Flame: Free-form language-based motion synthesis & editing
  44. 43. Human Motion Diffusion as a Generative Prior
  45. 44. Text-driven Human Motion Generation with Motion Masked Diffusion Model
  46. 45. ReMoDiffuse: RetrievalAugmented Motion Diffusion Model
  47. 46. MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
  48. 47. ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment
  49. 48. Absolute Coordinates Make Motion Generation Easy
  50. 49. Seamless Human Motion Composition with Blended Positional Encodings
  51. 50. FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
  52. 51. Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model
  53. 52. Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
  54. 53. StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework
  55. 54. EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation
  56. 55. Motion Mamba: Efficient and Long Sequence Motion Generation
  57. 56. M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
  58. 57. T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences
  59. 58. AttT2M:Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism
  60. 59. BAD: Bidirectional Auto-Regressive Diffusion for Text-to-Motion Generation
  61. 60. MMM: Generative Masked Motion Model
  62. 61. Priority-Centric Human Motion Generation in Discrete Latent Space
  63. 62. AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond
  64. 63. MotionGPT: Human Motion as a Foreign Language
  65. 64. Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation
  66. 65. PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting
  67. 66. Incorporating Physics Principles for Precise Human Motion Prediction
  68. 67. PIMNet: Physics-infused Neural Network for Human Motion Prediction
  69. 68. PhysDiff: Physics-Guided Human Motion Diffusion Model
  70. 69. NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
  71. 70. Riemannian Motion Generation: A Unified Framework for Human Motion Representation and Generation via Riemannian Flow Matching
  72. 71. GaussiAnimate: Rethinking Gaussian Splatting for Articulated Models via Skeleton-Aware Representation
  73. 72. SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds
  74. 73. FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On
  75. 74. Geometric Neural Distance Fields for Learning Human Motion Priors
  76. 75. Character Controllers Using Motion VAEs
  77. 76. Improving Human Motion Plausibility with Body Momentum
  78. 77. MoGlow: Probabilistic and controllable motion synthesis using normalising flows
  79. 78. Modi: Unconditional motion synthesis from diverse data
  80. 79. MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
  81. 80. A deep learning framework for character motion synthesis and editing
  82. 81. Multi-Object Sketch Animation with Grouping and Motion Trajectory Priors
  83. 82. TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos
  84. 83. X-MoGen: Unified Motion Generation across Humans and Animals
  85. 84. Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
  86. 85. MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation
  87. 86. Drop: Dynamics responses from human motion prior and projective dynamics
  88. 87. POMP: Physics-constrainable Motion Generative Model through Phase Manifolds
  89. 88. Dreamgaussian4d: Generative 4d gaussian splatting
  90. 89. Drive Any Mesh: 4D Latent Diffusion for Mesh Deformation from Video
  91. 90. AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
  92. 91. ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
  93. 92. Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
  94. 93. Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals
  95. 94. Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation
  96. 95. Generating time-consistent dynamics with discriminator-guided image diffusion models
  97. 96. GENMO:AGENeralist Model for Human MOtion
  98. 97. HGM3: HIERARCHICAL GENERATIVE MASKED MOTION MODELING WITH HARD TOKEN MINING
  99. 98. Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion
  100. 99. MoCLIP: Motion-Aware Fine-Tuning and Distillation of CLIP for Human Motion Generation
  101. 100. FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
  102. 101. VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
  103. 102. DragAnything: Motion Control for Anything using Entity Representation
  104. 103. PhysAnimator: Physics-Guided Generative Cartoon Animation
  105. 104. SOAP: Style-Omniscient Animatable Portraits
  106. 105. Neural Discrete Representation Learning
  107. 106. TSTMotion: Training-free Scene-aware Text-to-motion Generation
  108. 107. Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis
  109. 108. A lip sync expert is all you need for speech to lip generation in the wild
  110. 109. MUSETALK: REAL-TIME HIGH QUALITY LIP SYN-CHRONIZATION WITH LATENT SPACE INPAINTING
  111. 110. LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync
  112. 111. T2m-gpt: Generating human motion from textual descriptions with discrete representations
  113. 112. Motiongpt: Finetuned llms are general-purpose motion generators
  114. 113. Guided Motion Diffusion for Controllable Human Motion Synthesis
  115. 114. OmniControl: Control Any Joint at Any Time for Human Motion Generation
  116. 115. Learning Long-form Video Prior via Generative Pre-Training
  117. 116. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
  118. 117. Magic3D: High-Resolution Text-to-3D Content Creation
  119. 118. CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
  120. 119. One-Minute Video Generation with Test-Time Training
  121. 120. Key-Locked Rank One Editing for Text-to-Image Personalization
  122. 121. MARCHING CUBES: A HIGH RESOLUTION 3D SURFACE CONSTRUCTION ALGORITHM
  123. 122. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
  124. 123. NULL-text Inversion for Editing Real Images Using Guided Diffusion Models
  125. 124. simple diffusion: End-to-end diffusion for high resolution images
  126. 125. One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
  127. 126. Scalable Diffusion Models with Transformers
  128. 127. All are Worth Words: a ViT Backbone for Score-based Diffusion Models
  129. 128. An image is worth 16x16 words: Transformers for image recognition at scale
  130. 129. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
  131. 130. Photorealistic text-to-image diffusion models with deep language understanding||Imagen
  132. 131. DreamFusion: Text-to-3D using 2D Diffusion
  133. 132. GLIGEN: Open-Set Grounded Text-to-Image Generation
  134. 133. Adding Conditional Control to Text-to-Image Diffusion Models
  135. 134. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
  136. 135. Multi-Concept Customization of Text-to-Image Diffusion
  137. 136. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
  138. 137. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
  139. 138. VisorGPT: Learning Visual Prior via Generative Pre-Training
  140. 139. NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
  141. 140. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
  142. 141. ModelScope Text-to-Video Technical Report
  143. 142. Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
  144. 143. Make-A-Video: Text-to-Video Generation without Text-Video Data
  145. 144. Video Diffusion Models
  146. 145. Learning Transferable Visual Models From Natural Language Supervision
  147. 146. Implicit Warping for Animation with Image Sets
  148. 147. Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
  149. 148. Motion-Conditioned Diffusion Model for Controllable Video Synthesis
  150. 149. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
  151. 150. UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
  152. 151. Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
  153. 152. Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
  154. 153. A Recipe for Scaling up Text-to-Video Generation
  155. 154. High-Resolution Image Synthesis with Latent Diffusion Models
  156. 155. Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
  157. 156. 数据集:HumanVid
  158. 157. HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
  159. 158. StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
  160. 159. 数据集:Zoo-300K
  161. 160. Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
  162. 161. LORA: LOW-RANK ADAPTATION OF LARGE LAN-GUAGE MODELS
  163. 162. TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
  164. 163. GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
  165. 164. MagicPony: Learning Articulated 3D Animals in the Wild
  166. 165. Splatter a Video: Video Gaussian Representation for Versatile Processing
  167. 166. 数据集:Dynamic Furry Animal Dataset
  168. 167. Artemis: Articulated Neural Pets with Appearance and Motion Synthesis
  169. 168. SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
  170. 169. CAT3D: Create Anything in 3D with Multi-View Diffusion Models
  171. 170. PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios
  172. 171. Humans in 4D: Reconstructing and Tracking Humans with Transformers
  173. 172. Learning Human Motion from Monocular Videos via Cross-Modal Manifold Alignment
  174. 173. PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
  175. 174. Imagic: Text-Based Real Image Editing with Diffusion Models
  176. 175. DiffEdit: Diffusion-based semantic image editing with mask guidance
  177. 176. Dual diffusion implicit bridges for image-to-image translation
  178. 177. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
  179. 178. Prompt-to-Prompt Image Editing with Cross-Attention Control
  180. 179. WANDR: Intention-guided Human Motion Generation
  181. 180. TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
  182. 181. 3D Gaussian Splatting for Real-Time Radiance Field Rendering
  183. 182. Decoupling Human and Camera Motion from Videos in the Wild
  184. 183. HMP: Hand Motion Priors for Pose and Shape Estimation from Video
  185. 184. HuMoR: 3D Human Motion Model for Robust Pose Estimation
  186. 185. Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video
  187. 186. Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation
  188. 187. WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
  189. 188. Tackling the Generative Learning Trilemma with Denoising Diffusion GANs
  190. 189. Elucidating the Design Space of Diffusion-Based Generative Models
  191. 190. SCORE-BASED GENERATIVE MODELING THROUGHSTOCHASTIC DIFFERENTIAL EQUATIONS
  192. 191. Consistency Models
  193. 192. Classifier-Free Diffusion Guidance
  194. 193. Cascaded Diffusion Models for High Fidelity Image Generation
  195. 194. LEARNING ENERGY-BASED MODELS BY DIFFUSIONRECOVERY LIKELIHOOD
  196. 195. On Distillation of Guided Diffusion Models
  197. 196. Denoising Diffusion Implicit Models
  198. 197. PROGRESSIVE DISTILLATION FOR FAST SAMPLING OF DIFFUSION MODELS
  199. 198. Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
  200. 199. ControlVideo: Training-free Controllable Text-to-Video Generation
  201. 200. Pix2Video: Video Editing using Image Diffusion
  202. 201. Structure and Content-Guided Video Synthesis with Diffusion Models
  203. 202. MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
  204. 203. MotionDirector: Motion Customization of Text-to-Video Diffusion Models
  205. 204. Dreamix: Video Diffusion Models are General Video Editors
  206. 205. Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
  207. 206. TokenFlow: Consistent Diffusion Features for Consistent Video Editing
  208. 207. DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
  209. 208. Content Deformation Fields for Temporally Consistent Video Processing
  210. 209. PFNN: Phase-Functioned Neural Networks
  211. 210. Recurrent Transition Networks for Character Locomotion
  212. 211. Real-Time Style Modelling of Human Locomotion
  213. 212. Motion In-Betweening with Phase Manifolds
  214. 213. Mode-Adaptive Neural Networks for Quadruped Motion Control
  215. 214. Few-shot Learning of Homogeneous Human Locomotion Styles
  216. 215. Learning predict-and-simulate policies from unorganized human motion data
  217. 216. Local Motion Phases for Learning Multi-Contact Character Movements
  218. 217. Interactive Control of Diverse Complex Characters with Neural Networks
  219. 218. Accelerated Auto-regressive Motion Diffusion Model
  220. 219. DARTControl: A Diffusion-based Autoregressive Motion Model for Real-time Text-driven Motion Control
  221. 220. Interactive Character Control with Auto-Regressive Motion Diffusion Models
  222. 221. Taming Diffusion Probabilistic Models for Character Control
  223. 222. Learned Motion Matching
  224. 223. MOCHA: Real-Time Motion Characterization via Context Matching
  225. 224. DeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning
  226. 225. Benchmarking Deep Reinforcement Learning for Continuous Control
  227. 226. SIMBICON: Simple Biped Locomotion Control
  228. 227. RLOC: Terrain-Aware Legged Locomotion using Reinforcement Learning and Optimal Control
  229. 228. Efficient Self-Supervised Data Collection for Offline Robot Learning
  230. 229. Learning Robust Autonomous Navigation and Locomotion for Wheeled-Legged Robots
  231. 230. Dataset Distillation for Offline Reinforcement Learning
  232. 231. mimic-one: A Scalable Model Recipe for General Purpose Robot Dexterity

ReadPapers

Flame: Free-form language-based motion synthesis & editing