ReadPapers
Introduction
Regional Time Stepping for SPH
FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity
Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
ParticleGS: Particle-Based Dynamics Modeling of 3D Gaussians for Prior-free Motion Extrapolation
Animate3d: Animating any 3d model with multi-view video diffusion
Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos
HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
PIG: Physically-based Multi-Material Interaction with 3D Gaussians
EnliveningGS: Active Locomotion of 3DGS
SplineGS: Learning Smooth Trajectories in Gaussian Splatting for Dynamic Scene Reconstruction
PAMD: Plausibility-Aware Motion Diffusion Model for Long Dance Generation
PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum Learning
LengthAware Motion Synthesis via Latent Diffusion
IKMo: Image-Keyframed Motion Generation with Trajectory-Pose Conditioned Motion Diffusion Model
UniMoGen: Universal Motion Generation
AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion
Flame: Free-form language-based motion synthesis & editing
Human Motion Diffusion as a Generative Prior
Text-driven Human Motion Generation with Motion Masked Diffusion Model
ReMoDiffuse: RetrievalAugmented Motion Diffusion Model
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment
Absolute Coordinates Make Motion Generation Easy
Seamless Human Motion Composition with Blended Positional Encodings
FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework
EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation
Motion Mamba: Efficient and Long Sequence Motion Generation
M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences
AttT2M:Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism
BAD: Bidirectional Auto-Regressive Diffusion for Text-to-Motion Generation
MMM: Generative Masked Motion Model
Priority-Centric Human Motion Generation in Discrete Latent Space
AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond
MotionGPT: Human Motion as a Foreign Language
Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation
PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting
Incorporating Physics Principles for Precise Human Motion Prediction
PIMNet: Physics-infused Neural Network for Human Motion Prediction
PhysDiff: Physics-Guided Human Motion Diffusion Model
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
Pose-NDF: Modeling Human Pose Manifolds with Neural Distance Fields
Geometric Neural Distance Fields for Learning Human Motion Priors
Character Controllers Using Motion VAEs
Improving Human Motion Plausibility with Body Momentum
MoGlow: Probabilistic and controllable motion synthesis using normalising flows
Modi: Unconditional motion synthesis from diverse data
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
A deep learning framework for character motion synthesis and editing
Multi-Object Sketch Animation with Grouping and Motion Trajectory Priors
TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos
X-MoGen: Unified Motion Generation across Humans and Animals
Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation
Drop: Dynamics responses from human motion prior and projective dynamics
POMP: Physics-constrainable Motion Generative Model through Phase Manifolds
Dreamgaussian4d: Generative 4d gaussian splatting
Drive Any Mesh: 4D Latent Diffusion for Mesh Deformation from Video
AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals
Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation
Generating time-consistent dynamics with discriminator-guided image diffusion models
GENMO:AGENeralist Model for Human MOtion
HGM3: HIERARCHICAL GENERATIVE MASKED MOTION MODELING WITH HARD TOKEN MINING
Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion
MoCLIP: Motion-Aware Fine-Tuning and Distillation of CLIP for Human Motion Generation
FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
DragAnything: Motion Control for Anything using Entity Representation
PhysAnimator: Physics-Guided Generative Cartoon Animation
SOAP: Style-Omniscient Animatable Portraits
Neural Discrete Representation Learning
TSTMotion: Training-free Scene-aware Text-to-motion Generation
Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis
A lip sync expert is all you need for speech to lip generation in the wild
MUSETALK: REAL-TIME HIGH QUALITY LIP SYN-CHRONIZATION WITH LATENT SPACE INPAINTING
LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync
T2m-gpt: Generating human motion from textual descriptions with discrete representations
Motiongpt: Finetuned llms are general-purpose motion generators
Guided Motion Diffusion for Controllable Human Motion Synthesis
OmniControl: Control Any Joint at Any Time for Human Motion Generation
Learning Long-form Video Prior via Generative Pre-Training
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Magic3D: High-Resolution Text-to-3D Content Creation
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
One-Minute Video Generation with Test-Time Training
Key-Locked Rank One Editing for Text-to-Image Personalization
MARCHING CUBES: A HIGH RESOLUTION 3D SURFACE CONSTRUCTION ALGORITHM
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
NULL-text Inversion for Editing Real Images Using Guided Diffusion Models
simple diffusion: End-to-end diffusion for high resolution images
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
Scalable Diffusion Models with Transformers
All are Worth Words: a ViT Backbone for Score-based Diffusion Models
An image is worth 16x16 words: Transformers for image recognition at scale
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Photorealistic text-to-image diffusion models with deep language understanding||Imagen
DreamFusion: Text-to-3D using 2D Diffusion
GLIGEN: Open-Set Grounded Text-to-Image Generation
Adding Conditional Control to Text-to-Image Diffusion Models
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
Multi-Concept Customization of Text-to-Image Diffusion
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
VisorGPT: Learning Visual Prior via Generative Pre-Training
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
ModelScope Text-to-Video Technical Report
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Make-A-Video: Text-to-Video Generation without Text-Video Data
Video Diffusion Models
Learning Transferable Visual Models From Natural Language Supervision
Implicit Warping for Animation with Image Sets
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
Motion-Conditioned Diffusion Model for Controllable Video Synthesis
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
A Recipe for Scaling up Text-to-Video Generation
High-Resolution Image Synthesis with Latent Diffusion Models
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
数据集：HumanVid
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
数据集：Zoo-300K
Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
LORA: LOW-RANK ADAPTATION OF LARGE LAN-GUAGE MODELS
TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
MagicPony: Learning Articulated 3D Animals in the Wild
Splatter a Video: Video Gaussian Representation for Versatile Processing
数据集：Dynamic Furry Animal Dataset
Artemis: Articulated Neural Pets with Appearance and Motion Synthesis
SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
CAT3D: Create Anything in 3D with Multi-View Diffusion Models
PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios
Humans in 4D: Reconstructing and Tracking Humans with Transformers
Learning Human Motion from Monocular Videos via Cross-Modal Manifold Alignment
PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
Imagic: Text-Based Real Image Editing with Diffusion Models
DiffEdit: Diffusion-based semantic image editing with mask guidance
Dual diffusion implicit bridges for image-to-image translation
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
Prompt-to-Prompt Image Editing with Cross-Attention Control
WANDR: Intention-guided Human Motion Generation
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Decoupling Human and Camera Motion from Videos in the Wild
HMP: Hand Motion Priors for Pose and Shape Estimation from Video
HuMoR: 3D Human Motion Model for Robust Pose Estimation
Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video
Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
Tackling the Generative Learning Trilemma with Denoising Diffusion GANs
Elucidating the Design Space of Diffusion-Based Generative Models
SCORE-BASED GENERATIVE MODELING THROUGHSTOCHASTIC DIFFERENTIAL EQUATIONS
Consistency Models
Classifier-Free Diffusion Guidance
Cascaded Diffusion Models for High Fidelity Image Generation
LEARNING ENERGY-BASED MODELS BY DIFFUSIONRECOVERY LIKELIHOOD
On Distillation of Guided Diffusion Models
Denoising Diffusion Implicit Models
PROGRESSIVE DISTILLATION FOR FAST SAMPLING OF DIFFUSION MODELS
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
ControlVideo: Training-free Controllable Text-to-Video Generation
Pix2Video: Video Editing using Image Diffusion
Structure and Content-Guided Video Synthesis with Diffusion Models
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
Dreamix: Video Diffusion Models are General Video Editors
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
TokenFlow: Consistent Diffusion Features for Consistent Video Editing
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
Content Deformation Fields for Temporally Consistent Video Processing
PFNN: Phase-Functioned Neural Networks

ReadPapers

Flame: Free-form language-based motion synthesis & editing