ReadPapers
1.
Introduction
2.
Seamless Human Motion Composition with Blended Positional Encodings
3.
FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
4.
Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model
5.
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
6.
StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework
7.
EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation
8.
Motion Mamba: Efficient and Long Sequence Motion Generation
9.
M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
10.
T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences
11.
AttT2M:Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism
12.
BAD: Bidirectional Auto-Regressive Diffusion for Text-to-Motion Generation
13.
MMM: Generative Masked Motion Model
14.
Priority-Centric Human Motion Generation in Discrete Latent Space
15.
AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond
16.
MotionGPT: Human Motion as a Foreign Language
17.
Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation
18.
PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting
19.
Incorporating Physics Principles for Precise Human Motion Prediction
20.
PIMNet: Physics-infused Neural Network for Human Motion Prediction
21.
PhysDiff: Physics-Guided Human Motion Diffusion Model
22.
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
23.
Pose-NDF: Modeling Human Pose Manifolds with Neural Distance Fields
24.
Geometric Neural Distance Fields for Learning Human Motion Priors
25.
Character Controllers Using Motion VAEs
26.
Improving Human Motion Plausibility with Body Momentum
27.
MoGlow: Probabilistic and controllable motion synthesis using normalising flows
28.
Modi: Unconditional motion synthesis from diverse data
29.
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
30.
A deep learning framework for character motion synthesis and editing
31.
Multi-Object Sketch Animation with Grouping and Motion Trajectory Priors
32.
TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos
33.
X-MoGen: Unified Motion Generation across Humans and Animals
34.
Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
35.
MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation
36.
Drop: Dynamics responses from human motion prior and projective dynamics
37.
POMP: Physics-constrainable Motion Generative Model through Phase Manifolds
38.
Dreamgaussian4d: Generative 4d gaussian splatting
39.
Drive Any Mesh: 4D Latent Diffusion for Mesh Deformation from Video
40.
AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
41.
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
42.
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
43.
Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals
44.
Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation
45.
Generating time-consistent dynamics with discriminator-guided image diffusion models
46.
GENMO:AGENeralist Model for Human MOtion
47.
HGM3: HIERARCHICAL GENERATIVE MASKED MOTION MODELING WITH HARD TOKEN MINING
48.
Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion
49.
MoCLIP: Motion-Aware Fine-Tuning and Distillation of CLIP for Human Motion Generation
50.
FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
51.
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
52.
DragAnything: Motion Control for Anything using Entity Representation
53.
PhysAnimator: Physics-Guided Generative Cartoon Animation
54.
SOAP: Style-Omniscient Animatable Portraits
55.
Neural Discrete Representation Learning
56.
TSTMotion: Training-free Scene-aware Text-to-motion Generation
57.
Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis
58.
A lip sync expert is all you need for speech to lip generation in the wild
59.
MUSETALK: REAL-TIME HIGH QUALITY LIP SYN-CHRONIZATION WITH LATENT SPACE INPAINTING
60.
LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync
61.
T2m-gpt: Generating human motion from textual descriptions with discrete representations
62.
Motiongpt: Finetuned llms are general-purpose motion generators
63.
Guided Motion Diffusion for Controllable Human Motion Synthesis
64.
OmniControl: Control Any Joint at Any Time for Human Motion Generation
65.
Learning Long-form Video Prior via Generative Pre-Training
66.
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
67.
Magic3D: High-Resolution Text-to-3D Content Creation
68.
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
69.
One-Minute Video Generation with Test-Time Training
70.
Key-Locked Rank One Editing for Text-to-Image Personalization
71.
MARCHING CUBES: A HIGH RESOLUTION 3D SURFACE CONSTRUCTION ALGORITHM
72.
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
73.
NULL-text Inversion for Editing Real Images Using Guided Diffusion Models
74.
simple diffusion: End-to-end diffusion for high resolution images
75.
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
76.
Scalable Diffusion Models with Transformers
77.
All are Worth Words: a ViT Backbone for Score-based Diffusion Models
78.
An image is worth 16x16 words: Transformers for image recognition at scale
79.
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
80.
Photorealistic text-to-image diffusion models with deep language understanding||Imagen
81.
DreamFusion: Text-to-3D using 2D Diffusion
82.
GLIGEN: Open-Set Grounded Text-to-Image Generation
83.
Adding Conditional Control to Text-to-Image Diffusion Models
84.
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
85.
Multi-Concept Customization of Text-to-Image Diffusion
86.
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
87.
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
88.
VisorGPT: Learning Visual Prior via Generative Pre-Training
89.
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
90.
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
91.
ModelScope Text-to-Video Technical Report
92.
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
93.
Make-A-Video: Text-to-Video Generation without Text-Video Data
94.
Video Diffusion Models
95.
Learning Transferable Visual Models From Natural Language Supervision
96.
Implicit Warping for Animation with Image Sets
97.
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
98.
Motion-Conditioned Diffusion Model for Controllable Video Synthesis
99.
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
100.
UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
101.
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
102.
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
103.
A Recipe for Scaling up Text-to-Video Generation
104.
High-Resolution Image Synthesis with Latent Diffusion Models
105.
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
106.
数据集:HumanVid
107.
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
108.
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
109.
数据集:Zoo-300K
110.
Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
111.
LORA: LOW-RANK ADAPTATION OF LARGE LAN-GUAGE MODELS
112.
TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
113.
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
114.
MagicPony: Learning Articulated 3D Animals in the Wild
115.
Splatter a Video: Video Gaussian Representation for Versatile Processing
116.
数据集:Dynamic Furry Animal Dataset
117.
Artemis: Articulated Neural Pets with Appearance and Motion Synthesis
118.
SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
119.
CAT3D: Create Anything in 3D with Multi-View Diffusion Models
120.
PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios
121.
Humans in 4D: Reconstructing and Tracking Humans with Transformers
122.
Learning Human Motion from Monocular Videos via Cross-Modal Manifold Alignment
123.
PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
124.
Imagic: Text-Based Real Image Editing with Diffusion Models
125.
DiffEdit: Diffusion-based semantic image editing with mask guidance
126.
Dual diffusion implicit bridges for image-to-image translation
127.
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
128.
Prompt-to-Prompt Image Editing with Cross-Attention Control
129.
WANDR: Intention-guided Human Motion Generation
130.
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
131.
3D Gaussian Splatting for Real-Time Radiance Field Rendering
132.
Decoupling Human and Camera Motion from Videos in the Wild
133.
HMP: Hand Motion Priors for Pose and Shape Estimation from Video
134.
HuMoR: 3D Human Motion Model for Robust Pose Estimation
135.
Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video
136.
Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation
137.
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
138.
Tackling the Generative Learning Trilemma with Denoising Diffusion GANs
139.
Elucidating the Design Space of Diffusion-Based Generative Models
140.
SCORE-BASED GENERATIVE MODELING THROUGHSTOCHASTIC DIFFERENTIAL EQUATIONS
141.
Consistency Models
142.
Classifier-Free Diffusion Guidance
143.
Cascaded Diffusion Models for High Fidelity Image Generation
144.
LEARNING ENERGY-BASED MODELS BY DIFFUSIONRECOVERY LIKELIHOOD
145.
On Distillation of Guided Diffusion Models
146.
Denoising Diffusion Implicit Models
147.
PROGRESSIVE DISTILLATION FOR FAST SAMPLING OF DIFFUSION MODELS
148.
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
149.
ControlVideo: Training-free Controllable Text-to-Video Generation
150.
Pix2Video: Video Editing using Image Diffusion
151.
Structure and Content-Guided Video Synthesis with Diffusion Models
152.
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
153.
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
154.
Dreamix: Video Diffusion Models are General Video Editors
155.
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
156.
TokenFlow: Consistent Diffusion Features for Consistent Video Editing
157.
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
158.
Content Deformation Fields for Temporally Consistent Video Processing
159.
PFNN: Phase-Functioned Neural Networks
Light (default)
Rust
Coal
Navy
Ayu
ReadPapers
数据集:Zoo-300K
该数据集包含约 300,000 对文本描述和跨越 65 个不同动物类别的相应动物运动。
原始数据
Truebones Zoo [2] 数据集
合成数据
对原始数据的动作进行增强
人工标注
用表示动物和运动类别的文本标签进行注释
生成标注
reference
论文:
link