Awesome-Human-Motion

Awesome Human Motion

An aggregation of human motion understanding research; feel free to contribute.

Reviews & Surveys
Motion Generation
Motion Editing
Motion Stylization
Human-Object Interaction
Human-Scene Interaction
Human-Human Interaction
Datasets
Humanoid
Bio-stuff
Human Reconstruction
Human-Object/Scene/Human Interaction Reconstruction
Motion Controlled Image/Video Generation
Human Pose Estimation/Recognition
Human Motion Understanding

Reviews & Surveys

(JEB 2025) McAllister et al: Behavioural energetics in human locomotion: how energy use influences how we move, McAllister et al.
(ICER 2025) Zhao et al: Motion Generation Review: Exploring Deep Learning for Lifelike Animation with Manifold, Zhao et al.
(ArXiv 2025) Advances in 4D Representation: Geometry, Motion, and Interaction, Zhao et al.
(ArXiv 2025) Motion Generation: A Survey of Generative Approaches and Benchmarks, Khani et al.
(ArXiv 2025) Segado et al: Grounding Intelligence in Movement, Segado et al.
(ArXiv 2025) Multimodal Generative AI with Autoregressive LLMs for Human Motion Understanding and Generation: A Way Forward, Islam et al.
(ArXiv 2025) Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions, Abootorabi et al.
(ArXiv 2025) Sui et al: A Survey on Human Interaction Motion Generation, Sui et al.
(ArXiv 2025) 3D Human Interaction Generation: A Survey, Fan et al.
(ArXiv 2025) Human-Centric Foundation Models: Perception, Generation and Agentic Modeling, Tang et al.
(ArXiv 2025) Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning, Gu et al.
(ArXiv 2024) A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights, Lei et al.
(ArXiv 2024) Human Motion Video Generation: A survey, Xue et al.
(Neurocomputing) Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A survey, Liu et al.
(TVCG 2024) Loi et al: Machine Learning Approaches for 3D Motion Synthesis and Musculoskeletal Dynamics Estimation: A Survey, Loi et al.
(T-PAMI 2023) Zhu et al: Human Motion Generation: A Survey, Zhu et al.

Motion Generation, Text/Speech/Music-Driven

2026

(WACV 2026) SegMo: Segment-aligned Text to 3D Human Motion Generation, Dang et al.
(AAAI 2026) ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment, Weng et al.
(AAAI 2026) FineXtrol: Controllable Motion Generation via Fine-Grained Text, Shen et al.

2025

(NeurIPS 2025) HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA, Hu et al.
(NeurIPS 2025) TransPhase: Deep Compositional Phase Diffusion for Long Motion Sequence Generation, Au et al.
(SIGGRAPH Asia 2025) TCM: Learning Human Motion with Temporally Conditional Mamba, Nguyen et al.
(TMLR 2025) MoReact: Generating Reactive Motion from Textual Descriptions, Xu et al.
(ICCV 2025) Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation, Fan et al.
(ICCV 2025) UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation, Patel et al.
(ICCV 2025) FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing, Wu et al.
(ICCV 2025) PUMPS: Skeleton-Agnostic Point-based Universal Motion Pre-Training for Synthesis in Human Motion Tasks, Mo et al.
(ICCV 2025) GENMO: A GENeralist Model for Human MOtion, Li et al.
(ICCV 2025) InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation, Zhuo et al.
(ICCV 2025) Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data, Fan et al.
(ICCV 2025) Morph: A Motion-free Physics Optimization Framework for Human Motion Generation, Li et al.
(ICCV 2025) DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding, Cho et al.
(ICCV 2025) SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis, Zhang et al.
(ICCV 2025) KinMo: Kinematic-aware Human Motion Understanding and Generation, Zhang et al.
(ICCV 2025) GestureLSM: Latent Shortcut-based Co-Speech Gesture Generation with Spatial-Temporal Modeling, Liu et al.
(ICCV 2025) Motion-2-to-3: Leveraging 2D Motion Data to Boost 3D Motion Generation, Pi et al.
(ICCV 2025) MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm, Guo et al.
(ICCV 2025) SFControl: Motion Synthesis with Sparse and Flexible Keyjoint Control, Hwang et al.
(ICCV 2025) Less Is More: Improving Motion Diffusion Models with Sparse Keyframes, Bae et al.
(ICCV 2025) ControlMM: Controllable Masked Motion Generation, Pinyoanuntapong et al.
(ICCV 2025) PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning, Zhang et al.
(ICCV 2025) HERO: Human Reaction Generation from Videos, Yu et al.
(ICCV 2025) MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space, Xiao et al.
(ICCV 2025) GenM3: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation, Shi et al.
(ACM MM 2025) ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion, Wang et al.
(ICML 2025) Being-M0: Scaling Motion Generation Models with Million-Level Human Motions, Wang et al.
(TOG 2025) Sketch2Anim: Towards Transferring Sketch Storyboards into 3D Animation, Zhong et al.
(SIGGRAPH 2025) MECo: Motion-example-controlled Co-speech Gesture Generation Leveraging Large Language Models, Chen et al.
(SIGGRAPH 2025) Chang et al.: Large-Scale Multi-Character Interaction Synthesis, Chang et al.
(SIGGRAPH 2025) AnyTop: Character Animation Diffusion with Any Topology, Gat et al.
(CVPR 2025) DSDFM: Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis, Hua et al.
(CVPR 2025) EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation, Hua et al.
(CVPR 2025) UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing, Li et al.
(CVPR 2025) From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models, Barquero et al.
(CVPR 2025) Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions, Liao et al.
(CVPR 2025) MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities, Wu et al.
(CVPR 2025) SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing, Hong et al.
(CVPR 2025) PersonalBooth: Personalized Text-to-Motion Generation, Kim et al.
(CVPR 2025) MARDM: Rethinking Diffusion for Text-Driven Human Motion Generation, Meng et al.
(CVPR 2025) StickMotion: Generating 3D Human Motions by Drawing a Stickman, Wang et al.
(CVPR 2025) LLaMo: Human Motion Instruction Tuning, Li et al.
(CVPR 2025) HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation, Cheng et al.
(CVPR 2025) AtoM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward, Han et al.
(CVPR 2025) EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space, Zhang et al.
(CVPR 2025) The Languate of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion, Chen et al.
(CVPR 2025) ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model, Lu et al.
(CVPR 2025) Move in 2D: 2D-Conditioned Human Motion Generation, Huang et al.
(CVPR 2025) SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters, Jiang et al.
(CVPR 2025) MVLift: Lifting Motion to the 3D World via 2D Diffusion, Li et al.
(CVPR 2025 Workshop) MoCLIP: Motion-Aware Fine-Tuning and Distillation of CLIP for Human Motion Generation, Maldonado et al.
(CVPR 2025 Workshop) Dyadic Mamba: Long-term Dyadic Human Motion Synthesis, Tanke et al.
(ACM Sensys 2025) SHADE-AD: An LLM-Based Framework for Synthesizing Activity Data of Alzheimer’s Patients, Fu et al.
(ICRA 2025) MotionGlot: A Multi-Embodied Motion Generation Model, Harithas et al.
(ICLR 2025) CLoSD: Closing the Loop between Simulation and Diffusion for Multi-Task Character Control, Tevet et al.
(ICLR 2025) PedGen: Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels, Liu et al.
(ICLR 2025) HGM³: Hierarchical Generative Masked Motion Modeling with Hard Token Mining, Jeong et al.
(ICLR 2025) LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning, Li et al.
(ICLR 2025) MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer, Wang et al.
(ICLR 2025) Lyu et al: Towards Unified Human Motion-Language Understanding via Sparse Interpretable Characterization, Lyu et al.
(ICLR 2025) DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control, Zhao et al.
(ICLR 2025) Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs, Wu et al.
(TMM 2025) MCG-IMM: A Plug-and-Play Multi-Criteria Guidance for Diverse In-Betweening Human Motion Generation, Yu et al.
(IJCV 2025) Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation, Wang et al.
(TCSVT 2025) Zeng et al: Progressive Human Motion Generation Based on Text and Few Motion Frames, Zeng et al.
(Arxiv 2025) HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation, Wen et al.
(Arxiv 2025) DeMoGen: Towards Decompositional Human Motion Generation with Energy-Based Diffusion Models, Zhang et al.
(Arxiv 2025) Jeong et al: Pose-Guided Residual Refinement for Interpretable Text-to-Motion Generation and Editing, Jeong et al.
(Arxiv 2025) FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation, Yang et al.
(ArXiv 2025) OmniMoGen: Unifying Human Motion Generation via Learning from Interleaved Text-Motion Instructions, Bu et al.
(ArXiv 2025) MoLingo: Motion–Language Alignment for Text-to-Human Motion Generation, He et al.
(ArXiv 2025) FunPhase: A Periodic Functional Autoencoder for Motion Generation via Phase Manifolds, Pegoraro et al.
(ArXiv 2025) IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation, Li et al.
(ArXiv 2025) Kinetic Mining in Context: Few-Shot Action Synthesis via Text-to-Motion Distillations, Cazzola et al.
(ArXiv 2025) COMET: Controllable Long-term Motion Generation with Extended Joint Targets, Li et al.
(ArXiv 2025) Back to Basics: Motion Representation Matters for Human Motion Generation Using Diffusion Model, Jin et al.
(ArXiv 2025) UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework, Pang et al.
(ArXiv 2025) Free3D: 3D Human Motion Emerges from Single-View 2D Supervision, Liu et al.
(ArXiv 2025) Pressure2Motion: Hierarchical Motion Synthesis from Ground Pressure with Text Guidance, Li et al.
(ArXiv 2025) Mem-MLP: Real-Time 3D Human Motion Generation from Sparse Inputs, Mutlu et al.
(ArXiv 2025) The Quest for Generalizable Motion Generation: Data, Model, and Evaluation, Lin et al.
(ArXiv 2025) MoSa: Motion Generation with Scalable Autoregressive Modeling, Liu et al.
(ArXiv 2025) OmniMotion-X: Versatile Multimodal Whole-Body Motion Generation, Xu et al.
(ArXiv 2025) OmniMotion: Multimodal Motion Generation with Continuous Masked Autoregression, Li et al.
(ArXiv 2025) No MoCap Needed: Post-Training Motion Diffusion Models with Reinforcement Learning using Only Textual Prompts, Girolamo et al.
(ArXiv 2025) Pulp Motion: Framing-aware multimodal camera and human motion generation, Courant et al.
(ArXiv 2025) MonSTeR: a Unified Model for Motion, Scene, Text Retrieval, Collorone et al.
(ArXiv 2025) MoGIC: Boosting Motion Generation via Intention Understanding and Visual Context, Shi et al.
(ArXiv 2025) Gupta et al: Unified Multi-Modal Interactive & Reactive 3D Motion Generation via Rectified Flow, Gupta et al.
(ArXiv 2025) LaMoGen: Laban Movement-Guided Diffusion for Text-to-Motion Generation, Kim et al.
(ArXiv 2025) LUMA: Low-Dimension Unified Motion Alignment with Dual-Path Anchoring for Text-to-Motion Diffusion Model, Jia et al.
(ArXiv 2025) SimDiff: Simulator-constrained Diffusion Model for Physically Plausible Motion Generation, Watanabe et al.
(ArXiv 2025) SmooGPT: Stylized Motion Generation using Large Language Models, Zhong et al.
(ArXiv 2025) Embracing Aleatoric Uncertainty: Generating Diverse 3D Human Motion, Qin et al.
(ArXiv 2025) MotionFLUX: Efficient Text-Guided Motion Generation through Rectified Flow Matching and Preference Alignment, Gao et al.
(ArXiv 2025) VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models, Xu et al.
(ArXiv 2025) MSQ: Spatial-Temporal Multi-Scale Quantizationfor Flexible Motion Generation, Wang et al.
(ArXiv 2025) X-MoGen: Unified Motion Generation across Humans and Animals, Wang et al.
(ArXiv 2025) ReMoMask: Retrieval-Augmented Masked Motion Generation, Li et al.
(ArXiv 2025) OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation, Gan et al.
(ArXiv 2025) SpeakerVid-5M: A Large-Scale High-Quality Dataset for audio-visual Dyadic Interactive Human Generation, Zhang et al.
(ArXiv 2025) EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation, Meng et al.
(ArXiv 2025) <a href=https://arxiv.org/abs/2507.11949">MOSPA</a>: Human Motion Generation Driven by Spatial Audio, Xu et al.
(ArXiv 2025) SnapMoGen: Human Motion Generation from Expressive Texts, Wang et al.
(ArXiv 2025) MOST: Motion Diffusion Model for Rare Text via Temporal Clip Banzhaf Interaction, Wang et al.
(ArXiv 2025) Grounded Gestures: Language, Motion and Space, Deichler et al.
(ArXiv 2025) MotionGPT3: Human Motion as a Second Modality, Zhu et al.
(ArXiv 2025) HumanAttr: Generating Attribute-Aware Human Motions from Textual Prompt, Wang et al.
(ArXiv 2025) PlanMoGPT: Flow-Enhanced Progressive Planning for Text to Motion Synthesis, Jin et al.
(ArXiv 2025) Motion-R1: Chain-of-Thought Reasoning and Reinforcement Learning for Human Motion Generation, Ouyang et al.
(ArXiv 2025) ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model, Chen et al.
(ArXiv 2025) MotionRAG-Diff: A Retrieval-Augmented Diffusion Framework for Long-Term Music-to-Dance Generation, Huang et al.
(ArXiv 2025) IKMo: Image-Keyframed Motion Generation with Trajectory-Pose Conditioned Motion Diffusion Model, Zhao et al.
(ArXiv 2025) Li et al: How Much Do Large Language Models Know about Human Motion? A Case Study in 3D Avatar Control, Li et al.
(ArXiv 2025) UniMoGen: Universal Motion Generation, Khani et al.
(ArXiv 2025) Wang et al: Semantics-Aware Human Motion Generation from Audio Instructions, Wang et al.
(ArXiv 2025) ACMDM: Absolute Coordinates Make Motion Generation Easy, Meng et al.
(ArXiv 2025) PAMD: Plausibility-Aware Motion Diffusion Model for Long Dance Generation, Zhu et al.
(ArXiv 2025) Intentional Gesture: Deliver Your Intentions with Gestures for Speech, Liu et al.
(ArXiv 2025) MatchDance: Collaborative Mamba-Transformer Architecture Matching for High-Quality 3D Dance Synthesis, Yang et al.
(ArXiv 2025) M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis, Yin et al.
(ArXiv 2025) ReactDance: Progressive-Granular Representation for Long-Term Coherent Reactive Dance Generation, Lin et al.
(ArXiv 2025) PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum Learning, Xi et al.
(ArXiv 2025) DanceMosaic: High-Fidelity Dance Generation with Multimodal Editability, Shah et al.
(ArXiv 2025) ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer, Xie et al.
(ArXiv 2025) HMU: Human Motion Unlearning, Matteis et al.
(ArXiv 2025) ACMo: Attribute Controllable Motion Generation, Wei et al.
(ArXiv 2025) BioMoDiffuse: Physics-Guided Biomechanical Diffusion for Controllable and Authentic Human Motion Synthesis, Kang et al.
(ArXiv 2025) ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis, Zhou et al.
(ArXiv 2025) Motion Anything: Any to Motion Generation, Zhang et al.
(ArXiv 2025) GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music, Liu et al.
(ArXiv 2025) CASIM: Composite Aware Semantic Injection for Text to Motion Generation, Chang et al.
(ArXiv 2025) MotionPCM: Real-Time Motion Synthesis with Phased Consistency Model, Jiang et al.
(ArXiv 2025) Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss, Chen et al.
(ArXiv 2025) FlexMotion: Lightweight, Physics-Aware, and Controllable Human Motion Generation, Tashakori et al.
(ArXiv 2025) HiSTF Mamba: Hierarchical Spatiotemporal Fusion with Multi-Granular Body-Spatial Modeling for High-Fidelity Text-to-Motion Generation, Zhan et al.
(ArXiv 2025) PackDiT: Joint Human Motion and Text Generation via Mutual Prompting, Jiang et al.
(3DV 2025) Unimotion: Unifying 3D Human Motion Synthesis and Understanding, Li et al.
(3DV 2025) HoloGest: Decoupled Diffusion and Motion Priors for Generating Holisticly Expressive Co-speech Gestures, Cheng et al.
(AAAI 2025) RemoGPT: Part-Level Retrieval-Augmented Motion-Language Models, Yu et al.
(AAAI 2025) UniMuMo: Unified Text, Music and Motion Generation, Yang et al.
(AAAI 2025) EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning, Chen et al.
(AAAI 2025) ALERT-Motion: Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion, Miao et al.
(AAAI 2025) MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls, Bian et al.
(AAAI 2025) Light-T2M: A Lightweight and Fast Model for Text-to-Motion Generation, Zeng et al.
(WACV 2025 Worhshop) LS-GAN: Human Motion Synthesis with Latent-space GANs, Amballa et al.
(WACV 2025) ReinDiffuse: Crafting Physically Plausible Motions with Reinforced Diffusion Model, Han et al.
(WACV 2025) MoRAG: Multi-Fusion Retrieval Augmented Generation for Human Motion, Shashank et al.
(WACV 2025) Mandelli et al: Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models, Mandelli et al.

2024

(ArXiv 2024) MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model, Wang et al.
(ArXiv 2024) InterDance: Reactive 3D Dance Generation with Realistic Duet Interactions, Li et al.
(ArXiv 2024) Mogo: RQ Hierarchical Causal Transformer for High-Quality 3D Human Motion Generation, Fu et al.
(ArXiv 2024) CoMA: Compositional Human Motion Generation with Multi-modal Agents, Sun et al.
(ArXiv 2024) SoPo: Text-to-Motion Generation Using Semi-Online Preference Optimization, Tan et al.
(ArXiv 2024) RMD: A Simple Baseline for More General Human Motion Generation via Training-free Retrieval-Augmented Motion Diffuse, Liao et al.
(ArXiv 2024) BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis, Hong et al.
(ArXiv 2024) MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks, Wue et al.
(ArXiv 2024) FTMoMamba: Motion Generation with Frequency and Text State Space Models, Li et al.
(ArXiv 2024) KMM: Key Frame Mask Mamba for Extended Motion Generation, Zhang et al.
(ArXiv 2024) MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding, Wang et al.
(ArXiv 2024) Lodge++: High-quality and Long Dance Generation with Vivid Choreography Patterns, Li et al.
(ArXiv 2024) MotionCLR: Motion Generation and Training-Free Editing via Understanding Attention Mechanisms, Chen et al.
(ArXiv 2024) LEAD: Latent Realignment for Human Motion Diffusion, Andreou et al.
(ArXiv 2024) Leite et al. Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing, Leite et al.
(ArXiv 2024) MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning, Liu et al.
(ArXiv 2024) MotionLLM: Understanding Human Behaviors from Human Motions and Videos, Chen et al.
(ArXiv 2024) T2M-X: Learning Expressive Text-to-Motion Generation from Partially Annotated Data, Liu et al.
(ArXiv 2024) BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generation, Hosseyni et al.
(ArXiv 2024) synNsync: Synergy and Synchrony in Couple Dances, Manukele et al.
(EMNLP 2024) Dong et al: Word-Conditioned 3D American Sign Language Motion Generation, Dong et al.
(NeurIPS D&B 2024) Kim et al: Text to Blind Motion, Kim et al.
(NeurIPS 2024) UniMTS: Unified Pre-training for Motion Time Series, Zhang et al.
(NeurIPS 2024) Christopher et al.: Constrained Synthesis with Projected Diffusion Models, Christopher et al.
(NeurIPS 2024) MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence, You et al.
(NeurIPS 2024) MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling, Yuan et al.
(NeurIPS 2024) M3GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation, Luo et al.
(NeurIPS Workshop 2024) Bikov et al: Fitness Aware Human Motion Generation with Fine-Tuning, Bikov et al.
(NeurIPS Workshop 2024) DGFM: Full Body Dance Generation Driven by Music Foundation Models, Liu et al.
(ICPR 2024) FG-MDM: Towards Zero-Shot Human Motion Generation via ChatGPT-Refined Descriptions, Shi et al.
(ACM MM 2024) SynTalker: Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion Generation, Chen et al.
(ACM MM 2024) L3EM: Towards Emotion-enriched Text-to-Motion Generation via LLM-guided Limb-level Emotion Manipulating. Yu et al.
(ACM MM 2024) StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework, Huang et al.
(ACM MM 2024) SATO: Stable Text-to-Motion Framework, Chen et al.
(ICANN 2024) PIDM: Personality-Aware Interaction Diffusion Model for Gesture Generation, Shibasaki et al.
(HFES 2024) Macwan et al: High-Fidelity Worker Motion Simulation With Generative AI, Macwan et al.
(ECCV 2024) Jin et al: Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation, Jin et al.
(ECCV 2024) Motion Mamba: Efficient and Long Sequence Motion Generation, Zhong et al.
(ECCV 2024) EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation, Zhou et al.
(ECCV 2024) CoMo: Controllable Motion Generation through Language Guided Pose Code Editing, Huang et al.
(ECCV 2024) CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion, Sun et al.
(ECCV 2024) Shan et al: Towards Open Domain Text-Driven Synthesis of Multi-Person Motions, Shan et al.
(ECCV 2024) ParCo: Part-Coordinating Text-to-Motion Synthesis, Zou et al.
(ECCV 2024) Sampieri et al: Length-Aware Motion Synthesis via Latent Diffusion, Sampieri et al.
(ECCV 2024) ChroAccRet: Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models, Fujiwara et al.
(ECCV 2024) MHC: Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs, Liu et al.
(ECCV 2024) ProMotion: Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation, Liu et al.
(ECCV 2024) FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models, Zhang et al.
(ECCV 2024) Text Motion Translator: A Bi-Directional Model for Enhanced 3D Human Motion Generation from Open-Vocabulary Descriptions, Qian et al.
(ECCV 2024) FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis, Fan et al.
(ECCV 2024) Kinematic Phrases: Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases, Liu et al.
(ECCV 2024) MotionChain: Conversational Motion Controllers via Multimodal Prompts, Jiang et al.
(ECCV 2024) SMooDi: Stylized Motion Diffusion Model, Zhong et al.
(ECCV 2024) BAMM: Bidirectional Autoregressive Motion Model, Pinyoanuntapong et al.
(ECCV 2024) MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model, Dai et al.
(ECCV 2024) Ren et al: Realistic Human Motion Generation with Cross-Diffusion Models, Ren et al.
(ECCV 2024) M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models, Chi et al.
(ECCV 2024) LMM: Large Motion Model for Unified Multi-Modal Motion Generation, Zhang et al.
(ECCV 2024) TesMo: Generating Human Interaction Motions in Scenes with Text Control, Yi et al.
(ECCV 2024) TLcontrol: Trajectory and Language Control for Human Motion Synthesis, Wan et al.
(ICME 2024) ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance, Cheng et al.
(ICME Workshop 2024) Chen et al: Anatomically-Informed Vector Quantization Variational Auto-Encoder for Text-to-Motion Generation, Chen et al.
(ICML 2024) HumanTOMATO: Text-aligned Whole-body Motion Generation, Lu et al.
(ICML 2024) GPHLVM: Bringing Motion Taxonomies to Continuous Domains via GPLVM on Hyperbolic Manifolds, Jaquier et al.
(SIGGRAPH 2024) DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models, Sun et al.
(SIGGRAPH 2024) CondMDI: Flexible Motion In-betweening with Diffusion Models, Cohan et al.
(SIGGRAPH 2024) CAMDM: Taming Diffusion Probabilistic Models for Character Control, Chen et al.
(SIGGRAPH 2024) LGTM: Local-to-Global Text-Driven Human Motion Diffusion Models, Sun et al.
(SIGGRAPH 2024) TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis, Zhang et al.
(SIGGRAPH 2024) A-MDM: Interactive Character Control with Auto-Regressive Motion Diffusion Models, Shi et al.
(SIGGRAPH 2024) Starke et al: Categorical Codebook Matching for Embodied Character Controllers, Starke et al.
(SIGGRAPH 2024) SuperPADL: Scaling Language-Directed Physics-Based Control with Progressive Supervised Distillation, Juravsky et al.
(CVPR 2024) ProgMoGen: Programmable Motion Generation for Open-set Motion Control Tasks, Liu et al.
(CVPR 2024) PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios, Wang et al.
(CVPR 2024) AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion, Chhatre et al.
(CVPR 2024) Liu et al: Towards Variable and Coordinated Holistic Co-Speech Motion Generation, Liu et al.
(CVPR 2024) MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion, Kapon et al.
(CVPR 2024) WANDR: Intention-guided Human Motion Generation, Diomataris et al.
(CVPR 2024) MoMask: Generative Masked Modeling of 3D Human Motions, Guo et al.
(CVPR 2024) ChatPose: Chatting about 3D Human Pose, Feng et al.
(CVPR 2024) AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond, Zhou et al.
(CVPR 2024) MMM: Generative Masked Motion Model, Pinyoanuntapong et al.
(CVPR 2024) AAMDM: Accelerated Auto-regressive Motion Diffusion Model, Li et al.
(CVPR 2024) OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers, Liang et al.
(CVPR 2024) FlowMDM: Seamless Human Motion Composition with Blended Positional Encodings, Barquero et al.
(CVPR 2024) Digital Life Project: Autonomous 3D Characters with Social Intelligence, Cai et al.
(CVPR 2024) EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling, Liu et al.
(CVPR Workshop 2024) STMC: Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation, Petrovich et al.
(CVPR Workshop 2024) InstructMotion: Exploring Text-to-Motion Generation with Human Preference, Sheng et al.
(ICLR 2024) Single Motion Diffusion: Raab et al.
(ICLR 2024) NeRM: Learning Neural Representations for High-Framerate Human Motion Synthesis, Wei et al.
(ICLR 2024) PriorMDM: Human Motion Diffusion as a Generative Prior, Shafir et al.
(ICLR 2024) OmniControl: Control Any Joint at Any Time for Human Motion Generation, Xie et al.
(ICLR 2024) Adiya et al.: Bidirectional Temporal Diffusion Model for Temporally Consistent Human Animation, Adiya et al.
(ICLR 2024) Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment, Li et al.
(AAAI 2024) HuTuDiffusion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback, Han et al.
(AAAI 2024) AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion, Jing et al.
(AAAI 2024) MotionMix: Weakly-Supervised Diffusion for Controllable Motion Generation, Hoang et al.
(AAAI 2024) B2A-HDM: Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model, Xie et al.
(AAAI 2024) Everything2Motion: Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis, Fan et al.
(AAAI 2024) MotionGPT: Finetuned LLMs are General-Purpose Motion Generators, Zhang et al.
(AAAI 2024) Dong et al: Enhanced Fine-grained Motion Diffusion for Text-driven Human Motion Synthesis, Dong et al.
(AAAI 2024) UNIMASKM: A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis, Mascaro et al.
(AAAI 2024) B2A-HDM: Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model, Xie et al.
(TPAMI 2024) GUESS: GradUally Enriching SyntheSis for Text-Driven Human Motion Generation, Gao et al.
(WACV 2024) Xie et al.: Sign Language Production with Latent Motion Transformer, Xie et al.

2023

(NeurIPS 2023) GraphMotion: Act As You Wish: Fine-grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs, Jin et al.
(NeurIPS 2023) MotionGPT: Human Motion as Foreign Language, Jiang et al.
(NeurIPS 2023) FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing, Zhang et al.
(NeurIPS 2023) InsActor: Instruction-driven Physics-based Characters, Ren et al.
(ICCV 2023) AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism, Zhong et al.
(ICCV 2023) TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis, Petrovich et al.
(ICCV 2023) MAA: Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation, Azadi et al.
(ICCV 2023) PhysDiff: Physics-Guided Human Motion Diffusion Model, Yuan et al.
(ICCV 2023) ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model, Zhang et al.
(ICCV 2023) BelFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction, Barquero et al.
(ICCV 2023) GMD: Guided Motion Diffusion for Controllable Human Motion Synthesis, Karunratanakul et al.
(ICCV 2023) HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations, Aliakbarian et al.
(ICCV 2023) SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation, Athanasiou et al.
(ICCV 2023) Kong et al.: Priority-Centric Human Motion Generation in Discrete Latent Space, Kong et al.
(ICCV 2023) Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model, Wang et al.
(ICCV 2023) EMS: Breaking The Limits of Text-conditioned 3D Motion Synthesis with Elaborative Descriptions, Qian et al.
(SIGGRAPH 2023) GenMM: Example-based Motion Synthesis via Generative Motion Matching, Li et al.
(SIGGRAPH 2023) GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents, Ao et al.
(SIGGRAPH 2023) BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer, Pang et al.
(SIGGRAPH 2023) Alexanderson et al.: Listen, denoise, action! Audio-driven motion synthesis with diffusion models, Alexanderson et al.
(CVPR 2023) AGroL: Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model, Du et al.
(CVPR 2023) TALKSHOW: Generating Holistic 3D Human Motion from Speech, Yi et al.
(CVPR 2023) T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations, Zhang et al.
(CVPR 2023) UDE: A Unified Driving Engine for Human Motion Generation, Zhou et al.
(CVPR 2023) OOHMG: Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training, Lin et al.
(CVPR 2023) EDGE: Editable Dance Generation From Music, Tseng et al.
(CVPR 2023) MLD: Executing your Commands via Motion Diffusion in Latent Space, Chen et al.
(CVPR 2023) MoDi: Unconditional Motion Synthesis from Diverse Data, Raab et al.
(CVPR 2023) MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis, Dabral et al.
(CVPR 2023) Mo et al.: Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation, Mo et al.
(ICLR 2023) HMDM: Human Motion Diffusion Model, Tevet et al.
(TPAMI 2023) MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model, Zhang et al.
(TPAMI 2023) Bailando++: 3D Dance GPT with Choreographic Memory, Li et al.
(ArXiv 2023) UDE-2: A Unified Framework for Multimodal, Multi-Part Human Motion Synthesis, Zhou et al.
(ArXiv 2023) Motion Script: Natural Language Descriptions for Expressive 3D Human Motions, Yazdian et al.

2022 and earlier

(NeurIPS 2022) NeMF: Neural Motion Fields for Kinematic Animation, He et al.
(SIGGRAPH Asia 2022) PADL: Language-Directed Physics-Based Character, Juravsky et al.
(SIGGRAPH Asia 2022) Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings, Ao et al.
(3DV 2022) TEACH: Temporal Action Composition for 3D Human, Athanasiou et al.
(ECCV 2022) Implicit Motion: Implicit Neural Representations for Variable Length Human Motion Generation, Cervantes et al.
(ECCV 2022) Zhong et al.: Learning Uncoupled-Modulation CVAE for 3D Action-Conditioned Human Motion Synthesis, Zhong et al.
(ECCV 2022) MotionCLIP: Exposing Human Motion Generation to CLIP Space, Tevet et al.
(ECCV 2022) PoseGPT: Quantizing human motion for large scale generative modeling, Lucas et al.
(ECCV 2022) TEMOS: Generating diverse human motions from textual descriptions, Petrovich et al.
(ECCV 2022) TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts, Guo et al.
(SIGGRAPH 2022) AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars, Hong et al.
(SIGGRAPH 2022) DeepPhase: Periodic autoencoders for learning motion phase manifolds, Starke et al.
(CVPR 2022) Guo et al.: Generating Diverse and Natural 3D Human Motions from Text, Guo et al.
(CVPR 2022) Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory, Li et al.
(ICCV 2021) ACTOR: Action-Conditioned 3D Human Motion Synthesis with Transformer VAE, Petrovich et al.
(ICCV 2021) AIST++: AI Choreographer: Music Conditioned 3D Dance Generation with AIST++, Li et al.
(SIGGRAPH 2021) Starke et al.: Neural animation layering for synthesizing martial arts movements, Starke et al.
(CVPR 2021) MOJO: We are More than Our Joints: Predicting how 3D Bodies Move, Zhang et al.
(ECCV 2020) DLow: Diversifying Latent Flows for Diverse Human Motion Prediction, Yuan et al.
(SIGGRAPH 2020) Starke et al.: Local motion phases for learning multi-contact character movements, Starke et al.

Motion Editing

(IVA 2025) TF-JAX-IK: Real-Time Inverse Kinematics for Generating Multi-Constrained Movements of Virtual Human Characters, Voss et al.
(ICCV 2025) PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning, Zhang et al.
(CVPR 2025) SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing, Hong et al.
(CVPR 2025) MixerMDM: Learnable Composition of Human Motion Diffusion Models, Ruiz-Ponce et al.
(CVPR 2025) AnyMoLe: Any Character Motion In-Betweening Leveraging Video Diffusion Models, Yun et al.
(CVPR 2025) SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction, Li et al.
(CVPR 2025) MotionReFit: Dynamic Motion Blending for Versatile Motion Editing, Jiang et al.
(ArXiv 2025) StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data, Mu et al.
(ArXiv 2025) Dai et al: Towards Synthesized and Editable Motion In-Betweening Through Part-Wise Phase Representation, Dai et al.
(SIGGRAPH Asia 2024) MotionFix: Text-Driven 3D Human Motion Editing, Athanasiou et al.
(NeurIPS 2024) CigTime: Corrective Instruction Generation Through Inverse Motion Editing, Fang et al.
(SIGGRAPH 2024) Iterative Motion Editing: Iterative Motion Editing with Natural Language, Goel et al.
(CVPR 2024) DNO: Optimizing Diffusion Noise Can Serve As Universal Motion Priors, Karunratanakul et al.

Motion Stylization

(ICCV 2025) StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion, Guo et al.
(CVPR 2025) Visual Persona: Foundation Model for Full-Body Human Customization, Nam et al.
(ArXiv 2025) ClusterStyle: Modeling Intra-Style Diversity with Prototypical Clustering for Stylized Motion Generation, Chen et al.
(ArXiv 2025) MotionPersona: Characteristics-aware Locomotion Control, Shi et al.
(ArXiv 2025) AStF: Motion Style Transfer via Adaptive Statistics Fusor, Chen et al.
(ArXiv 2025) Dance Like a Chicken: Low-Rank Stylization for Human Motion Diffusion, Sawdayee et al.
(ArXiv 2024) MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow, Li et al.
(TSMC 2024) D-LORD: D-LORD for Motion Stylization, Gupta et al.
(ECCV 2024) HUMOS: Human Motion Model Conditioned on Body Shape, Tripathi et al.
(SIGGRAPH 2024) SMEAR: Stylized Motion Exaggeration with ARt-direction, Basset et al.
(SIGGRAPH 2024) Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior, Wu et al.
(CVPR 2024) MCM-LDM: Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model, Song et al.
(CVPR 2024) MoST: Motion Style Transformer between Diverse Action Contents, Kim et al.
(ICLR 2024) GenMoStyle: Generative Human Motion Stylization in Latent Space, Guo et al.

Human-Object Interaction

2025

(NeurIPS 2025) HHOI: Learning to Generate Human-Human-Object Interactions from Textual Descriptions, Na et al.
(ACM MM 2025) PA-HOI: A Physics-Aware Human and Object Interaction Dataset, Wang et al.
(ACM MM 2025) OnlineHOI: Towards Online Human-Object Interaction Generation and Perception, Ji et al.
(ICCV 2025) Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions, Xu et al.
(ICCV 2025) TriDi: Trilateral Diffusion of 3D Humans, Objects and Interactions, Petrov et al.
(ICCV 2025) SMGDiff: Soccer Motion Generation using diffusion probabilistic models, Yang et al.
(ICCV 2025) SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis, He et al.
(ICCV 2025) Wu et al: Human-Object Interaction from Human-Level Instructions, Wu et al.
(ICCV 2025) HUMOTO: A 4D Dataset of Mocap Human Object Interactions, Lu et al.
(SIGGRAPH 2025) PhysicsFC: Learning User-Controlled Skills for a Physics-Based Football Player Controller, Kim et al.
(SIGGRAPH 2025) SkillMimic-v2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations, Yu et al.
(Bioengineering 2025) MeLLO: The Utah Manipulation and Locomotion of Large Objects (MeLLO) Data Library, Luttmer et al.
(CVPR 2025) ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation, Zeng et al.
(CVPR 2025) HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models, Huang et al.
(CVPR 2025) Hui et al: An Image-like Diffusion Method for Human-Object Interaction Detection, Hui et al.
(CVPR 2025) PersonaHOI: Effortlessly Improving Personalized Face with Human-Object Interaction Generation, Hu et al.
(CVPR 2025) InteractVLM: 3D Interaction Reasoning from 2D Foundational Models, Dwivedi et al.
(CVPR 2025) PICO: Reconstructing 3D People In Contact with Objects, Cseke et al.
(CVPR 2025) EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild, Liu et al.
(CVPR 2025) FIction: 4D Future Interaction Prediction from Video, Ashutosh et al.
(CVPR 2025) ROG: Guiding Human-Object Interactions with Rich Geometry and Relations, Xue et al.
(CVPR 2025) SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance, Cong et al.
(CVPR 2025) Phys-Reach-Grasp: Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References, Li et al.
(CVPR 2025) ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions, Kim et al.
(CVPR 2025) InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions, Xu et al.
(CVPR 2025) CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement, Zhang et al.
(CVPR 2025) InteractAnything: Zero-shot Human Object-Interaction Synthesis via LLM Feedback and Object Affordance Parsing, Zhang et al.
(CVPR 2025) SkillMimic: Learning Reusable Basketball Skills from Demonstrations, Wang et al.
(CVPR 2025) MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data, Wang et al.
(AAAI 2025) ARDHOI: Auto-Regressive Diffusion for Generating 3D Human-Object Interactions, Geng et al.
(AAAI 2025) DiffGrasp: Whole-Body Grasping Synthesis Guided by Object Motion Using a Diffusion Model, Zhang et al.
(3DV 2025) Paschalidis et al: 3D Whole-body Grasp Synthesis with Directional Controllability, Paschalidis et al.
(3DV 2025) InterTrack: Tracking Human Object Interaction without Object Templates, Xie et al.
(3DV 2025) FORCE: Dataset and Method for Intuitive Physics Guided Human-object Interaction, Zhang et al.
(PAMI 2025) MotionVerse: A Unified Multimodal Framework for Motion Comprehension, Generation and Editing, Hou et al.
(PAMI 2025) EigenActor: Variant Body-Object Interaction Generation Evolved from Invariant Action Basis Reasoning, Guo et al.
(ArXiv 2025) InteractMove: Text-Controlled Human-Object Interaction Generation in 3D Scenes with Movable Objects, Cai et al.
(ArXiv 2025) InterPose: Learning to Generate Human-Object Interactions from Large-Scale Web Videos, Zhang et al.
(ArXiv 2025) ECHO: Ego-Centric modeling of Human-Object interactions, Petrov et al.
(ArXiv 2025) CoopDiff: Anticipating 3D Human-object Interactions via Contact-consistent Decoupled Diffusion, Lin et al.
(ArXiv 2025) HOI-Dyn: Learning Interaction Dynamics for Human-Object Motion Diffusion, Wu et al.
(ArXiv 2025) HOIDiNi: Human-Object Interaction through Diffusion Noise Optimization, Ron et al.
(ArXiv 2025) GenHOI: Generalizing Text-driven 4D Human-Object Interaction Synthesis for Unseen Objects, Li et al.
(ArXiv 2025) HOI-PAGE: Zero-Shot Human-Object Interaction Generation with Part Affordance Guidance, Li et al.
(ArXiv 2025) HOSIG: Full-Body Human-Object-Scene Interaction Generation, Yao et al.
(ArXiv 2025) CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects, Pi et al.
(ArXiv 2025) MaskedManipulator: Versatile Whole-Body Control for Loco-Manipulation, Tessler et al.
(ArXiv 2025) UniHM: Universal Human Motion Generation with Object Interactions in Indoor Scenes, Geng et al.
(ArXiv 2025) EJIM: Efficient Explicit Joint-level Interaction Modeling with Mamba for Text-guided HOI Generation, Huang et al.
(ArXiv 2025) ZeroHOI: Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors, Lou et al.
(ArXiv 2025) RMD-HOI: Human-Object Interaction with Vision-Language Model Guided Relative Movement Dynamics, Deng et al.
(ArXiv 2025) Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction, Jiang et al.

2024

(ArXiv 2024) CHOICE: Coordinated Human-Object Interaction in Cluttered Environments for Pick-and-Place Actions, Lu et al.
(ArXiv 2024) OOD-HOI: Text-Driven 3D Whole-Body Human-Object Interactions Generation Beyond Training Domains, Zhang et al.
(ArXiv 2024) COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models, Daiya et al.
(NeurIPS 2024) HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid, Xu et al.
(NeurIPS 2024) OmniGrasp: Grasping Diverse Objects with Simulated Humanoids, Luo et al.
(NeurIPS 2024) EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views, Yang et al.
(NeurIPS 2024) CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics, Gao et al.
(NeurIPS 2024) InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction, Xu et al.
(NeurIPS 2024) PiMForce: Posture-Informed Muscular Force Learning for Robust Hand Pressure Estimation, Seo et al.
(ECCV 2024) InterFusion: Text-Driven Generation of 3D Human-Object Interaction, Dai et al.
(ECCV 2024) CHOIS: Controllable Human-Object Interaction Synthesis, Li et al.
(ECCV 2024) F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions, Yang et al.
(ECCV 2024) HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects, Lv et al.
(SIGGRAPH 2024) PhysicsPingPong: Strategy and Skill Learning for Physics-based Table Tennis Animation, Wang et al.
(CVPR 2024) NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis, Kulkarni et al.
(CVPR 2024) HOI Animator: Generating Text-Prompt Human-Object Animations using Novel Perceptive Diffusion Models, Son et al.
(CVPR 2024) CG-HOI: Contact-Guided 3D Human-Object Interaction Generation, Diller et al.
(IJCV 2024) InterCap: Joint Markerless 3D Tracking of Humans and Objects in Interaction, Huang et al.
(3DV 2024) Phys-Fullbody-Grasp: Physically Plausible Full-Body Hand-Object Interaction Synthesis, Braun et al.
(3DV 2024) GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency, Taheri et al.
(AAAI 2024) FAVOR: Full-Body AR-driven Virtual Object Rearrangement Guided by Instruction Text, Li et al.

2023 and earlier

(SIGGRAPH Asia 2023) OMOMO: Object Motion Guided Human Motion Synthesis, Li et al.
(ICCV 2023) CHAIRS: Full-Body Articulated Human-Object Interaction, Jiang et al.
(ICCV 2023) HGHOI: Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models, Pi et al.
(ICCV 2023) InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion, Xu et al.
(CVPR 2023) Object Pop Up: Can we infer 3D objects and their poses from human interactions alone? Petrov et al.
(CVPR 2023) ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation, Fan et al.
(ECCV 2022) TOCH: Spatio-Temporal Object-to-Hand Correspondence for Motion Refinement, Zhou et al.
(ECCV 2022) COUCH: Towards Controllable Human-Chair Interactions, Zhang et al.
(ECCV 2022) SAGA: Stochastic Whole-Body Grasping with Contact, Wu et al.
(CVPR 2022) GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping, Taheri et al.
(CVPR 2022) BEHAVE: Dataset and Method for Tracking Human Object Interactions, Bhatnagar et al.
(ECCV 2020) GRAB: A Dataset of Whole-Body Human Grasping of Objects, Taheri et al.

Human-Scene Interaction

2025

(ICCV 2025) Being-M0.5: A Real-Time Controllable Vision-Language-Motion Model, Cao et al.
(ICCV 2025) SceneMI: Motion In-Betweening for Modeling Human-Scene Interactions, Hwang et al.
(ICCV 2025) SIMS: Simulating Human-Scene Interactions with Real World Script Planning, Wang et al.
(ICCV 2025) Lim et al: Event-Driven Storytelling with Multiple Lifelike Humans in a 3D scene, Lim et al.
(ICME 2025) TSTMotion: Training-free Scene-aware Text-to-motion Generation, Guo et al.
(CVPR 2025) HSI-GPT: A General-Purpose Large Scene-Motion-Language Model for Human Scene Interaction. Wang et al.
(CVPR 2025) Vision-Guided Action: Enhancing 3D Human Motion Prediction with Gaze-informed Affordance in 3D Scenes. Yu et al.
(CVPR 2025) Yi et al: Estimating Body and Hand Motion in an Ego‑sensed World, Yi et al.
(CVPR 2025) EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling. Xia et al.
(CVPR 2025) TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization, Pan et al.
(ICLR 2025) Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes, Chen et al.
(3DV 2025) Paschalidis et al: 3D Whole-body Grasp Synthesis with Directional Controllability, Paschalidis et al.
(WACV 2025) GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and-Text Contexts, Milacski et al.
(ArXiv 2025) Prime and Reach: Synthesising Body Motion for Gaze-Primed Object Reach, Hatano et al.
(ArXiv 2025) Uni-Inter: Unifying 3D Human Motion Synthesis Across Diverse Interaction Contexts, Liu et al.
(ArXiv 2025) SSOMotion: HumanMotion Synthesis in 3D Scenes via Unified Scene Semantic Occupancy, Cho et al.
(ArXiv 2025) SceneAdapt: Scene-aware Adaptation of Human Motion Diffusion, Cho et al.
(ArXiv 2025) FantasyHSI: Video-Generation-Centric 4D Human Synthesis In Any Scene through A Graph-based Multi-Agent Framework, Mu et al.
(ArXiv 2025) Half-Physics: Enabling Kinematic 3D Human Model with Physical Interactions, Li et al.
(ArXiv 2025) GenHSI: Controllable Generation of Human-Scene Interaction Videos, Li et al.
(ArXiv 2025) RMD-HOI: Human-Object Interaction with Vision-Language Model Guided Relative Movement Dynamics, Deng et al.
(ArXiv 2025) HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding, Zhao et al.
(ArXiv 2025) Jointly Understand Your Command and Intention: Reciprocal Co-Evolution between Scene-Aware 3D Human Motion Synthesis and Analysis, Gao et al.

2024

(ArXiv 2024) ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation, Li et al.
(ArXiv 2024) Mimicking-Bench: A Benchmark for Generalizable Humanoid-Scene Interaction Learning via Human Mimicking, Liu et al.
(ArXiv 2024) SCENIC: Scene-aware Semantic Navigation with Instruction-guided Control, Zhang et al.
(ArXiv 2024) Diffusion Implicit Policy: Diffusion Implicit Policy for Unpaired Scene-aware Motion synthesis, Gong et al.
(ArXiv 2024) LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment, Cong et al.
(SIGGRAPH Asia 2024) LINGO: Autonomous Character-Scene Interaction Synthesis from Text Instruction, Jiang et al.
(NeurIPS 2024) DiMoP3D: Harmonizing Stochasticity and Determinism: Scene-responsive Diverse Human Motion Prediction, Lou et al.
(ECCV 2024) MOB: Revisit Human-Scene Interaction via Space Occupancy, Liu et al.
(ECCV 2024) TesMo: Generating Human Interaction Motions in Scenes with Text Control, Yi et al.
(ECCV 2024 Workshop) SAST: Massively Multi-Person 3D Human Motion Forecasting with Scene Context, Mueller et al.
(Eurographics 2024) Kang et al: Learning Climbing Controllers for Physics-Based Characters, Kang et al.
(CVPR 2024) Afford-Motion: Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance, Wang et al.
(CVPR 2024) GenZI: Zero-Shot 3D Human-Scene Interaction Generation, Li et al.
(CVPR 2024) Cen et al.: Generating Human Motion in 3D Scenes from Text Descriptions, Cen et al.
(CVPR 2024) TRUMANS: Scaling Up Dynamic Human-Scene Interaction Modeling, Jiang et al.
(ICLR 2024) UniHSI: Unified Human-Scene Interaction via Prompted Chain-of-Contacts, Xiao et al.
(3DV 2024) Purposer: Putting Human Motion Generation in Context, Ugrinovic et al.
(3DV 2024) InterScene: Synthesizing Physically Plausible Human Motions in 3D Scenes, Pan et al.
(3DV 2024) Mir et al: Generating Continual Human Motion in Diverse 3D Scenes, Mir et al.

2023 and earlier

(ICCV 2023) DIMOS: Synthesizing Diverse Human Motions in 3D Indoor Scenes, Zhao et al.
(ICCV 2023) LAMA: Locomotion-Action-Manipulation: Synthesizing Human-Scene Interactions in Complex 3D Environments, Lee et al.
(ICCV 2023) Narrator: Towards Natural Control of Human-Scene Interaction Generation via Relationship Reasoning, Xuan et al.
(CVPR 2023) CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-Scene Interactions, Yan et al.
(CVPR 2023) Scene-Ego: Scene-aware Egocentric 3D Human Pose Estimation, Wang et al.
(CVPR 2023) SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments, Dai et al.
(CVPR 2023) CIRCLE: Capture in Rich Contextual Environments, Araujo et al.
(CVPR 2023) SceneDiffuser: Diffusion-based Generation, Optimization, and Planning in 3D Scenes, Huang et al.
(CVPR 2023) MIME: Human-Aware 3D Scene Generation, Yi et al.
(SIGGRAPH 2023) PMP: Learning to Physically Interact with Environments using Part-wise Motion Priors, Bae et al.
(SIGGRAPH 2023) QuestEnvSim: Environment-Aware Simulated Motion Tracking from Sparse Sensors, Lee et al.
(SIGGRAPH 2023) Hassan et al.: Synthesizing Physical Character-Scene Interactions, Hassan et al.
(NeurIPS 2022) Mao et al.: Contact-Aware Human Motion Forecasting, Mao et al.
(NeurIPS 2022) HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes, Wang et al.
(NeurIPS 2022) EmbodiedPose: Embodied Scene-aware Human Pose Estimation, Luo et al.
(ECCV 2022) GIMO: Gaze-Informed Human Motion Prediction in Context, Zheng et al.
(ECCV 2022) COINS: Compositional Human-Scene Interaction Synthesis with Semantic Control, Zhao et al.
(CVPR 2022) Wang et al.: Towards Diverse and Natural Scene-aware 3D Human Motion Synthesis, Wang et al.
(CVPR 2022) GAMMA: The Wanderings of Odysseus in 3D Scenes, Zhang et al.
(ICCV 2021) SAMP: Stochastic Scene-Aware Motion Prediction, Hassan et al.
(ICCV 2021) LEMO: Learning Motion Priors for 4D Human Body Capture in 3D Scenes, Zhang et al.
(3DV 2020) PLACE: Proximity Learning of Articulation and Contact in 3D Environments, Zhang et al.
(SIGGRAPH 2020) Starke et al.: Local motion phases for learning multi-contact character movements, Starke et al.
(CVPR 2020) PSI: Generating 3D People in Scenes without People, Zhang et al.
(SIGGRAPH Asia 2019) NSM: Neural State Machine for Character-Scene Interactions, Starke et al.
(ICCV 2019) PROX: Resolving 3D Human Pose Ambiguities with 3D Scene Constraints, Hassan et al.

Human-Human Interaction

(ICCV 2025) Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation, Liu et al.
(ICCV 2025) Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions, Xu et al.
(ICCV 2025) Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis, Ji et al.
(ICCV 2025) PINO: Person-Interaction Noise Optimization for Long-Duration and Customizable Motion Generation of Arbitrary-Sized Groups, Ota et al.
(SIGGRAPH 2025) Xu et al: Multi-Person Interaction Generation from Two-Person Motion Priors, Xu et al.
(SIGGRAPH 2025) DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling, Ghosh et al.
(CVPR 2025) TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation, Wang et al.
(ICLR 2025) Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation, Tan et al.
(ICLR 2025) Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation, Cen et al.
(ICLR 2025) InterMask: 3D Human Interaction Generation via Collaborative Masked Modelling, Javed et al.
(3DV 2025) Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting, Liu et al.
(AAAI 2026) InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE, Wang et al.
(ArXiv 2025) Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models, Ruiz-Ponce et al.
(ArXiv 2025) Text2Interact: High-Fidelity and Diverse Text-to-Two-Person Interaction Generation, Wu et al.
(ArXiv 2025) InterAct: A Large-Scale Dataset of Dynamic, Expressive and Interactive Activities between Two People in Daily Scenarios, Ho et al.
(ArXiv 2025) E-React: Towards Emotionally Controlled Synthesis of Human Reactions, Zhu et al.
(ArXiv 2025) Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset, Agrawal et al.
(ArXiv 2025) MAMMA: Markerless & Automatic Multi-Person Motion Action Capture, Cuevas-Velasquez et al.
(ArXiv 2025) PhysInter: Integrating Physical Mapping for High-Fidelity Human Interaction Generation, Yao et al.
(ArXiv 2025) InterMamba: Efficient Human-Human Interaction Generation with Adaptive Spatio-Temporal Mamba, Wu et al.
(ArXiv 2025) MARRS: MaskedAutoregressive Unit-based Reaction Synthesis, Wang et al.
(ArXiv 2025) SocialGen: Modeling Multi-Human Social Interaction with Language Models, Yu et al.
(ArXiv 2025) ARFlow: Human Action-Reaction Flow Matching with Physical Guidance, Jiang et al.
(ArXiv 2025) Fan et al: 3D Human Interaction Generation: A Survey, Fan et al.
(ArXiv 2025) Invisible Strings: Revealing Latent Dancer-to-Dancer Interactions with Graph Neural Networks, Zerkowski et al.
(ArXiv 2025) Leader and Follower: Interactive Motion Generation under Trajectory Constraints, Wang et al.
(ArXiv 2024) Two in One: Unified Multi-Person Interactive Motion Generation by Latent Diffusion Transformer, Li et al.
(ArXiv 2024) It Takes Two: Real-time Co-Speech Two-person’s Interaction Generation via Reactive Auto-regressive Diffusion Model, Shi et al.
(ArXiv 2024) COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models, Daiya et al.
(NeurIPS 2024) InterControl: Generate Human Motion Interactions by Controlling Every Joint, Wang et al.
(ACM MM 2024) PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation, Liu et al.
(ECCV 2024) Shan et al: Towards Open Domain Text-Driven Synthesis of Multi-Person Motions, Shan et al.
(ECCV 2024) ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions, Ghosh et al.
(CVPR 2024) Inter-X: Towards Versatile Human-Human Interaction Analysis, Xu et al.
(CVPR 2024) ReGenNet: Towards Human Action-Reaction Synthesis, Xu et al.
(CVPR Workshop 2024) in2IN: in2IN: Leveraging Individual Information to Generate Human INteractions, Ruiz-Ponce et al.
(IJCV 2024) InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions, Liang et al.
(ICCV 2023) ActFormer: A GAN-based Transformer towards General Action-Conditioned 3D Human Motion Generation, Xu et al.
(ICCV 2023) Tanaka et al.: Role-aware Interaction Generation from Textual Description, Tanaka et al.
(CVPR 2023) Hi4D: 4D Instance Segmentation of Close Human Interaction, Yin et al.
(CVPR 2022) ExPI: Multi-Person Extreme Motion Prediction, Guo et al.
(CVPR 2020) CHI3D: Three-Dimensional Reconstruction of Human Interactions, Fieraru et al.

Datasets & Benchmarks

2025

(ICCV 2025) PP-Motion: Physical-Perceptual Fidelity Evaluation for Human Motion Generation, Zhao et al.
(ICCV 2025) MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation, Gupta et al.
(ACM MM 2025) Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions, Xu et al.
(Bioengineering 2025) MeLLO: The Utah Manipulation and Locomotion of Large Objects (MeLLO) Data Library, Luttmer et al.
(CVPR 2025) OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation, Xu et al.
(CVPR 2025) InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation, Xu et al.
(CVPR 2025) MotionPro: Exploring the Role of Pressure in Human MoCap and Beyond, Ren et al.
(CVPR 2025) GORP: Real-Time Motion Generation with Rolling Prediction Models, Barquero et al.
(CVPR 2025) ClimbingCap: ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate, Yan et al.
(CVPR 2025) AtoM: AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward, Han et al.
(CVPR 2025) CORE4D: CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement, Zhang et al.
(ICLR 2025) MotionCritic: Aligning Human Motion Generation with Human Perceptions, Wang et al.
(ICLR 2025) LocoVR: LocoVR: Multiuser Indoor Locomotion Dataset in Virtual Reality, Takeyama et al.
(ICLR 2025) PMR: Pedestrian Motion Reconstruction: A Large-scale Benchmark via Mixed Reality Rendering with Multiple Perspectives and Modalities, Wang et al.
(AAAI 2025) EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs, Fan et al.
(ArXiv 2025) H2IAD: 3D Human-Human Interaction Anomaly Detection, Maeda et al.
(ArXiv 2025) RoleMotion: A Large-Scale Dataset towards Robust Scene-Specific Role-Playing Motion Synthesis with Fine-grained Descriptions, Peng et al.
(ArXiv 2025) Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset, McLean et al.
(ArXiv 2025) CaddieSet: A Golf Swing Dataset with Human Joint Features and Ball Information, Jung et al.
(ArXiv 2025) Waymo-3DSkelMo: A Multi-Agent 3D Skeletal Motion Dataset for Pedestrian Interaction Modeling in Autonomous Driving, Zhu et al.
(ArXiv 2025) SpeakerVid-5M: A Large-Scale High-Quality Dataset for audio-visual Dyadic Interactive Human Generation, Zhang et al.
(ArXiv 2025) AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability, Suzuki et al.
(ArXiv 2025) MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding, Li et al.
(ArXiv 2025) FLEX: A Large-Scale Multi-Modal Multi-Action Dataset for Fitness Action Quality Assessment, Yin et al.
(ArXiv 2025) From Motion to Behavior: Hierarchical Modeling of Humanoid Generative Behavior Control, Zhang et al.
(ArXiv 2025) Rekik et al: Quality assessment of 3D human animation: Subjective and objective evaluation, Rekik et al.
(ArXiv 2025) K2MUSE: A Large-scale Human Lower limb Dataset of Kinematics, Kinetics, amplitude Mode Ultrasound and Surface Electromyography, Li et al.
(ArXiv 2025) RMD-HOI: Human-Object Interaction with Vision-Language Model Guided Relative Movement Dynamics, Deng et al.
(ArXiv 2025) SGA-INTERACT: SGA-INTERACT: A3DSkeleton-based Benchmark for Group Activity Understanding in Modern Basketball Tactic, Yang et al.
(ArXiv 2025) Kaiwu: Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction, Jiang et al.
(ArXiv 2025) Motion-X++: Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset, Zhang et al.

2024

(ArXiv 2024) Mimicking-Bench: A Benchmark for Generalizable Humanoid-Scene Interaction Learning via Human Mimicking, Liu et al.
(ArXiv 2024) LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment, Cong et al.
(ArXiv 2024) SCENIC: Scene-aware Semantic Navigation with Instruction-guided Control, Zhang et al.
(ArXiv 2024) synNsync: Synergy and Synchrony in Couple Dances, Manukele et al.
(ArXiv 2024) MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations, Xu et al.
(Github 2024) CMP & CMR: AnimationGPT: An AIGC tool for generating game combat motion assets, Liao et al.
(Scientific Data 2024) Evans et al: Synchronized Video, Motion Capture and Force Plate Dataset for Validating Markerless Human Movement Analysis, Evans et al.
(Scientific Data 2024) MultiSenseBadminton: MultiSenseBadminton: Wearable Sensor–Based Biomechanical Dataset for Evaluation of Badminton Performance, Seong et al.
(SIGGRAPH Asia 2024) LINGO: Autonomous Character-Scene Interaction Synthesis from Text Instruction, Jiang et al.
(NeurIPS 2024) Harmony4D: Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions, Khirodkar et al.
(NeurIPS D&B 2024) EgoSim: EgoSim: An Egocentric Multi-view Simulator for Body-worn Cameras during Human Motion, Hollidt et al.
(NeurIPS D&B 2024) Muscles in Time: Muscles in Time: Learning to Understand Human Motion by Simulating Muscle Activations, Schneider et al.
(NeurIPS D&B 2024) Text to blind motion: Text to blind motion, Kim et al.
(ACM MM 2024) CLaM: CLaM: An Open-Source Library for Performance Evaluation of Text-driven Human Motion Generation, Chen et al.
(ECCV 2024) AddBiomechanics: AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale, Werling et al.
(ECCV 2024) LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment, Ren et al.
(ECCV 2024) SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark, Yu et al.
(ECCV 2024) Nymeria: A massive collection of multimodal egocentric daily motion in the wild, Ma et al.
(Multibody System Dynamics 2024) Human3.6M+: Using musculoskeletal models to generate physically-consistent data for 3D human pose, kinematic, dynamic, and muscle estimation, Nasr et al.
(CVPR 2024) Inter-X: Towards Versatile Human-Human Interaction Analysis, Xu et al.
(CVPR 2024) HardMo: A Large-Scale Hardcase Dataset for Motion Capture, Liao et al.
(CVPR 2024) Xie et al: Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation, Xie et al.
(CVPR 2024) MMVP: MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors, Zhang et al.
(CVPR 2024) RELI11D: RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method, Yan et al.

2023 and earlier

(SIGGRAPH Asia 2023) GroundLink: A Dataset Unifying Human Body Movement and Ground Reaction Dynamics, Han et al.
(NeurIPS D&B 2023) HOH: Markerless Multimodal Human-Object-Human Handover Dataset with Large Object Count, Wiederhold et al.
(NeurIPS D&B 2023) Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset, Lin et al.
(NeurIPS D&B 2023) Humans in Kitchens: A Dataset for Multi-Person Human Motion Forecasting with Scene Context, Tanke et al.
(ICCV 2023) CHAIRS: Full-Body Articulated Human-Object Interaction, Jiang et al.
(ICCV 2023) EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild, Kaufmann et al.
(CVPR 2023) MOYO: 3D Human Pose Estimation via Intuitive Physics, Tripathi et al.
(CVPR 2023) CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-Scene Interactions, Yan et al.
(CVPR 2023) FLAG3D: A 3D Fitness Activity Dataset with Language Instruction, Tang et al.
(CVPR 2023) Hi4D: 4D Instance Segmentation of Close Human Interaction, Yin et al.
(CVPR 2023) CIRCLE: Capture in Rich Contextual Environments, Araujo et al.
(CVPR 2023) BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion, Black et al.
(CVPR 2023) SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments, Dai et al.
(CVPR 2023) MIME: Human-Aware 3D Scene Generation, Yi et al.
(NeurIPS 2022) MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control, Wagener et al.
(ACM MM 2022) ForcePose: Learning to Estimate External Forces of Human Motion in Video, Louis et al.
(ECCV 2022) BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis, Liu et al.
(ECCV 2022) BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis, Moltisanti et al.
(ECCV 2022) EgoBody: Human body shape and motion of interacting people from head-mounted devices, Zhang et al.
(ECCV 2022) GIMO: Gaze-Informed Human Motion Prediction in Context, Zheng et al.
(ECCV 2022) HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling, Cai et al.
(CVPR 2022) ExPI: Multi-Person Extreme Motion Prediction, Guo et al.
(CVPR 2022) HumanML3D: Generating Diverse and Natural 3D Human Motions from Text, Guo et al.
(CVPR 2022) Putting People in their Place: Monocular Regression of 3D People in Depth, Sun et al.
(CVPR 2022) BEHAVE: Dataset and Method for Tracking Human Object Interactions, Bhatnagar et al.
(ICCV 2021) AIST++: AI Choreographer: Music Conditioned 3D Dance Generation with AIST++, Li et al.
(CVPR 2021) Fit3D: AIFit: Automatic 3D Human-Interpretable Feedback Models for Fitness Training, Fieraru et al.
(CVPR 2021) BABEL: Bodies, Action, and Behavior with English Labels, Punnakkal et al.
(AAAI 2021) HumanSC3D: Learning complex 3d human self-contact, Fieraru et al.
(CVPR 2020) CHI3D: Three-Dimensional Reconstruction of Human Interactions, Fieraru et al.
(ICCV 2019) PROX: Resolving 3D Human Pose Ambiguities with 3D Scene Constraints, Hassan et al.
(ICCV 2019) AMASS: Archive of Motion Capture As Surface Shapes, Mahmood et al.

Humanoid, Simulated or Real

2026

(ArXiv 2026) UniAct: Unified Motion Generation and Action Streaming for Humanoid Robots, Jiang et al.

2025

(CoRL 2025) Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids, Hu et al.
(CoRL 2025) HuB: Learning Extreme Humanoid Balance, Zhang et al.
(CoRL 2025) CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks, Li et al.
(CoRL 2025) Hand-Eye Autonomous Delivery: Learning Humanoid Navigation, Locomotion and Reaching, Ye et al.
(ICCV 2025) SIMS: Simulating Human-Scene Interactions with Real World Script Planning, Wang et al.
(ICCV 2025) ModSkill: Physical Character Skill Modularization, Huang et al.
(ICCV 2025) UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control, Wu et al.
(RSS 2025) HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit, Ben et al.
(RSS 2025) BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds, Wang et al.
(RSS 2025) ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills, He et al.
(RSS 2025) HumanUP: Learning Getting-Up Policies for Real-World Humanoid Robots, He et al.
(RSS 2025) Demonstrating Berkeley Humanoid Lite: An Open-source, Accessible, and Customizable 3D-printed Humanoid Robot, Chi et al.
(RSS 2025) AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control, Li et al.
(RSS 2025) HoST: Learning Humanoid Standing-up Control across Diverse Postures, Huang et al.
(RSS 2025 Workshop) Exbody2: Advanced Expressive Humanoid Whole-Body Control, Ji et al.
(SIGGRPAH 2025) Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control, Huang et al.
(SIGGRAPH 2025) AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning, Alegre et al.
(SIGGRAPH 2025) PARC: Physics-based Augmentation with Reinforcement Learning for Character Controllers, Xu et al.
(SIGGRAPH 2025) SkillMimic-v2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations, Yu et al.
(CVPR 2025) POMP: Physics-constrainable Motion Generative Model through Phase Manifolds, Ji et al.
(CVPR 2025) Let Humanoids Hike! Integrative Skill Development on Complex Trails, Lin et al.
(CVPR 2025) GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill, Cui et al.
(CVPR 2025) InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions, Xu et al.
(CVPR 2025) SkillMimic: Learning Reusable Basketball Skills from Demonstrations, Wang et al.
(CVPR 2025) Neural Motion Simulator: Pushing the Limit of World Models in Reinforcement Learning, Hao et al.
(Eurographics 2025) Bae et al: Versatile Physics-based Character Control with Hybrid Latent Representation, Bae et al.
(ICRA 2025) Boguslavskii et al: Human-Robot Collaboration for the Remote Control of Mobile Humanoid Robots with Torso-Arm Coordination, Boguslavskii et al.
(ICRA 2025) HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots, He et al.
(ICRA 2025) PIM: Learning Humanoid Locomotion with Perceptive Internal Model, Long et al.
(ICRA 2025) Think on your feet: Seamless Transition between Human-like Locomotion in Response to Changing Commands, Huang et al.
(ICLR 2025) MimicLabs: What Matters in Learning from Large-Scale Datasets for Robot Manipulation, Saxena et al.
(ICLR 2025) Puppeteer: Hierarchical World Models as Visual Whole-Body Humanoid Controllers, Hansen et al.
(ICLR 2025) FB-CPR: Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models, Tirinzoni et al.
(ICLR 2025) MPC2: Motion Control of High-Dimensional Musculoskeletal System with Hierarchical Model-Based Planning, Wei et al.
(ICLR 2025) CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control, Tevet et al.
(ICLR 2025) HiLo: Learning Whole-Body Human-like Locomotion with Motion Tracking Controller, Zhang et al.
(Github 2025) MobilityGen: MobilityGen.
(ArXiv 2025) Do You Have Freestyle?: Expressive Humanoid Locomotion via Audio Control, Li et al.
(ArXiv 2025) RoboMirror: Understand Before You Imitate for Video to Humanoid Locomotion, Li et al.
(ArXiv 2025) EGM: Efficiently Learning General Motion Tracking Policy for High Dynamic Humanoid Whole-Body Control, Yang et al.
(ArXiv 2025) CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation, Chen et al.
(ArXiv 2025) Spraggett et al: Learning to Get Up Across Morphologies: Zero-Shot Recovery with a Unified Humanoid Policy, Spraggett et al.
(ArXiv 2025) PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations, Yuan et al.
(ArXiv 2025) Mimic2DM: Learning to Control Physically-simulated 3D Characters via Generating and Mimicking 2D Motions, Li et al.
(ArXiv 2025) WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation Control, Jiang et al.
(ArXiv 2025) Xu et al.: Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input, Xu et al.
(ArXiv 2025) Song et al.: Gait-Adaptive Perceptive Humanoid Locomotion with Real-Time Under-Base Terrain Reconstruction, Song et al.
(ArXiv 2025) Toward Seamless Physical Human-Humanoid Interaction: Insights from Control, Intent, and Modeling with a Vision for What Comes Next, Cardona et al.
(ArXiv 2025) Kumbhar et al: Efficient and Compliant Control Framework for Versatile Human-Humanoid Collaborative Transportation, Kumbhar et al.
(ArXiv 2025) SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control, Mu et al.
(ArXiv 2025) GenMimic: From Generated Human Videos to Physically Plausible Robot Trajectories, Ni et al.
(ArXiv 2025) H-Zero: Cross-Humanoid Locomotion Pretraining Enables Few-shot Novel Embodiment Transfer, Lin et al.
(ArXiv 2025) Xue et al: Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer, Xue et al.
(ArXiv 2025) Seo et al: Learning Sim-to-Real Humanoid Locomotion in 15 Minutes, Seo et al.
(ArXiv 2025) Seo et al: Discovering Self-Protective Falling Policy for Humanoid Robot via Deep Reinforcement Learning, Shi et al.
(ArXiv 2025) SafeHumanoid: VLM-RAG-driven Control of Upper Body Impedance for Humanoid Robot, Mahmoud et al.
(ArXiv 2025) Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary, Liu et al.
(ArXiv 2025) Agility Meets Stability: Versatile Humanoid Control with Heterogeneous Data, Pan et al.
(ArXiv 2025) SafeFall: Learning Protective Control for Humanoid Robots, Meng et al.
(ArXiv 2025) SENTINEL: A Fully End-to-End Language-Action Model for Humanoid Whole Body Control, Wang et al.
(ArXiv 2025) HAFO: Humanoid Force-Adaptive Control for Intense External Force Interaction Environments, Dong et al.
(ArXiv 2025) Xiao et al.: Kinematics-Aware Multi-Policy Reinforcement Learning for Force-Capable Humanoid Loco-Manipulation, Xiao et al.
(ArXiv 2025) Switch-JustDance: Benchmarking Whole-Body Motion Tracking Policies Using a Commercial Console Game, Kim et al.
(ArXiv 2025) VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation, He et al.
(ArXiv 2025) HMC: Learning Heterogeneous Meta-Control for Contact-Rich Loco-Manipulation, Wei et al.
(ArXiv 2025) Gallant: Voxel Grid-based Humanoid Locomotion and Local-navigation across 3D Constrained Terrains, Ben et al.
(ArXiv 2025) Liu et al.: Humanoid Whole-Body Badminton via Multi-Stage Reinforcement Learning, Liu et al.
(ArXiv 2025) SPIDER: Scalable Physics-Informed DExterous Retargeting, Pan et al.
(ArXiv 2025) SCHUR: Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots, Wei et al.
(ArXiv 2025) RGMP: Recurrent Geometric-prior Multimodal Policy for Generalizable Humanoid Robot Manipulation, Li et al.
(ArXiv 2025) SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control, Luo et al.
(ArXiv 2025) FIRM: Unified Humanoid Fall-Safety Policy from a Few Demonstrations, Xu et al.
(ArXiv 2025) AHC: Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning, Zhao et al.
(ArXiv 2025) BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning, Ze et al.
(ArXiv 2025) GentleHumanoid: Learning Upper-body Compliance for Contact-rich Human and Object Interaction, Lu et al.
(ArXiv 2025) Wang et al: Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots, Wang et al.
(ArXiv 2025) TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System, Ze et al.
(ArXiv 2025) Huang et al: One-shot Humanoid Whole-body Motion Learning, Huang et al.
(ArXiv 2025) Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments, Li et al.
(ArXiv 2025) PHUMA: Building the Bridge Between Off-the-Shelf VLMs and the Physical World, Lee et al.
(ArXiv 2025) Kwon et al: A Humanoid Visual-Tactile-Action Dataset for Contact-Rich Manipulation, Kwon et al.
(ArXiv 2025) Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World, Jian et al.
(ArXiv 2025) Humanoid Goalkeeper: Learning from Position Conditioned Task-Motion Constraints, Ren et al.
(ArXiv 2025) SoftMimic: Learning Compliant Whole-body Control from Examples, Margolis et al.
(ArXiv 2025) AdaMimic: Towards Adaptable Humanoid Control via Adaptive Motion Tracking, Huang et al.
(ArXiv 2025) COLA: Learning Human-Humanoid Coordination for Collaborative Object Carrying, Du et al.
(ArXiv 2025) Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion, Werner et al.
(ArXiv 2025) From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance, Li et al.
(ArXiv 2025) Wu et al: Path and Motion Optimization for Efficient Multi-Location Inspection with Humanoid Robots, Wu et al.
(ArXiv 2025) DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation, Fu et al.
(ArXiv 2025) PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System, Wang et al.
(ArXiv 2025) Ego-VCP: Ego-Vision World Model for Humanoid Contact Planning, Liu et al.
(ArXiv 2025) Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation, Zhao et al.
(ArXiv 2025) DPL: Depth-only Perceptive Humanoid Locomotion via Realistic Depth Synthesis and Cross-Attention Terrain Reconstruction, Sun et al.
(ArXiv 2025) ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning, Zhao et al.
(ArXiv 2025) Retargeting Matters: General Motion Retargeting for Humanoid Motion Tracking, Ara´ujo et al.
(ArXiv 2025) PolySim: Bridging the Sim-to-Real Gap for Humanoid Control via Multi-Simulator Dynamics Randomization, Lei et al.
(ArXiv 2025) D'Elia et al: Stabilizing Humanoid Robot Trajectory Generation via Physics-Informed Learning and Control-Informed Steering, D'Elia et al.
(ArXiv 2025) OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction, Yang et al.
(ArXiv 2025) MoReFlow: Motion Retargeting Learning through Unsupervised Flow Matching, Kim et al.
(ArXiv 2025) Towards Versatile Humanoid Table Tennis: Unified Reinforcement Learning with Prediction Augmentation, Hu et al.
(ArXiv 2025) SEEC: Stable End-Effector Control with Model-Enhanced Residual Learning for Humanoid Loco-Manipulation, Jang et al.
(ArXiv 2025) RuN: Residual Policy for Natural Humanoid Locomotion, Li et al.
(ArXiv 2025) RobotDancing: Residual-Action Reinforcement Learning Enables Robust Long-Horizon Humanoid Motion Tracking, Sun et al.
(ArXiv 2025) VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation, Yin et al.
(ArXiv 2025) Chasing Stability: Humanoid Running via Control Lyapunov Function Guided Reinforcement Learning, Olkin et al.
(ArXiv 2025) RoMoCo: Robotic Motion Control Toolbox for Reduced-Order Model-Based Locomotion on Bipedal and Humanoid Robots, Dai et al.
(ArXiv 2025) HDMI: Learning Interactive Humanoid Whole-Body Control from Human Videos, Weng et al.
(ArXiv 2025) HuMam: Humanoid Motion Control via End-to-End Deep Reinforcement Learning with Mamba, Wang et al.
(ArXiv 2025) KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Contro, Han et al.
(ArXiv 2025) IKMR: Implicit Kinodynamic Motion Retargeting for Human-to-humanoid Imitation Learning, Chen et al.
(ArXiv 2025) DreamControl: Human-Inspired Whole-Body Humanoid Control for Scene Interaction via Guided Diffusion, Kalaria et al.
(ArXiv 2025) BFM: Behavior Foundation Model for Humanoid Robots, Zeng et al.
(ArXiv 2025) Zheng et al: Embracing Bulky Objects with Humanoid Robots: Whole-Body Manipulation with Reinforcement Learning, Zheng et al.
(ArXiv 2025) StageACT: Stage-Conditioned Imitation for Robust Humanoid Door Opening, Lee et al.
(ArXiv 2025) TrajBooster: Boosting Humanoid Whole-Body Manipulation via Trajectory-Centric Learning, Liu et al.
(ArXiv 2025) Learning to Walk in Costume: Adversarial Motion Priors for Aesthetically Constrained Humanoids, Alvarez et al.
(ArXiv 2025) Ghansah et al: Hierarchical Reduced-Order Model Predictive Control for Robust Locomotion on Humanoid Robots, Ghansah et al.
(ArXiv 2025) HITTER: A HumanoId Table TEnnis Robot via Hierarchical Planning and Learning, Su et al.
(ArXiv 2025) Traversing the Narrow Path: A Two-Stage Reinforcement Learning Framework for Humanoid Beam Walking, Huang et al.
(ArXiv 2025) FARM: Frame-Accelerated Augmentation and Residual Mixture-of-Experts for Physics-Based High-Dynamic Humanoid Control, Jing et al.
(ArXiv 2025) Arnold: A Generalist Muscle Transformer Policy, Chiappa et al.
(ArXiv 2025) HumanoidVerse: A Versatile Humanoid for Vision-Language Guided Multi-Object Rearrangement, Zhang et al.
(ArXiv 2025) Ciebielski et al: Task and Motion Planning for Humanoid Loco-manipulation, Ciebielski et al.
(ArXiv 2025) SimGenHOI: Physically Realistic Whole-Body Humanoid-Object Interaction via Generative Modeling and Reinforcement Learning, Lin et al.
(ArXiv 2025) LookOut: Real-World Humanoid Egocentric Navigation, Pan et al.
(ArXiv 2025) Malhotra et al: Humanoid Motion Scripting with Postural Synergies, Malhotra et al.
(ArXiv 2025) GBC: Generalized Behavior-Cloning Framework for Whole-Body Humanoid Imitation, Yao et al.
(ArXiv 2025) SafeHumanoidsPolicy: End-to-End Humanoid Robot Safe and Comfortable Locomotion Policy, Wang et al.
(ArXiv 2025) BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion, Truong et al.
(ArXiv 2025) SE-Policy: Coordinated Humanoid Robot Locomotion with Symmetry Equivariant Reinforcement Learning Policy, Nie et al.
(ArXiv 2025) Chen et al: A Whole-Body Motion Imitation Framework from Human Data for Full-Size Humanoid Robot, Chen et al.
(ArXiv 2025) Humanoid Occupancy: Enabling A Generalized Multimodal Occupancy Perception System on Humanoid Robots, Cui et al.
(ArXiv 2025) Astribot: Towards Human-level Intelligence via Human-like Whole-Body Manipulation, Astribot Team.
(ArXiv 2025) EMP: Executable Motion Prior for Humanoid Robot Standing Upper-body Motion Imitation, Xu et al.
(ArXiv 2025) EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos, Yang et al.
(ArXiv 2025) Robot Drummer: Learning Rhythmic Skills for Humanoid Drumming, Shahid et al.
(ArXiv 2025) PL-CAP: Learning Robust Motion Skills via Critical Adversarial Attacks for Humanoid Robots, Zhang et al.
(ArXiv 2025) UniTracker: Learning Universal Whole-Body Motion Tracker for Humanoid Robots, Yin et al.
(ArXiv 2025) ULC: A Unified and Fine-Grained Controller for Humanoid Loco-Manipulation, Sun et al.
(ArXiv 2025) Schakkal et al: Hierarchical Vision-Language Planning for Multi-Step Humanoid Manipulation, Schakkal et al.
(ArXiv 2025) PIMBS: Efficient Body Schema Learning for Musculoskeletal Humanoids with Physics-Informed Neural Networks, Kawaharazuka et al.
(ArXiv 2025) Behavior Foundation Model: Towards Next-Generation Whole-Body Control System of Humanoid Robots, Yuan et al.
(ArXiv 2025) RL from Physical Feedback: Aligning Large Motion Models with Humanoid Control, Yue et al.
(ArXiv 2025) From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots, Wang et al.
(ArXiv 2025) LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction, Xue et al.
(ArXiv 2025) GMT: General Motion Tracking for Humanoid Whole-Body Control, Chen et al.
(ArXiv 2025) KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills, Xie et al.
(ArXiv 2025) SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending, Kuang et al.
(ArXiv 2025) Hold My Beer🍻: Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control, Li et al.
(ArXiv 2025) FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control, Seo et al.
(ArXiv 2025) Peng et al: Gait-Conditioned Reinforcement Learning with Multi-Phase Curriculum for Humanoid Locomotion, Peng et al.
(ArXiv 2025) One Policy but Many Worlds: A Scalable Unified Policy for Versatile Humanoid Locomotion, Fan et al.
(ArXiv 2025) SMAP: Self-supervised Motion Adaptation for Physically Plausible Humanoid Whole-body Control, Zhao et al.
(ArXiv 2025) MaskedManipulator: Versatile Whole-Body Control for Loco-Manipulation, Tessler et al.
(ArXiv 2025) H2-COMPAXT: Human–Humanoid Co-Manipulation via Adaptive Contact Trajectory Policies, Bethala et al.
(ArXiv 2025) HIL: Hybrid Imitation Learning of Diverse Parkour Skills from Videos, Wang et al.
(ArXiv 2025) Dribble Master: Learning Agile Humanoid Dribbling Through Legged Locomotion, Wang et al.
(ArXiv 2025) KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture, Cotton et al.
(ArXiv 2025) PDC: Emergent Active Perception and Dexterity of Simulated Humanoids from Visual Reinforcement Learning, Luo et al.
(ArXiv 2025) R2S2: Unleashing Humanoid Reaching Potential via Real-world-Ready Skill Space, Zhang et al.
(ArXiv 2025) Bracing for Impact: Robust Humanoid Push Recovery and Locomotion with Reduced Order Model, Yang et al.
(ArXiv 2025) SHIELD: Safety on Humanoids via CBFs In Expectation on Learned Dynamics, Yang et al.
(ArXiv 2025) JAEGER: Dual-Level Humanoid Whole-Body Controller, Ding et al.
(ArXiv 2025) FALCON: Learning Force-Adaptive Humanoid Loco-Manipulation, Zhang et al.
(ArXiv 2025) ADD: Physics-Based Motion Imitation with Adversarial Differential Discriminators, Zhang et al.
(ArXiv 2025) VideoMimic: Visual imitation enables contextual humanoid control, Allshire et al.
(ArXiv 2025) TWIST: Teleoperated Whole-Body Imitation System, Ze et al.
(ArXiv 2025) SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings, Vahl et al.
(ArXiv 2025) ALMI: Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning, Shi et al.
(ArXiv 2025) Hao et al: Embodied Chain of Action Reasoning with Multi-Modal Foundation Model for Humanoid Loco-manipulation, Hao et al.
(ArXiv 2025) PreCi: Pre-training and Continual Improvement of Humanoid Locomotion via Model-Assumption-based Regularization, Jung et al.
(ArXiv 2025) Teacher Motion Priors: Enhancing Robot Locomotion over Challenging Terrain, Lin et al.
(ArXiv 2025) Cha et al: Sim-to-Real of Humanoid Locomotion Policies via Joint Torque Space Perturbation Injection, Cha et al.
(ArXiv 2025) FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and Manipulation, Zhang et al.
(ArXiv 2025) Lutz et al: Control of Humanoid Robots with Parallel Mechanisms using Kinematic Actuation Models, Lutz et al.
(ArXiv 2025) GR00T N1: An Open Foundation Model for Generalist Humanoid Robots, NVIDIA.
(ArXiv 2025) StyleLoco: Generative Adversarial Distillation for Natural Humanoid Robot Locomotion, Ma et al.
(ArXiv 2025) KINESIS: Reinforcement Learning-Based Motion Imitation for Physiologically Plausible Musculoskeletal Motor Control, Simos et al.
(ArXiv 2025) GMP: Natural Humanoid Robot Locomotion with Generative Motion Prior, Zhang et al.
(ArXiv 2025) Sun et al: Learning Perceptive Humanoid Locomotion over Challenging Terrain, Sun et al.
(ArXiv 2025) HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion, Lin et al.
(ArXiv 2025) Lin et al: Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids, Lin et al.
(ArXiv 2025) COMPASS: Cross-embOdiment Mobility Policy via ResiduAl RL and Skill Synthesis, Liu et al.
(ArXiv 2025) VB-COM: Learning Vision-Blind Composite Humanoid Locomotion Against Deficient Perception, Ren et al.
(ArXiv 2025) Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration, Ding et al.
(ArXiv 2025) Li et al: Human-Like Robot Impedance Regulation Skill Learning from Human-Human Demonstrations, Li et al.
(ArXiv 2025) RHINO: Learning Real-Time Humanoid-Human-Object Interaction from Human Demonstrations, Chen et al.
(ArXiv 2025) Embrace Collisions: Humanoid Shadowing for Deployable Contact-Agnostics Motion, Zhuang et al.
(ArXiv 2025) ToddlerBot: Open-Source ML-Compatible Humanoid Platform for Loco-Manipulation, Shi et al.
(ArXiv 2025) Gu et al: Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning, Gu et al.

2024

(ArXiv 2024) UH-1: Learning from Massive Human Videos for Universal Humanoid Pose Control, Mao et al.
(ArXiv 2024) Mimicking-Bench: A Benchmark for Generalizable Humanoid-Scene Interaction Learning via Human Mimicking, Liu et al.
(ArXiv 2024) Humanoidlympics: Sports Environments for Physically Simulated Humanoids, Luo et al.
(ArXiv 2024) PhySHOI: Physics-Based Imitation of Dynamic Human-Object Interaction, Wang et al.
(RA-L 2024) Murooka et al: Whole-body Multi-contact Motion Control for Humanoid Robots Based on Distributed Tactile Sensors, Murooka et al.
(RA-L 2024) Liu et al: Human-Humanoid Robots Cross-Embodiment Behavior-Skill Transfer Using Decomposed Adversarial Learning from Demonstration, Liu et al.
(SIGGRAPH Asia 2024) PDP: Physics-Based Character Animation via Diffusion Policy, Truong et al.
(SIGGRAPH Asia 2024) MaskedMimic: MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting, Tessler et al.
(NeurIPS 2024) HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid, Xu et al.
(NeurIPS 2024) OmniGrasp: Grasping Diverse Objects with Simulated Humanoids, Luo et al.
(NeurIPS 2024) InterControl: Generate Human Motion Interactions by Controlling Every Joint, Wang et al.
(NeurIPS 2024) CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics, Gao et al.
(NeurIPS 2024) Radosavovic et al.: Humanoid Locomotion as Next Token Prediction, Radosavovic et al.
(CoRL 2024) HARMON: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions, Jiang et al.
(CoRL 2024) OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation, Li et al.
(CoRL 2024) HumanPlus: Humanoid Shadowing and Imitation from Humans, Fu et al.
(CoRL 2024) OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning, He et al.
(Humanoids 2024) Self-Aware: Know your limits! Optimize the behavior of bipedal robots through self-awareness, Mascaro et al.
(ACM MM 2024) PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation, Liu et al.
(IROS 2024) H2O: Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation, He et al.
(ECCV 2024) MHC: Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs, Shrestha et al.
(ICML 2024) DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation, Liu et al.
(SIGGRAPH 2024) MoConVQ: Unified Physics-Based Motion Control via Scalable Discrete Representations, Yao et al.
(SIGGRAPH 2024) PhysicsPingPong: Strategy and Skill Learning for Physics-based Table Tennis Animation, Wang et al.
(SIGGRAPH 2024) SuperPADL: Scaling Language-Directed Physics-Based Control with Progressive Supervised Distillation, Juravsky et al.
(CVPR 2024) SimXR: Real-Time Simulated Avatar from Head-Mounted Sensors, Luo et al.
(CVPR 2024) AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents, Cui et al.
(ICLR 2024) PULSE: Universal Humanoid Motion Representations for Physics-Based Control, Luo et al.
(ICLR 2024) H-GAP: Humanoid Control with a Generalist Planner, Jiang et al.
(ICLR 2024) UniHSI: Unified Human-Scene Interaction via Prompted Chain-of-Contacts, Xiao et al.
(3DV 2024) Phys-Fullbody-Grasp: Physically Plausible Full-Body Hand-Object Interaction Synthesis, Braun et al.
(RSS 2024) ExBody: Expressive Whole-Body Control for Humanoid Robots, Cheng et al.

2023 and earlier

(SIGGRAPH Asia 2023) Fatigued Movements: Discovering Fatigued Movements for Virtual Character Animation, Cheema et al.
(SIGGRAPH Asia 2023) C·ASE: Learning Conditional Adversarial Skill Embeddings for Physics-based Characters, Dou et al.
(SIGGRAPH Asia 2023) AdaptNet: Policy Adaptation for Physics-Based Character Control, Xu et al.
(SIGGRAPH Asia 2023) NCP: Neural Categorical Priors for Physics-Based Character Control, Zhu et al.
(SIGGRAPH Asia 2023) DROP: Dynamics Responses from Human Motion Prior and Projective Dynamics, Jiang et al.
(NeurIPS 2023) InsActor: InsActor: Instruction-driven Physics-based Characters, Ren et al.
(CoRL 2023) Humanoid4Parkour: Humanoid Parkour Learning, Zhuang et al.
(CoRL Workshop 2023) Words into Action: Words into Action: Learning Diverse Humanoid Robot Behaviors using Language Guided Iterative Motion Refinement, Kumar et al.
(ICCV 2023) PHC: Perpetual Humanoid Control for Real-time Simulated Avatars, Luo et al.
(CVPR 2023) Trace and Pace: Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion, Rempe et al.
(SIGGRAPH 2023) Vid2Player3D: Learning Physically Simulated Tennis Skills from Broadcast Videos, Zhang et al.
(SIGGRAPH 2023) QuestEnvSim: Environment-Aware Simulated Motion Tracking from Sparse Sensors, Lee et al.
(SIGGRAPH 2023) Hassan et al.: Synthesizing Physical Character-Scene Interactions, Hassan et al.
(SIGGRAPH 2023) CALM: Conditional Adversarial Latent Models for Directable Virtual Characters, Tessler et al.
(SIGGRAPH 2023) Composite Motion: Composite Motion Learning with Task Control, Xu et al.
(ICLR 2023) DiffMimic: Efficient Motion Mimicking with Differentiable Physics, Ren et al.
(NeurIPS 2022) EmbodiedPose: Embodied Scene-aware Human Pose Estimation, Luo et al.
(NeurIPS 2022) MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control, Wagener et al.
(SIGGRAPH Asia 2022) Gopinath et al.: Motion In-betweening for Physically Simulated Characters, Gopinath et al.
(SIGGRAPH Asia 2022) AIP: Adversarial Interaction Priors for Multi-Agent Physics-based Character Control, Younes et al.
(SIGGRAPH Asia 2022) ControlVAE: Model-Based Learning of Generative Controllers for Physics-Based Characters, Yao et al.
(SIGGRAPH Asia 2022) QuestSim: Human Motion Tracking from Sparse Sensors with Simulated Avatars, Winkler et al.
(SIGGRAPH Asia 2022) PADL: Language-Directed Physics-Based Character, Juravsky et al.
(SIGGRAPH Asia 2022) Wang et al.: Differentiable Simulation of Inertial Musculotendons, Wang et al.
(SIGGRAPH 2022) ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters, Peng et al.
(Journal of Neuro-Engineering and Rehabilitation 2021) Learn to Move: Deep Reinforcement Learning for Modeling Human Locomotion Control in Neuromechanical Simulation, Peng et al.
(NeurIPS 2021) KinPoly: Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation, Luo et al.
(SIGGRAPH 2021) AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control, Peng et al.
(CVPR 2021) SimPoE: Simulated Character Control for 3D Human Pose Estimation, Yuan et al.
(NeurIPS 2020) RFC: Residual Force Control for Agile Human Behavior Imitation and Extended Motion Synthesis, Yuan et al.
(ICLR 2020) Yuan et al.: Diverse Trajectory Forecasting with Determinantal Point Processes, Yuan et al.
(ICCV 2019) Ego-Pose: Ego-Pose Estimation and Forecasting as Real-Time PD Control, Yuan et al.
(SIGGRAPH 2018) DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills, Peng et al.

Bio-stuff: Human Anatomy, Biomechanics, Physiology

(npj 2025) Xiang et al: Integrating personalized shape prediction, biomechanical modeling, and wearables for bone stress prediction in runners, Xiang et al.
(CVPR 2025) HSMR: Reconstructing Humans with A Biomechanically Accurate Skeleton, Xia et al.
(CVPR 2025) HDyS: Homogeneous Dynamics Space for Heterogeneous Humans, Liu et al.
(ICLR 2025) ImDy: Human Inverse Dynamics from Imitated Observations, Liu et al.
(ICLR 2025) MPC2: Motion Control of High-Dimensional Musculoskeletal System with Hierarchical Model-Based Planning, Wei et al.
(ACM Sensys 2025) SHADE-AD: An LLM-Based Framework for Synthesizing Activity Data of Alzheimer’s Patients, Fu et al.
(JEB 2025) McAllister et al: Behavioural energetics in human locomotion: how energy use influences how we move, McAllister et al.
(WACV 2025) OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics, Gozlan et al.
(RA-L 2025) Mobedi et al: A Framework for Adaptive Load Redistribution in Human-Exoskeleton-Cobot Systems, Mobedi et al.
(Preprint 2025) GaitDynamics: A Foundation Model for Analyzing Gait Dynamics, Tan et al.
(bioRxiv 2025) Richards et al: Visualising joint force-velocity proper es in musculoskeletal models, Richards et al.
(ArXiv 2025) Beyond Motion Pattern: An Empirical Study of Physical Forces for Human Motion Understanding, Dao et al.
(ArXiv 2025) CADS: A Comprehensive Anatomical Dataset and Segmentation for Whole-Body Anatomy in Computed Tomography, Xu et al.
(ArXiv 2025) Portable Biomechanics Laboratory: Clinically Accessible Movement Analysis from a Handheld Smartphone, Peiffer et al.
(ArXiv 2025) Le et al: Physics-informed Ground Reaction Dynamics from Human Motion Capture, Le et al.
(ArXiv 2025) SMS-Human: Human sensory-musculoskeletal modeling and control of whole-body movements, Zuo et al.
(ArXiv 2025) K2MUSE: A Large-scale Human Lower limb Dataset of Kinematics, Kinetics, amplitude Mode Ultrasound and Surface Electromyography, Li et al.
(ArXiv 2025) A Human-Sensitive Controller: Adapting to Human Ergonomics and Physical Constraints via Reinforcement Learning, Almeida et al.
(ArXiv 2025) Ankle Exoskeletons in Walking and Load-Carrying Tasks: Insights into Biomechanics and Human-Robot Interaction, Almeida et al.
(ArXiv 2025) GAITGen: Disentangled Motion-Pathology Impaired Gait Generative Model, Adeli et al.
(ArXiv 2025) KINESIS: Reinforcement Learning-Based Motion Imitation for Physiologically Plausible Musculoskeletal Motor Control, Simos et al.
(ArXiv 2025) Cotton et al: Biomechanical Reconstruction with Confidence Intervals from Multiview Markerless Motion Capture, Cotton et al.
(BiorXiv 2024) Lai et al: Mapping Grip Force to Muscular Activity Towards Understanding Upper Limb Musculoskeletal Intent using a Novel Grip Strength Model, Lai et al.
(ROBIO 2024) Wu et al: Muscle Activation Estimation by Optimizing the Musculoskeletal Model for Personalized Strength and Conditioning Training, Wu et al.
(IROS 2024) Shahriari et al: Enhancing Robustness in Manipulability Assessment: The Pseudo-Ellipsoid Approach, Shahriari et al.
(SIGGRAPH Asia 2024) BioDesign: Motion-Driven Neural Optimizer for Prophylactic Braces Made by Distributed Microstructures, Han et al.
(Scientific Data 2024) Evans et al: Synchronized Video, Motion Capture and Force Plate Dataset for Validating Markerless Human Movement Analysis, Evans et al.
(NeurIPS D&B 2024) Muscles in Time: Learning to Understand Human Motion by Simulating Muscle Activations, Schneider et al.
(CoRL 2024) Wei et al: Safe Bayesian Optimization for the Control of High-Dimensional Embodied Systems, Wei et al.
(HFES 2024) Macwan et al: High-Fidelity Worker Motion Simulation With Generative AI, Macwan et al.
(ECCV 2024) AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale, Werling et al.
(ECCV 2024) MANIKIN: Biomechanically Accurate Neural Inverse Kinematics for Human Motion Estimation, Jiang et al.
(TOG 2024) NICER: A New and Improved Consumed Endurance and Recovery Metric to Quantify Muscle Fatigue of Mid-Air Interactions, Li et al.
(TVCG 2024) Loi et al: Machine Learning Approaches for 3D Motion Synthesis and Musculoskeletal Dynamics Estimation: A Survey, Loi et al.
(ICML 2024) DynSyn: Dynamical Synergistic Representation for Efficient Learning and Control in Overactuated Embodied Systems, He et al.
(Multibody System Dynamics 2024) Human3.6M+: Using musculoskeletal models to generate physically-consistent data for 3D human pose, kinematic, dynamic, and muscle estimation, Nasr et al.
(CVPR 2024) HIT: Estimating Internal Human Implicit Tissues from the Body Surface, Keller et al.
(Frontiers in Neuroscience 2024) Dai et al: Full-body pose reconstruction and correction in virtual reality for rehabilitation training, Dai et al.
(ICRA 2024) Self Model for Embodied Intelligence: Modeling Full-Body Human Musculoskeletal System and Locomotion Control with Hierarchical Low-Dimensional Representation, He et al.
(SIGGRAPH Asia 2023) Fatigued Movements: Discovering Fatigued Movements for Virtual Character Animation, Cheema et al.
(SIGGRAPH Asia 2023) SKEL: From skin to skeleton: Towards biomechanically accurate 3d digital humans, Keller et al.
(SIGGRAPH Asia 2023) MuscleVAE: Model-Based Controllers of Muscle-Actuated Characters, Feng et al.
(SIGGRAPH 2023) Bidirectional GaitNet: Bidirectional GaitNet, Park et al.
(SIGGRAPH 2023) Lee et al.: Anatomically Detailed Simulation of Human Torso, Lee et al.
(ICCV 2023) MiA: Muscles in Action, Chiquer et al.
(CVPR 2022) OSSO: Obtaining Skeletal Shape from Outside, Keller et al.
(Scientific Data 2022) Xing et al: Functional movement screen dataset collected with two Azure Kinect depth sensors, Xing et al.
(NCA 2020) Zell et al: Learning inverse dynamics for human locomotion analysis, Zell et al.
(ECCV 2020) Zell et al: Weakly-supervised learning of human dynamics, Zell et al.
(SIGGRAPH 2019) LRLE: Synthesis of biologically realistic human motion using joint torque actuation, Jiang et al.
(TII 2018) Pham et al: Multicontact Interaction Force Sensing From Whole-Body Motion Capture, Pham et al.
(ICCV Workshop 2017) Zell et al: Learning-based inverse dynamics of human motion, Zell et al.
(CVPR Workshop 2017) Zell et al: Joint 3d human motion capture and physical analysis from monocular videos, Zell et al.
(AIST 2017) HuGaDb: HuGaDB: Human Gait Database for Activity Recognition from Wearable Inertial Sensor Networks, Chereshnev et al.
(SIGGRAPH 2016) Lv et al: Data-driven inverse dynamics for human motion, Lv et al.

Human Reconstruction, Motion/Interaction/Avatar

2025

(Arxiv 2025) MonoMSK: Monocular 3D Musculoskeletal Dynamics Estimation, Koleini et al.
(Arxiv 2025) SKEL-CF: Coarse-to-Fine Biomechanical Skeleton and Surface Mesh Recovery, Li et al.
(ICCV 2025) Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models, Jin et al.
(ICCV 2025) ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness, Li et al.
(ICML 2025) ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization, Shen et al.
(CVPR 2025) HSFM: Reconstructing People, Places, and Cameras, M\"uller et al.
(CVPR 2025) Subramanian et al: Pose Priors from Language Models, Subramanian et al.
(CVPR 2025) PromptHMR: Embodied Promptable Human Mesh Recovery, Wang et al.
(CVPR 2025) HumanMM: Global Human Motion Recovery from Multi-shot Videos, Zhang et al.
(CVPR 2025) HSMR: Reconstructing Humans with A Biomechanically Accurate Skeleton, Xia et al.
(CVPR 2025) MEGA: Masked Generative Autoencoder for Human Mesh Recovery, Fiche et al.
(CVPR 2025) DiSRT-In-Bed: TDiffusion-Based Sim-to-Real Transfer Framework for In-Bed Human Mesh Recovery, Gao et al.
(CVPR 2025) Motions as Queries: One-Stage Multi-Person Holistic Human Motion Capture, Liu et al.
(CVPR 2025) H-MoRe: Learning Human-centric Motion Representation for Action Analysis, Huang et al.
(CVPR 2025) IDOL: Instant Photorealistic 3D Human Creation from a Single Image, Zhuang et al.
(CVPR 2025) MultiGO: Towards Multi-Level Geometry Learning for Monocular 3D Textured Human Reconstruction, Zhang et al.
(CVPR 2025) GBC-Splat: Generalizable Gaussian-Based Clothed Human Digitalization under Sparse RGB Cameras, Tu et al.
(CVPR 2025 workshop) Ludwig et al: Leveraging Anthropometric Measurements to Improve Human Mesh Estimation and Ensure Consistent Body Shapes, Ludwig et al.
(CVPR 2025 workshop) Ludwig et al: Efficient 2D to Full 3D Human Pose Uplifting including Joint Rotations, Ludwig et al.
(ICLR 2025) Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Dynamic Scenes, Liu et al.
(3DV 2025) CameraHMR: Aligning People with Perspective, Patel et al.
(ArXiv 2025) BaroPoser: Real-time Human Motion Tracking from IMUs and Barometers in Everyday Devices, Zhang et al.
(ArXiv 2025) Portable Biomechanics Laboratory: Clinically Accessible Movement Analysis from a Handheld Smartphone, Peiffer et al.
(ArXiv 2025) SMPLest-X: CUltimate Scaling for Expressive Human Pose and Shape Estimation, Yin et al.

2024

(SIGGRAPH Asia 2024) GVHMR: World-Grounded Human Motion Recovery via Gravity-View Coordinates, Shen et al.
(NeurIPS 2024) EVAHuman: Expressive Gaussian Human Avatars from Monocular RGB Video, Hu et al.
(ECCV 2024) TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos, Wang et al.
(ECCV 2024) ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild, Guo et al.
(ECCV 2024) WHAC: World-grounded Humans and Cameras, Yin et al.
(ECCV 2024) Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot, Baradel et al.
(ECCV 2024) Sapiens: Foundation for Human Vision Models, Khirodkar et al.
(CVPR 2024) AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation, Sun et al.
(CVPR 2024) Generative Proxemics: A Prior for 3D Social Interaction from Images, M{\“u}ller et al.
(CVPR 2024) KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree Rotation, Yang et al.
(CVPR 2024) TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation, Dwivedi et al.
(CVPR 2024) WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion, Shin et al.
(CVPR 2024) MultiPhys: Multi-Person Physics-aware 3D Motion Estimation, Ugrinovic et al.
(CVPR 2024) RoHM: Robust Human Motion Reconstruction via Diffusion, Zhang et al.

2023 and earlier

(NeurIPS 2023) SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation, Cai et al.
(NeurIPS 2023) RoboSMPLX: Towards Robust and Expressive Whole-body Human Pose and Shape Estimation, Pang et al.
(ICCV 2023) Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction, Wang et al.
(ICCV 2023) MotionBERT: A Unified Perspective on Learning Human Motion Representations, Zhu et al.
(ICCV 2023) ReFit: Recurrent Fitting Network for 3D Human Recovery, Wang et al.
(ICCV 2023) Humans in 4D: Reconstructing and Tracking Humans with Transformers, Goel et al.
(ICCV 2023) EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild, Kaufmann et al.
(CVPR 2023) Ye et al: Decoupling Human and Camera Motion from Videos in the Wild, Ye et al.
(CVPR 2023) BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion, Black et al.
(CVPR 2023) TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D Environments, Sun et al.
(CVPR 2023) IPMAN: 3D Human Pose Estimation via Intuitive Physics, Tripathi et al.
(CVPR 2023) Rajasegaran et al: On the Benefits of 3D Pose and Tracking for Human Action Recognition, Rajasegaran et al.
(NeurIPS 2022) Pang et al: Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms, Pang et al.
(ECCV 2022) Pavlakos et al: The One Where They Reconstructed 3D Humans and Environments in TV Shows, Pavlakos et al.
(CVPR 2022) Putting People in their Place: Monocular Regression of 3D People in Depth, Sun et al.
(CVPR 2022) Pavlakos et al: Human Mesh Recovery from Multiple Shots, Pavlakos et al.
(CVPR 2022 Workshop) NeuralAnnot: Neural Annotator for 3D Human Mesh Training Sets, Moon et al.
(NeurIPS 2021) Rajasegaran et al: Tracking People with 3D Representations, Rajasegaran et al.
(ICCV 2021) Kolotouros et al: Probabilistic Modeling for Human Mesh Recovery, Kolotouros et al.
(ICCV 2021) ROMP: Monocular, One-stage, Regression of Multiple 3D People, Sun et al.
(ECCV 2020) ExPose: Monocular Expressive Body Regression through Body-Driven Attention, Choutas et al.
(CVPR 2020) Jiang et al: Coherent Reconstruction of Multiple Humans from a Single Image, Jiang et al.
(CVPR 2020) NHR: Multi-view Neural Human Rendering, Wu et al.
(CVPR 2019) Expressive Body Capture: 3D Hands, Face, and Body from a Single Image, Pavlakos et al.

Human-Object/Scene/Human Interaction Reconstruction

HOI Reconstruction

(CVPR 2025) Wen et al: Reconstructing In-the-Wild Open-Vocabulary Human-Object Interactions, Wen et al.
(CVPR 2025) InteractVLM: 3D Interaction Reasoning from 2D Foundational Models, Dwivedi et al.
(CVPR 2025) PICO: Reconstructing 3D People In Contact with Objects, Cseke et al.
(CVPR 2025) EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild, Liu et al.
(CVPR 2024) HOLD:Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video, Fan et al.
(CVPR 2024) Xie et al: Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation, Xie et al.
(ACM MM 2024) WildHOI: Monocular Human-Object Reconstruction in the Wild, Huo et al.
(ICCV 2023) DECO: Dense Estimation of 3D Human-Scene COntact in the Wild, Tripathi1 et al.
(CVPR 2022) BEHAVE: Dataset and Method for Tracking Human Object Interactions, Bhatnagar et al.
(CVPR 2022) MOVER: Human-Aware Object Placement for Visual Environment Reconstruction, Yi et al.

HSI Reconstruction

(CVPR 2025) ODHSR et al: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos, Zhang et al.
(ArXiv 2025) JOSH: Joint Optimization for 4D Human-Scene Reconstruction in the Wild, Liu et al.
(NeurIPS 2024) DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos, Chu et al.
(ECCV 2024) HSR: Holistic 3D Human-Scene Reconstruction from Monocular Videos, Xue et al.
(CVPR 2024) HUGS: Human Gaussian Splats, Kocabas et al.
(ArXiv 2024) Ge et al: 3D Human Reconstrution in the Wild with Synthetic Data using Generative Models, Ge et al.
(ICCV 2023) DECO: Dense Estimation of 3D Human-Scene COntact in the Wild, Tripathi1 et al.
(ICCV 2023) EgoHMR: Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views, Zhang et al.
(CVPR 2023) SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments, Dai et al.
(CVPR 2022) MOVER: Human-Aware Object Placement for Visual Environment Reconstruction, Yi et al.

HHI Reconstruction

(CVPR 2025) Huang et al: Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning, Huang et al.
(NeurIPS 2024) Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions, Khirodkar et al.
(ECCV 2024) AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos, Lu et al.
(CVPR 2024) Fang et al.: Capturing Closely Interacted Two-Person Motions with Reaction Priors, Fan et al.
(SIGGRAPH Asia 2024) Shuai et al.: Reconstructing Close Human Interactions from Multiple Views, Shuai et al.

Motion Controlled Image/Video Generation

video

(CVPR 2025) HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation, Wang et al.
(CVPR 2025) FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance, Shao et al.
(CVPR 2025) Na et al: Boost Your Human Image Generation Model via Direct Preference Optimization, Na et al.
(CVPR 2025) TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation, Li et al.
(ArXiv 2025) X-UniMotion: Animating Human Images with Expressive, Unified and Identity-Agnostic Motion Latents, Song et al.

image

(CVPR 2025) Na et al: Boost Your Human Image Generation Model via Direct Preference Optimization, Na et al.
(NeurIPS 2024) Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation, Wang et al.
(ICLR 2024) Shen et al: Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models, Shen et al.
(ArXiv 2024) From Text to Pose to Image: Improving Diffusion Model Control and Quality, Bonnet et al.
(TNNLS 2023) Verbal-Person Nets: Pose-Guided Multi-Granularity Language-to-Person Generation, Liu et al.
(ICCV 2023) HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation, Ju et al.
(CVPR 2023) Bhunia et al: Person Image Synthesis via Denoising Diffusion Model, Bhunia et al.

Human Pose Estimation/Recognition

(CVPR 2025) SapiensID: Foundation for Human Recognition, Kim et al.
(CVPR 2025) MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation, Chharia et al.
(CVPR 2025) EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling, Xia et al.
(CVPR 2025) ProbPose: A Probabilistic Approach to 2D Human Pose Estimation, Purkrabek et al.
(ArXiv 2025) Aytekin et al: Physics-based Human Pose Estimation from a Single Moving RGB Camera, Aytekin et al.
(ECCV 2024) Sapiens: Foundation for Human Vision Models, Khirodkar et al.
(ICCV 2023) Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation, Liu et al.
(CVPR 2023) GenVIS: A Generalized Framework for Video Instance Segmentation, Heo et al.
(ArXiv 2022) Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet, Zou et al.

Human Motion Understanding

(CVPR 2025) ChatHuman: Chatting about 3D Humans with Tools, Lin et al.
(CVPR 2025) HuMoCon: Concept Discovery for Human Motion Understanding, Fang et al.
(CVPR 2025) ExpertAF: Expert Actionable Feedback from Video, Ashutosh et al.
(ArXiv 2024) MotionLLM: Understanding Human Behaviors from Human Motions and Videos, Chen et al.
(CVPR 2024) ChatPose: Chatting about 3D Human Pose, Feng et al.

Contributors

This paper list is mainly contributed by Xinpeng Liu and Yusu Fang, feel free to contact us if you have any questions or suggestions!