MotionPRO: Exploring the Role of Pressure in Human MoCap and Beyond

Shenghao Ren*1     Yi Lu*1     Jiayi Huang1     Jiayi Zhao1
He Zhang3     Tao Yu3     Qiu Shen✉1,2     Xun Cao✉1,2    
1School of Electronic Science and Engineering, Nanjing University, Nanjing, China      2Key Laboratory of Optoelectronic Devices and Systems with Extreme
Performances of MOE, Nanjing University, Nanjing, China
3BNRist, Tsinghua University, Beijing, China     
*Equal Contribution    Corresponding Author
MotionPRO is a large-scale Human Motion capture dataset with Pressure, RGB and Optical sensors, which comprises 70 volunteers performing 400 types of motion, encompassing a total of 12.4M pose frames.

Abstract

Existing human Motion Capture (MoCap) methods mostly focus on the visual similarity while neglecting the physical plausibility. As a result, downstream tasks such as driving virtual human in 3D scene or humanoid robots in real world suffer from issues such as timing drift and jitter, spatial problems like sliding and penetration, and poor global trajectory accuracy. In this paper, we revisit human MoCap from the perspective of interaction between human body and physical world by exploring the role of pressure. Firstly, we construct a large-scale Human Motion capture dataset with Pressure, RGB and Optical sensors (named MotionPRO), which comprises 70 volunteers performing 400 types of motion. Secondly, we examine both the necessity and effectiveness of the pressure signal through two challenging tasks: (1) pose and trajectory estimation based solely on pressure: We propose a network that incorporates a small kernel decoder and a long-short-term attention module, and proof that pressure could provide accurate global trajectory and plausible lower body pose. (2) pose and trajectory estimation by fusing pressure and RGB: We impose constraints on orthographic similarity along the camera axis and whole-body contact along the vertical axis to enhance the cross-attention strategy to fuse pressure and RGB feature maps. Experiments demonstrate that fusing pressure with RGB features not only significantly improves performance in terms of objective metrics, but also plausibly drives virtual humans (SMPL) in 3D scene. Furthermore, we demonstrate that incorporating physical perception enables humanoid robots to perform more precise and stable actions, which is highly beneficial for the development of embodied artificial intelligence.

Video

Examples in MotionPRO

BibTeX

@article{Ren2025MotionPRO,
  author    = {Shenghao Ren, Yi Lu, Jiayi Huang, Jiayi Zhao, He Zhang, Tao Yu, Qiu Shen, Xun Cao},
  title     = {MotionPRO: Exploring the Role of Pressure in Human MoCap and Beyond},
  journal   = {CVPR},
  year      = {2025},
}