In This Story
Researchers at RAI Institute are exploring a variety of approaches to giving robots the ability to manipulate with higher levels of dexterity than is typically seen in robots – behavior that is more like a person or animal. Rather than limiting manipulation to static grasping of stationary objects with tweezer-like manipulators, we are interested in giving robots the ability to have dynamic interactions with objects that have mass, inertia, and kinetic energy of their own, and to allow a wide variety of contact patterns and contact conditions, such as pushing, rolling, tapping, etc. With that long-term goal in mind this project has been exploring dynamic whole-body manipulation using a combination of reinforcement learning and sampling-based control. This approach enables a robot to perform dynamic and forceful interactions that require coordinated use of its arm, legs, and body where locomotion and manipulation are tightly intertwined.
The Hierarchical Control Architecture
To handle complex manipulation tasks, we employ a hierarchical control approach that decomposes problems into smaller, manageable sub-problems. The framework divides the control problem into two complementary and synchronized layers. At the low level, a reinforcement learning-based locomotion policy handles balance, stability, and movement execution by directly controlling motor torques. This learned policy provides a robust locomotion abstraction that higher-level controllers build upon.
The high-level control varies by task. For tire uprighting, dragging, and stacking, sampling-based control is used to discover manipulation strategies by simulating potential futures. For tire rolling, reinforcement learning captures the subtle dynamics and reactive control needed to maintain stable object motion. Both high-level approaches output commands for base velocities, orientation (roll, pitch, height), leg control, and arm actions, which the locomotion controller then executes while maintaining balance.
We find that the separation of high-level and low-level control significantly simplifies the control problem. Instead of reasoning about joint torques, contact forces, and stability constraints across dozens of degrees of freedom, the high-level controller operates in a reduced action space of base velocities and orientations, trusting the locomotion controller to handle the execution details. High-level controllers can focus on task completion without explicitly reasoning about balance constraints or ground contacts, while the learned locomotion policy handles these complexities robustly.
Sampling-Based Control for Forceful Manipulation
The sampling-based controller finds effective manipulation strategies by simulating multiple futures in parallel, selecting actions that best achieve the task objectives. For tasks requiring precise force application and multi-contact coordination, the system runs 32 parallel CPU threads, each using MuJoCo to simulate different action sequences several seconds into the future. Rather than sampling raw trajectories, we sample in the space of splines, providing smoother, more natural motions while reducing the search space dimensionality.
The sampling-based controller exhibits opportunistic behavior that emerges from physics-based simulation. During tire uprighting, the controller discovers complex manipulation strategies: the robot coordinates three-legged locomotion with arm movements to create the leverage needed to lift the heavy tire. In order to handle a variety of initial configurations, the robot may use its arm, front legs, body, or a combination to adapt its strategy to the task. There is no prescribed manipulation mode. Such multi-limb, multi-contact behavior arises naturally from the optimization that occurs in the sampling process, without requiring explicit programming of contact sequences.
The controller adapts its strategy based on the current configuration of the robot and the tire in the lab space. For these experiments we use a multi-camera motion capture system located in the room to provide real-time state information for the pose of the tires and the robot. One can imagine replacing the motion capture system with a camera-based vision system, possibly in conjunction with tactile and force-sensing, for practical applications.
Learning Dynamic Skills Through Reinforcement Learning
Reinforcement learning excels at producing robust controllers that handle uncertainties and disturbances, making it ideal for both low-level balance control and control of high-level manipulation skills. The foundation of this system is a learned locomotion policy trained using PPO (Proximal Policy Optimization) in IsaacLab. This policy provides the low-level control abstraction that makes high-level control tractable, robustly maintaining balance across diverse manipulation scenarios.
For tire rolling, we use reinforcement learning to handle complex friction and contact dynamics that are difficult to model accurately. We use asymmetric actor-critics to train a high-level skill policy in approximately 24 hours on a single GPU. The policy observes relative poses between robot, tire, and goal, along with joint positions and velocities. The reward function guides the policy toward desired torso and end-effector positions calculated based on object geometry and spatial relationship to the environment.
The learned rolling policy enables the robot to dynamically adjust its torso and arm position to maintain control of the rolling tire, preventing it from toppling while guiding it toward the goal. We address the sim-to-real gap through randomization of object properties including mass, friction, and shape during training. This learned approach produces behaviors that are robust across a distribution of dynamics, accounting for variations that would be difficult to model explicitly.
Real-World Performance
The system achieves consistent performance on these challenging manipulation tasks. The tire uprighting sequence completes in an average of 5.9 seconds per tire (with a best time of 3.7 seconds), almost the speed of human performance on this task. This performance relies on dynamic manipulation and would not be achievable under the quasistatic assumptions that are pervasive in robotic manipulation. Under quasistatic assumptions, robot–object interactions are restricted slow movements where the effects of acceleration can be safely neglected. Furthermore, the robot efficiently manipulates a 15 kg object—well above its peak lift capacity of 11 kg and continuous lifting capacity of 5 kg for an object carried in the gripper.
This result illustrates that a tight coupling between locomotion and manipulation expands the operational range beyond conventional pick-and-place systems. The learned locomotion abstraction simplifies high-level control by allowing higher-level control methods to become computationally feasible. This allows the controller to focus purely on task-level decisions about where and how to manipulate objects.
Current Limitations and Future Directions
This work contributes to progress in dynamic and forceful manipulation. However, the system currently relies on motion capture for state estimation, requires offboard computation, and generalizes only modestly across object sizes. Future directions include integrating perception-driven control, broadening applicability to diverse objects and robot embodiments, and enhancing the robustness of sequential task execution. These improvements will take us closer to enabling robots to perform tasks that demand significant physical interaction—such as industrial assembly, moving furniture, or handling construction materials—in real-world settings.
Authorship
This work builds upon the Judo package for sampling-based control and the ReLIC framework for flexible interlimb coordination. Contributors are listed below:
Maks Sorokin*, Jan Brüdigam*, Brandon Hung*, Stephen Phillips, Dmitry Yershov, Farzad Niroui, Tong Zhao, Leonor Fermoselle, Xinghao Zhu, John Z. Zhang, Duy Ta, Tao Pang, Jiuguang Wang, Simon Le Cléac’h
ReLIC: Xinghao Zhu, Yuxin Chen, Lingfeng Sun, Farzad Niroui, Simon Le Cléac’h, Jiuguang Wang, Kuan Fang
Judo: Albert H. Li, Brandon Hung, Jan Brüdigam, Aaron D. Ames, Jiuguang Wang, Simon Le Cléac’h, Preston Culbertson
(* equal contributions)