Combining Sampling and Learning for Dynamic Whole-Body Manipulation

In This Story

Researchers at RAI Institute are exploring a variety of approaches to giving robots the ability to manipulate with higher levels of dexterity than is typically seen in robots – behavior that is more like a person or animal. Rather than limiting manipulation to static grasping of stationary objects with tweezer-like manipulators, we are interested in giving robots the ability to have dynamic interactions with objects that have mass, inertia, and kinetic energy of their own, and to allow a wide variety of contact patterns and contact conditions, such as pushing, rolling, tapping, etc. With that long-term goal in mind this project has been exploring dynamic whole-body manipulation using a combination of reinforcement learning and sampling-based control. This approach enables a robot to perform dynamic and forceful interactions that require coordinated use of its arm, legs, and body where locomotion and manipulation are tightly intertwined.

The Hierarchical Control Architecture

To handle complex manipulation tasks, we employ a hierarchical control approach that decomposes problems into smaller, manageable sub-problems. The framework divides the control problem into two complementary and synchronized layers. At the low level, a reinforcement learning-based locomotion policy handles balance, stability, and movement execution by directly controlling motor torques. This learned policy provides a robust locomotion abstraction that higher-level controllers build upon.

The high-level control varies by task. For tire uprighting, dragging, and stacking, sampling-based control is used to discover manipulation strategies by simulating potential futures. For tire rolling, reinforcement learning captures the subtle dynamics and reactive control needed to maintain stable object motion. Both high-level approaches output commands for base velocities, orientation (roll, pitch, height), leg control, and arm actions, which the locomotion controller then executes while maintaining balance.

We find that the separation of high-level and low-level control significantly simplifies the control problem. Instead of reasoning about joint torques, contact forces, and stability constraints across dozens of degrees of freedom, the high-level controller operates in a reduced action space of base velocities and orientations, trusting the locomotion controller to handle the execution details. High-level controllers can focus on task completion without explicitly reasoning about balance constraints or ground contacts, while the learned locomotion policy handles these complexities robustly.

^{Tire uprighting using multi-contact manipu}^lation.

^{The robot maintains control of a rolling tire while guiding it toward a target lo}^cation.

Sampling-Based Control for Forceful Manipulation

The sampling-based controller finds effective manipulation strategies by simulating multiple futures in parallel, selecting actions that best achieve the task objectives. For tasks requiring precise force application and multi-contact coordination, the system runs 32 parallel CPU threads, each using MuJoCo to simulate different action sequences several seconds into the future. Rather than sampling raw trajectories, we sample in the space of splines, providing smoother, more natural motions while reducing the search space dimensionality.

^{Visualization of the sampling-based controller using its internal physics model to explore multiple trajectory candidates (white) in parallel before selecting the optimal action (blue). First, we visualize only the optimal sample along with the executed trajectory, then we overlay the 32 samples that the robot explores in parallel showing the diversity of explore}^{d futures.}

The sampling-based controller exhibits opportunistic behavior that emerges from physics-based simulation. During tire uprighting, the controller discovers complex manipulation strategies: the robot coordinates three-legged locomotion with arm movements to create the leverage needed to lift the heavy tire. In order to handle a variety of initial configurations, the robot may use its arm, front legs, body, or a combination to adapt its strategy to the task. There is no prescribed manipulation mode. Such multi-limb, multi-contact behavior arises naturally from the optimization that occurs in the sampling process, without requiring explicit programming of contact sequences.

The controller adapts its strategy based on the current configuration of the robot and the tire in the lab space. For these experiments we use a multi-camera motion capture system located in the room to provide real-time state information for the pose of the tires and the robot. One can imagine replacing the motion capture system with a camera-based vision system, possibly in conjunction with tactile and force-sensing, for practical applications.

Learning Dynamic Skills Through Reinforcement Learning

Reinforcement learning excels at producing robust controllers that handle uncertainties and disturbances, making it ideal for both low-level balance control and control of high-level manipulation skills. The foundation of this system is a learned locomotion policy trained using PPO (Proximal Policy Optimization) in IsaacLab. This policy provides the low-level control abstraction that makes high-level control tractable, robustly maintaining balance across diverse manipulation scenarios.

^{IsaacLab simulation showing 4096 robot instances learning the tire rolling policy simultane}^ously.

For tire rolling, we use reinforcement learning to handle complex friction and contact dynamics that are difficult to model accurately. We use asymmetric actor-critics to train a high-level skill policy in approximately 24 hours on a single GPU. The policy observes relative poses between robot, tire, and goal, along with joint positions and velocities. The reward function guides the policy toward desired torso and end-effector positions calculated based on object geometry and spatial relationship to the environment.

^{The robot guides a tire along a circular path, demonstrating sustained control during extended rolling session.}

The learned rolling policy enables the robot to dynamically adjust its torso and arm position to maintain control of the rolling tire, preventing it from toppling while guiding it toward the goal. We address the sim-to-real gap through randomization of object properties including mass, friction, and shape during training. This learned approach produces behaviors that are robust across a distribution of dynamics, accounting for variations that would be difficult to model explicitly.

Real-World Performance

The system achieves consistent performance on these challenging manipulation tasks. The tire uprighting sequence completes in an average of 5.9 seconds per tire (with a best time of 3.7 seconds), almost the speed of human performance on this task. This performance relies on dynamic manipulation and would not be achievable under the quasistatic assumptions that are pervasive in robotic manipulation. Under quasistatic assumptions, robot–object interactions are restricted slow movements where the effects of acceleration can be safely neglected. Furthermore, the robot efficiently manipulates a 15 kg object—well above its peak lift capacity of 11 kg and continuous lifting capacity of 5 kg for an object carried in the gripper.

^{Tire uprighting where the RL controller dynamically lifts the tire off the ground.}

This result illustrates that a tight coupling between locomotion and manipulation expands the operational range beyond conventional pick-and-place systems. The learned locomotion abstraction simplifies high-level control by allowing higher-level control methods to become computationally feasible. This allows the controller to focus purely on task-level decisions about where and how to manipulate objects.

Current Limitations and Future Directions

This work contributes to progress in dynamic and forceful manipulation. However, the system currently relies on motion capture for state estimation, requires offboard computation, and generalizes only modestly across object sizes. Future directions include integrating perception-driven control, broadening applicability to diverse objects and robot embodiments, and enhancing the robustness of sequential task execution. These improvements will take us closer to enabling robots to perform tasks that demand significant physical interaction—such as industrial assembly, moving furniture, or handling construction materials—in real-world settings.

Authorship

This work builds upon the Judo package for sampling-based control and the ReLIC framework for flexible interlimb coordination. Contributors are listed below:

Maks Sorokin*, Jan Brüdigam*, Brandon Hung*, Stephen Phillips, Dmitry Yershov, Farzad Niroui, Tong Zhao, Leonor Fermoselle, Xinghao Zhu, John Z. Zhang, Duy Ta, Tao Pang, Jiuguang Wang, Simon Le Cléac’h