Abstract

Cross-embodiment imitation learning enables policies trained on specific embodiments to transfer across different robots, unlocking the potential for large-scale imitation learning that is both cost-effective and highly reusable. This paper presents LEGATO, a cross-embodiment imitation learning framework for visuomotor skill transfer across varied kinematic morphologies. We introduce a handheld gripper that unifies action and observation spaces, allowing tasks to be defined consistently across robots. We train visuomotor policies on task demonstrations using this gripper through imitation learning, applying transformation to a motion-invariant space for computing the training loss. Gripper motions generated by the policies are retargeted into high-degree-of-freedom whole-body motions using inverse kinematics for deployment across diverse embodiments. Our evaluations in simulation and real-robot experiments highlight the framework’s effectiveness in learning and transferring visuomotor skills across various robots.

Cross-Embodiment Learning Pipeline

We introcuce a cross-embodiment imitation learning framework that enables human demonstrations via direct interaction or robot teleoperation. Our framework uses the LEGATO Gripper, a versatile handheld grasping tool that ensures consistent physical interactions across different embodiments. During data collection, the LEGATO Gripper records its trajectories, grasping actions, and visual observations captured by its egocentric stereo camera. Visuomotor policies trained on demonstrations by humans or teleoperated robots using the tool can be deployed across various robots equipped with the same gripper. Motion retargeting enables the execution of trajectories on different robots without requiring robot-specific training data.

Handheld Gripper Design

The LEGATO Gripper is designed for both human demonstration collection and robot deployment. It features a shared actuated gripper with adaptable handles, ensuring reliable human handling and consistent grasping across robots while minimizing components. A human demonstrator can directly perform tasks by carrying the LEGATO Gripper in hand. The LEGATO Gripper is easily installable on various robots, securely held by their original grippers, and is ready for immediate use.

Human usage

Robot usage

Whole-body Motion Retargeting

Motion retargeting through IK optimization adeptly navigates the kinematic differences and constraints across robot embodiments, exploiting kinematic redundancy without requiring additional robot-specific demonstrations for deployment.

Real-Robot Deployment

We trained visuomotor policies on direct human demonstrations and successfully deployed them on the Panda robot system. Our method succeeded in 16 trials of the Closing the lid task, 13 trials of the Cup shelving task, and 14 trials of the Ladle reorganization task, respectively.

Closing the lid

Cup shelving

Ladle reorganization

Simulation Evaluation

On average, LEGATO outperforms the other methods in cross-embodiment deployment by 28.9%, 10.5%, and 21.1%, compared to BC-RNN, Diffusion Policy, and the self-variant of LEGATO trained only on SE3 (LEGATO (SE3)), respectively. Notably, unlike the baselines that only achieved high success rates on specific robot bodies, typically the Abstract embodiment used for training, LEGATO demonstrates consistent success across different embodiments.

Closing the lid

Cup shelving

Ladle reorganization

Citation

 @article{seo2024legato,
        title={LEGATO: Cross-Embodiment Imitation Using a Grasping Tool},
        author={Seo, Mingyo and Park, H. Andy and Yuan, Shenli and Zhu, Yuke and
          and Sentis, Luis},
        journal={IEEE Robotics and Automation Letters (RA-L)},
        year={2025}
      }