D-REX: Differentiable Real-to-sim-to-real Engine for Learning Dexterous Grasping

Abstract 📝

Simulation provides a cost-effective and flexible platform for data generation and policy learning to develop robotic systems 🤖. However, bridging the gap between simulation and real-world dynamics remains a significant challenge, especially in physical parameter identification 🔍. In this work, we introduce a real-to-sim-to-real framework that leverages the Gaussian Splat representations to build a differentiable engine, enabling object mass identification from real-world visual observations and robot control signals, while enabling manipulation policy learning simultaneously ⚖️.

Through optimizing the mass of the manipulated object, our method automatically builds high-fidelity and physically plausible digital twins 🌟. Additionally, we propose a novel approach to train the force-aware grasping policies from limited data by transferring feasible human demonstrations into simulated robot demonstrations 🎯. Through comprehensive experiments, we demonstrate that our proposed framework achieves accurate and robust performance on mass identification across various object geometries and mass values 📊. Those optimized mass values facilitate force-aware policy learning, achieving superior and high performance on object grasping, reducing the sim-to-real gap effectively ✨.

We present D-REX 🌟, a differentiable real-to-sim-to-real framework that enables 4D photorealistic rendering and physical simulation by identifying object mass from real-world visual observations and robot interaction data 🔍. D-REX reconstructs object geometry using Gaussian Splat representations and leverages a differentiable physics engine for end-to-end mass identification ⚖️. The identified mass is then used to enable force-aware policy learning from human demonstrations, supporting robust grasping and sim-to-real transfer in dexterous manipulation tasks ✨.

Method Overview 🚀

Overview of our method 🚀 Our approach consists of four components: (1) Real-to-Sim 🔄, (2) Learning from Human Demonstrations 👤, (3) Mass Identification ⚖️, and (4) Policy Learning 🧠. We begin by capturing videos of the scene and human demonstrations 📹. Robotic actions are then executed in both simulation and the real world to identify object mass via our differentiable physics engine 🔬. Lastly, a manipulation policy is trained using the demonstrations and identified mass 🎯.

Method Overview 🚀

Overview of our method 🚀 Our approach consists of four components: (1) Real-to-Sim 🔄, (2) Learning from Human Demonstrations 👤, (3) Mass Identification ⚖️, and (4) Policy Learning 🧠. We begin by capturing videos of the scene and human demonstrations 📹. Robotic actions are then executed in both simulation and the real world to identify object mass via our differentiable physics engine 🔬. Lastly, a manipulation policy is trained using the demonstrations and identified mass 🎯.

Interactive 3D Workspace 🌟

Explore our interactive 3D workspace that demonstrates the real-to-sim results. This interactive viewer allows you to examine the Gaussian Splats representations and see how our method reconstructs object geometry from real-world observations.

Mass Identification via Object Pushing

We conduct experiments on mass identification across diverse object geometries and identical geometries with varying densities. Our method accurately estimates mass in both settings, demonstrating robustness to shape and density variations.

Scroll horizontally to explore the full figure →

We conduct experiments on mass identification across diverse object geometries and identical geometries with varying densities. Our method accurately estimates mass in both settings, demonstrating robustness to shape and density variations.

We demonstrate object trajectory rendering using Gaussian Splats with optimized mass parameters. The simulation closely matches real-world dynamics, bridging the sim-to-real gap with high visual fidelity across multiple viewpoints.

Effectiveness of Force-Based Control through Grasping ⚖️

In our grasping experiments 🔬, we evaluate how incorporating force-based constraints conditioned on object mass influences sim-to-real performance 📊. This setup highlights the need for mass-aware force control and demonstrates the impact of accurate mass identification on policy success 🎯✨.

Scroll horizontally to view all grasping experiments →

Cookie Grasping

Cube Grasping

Lightbulb Grasping

Nutella Grasping

Spray Grasping

T Grasping

We evaluate our force-aware grasping policy across various objects with pre-grasp poses 🎯 and two post-grasp positions 📍, demonstrating that the policy achieves stable, secure grasps 🔒.

Our method consistently outperforms baselines across eight objects with diverse geometries and mass properties, demonstrating the robustness of our approach. While baseline performance tends to degrade on heavier objects, our method maintains high success rates with lower variance across all cases. These results show the effectiveness of our force-aware optimization in enabling stable and reliable grasping across a wide range of object characteristics.

Mass Policy Evaluation Results

Training on Light Object, Evaluation on All Masses

Train Light, Eval Light

Train Light, Eval Medium

Train Light, Eval Heavy

Training on Heavy Object, Evaluation on All Masses

Train Heavy, Eval Light

Train Heavy, Eval Medium

Train Heavy, Eval Heavy

Training on Medium Object, Evaluation on All Masses

Train Medium, Eval Light

Train Medium, Eval Medium

Train Medium, Eval Heavy

Effectiveness of Mass-Aware Policy

Policies perform well only when training and evaluation masses matched: the medium-mass policy succeeds on the medium object but fails on the lighter and heavier ones due to under- and over-applied force, respectively. Mass mismatches likewise lead to unstable grasps for the other two policies. The results confirm this trend, with the highest success rate (80%) on the training mass, while performance drops to 40% and 30% on mismatched cases. These results highlight the importance of accurate mass conditioning for robust, reliable grasping.

More Dexterous Tasks 🤖

Our D-REX framework extends beyond simple grasping to more complex dexterous manipulation tasks. These demonstrations showcase the versatility of our mass-aware force control approach in handling various real-world scenarios.

Stapler

Mouse

Refrigerator

Unseen Screwdriver

Large Screwdriver

These dexterous manipulation tasks demonstrate the versatility and robustness of our D-REX framework. By accurately identifying object mass and incorporating force-aware control, our method enables successful execution of complex manipulation tasks that require precise force application and coordination.