Deep Reinforcement Learning with Dense Reward Shaping and Domain Randomization for Sim-to-Real Transfer in Robotic Push Manipulation
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Nazarbayev University School of Engineering and Digital Sciences
Abstract
Deep reinforcement learning represents a promising approach for teaching robot manipulation skills through simulation training, however the transfer of learned policies
to the real robot remains a major challenge. This thesis is focused on investigation
of sim-to-real transfer for planar push manipulation using the Soft Actor-Critic policy trained in MuJoCo simulation for a Franka Emika Panda robot. The work is
organized as a three-condition ablation study, examining individual effects of reward
function design and domain randomization, followed by zero-shot deployment of the
candidate policy on a physical hardware.
The ablation provides a comparison between sparse reward, dense progress-based
reward, and dense reward with domain randomization. Each of these conditions
was trained over 2,000,000 timesteps and evaluated over 100 episodes. This result
is consistent with prior findings that sparse rewards can support policy learning in
combination with efficient off-policy methods and large replay buffers for constrained
manipulation tasks [16]: sparse and dense reward functions achieved the same success rate in simulation (79%), with differences only in training convergence speed.
Adding domain randomization strategy leads to a drop of 12 percentage points to
67%, however produces the only policy suitable for real robot deployment.
Real robot evaluation over 20 episodes demonstrates that policy with domain
randomization achieved 80% success rate in straight-line configurations, which exceeds
its own simulation result. It confirms that the sim-to-real gap is effectively bridged
for aligned push configurations. Overall success rate over all types of configurations is
50%, in which the main limitation is restricted policy’s ability to generalize laterally
for diagonal push outside of training distribution.
The key finding in this thesis is that success rate in simulation is an unreliable
indicator for performance in the real-world experiment. The policy with the lowest
success rate in simulation was the only candidate for real robot deployment, and it
even showed better performance compared to simulation. This result shows that real robot evaluation is important and should be used along with simulation benchmarks
in robotic research. Also, it provides empirical evidence of the trade-off between
specialization and generalization inherent in domain randomization for contact-rich
manipulation tasks.
Description
Citation
Abekenov, D. (2026). Deep Reinforcement Learning with Dense Reward Shaping and Domain Randomization for Sim-to-Real Transfer in Robotic Push Manipulation. Nazarbayev University School of Engineering and Digital Sciences
Collections
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States
