Deep Reinforcement Learning with Dense Reward Shaping and Domain Randomization for Sim-to-Real Transfer in Robotic Push Manipulation

Abekenov, Dalel

Deep Reinforcement Learning with Dense Reward Shaping and Domain Randomization for Sim-to-Real Transfer in Robotic Push Manipulation

Files

Dalel_Abekenov_Thesis_Revised (1).pdf (6.99 MB)

Date

2026-04-28

Authors

Abekenov, Dalel

Publisher

Nazarbayev University School of Engineering and Digital Sciences

Abstract

Deep reinforcement learning represents a promising approach for teaching robot manipulation skills through simulation training, however the transfer of learned policies to the real robot remains a major challenge. This thesis is focused on investigation of sim-to-real transfer for planar push manipulation using the Soft Actor-Critic policy trained in MuJoCo simulation for a Franka Emika Panda robot. The work is organized as a three-condition ablation study, examining individual effects of reward function design and domain randomization, followed by zero-shot deployment of the candidate policy on a physical hardware. The ablation provides a comparison between sparse reward, dense progress-based reward, and dense reward with domain randomization. Each of these conditions was trained over 2,000,000 timesteps and evaluated over 100 episodes. This result is consistent with prior findings that sparse rewards can support policy learning in combination with efficient off-policy methods and large replay buffers for constrained manipulation tasks [16]: sparse and dense reward functions achieved the same success rate in simulation (79%), with differences only in training convergence speed. Adding domain randomization strategy leads to a drop of 12 percentage points to 67%, however produces the only policy suitable for real robot deployment. Real robot evaluation over 20 episodes demonstrates that policy with domain randomization achieved 80% success rate in straight-line configurations, which exceeds its own simulation result. It confirms that the sim-to-real gap is effectively bridged for aligned push configurations. Overall success rate over all types of configurations is 50%, in which the main limitation is restricted policy’s ability to generalize laterally for diagonal push outside of training distribution. The key finding in this thesis is that success rate in simulation is an unreliable indicator for performance in the real-world experiment. The policy with the lowest success rate in simulation was the only candidate for real robot deployment, and it even showed better performance compared to simulation. This result shows that real robot evaluation is important and should be used along with simulation benchmarks in robotic research. Also, it provides empirical evidence of the trade-off between specialization and generalization inherent in domain randomization for contact-rich manipulation tasks.

Keywords

Reward function design, Domain randomization, Sim-to-real transfer, PQDT_Master

Citation

Abekenov, D. (2026). Deep Reinforcement Learning with Dense Reward Shaping and Domain Randomization for Sim-to-Real Transfer in Robotic Push Manipulation. Nazarbayev University School of Engineering and Digital Sciences

URI

https://nur.nu.edu.kz/handle/123456789/18813

Collections

02. Master's Thesis

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Full item page

Deep Reinforcement Learning with Dense Reward Shaping and Domain Randomization for Sim-to-Real Transfer in Robotic Push Manipulation

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license