Deep Reinforcement Learning with Dense Reward Shaping and Domain Randomization for Sim-to-Real Transfer in Robotic Push Manipulation

Abekenov, Dalel

Deep Reinforcement Learning with Dense Reward Shaping and Domain Randomization for Sim-to-Real Transfer in Robotic Push Manipulation

dc.contributor.advisor	Shintemirov, Almas
dc.contributor.advisor	Rubagotti, Matteo
dc.contributor.author	Abekenov, Dalel
dc.date.accessioned	2026-06-01T11:40:43Z
dc.date.issued	2026-04-28
dc.description.abstract	Deep reinforcement learning represents a promising approach for teaching robot manipulation skills through simulation training, however the transfer of learned policies to the real robot remains a major challenge. This thesis is focused on investigation of sim-to-real transfer for planar push manipulation using the Soft Actor-Critic policy trained in MuJoCo simulation for a Franka Emika Panda robot. The work is organized as a three-condition ablation study, examining individual effects of reward function design and domain randomization, followed by zero-shot deployment of the candidate policy on a physical hardware. The ablation provides a comparison between sparse reward, dense progress-based reward, and dense reward with domain randomization. Each of these conditions was trained over 2,000,000 timesteps and evaluated over 100 episodes. This result is consistent with prior findings that sparse rewards can support policy learning in combination with efficient off-policy methods and large replay buffers for constrained manipulation tasks [16]: sparse and dense reward functions achieved the same success rate in simulation (79%), with differences only in training convergence speed. Adding domain randomization strategy leads to a drop of 12 percentage points to 67%, however produces the only policy suitable for real robot deployment. Real robot evaluation over 20 episodes demonstrates that policy with domain randomization achieved 80% success rate in straight-line configurations, which exceeds its own simulation result. It confirms that the sim-to-real gap is effectively bridged for aligned push configurations. Overall success rate over all types of configurations is 50%, in which the main limitation is restricted policy’s ability to generalize laterally for diagonal push outside of training distribution. The key finding in this thesis is that success rate in simulation is an unreliable indicator for performance in the real-world experiment. The policy with the lowest success rate in simulation was the only candidate for real robot deployment, and it even showed better performance compared to simulation. This result shows that real robot evaluation is important and should be used along with simulation benchmarks in robotic research. Also, it provides empirical evidence of the trade-off between specialization and generalization inherent in domain randomization for contact-rich manipulation tasks.
dc.identifier.citation	Abekenov, D. (2026). Deep Reinforcement Learning with Dense Reward Shaping and Domain Randomization for Sim-to-Real Transfer in Robotic Push Manipulation. Nazarbayev University School of Engineering and Digital Sciences
dc.identifier.uri	https://nur.nu.edu.kz/handle/123456789/18813
dc.language.iso	en
dc.publisher	Nazarbayev University School of Engineering and Digital Sciences
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/
dc.subject	Reward function design
dc.subject	Domain randomization
dc.subject	Sim-to-real transfer
dc.subject	PQDT_Master
dc.title	Deep Reinforcement Learning with Dense Reward Shaping and Domain Randomization for Sim-to-Real Transfer in Robotic Push Manipulation
dc.type	Master`s thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Dalel_Abekenov_Thesis_Revised (1).pdf
Size:: 6.99 MB
Format:: Adobe Portable Document Format
Description:: Master's thesis

Download

Collections

02. Master's Thesis