Deep Reinforcement Learning with Dense Reward Shaping and Domain Randomization for Sim-to-Real Transfer in Robotic Push Manipulation
| dc.contributor.advisor | Shintemirov, Almas | |
| dc.contributor.advisor | Rubagotti, Matteo | |
| dc.contributor.author | Abekenov, Dalel | |
| dc.date.accessioned | 2026-06-01T11:40:43Z | |
| dc.date.issued | 2026-04-28 | |
| dc.description.abstract | Deep reinforcement learning represents a promising approach for teaching robot manipulation skills through simulation training, however the transfer of learned policies to the real robot remains a major challenge. This thesis is focused on investigation of sim-to-real transfer for planar push manipulation using the Soft Actor-Critic policy trained in MuJoCo simulation for a Franka Emika Panda robot. The work is organized as a three-condition ablation study, examining individual effects of reward function design and domain randomization, followed by zero-shot deployment of the candidate policy on a physical hardware. The ablation provides a comparison between sparse reward, dense progress-based reward, and dense reward with domain randomization. Each of these conditions was trained over 2,000,000 timesteps and evaluated over 100 episodes. This result is consistent with prior findings that sparse rewards can support policy learning in combination with efficient off-policy methods and large replay buffers for constrained manipulation tasks [16]: sparse and dense reward functions achieved the same success rate in simulation (79%), with differences only in training convergence speed. Adding domain randomization strategy leads to a drop of 12 percentage points to 67%, however produces the only policy suitable for real robot deployment. Real robot evaluation over 20 episodes demonstrates that policy with domain randomization achieved 80% success rate in straight-line configurations, which exceeds its own simulation result. It confirms that the sim-to-real gap is effectively bridged for aligned push configurations. Overall success rate over all types of configurations is 50%, in which the main limitation is restricted policy’s ability to generalize laterally for diagonal push outside of training distribution. The key finding in this thesis is that success rate in simulation is an unreliable indicator for performance in the real-world experiment. The policy with the lowest success rate in simulation was the only candidate for real robot deployment, and it even showed better performance compared to simulation. This result shows that real robot evaluation is important and should be used along with simulation benchmarks in robotic research. Also, it provides empirical evidence of the trade-off between specialization and generalization inherent in domain randomization for contact-rich manipulation tasks. | |
| dc.identifier.citation | Abekenov, D. (2026). Deep Reinforcement Learning with Dense Reward Shaping and Domain Randomization for Sim-to-Real Transfer in Robotic Push Manipulation. Nazarbayev University School of Engineering and Digital Sciences | |
| dc.identifier.uri | https://nur.nu.edu.kz/handle/123456789/18813 | |
| dc.language.iso | en | |
| dc.publisher | Nazarbayev University School of Engineering and Digital Sciences | |
| dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | en |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | |
| dc.subject | Reward function design | |
| dc.subject | Domain randomization | |
| dc.subject | Sim-to-real transfer | |
| dc.subject | PQDT_Master | |
| dc.title | Deep Reinforcement Learning with Dense Reward Shaping and Domain Randomization for Sim-to-Real Transfer in Robotic Push Manipulation | |
| dc.type | Master`s thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Dalel_Abekenov_Thesis_Revised (1).pdf
- Size:
- 6.99 MB
- Format:
- Adobe Portable Document Format
- Description:
- Master's thesis