DSpace Repository

EXPLORING DATA DISTRIBUTION AND VALUE FUNCTION APPROXIMATION IMPACTS IN OFFLINE REINFORCEMENT LEARNING(RL): FROM GRIDWORLD ENVIRONMENTS

Show simple item record

dc.contributor.author Tokayev, Kuanysh
dc.date.accessioned 2024-06-03T07:04:29Z
dc.date.available 2024-06-03T07:04:29Z
dc.date.issued 2024-04-23
dc.identifier.citation Tokayev, K. (2024) Exploring Data Distribution and Value Function Approximation Impacts in Offline Reinforcement Learning(RL): From Gridworld Environments. Nazarbayev University School of Engineering and Digital Sciences en_US
dc.identifier.uri http://nur.nu.edu.kz/handle/123456789/7718
dc.description.abstract In the emerging landscape of off-policy reinforcement learning (RL), challenges arise due to the significant costs and risks tied to data collection. To address these issues, there is an alternative path for transitioning RL from off-policy to offline, which is known for its fixed data collection practices. This stands in contrast to online algorithms, which are sensitive to changes in data during the learning phase. However, the inherent challenge of offline RL lies in its limited interaction with the environment, resulting in inadequate data coverage. Hence, we underscore the convenient application of offline RL, 1) starting from the collection and preprocessing of a static dataset from online RL interactions, 2) followed by the training of offline RL models, and 3) culminating with testing in the same environment as the off-policy RL algorithm. Specifically, the dataset collection involves the utilization of a uniform dataset gathered systematically via non-arbitrary action selection, covering all possible states of the environment. Furthermore, we incorporate Q-values into the static dataset, representing the action distribution across the state-action space. This allows the offline RL model to directly update weights by comparing learned model Q-values with collected Q-values. Utilizing the proposed approach, the Offline RL model employing a Multi-Layer Perceptron (MLP) achieves a testing accuracy that falls within 1% of the results obtained by the off-policy RL agent. Additionally, we provide a practical guide with datasets, offering valuable tutorials on the application of Offline RL in a Gridworld-based environment. en_US
dc.language.iso en en_US
dc.publisher Nazarbayev University School of Engineering and Digital Sciences en_US
dc.rights Attribution-NonCommercial 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc/3.0/us/ *
dc.subject Type of access: Restricted en_US
dc.title EXPLORING DATA DISTRIBUTION AND VALUE FUNCTION APPROXIMATION IMPACTS IN OFFLINE REINFORCEMENT LEARNING(RL): FROM GRIDWORLD ENVIRONMENTS en_US
dc.type Master's thesis en_US
workflow.import.source science


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial 3.0 United States