DSpace Repository

AN EXPLORATION OF VIDEO TRANSFORMERS FOR FEW-SHOT ACTION RECOGNITION

Система будет остановлена для регулярного обслуживания. Пожалуйста, сохраните рабочие данные и выйдите из системы.

Show simple item record

dc.contributor.author Aikyn, Nartay
dc.date.accessioned 2024-05-24T11:46:16Z
dc.date.available 2024-05-24T11:46:16Z
dc.date.issued 2024-04-29
dc.identifier.citation Aikyn, N. (2024). An Exploration of Video Transformers for Few-Shot Action Recognition (thesis). Nazarbayev University School of Engineering and Digital Sciences en_US
dc.identifier.uri http://nur.nu.edu.kz/handle/123456789/7713
dc.description.abstract Action recognition is an essential task in computer vision with many applications in various fields. However, recognizing actions in videos using few examples, often referred to as Few-Shot Learning (FSL), is a challenging problem due to the high dimensionality and temporal complexity of video data. This work aims to address this problem by proposing a novel meta-learning framework that integrates Video Transformer as the feature backbone. Video Transformer can capture long-range dependencies and model temporal relationships effectively, thus enriching the global representation. Extensive experiments on benchmark datasets demonstrate that our approach achieves remarkable performance, surpassing baseline models and obtaining competitive results compared to state-of-the-art models. Additionally, we investigate the impact of supervised and self-supervised learning on video representation and evaluate the transferability of the learned representations in cross-domain scenarios. Our approach suggests a promising direction for exploring the combination of meta-learning with Video Transformer in the context of few-shot learning tasks, potentially contributing to the field of action recognition in various domains. en_US
dc.language.iso en en_US
dc.publisher Nazarbayev University School of Engineering and Digital Sciences en_US
dc.rights Attribution-NonCommercial-NoDerivs 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/us/ *
dc.subject Type of access: Restricted en_US
dc.subject deep learning en_US
dc.subject human action recognition en_US
dc.subject few-shot learning en_US
dc.subject video transformer en_US
dc.subject self-supervised learning en_US
dc.subject cross-domain experiment en_US
dc.title AN EXPLORATION OF VIDEO TRANSFORMERS FOR FEW-SHOT ACTION RECOGNITION en_US
dc.type Master's thesis en_US
workflow.import.source science


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States