AN EXPLORATION OF VIDEO TRANSFORMERS FOR FEW-SHOT ACTION RECOGNITION

Aikyn, Nartay

AN EXPLORATION OF VIDEO TRANSFORMERS FOR FEW-SHOT ACTION RECOGNITION

Files

Thesis_Nartay_Aikyn.pdf (1.71 MB)

Date

2024-04-29

Authors

Aikyn, Nartay

Publisher

Nazarbayev University School of Engineering and Digital Sciences

Abstract

Action recognition is an essential task in computer vision with many applications in various fields. However, recognizing actions in videos using few examples, often referred to as Few-Shot Learning (FSL), is a challenging problem due to the high dimensionality and temporal complexity of video data. This work aims to address this problem by proposing a novel meta-learning framework that integrates Video Transformer as the feature backbone. Video Transformer can capture long-range dependencies and model temporal relationships effectively, thus enriching the global representation. Extensive experiments on benchmark datasets demonstrate that our approach achieves remarkable performance, surpassing baseline models and obtaining competitive results compared to state-of-the-art models. Additionally, we investigate the impact of supervised and self-supervised learning on video representation and evaluate the transferability of the learned representations in cross-domain scenarios. Our approach suggests a promising direction for exploring the combination of meta-learning with Video Transformer in the context of few-shot learning tasks, potentially contributing to the field of action recognition in various domains.

Keywords

type of access: restricted access, deep learning, human action recognition, few-shot learning, video transformer, self-supervised learning, cross-domain experiment

Citation

Aikyn, N. (2024). An Exploration of Video Transformers for Few-Shot Action Recognition (thesis). Nazarbayev University School of Engineering and Digital Sciences

URI

http://nur.nu.edu.kz/handle/123456789/7713

Collections

02. Master's Thesis

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Full item page

AN EXPLORATION OF VIDEO TRANSFORMERS FOR FEW-SHOT ACTION RECOGNITION

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license