Benchmarking Federated Few-shot Learning for Video-based Action Recognition

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Abstract

Few‑shot action recognition aims to train a model to classify actions in videos using only a few examples, known as "shots," per action class. This learning approach is particularly useful but challenging due to the limited availability of labeled video data in practice. Although significant progress has been made in developing few‑shot learners, existing methods still face several limitations. Firstly, current approaches have not sufficiently leveraged the power of 3D feature extractors (e.g., 3D CNNs or Video Transformers), missing out on spatiotemporal dynamics inherent in video data. Secondly, centralized training requires large datasets, which raises privacy concerns and introduces high storage and communication overheads. Thirdly, isolated deployment of models fails to benefit from global prior knowledge derived from diverse real-world action samples. To address these issues, we propose FedFSLAR++, a federated learning framework for few‑shot action recognition that employs 3D feature extractors. Our framework enables collaborative training under federated settings—preserving privacy while optimizing communication and storage. It also allows learning meta-knowledge from a wide array of video data across heterogeneous clients. Within FedFSLAR++, we establish a unified benchmark to systematically compare components such as feature extraction backbones, meta‑learning paradigms, and federated aggregation strategies—an evaluation protocol currently missing in the literature. Notably, we evaluate six 3D CNN and transformer-based models for rapid adaptation during meta-training, propose a hybrid extractor to enhance video representations, and explore multiple meta-learning and federated aggregation schemes. Extensive experiments conducted on four action recognition datasets demonstrate the robustness and superior performance of FedFSLAR++. Our comprehensive study lays a solid foundation for future research in video-based action recognition.

Description

Citation

Tu Nguyen Anh, Aikyn Nartay, Makhanov Nursultan, Abu Assanali, Wong Kok-Seng, Lee Min-Ho. (2024). Benchmarking Federated Few-Shot Learning for Video-Based Action Recognition. IEEE Access. https://doi.org/10.1109/access.2024.3519254

Collections

Endorsement

Review

Supplemented By

Referenced By