Benchmarking Federated Few-shot Learning for Video-based Action Recognition

Few‑shot action recognition aims to train a model to classify actions in videos using only a few examples, known as "shots," per action class. This learning approach is particularly useful but challenging due to the limited availability of labeled video data in practice. Although significant progress has been made in developing few‑shot learners, existing methods still face several limitations. Firstly, current approaches have not sufficiently leveraged the power of 3D feature extractors (e.g., 3D CNNs or Video Transformers), missing out on spatiotemporal dynamics inherent in video data. Secondly, centralized training requires large datasets, which raises privacy concerns and introduces high storage and communication overheads. Thirdly, isolated deployment of models fails to benefit from global prior knowledge derived from diverse real-world action samples. To address these issues, we propose FedFSLAR++, a federated learning framework for few‑shot action recognition that employs 3D feature extractors. Our framework enables collaborative training under federated settings—preserving privacy while optimizing communication and storage. It also allows learning meta-knowledge from a wide array of video data across heterogeneous clients. Within FedFSLAR++, we establish a unified benchmark to systematically compare components such as feature extraction backbones, meta‑learning paradigms, and federated aggregation strategies—an evaluation protocol currently missing in the literature. Notably, we evaluate six 3D CNN and transformer-based models for rapid adaptation during meta-training, propose a hybrid extractor to enhance video representations, and explore multiple meta-learning and federated aggregation schemes. Extensive experiments conducted on four action recognition datasets demonstrate the robustness and superior performance of FedFSLAR++. Our comprehensive study lays a solid foundation for future research in video-based action recognition.

Keywords

Benchmarking, Computer science, Shot (pellet), Action recognition, Artificial intelligence, Pattern recognition (psychology), Machine learning, Speech recognition, Class (philosophy), Chemistry, Organic chemistry, Marketing, Business; type of access: open access

Citation

Tu Nguyen Anh, Aikyn Nartay, Makhanov Nursultan, Abu Assanali, Wong Kok-Seng, Lee Min-Ho. (2024). Benchmarking Federated Few-Shot Learning for Video-Based Action Recognition. IEEE Access. https://doi.org/10.1109/access.2024.3519254

URI

https://doi.org/10.1109/access.2024.3519254
https://nur.nu.edu.kz/handle/123456789/10272

Collections

Articles

Full item page

Benchmarking Federated Few-shot Learning for Video-based Action Recognition

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By