Benchmarking Federated Few-shot Learning for Video-based Action Recognition

dc.contributor.authorNguyen Anh Tu
dc.contributor.authorNartay Aikyn
dc.contributor.authorNursultan Makhanov
dc.contributor.authorAssanali Abu
dc.contributor.authorKok‐Seng Wong
dc.contributor.authorMin-Ho Lee
dc.date.accessioned2025-08-26T11:25:32Z
dc.date.available2025-08-26T11:25:32Z
dc.date.issued2024-01-01
dc.description.abstractFew‑shot action recognition aims to train a model to classify actions in videos using only a few examples, known as "shots," per action class. This learning approach is particularly useful but challenging due to the limited availability of labeled video data in practice. Although significant progress has been made in developing few‑shot learners, existing methods still face several limitations. Firstly, current approaches have not sufficiently leveraged the power of 3D feature extractors (e.g., 3D CNNs or Video Transformers), missing out on spatiotemporal dynamics inherent in video data. Secondly, centralized training requires large datasets, which raises privacy concerns and introduces high storage and communication overheads. Thirdly, isolated deployment of models fails to benefit from global prior knowledge derived from diverse real-world action samples. To address these issues, we propose FedFSLAR++, a federated learning framework for few‑shot action recognition that employs 3D feature extractors. Our framework enables collaborative training under federated settings—preserving privacy while optimizing communication and storage. It also allows learning meta-knowledge from a wide array of video data across heterogeneous clients. Within FedFSLAR++, we establish a unified benchmark to systematically compare components such as feature extraction backbones, meta‑learning paradigms, and federated aggregation strategies—an evaluation protocol currently missing in the literature. Notably, we evaluate six 3D CNN and transformer-based models for rapid adaptation during meta-training, propose a hybrid extractor to enhance video representations, and explore multiple meta-learning and federated aggregation schemes. Extensive experiments conducted on four action recognition datasets demonstrate the robustness and superior performance of FedFSLAR++. Our comprehensive study lays a solid foundation for future research in video-based action recognition.en
dc.identifier.citationTu Nguyen Anh, Aikyn Nartay, Makhanov Nursultan, Abu Assanali, Wong Kok-Seng, Lee Min-Ho. (2024). Benchmarking Federated Few-Shot Learning for Video-Based Action Recognition. IEEE Access. https://doi.org/10.1109/access.2024.3519254en
dc.identifier.doi10.1109/access.2024.3519254
dc.identifier.urihttps://doi.org/10.1109/access.2024.3519254
dc.identifier.urihttps://nur.nu.edu.kz/handle/123456789/10272
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.rightsOpen accessen
dc.source(2024)en
dc.subjectBenchmarkingen
dc.subjectComputer scienceen
dc.subjectShot (pellet)en
dc.subjectAction recognitionen
dc.subjectArtificial intelligenceen
dc.subjectPattern recognition (psychology)en
dc.subjectMachine learningen
dc.subjectSpeech recognitionen
dc.subjectClass (philosophy)en
dc.subjectChemistryen
dc.subjectOrganic chemistryen
dc.subjectMarketingen
dc.subjectBusiness; type of access: open accessen
dc.titleBenchmarking Federated Few-shot Learning for Video-based Action Recognitionen
dc.typearticleen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
10.1109_ACCESS.2024.3519254.pdf
Size:
4.16 MB
Format:
Adobe Portable Document Format

Collections