POSE2ACT: TRANSFORMER-BASED 3D POSE ESTIMATION AND GRAPH CONVOLUTION NETWORKS FOR HUMAN ACTIVITY RECOGNITION
Loading...
Date
2023
Authors
Aimyshev, Dias
Journal Title
Journal ISSN
Volume Title
Publisher
School of Engineering and Digital Sciences
Abstract
The rise of deep learning has brought significant attention to two tasks in computer
vision: pose estimation and human activity recognition. While human activity recognition
has various applications in IoT systems, pose estimation is critical for motion
tracking and prediction in virtual and augmented realities, robotics, and other fields.
Despite being distinct tasks, they are closely linked, and this study focuses on merging
pose estimation, which generates body joint coordinates, and skeleton-based activity
recognition, which operates on the given joints. The study uses a visual transformer
for 3D pose estimation, viewing joints as spatial features and neighboring frames as
temporal features. Meanwhile, graph convolution networks are used for activity recognition
based on a 3D skeleton, which has produced state-of-the-art results. However,
these outcomes are based on 3D coordinates generated by motion capture systems and
have limitations in their applicability and robustness. To overcome these limitations,
the two models are merged into a single End2End network. The proposed approach
is enhanced by applying various data transformations, modifications, pre-training,
and fine-tuning of different architecture components. The research achieves a 90.3%
activity recognition cross-subject accuracy score on the NTU RGB+D test dataset,
comparable to the state-of-the-art using generated 3D input, and outperforms other
models using 2D input by predicting 3D coordinates in the process.
Description
Keywords
Type of access: Embargo, pose estimation, human activity recognition, 3D skeleton
Citation
Aimyshev, D. (2023). Pose2Act: Transformer-based 3D Pose Estimation and Graph Convolution Networks for Human Activity Recognition. School of Engineering and Digital Sciences