Abstract:
Human activity recognition is an active research topic in the field of computer vision.
The use of human action recognition is increasing day by day for surveillance,
smart house and healthcare purposes. In general, human action recognition can be
recognized from multiple modalities such as appearance (RGB images), depth, optical
flow, body skeleton and etc. The main advantage of the skeleton data over other
modalities is that it is resistant to changes in motion speeds, body scales, camera
viewpoints and backgrounds interference. For that reason, in this paper, I focus on
skeleton-based activity recognition. Starting with ST-GCN, graph convolutional networks
(GCNs), that model the body skeleton data as spatiotemporal graphs, have
reached good achievement in skeleton-based action recognition. However, in existing
GCN based models, authors pay attention only to the XYZ coordinates of the
bones and joints. In my model, apart from XYZ coordinates, information of joint
orientations in terms of quaternions is used. Finally, the CNN model is used as a
late fusion method to combine the results of all models. Extensive experiments on a
state-of-the-art dataset called NTU-RGBD demonstrate that the performance of the
proposed model can compete with state-of-the-art models.