Abstract:
Sensors’ data is used in monitoring patient activity during rehabilitation and also can
be extended to controlling rehabilitation devices based on the activity of the person.
Both wearable sensors and extracted skeleton data from the video can be used for that.
As there, exist similarities, a unified solution can be presented, which also focuses on
effectively capturing the spatiotemporal dependencies in the data collected by these
sensors and efficiently classifying human activities. With the increasing complexity
and size of models, there is a growing emphasis on optimizing their efficiency in terms
of memory usage and inference time for real-time usage and mobile computers. There
is an opportunity to develop a novel unified framework that incorporates recent advancements
to enhance speed and memory efficiency, specifically tailored for Human
Activity Recognition (HAR) tasks. In line with this approach, we present GLULA, a
unique architecture for human activity recognition. GLULA combines gated convolutional
networks, branched convolutions, and linear self-attention to achieve efficient
and powerful solutions. Extensive experiments showed its effectiveness both in wearable
sensors’ data and skeleton-based sets. Tests were conducted on five benchmark
IMU datasets: PAMAP2, SKODA, OPPORTUNITY, DAPHNET, and USC-HAD.
Our findings demonstrate that GLULA outperforms recent models in the literature
on the latter four datasets but also exhibits the lowest parameter count among stateof-
the-art models. In HAR for the human skeleton domain, examinations were done
on the NTU RGB+D dataset. While getting comparable results with recent work in
this field, it managed to be smaller and significantly faster.