Abstract:
Recently Wireless Multimedia Sensor Networks (WMSN) is extensively used and
huge amounts of data are generated on daily basis. There are huge processes that
have to be monitored in real-time, so preprocessing and fast analysis of raw data is
required to be done and stored on the edge. Since, edge computation allows the environment
to be decentralized, which makes it highly responsive, low price, scalable, and
secure. WMSN and edge computing are important in areas like healthcare where the
subject has to be monitored and analyzed continuously. In this work, we propose the
healthcare system for monitoring human emotion using speech in realtime (RSER).
Firstly, this project aims to analyze state-of-the are SER approaches with respect
to time and the ability to work on constrained devices. Secondly, the new approach
based on time analysis will be provided. There will be Exploratory data Analysis
on multiple datasets that will be used for training such as the Ryerson Audio-Visual
Database of Emotional Speech and Song (RAVDESS) , Berlin (EMO-DB) and IMEOCAP
datasets. Data based on Vocal tract spectrum features and low-level acoustic
features (Pitch and energy) will be extracted. The data will be trained and evaluated
on Deep Learning and Machine Learning algorithms. Algorithms will be prioritized
by their time, energy, and accuracy metrics. Then, this experiment will be tested
and evaluated on embedded device (Raspberry PI). Finally, modified model based
on algorithm analysis will be tested on 3 Scenarios (Processing on Edge, Processing
Sink, and Streaming).