The project addresses the significant challenges posed by the vast amount of video data generated by Internet of Things (IoT) devices, especially surveillance cameras, by developing an edge-assisted human action recognition system (HAR). Utilizing edge computing and deep learning technologies, including advanced pose estimation and convolutional neural networks (CNNs), the system aims to provide real-time HAR with minimal latency and reduced reliance on cloud resources. Key components include end devices for data capture, a cloud server for model training and management, and a web application for user interaction. This integration sets a new standard in real-time, edge-assisted video analytics by tackling traditional challenges related to latency, scalability, and efficiency. The project not only progresses through stages such as dataset creation, pipeline development, and software architecture design but also demonstrates the practical application and effectiveness of these technologies in enhancing video surveillance systems.