Abstract:
Computer vision, an essential component of robotics, is an expanding field of research.
While substantial advancements have been made in visual camera technology, conventional
cameras still exhibit limitations, such as motion blur and low dynamic
range, owing to their image acquisition and output format as 2-dimensional arrays.
Event-based imaging is addressing these bottlenecks. Consequently, the utilization
of event-based cameras has been gaining traction in the realm of robotics. These
cameras asynchronously capture each pixel, providing numerous possibilities. Nevertheless,
as a novel technology, many applications remain unexplored, such as utilizing
event cameras for face detection and facial landmarks.
Although there has been a surge of research into face detection using event cameras,
the lack of a comprehensive, annotated dataset of face bounding boxes and facial
landmarks in event streams has impeded progress in this field. This thesis endeavors
to bridge this gap by introducing the pioneering Faces in Event Streams (FES)
dataset, which covers 689 minutes and is specifically designed to detect faces and
facial landmarks for direct event-based camera output.
To showcase the efficacy of the FES dataset, 12 models were developed and trained
to predict bounding box coordinates and facial landmarks with an mAP50 score
exceeding 90%. Furthermore, during the course of the thesis research, efforts were
made to demonstrate real-time face recognition using an event camera with the aid
of one of our pre-trained models. The published dataset and pre-trained models
are publicly available for further study at https://github.com/IS2AI/faces-in-eventstreams.