Abstract:
The state-of-the-art YOLOv4 object detector has already demonstrated its effective
inference (65 frames per second (FPS) on V100 Tesla) and relatively high accuracy on
MSCOCO dataset (mAP 43.5 %) in real-time mode. Moreover, simplicity of the model’s
training and testing appears as another advantage for machine learning community. The
ability of the model to be learned as a unified system on just a single graphic processing
unit (GPU) unsurprisingly established itself as the milestone in the real-time object
detection field. This work aims to review the fundamental and most recent academic
work in the field and suggest the incremental research towards the optimization of the
YOLOv4 architecture. We propose a model, named SAMD-YOLOv4, with modified neck
structure, which reduces number of learning parameters by decreased number of filters
with 1×1 kernel, which is followed by spatial attention module and dilated convolutional
layers. We demonstrate that method is capable to reduce model’s complexity by 7.3%
with no effect on model’s precision as well as lowered inference time by 6.9%. In Chapters
below, we provide experimental results and comparison study on baseline YOLOv4
and our SAMD-YOLOv4. Furthermore, the TensorRT-based inference’s results will be
revealed and studied.