Real-time Image Recognition System Based on an Embedded Heterogeneous Computer and Deep Convolutional Neural Networks for Deployment in Constrained Environments 2019-01-1045
Computer vision represents the idea of giving machines the capacity to make meaning out of images frames, and for decades it consisted mostly of laborious and complex techniques that provided poor performance, which prevented them from making their way into “real-world” applications. With the advent of the Deep Convolutional Neural Networks (DCNN), computer vision systems have reached levels of accuracy that allowed them to gain ground into several industries such as Manufacturing (automated quality inspection and risk surveillance) and Automotive (autonomous driving and driver assistance systems), for example. A major challenge, however, resides in deploying computer vision systems that can perform in real-time in environments (such as driverless cars) that impose a series of constraints in terms of energy supply, weight and space. This technical paper describes how a real-time embedded image recognition system was developed and how the required features were derived and specified in terms of functionality and performance. A minimum rate of 30 frames per second (fps) was identified as a real-time boundary. On top of that, the paper explains how modern System on a Chip (SoC) can support computation-intensive computer vision algorithms through hardware acceleration and heterogeneous computing. The paper also explores the reason why DNCCs were selected as a target technology, the state-of-the-art topologies, their advantages and disadvantages in the context of an embedded application. With the decision for a SoC, a DCNN topology and acceleration technology, the Nvidia’s Jetson TX2 embedded computer was chosen as an evaluation board. It is described how the image recognition pipeline was configured, alongside the modules that have been implemented, modified and reused. The test set-up consisted of a remote camera producing a video input stream and a HDMI monitor for presenting the system’s output. A trained model was used to benchmark the platform in terms of throughput and power consumption, and a series of optimizations were performed to leverage the performance of the inference pipeline. Techniques such as reduced precision and batching were employed in order to obtain successive improvements. The different implementations provide a wide range of achieved throughput, power consumption and energy efficiency. The best performance was achieved at 47,7 fps with a resolution of 1080x720. The several obtained results demonstrate the scalability potential of the system with respect to different configurations, which can be further leveraged in the future by more powerful SoC platforms.
Maycon Douglas da Silva Carvalho, Fabian Koark, Carl Rheinländer, Norbert Wehn
Invensity Inc, University of Kaiserslautern