Generation and Usage of Virtual Data for the Development of Perception Algorithms Using Vision 2016-01-0170
Camera data generated in a 3D virtual environment has been used to train object detection and identification algorithms. 40 common US road traffic signs were used as the objects of interest during the investigation of these methods. Traffic signs were placed randomly alongside the road in front of a camera in a virtual driving environment, after the camera itself was randomly placed along the road at an appropriate height for a camera located on a vehicle’s rear view mirror. In order to best represent the real world, effects such as shadows, occlusions, washout/fade, skew, rotations, reflections, fog, rain, snow and varied illumination were randomly included in the generated data. Images were generated at a rate of approximately one thousand per minute, and the image data was automatically annotated with the true location of each sign within each image, to facilitate supervised learning as well as testing of the trained algorithms. A deep convolutional neural network was built using 8 hidden layers, 1.5 million free parameters, and 250,000 neurons, with unique configurations optimal for traffic sign classification. This network was then trained using the above mentioned dataset. A high cross-validation accuracy of 98% with stable k-fold validation energy was achieved. This network, trained using virtual images, was then tested on real-world images with promising results, and the network was able to consistently classify signs that appear much smaller and farther away than those in the images it was trained on. The algorithm also attempted to classify signs for which it had not been trained, and predictably classified such signs using the most similar label.