3D Scene Reconstruction with Sparse LiDAR Data and Monocular Image in Single Frame 07-11-01-0005
This also appears in
SAE International Journal of Passenger Cars - Electronic and Electrical Systems-V127-7EJ
Real-time reconstruction of 3D environment attributed with semantic information is significant for a variety of applications, such as obstacle detection, traffic scene comprehension and autonomous navigation. The current approaches to achieve it are mainly using stereo vision, Structure from Motion (SfM) or mobile LiDAR sensors. Each of these approaches has its own limitation, stereo vision has high computational cost, SfM needs accurate calibration between a sequences of images, and the onboard LiDAR sensor can only provide sparse points without color information. This paper describes a novel method for traffic scene semantic segmentation by combining sparse LiDAR point cloud (e.g. from Velodyne scans), with monocular color image. The key novelty of the method is the semantic coupling of stereoscopic point cloud with color lattice from camera image labelled through a Convolutional Neural Network (CNN). The presented method comprises three main process: (I) perform semantic segmentation on color image from monocular camera by using CNN, (II) extract ideal surfaces and other structural information from point cloud, (III) improve the image segmentation with the extracts and label the point cloud with the image segments. The whole process is done in a single frame, and the output of the system is labelled point cloud which can be used in construction of semantic object convex and alignment between frames. We demonstrate the effectiveness of our system on the KITTI dataset providing sufficient camera and LiDAR data, and present qualitative and quantitative results indicating the improvements in segmentation comparing to methods merely using either image or LiDAR data.