Accuracy in detecting a moving object is critical to autonomous driving or advanced driver assistance systems (ADAS). By including the object classification from multiple sensor detections, the model of the object or environment can be identified more accurately. The critical parameters involved in improving the accuracy are the size and the speed of the moving object. All sensor data are to be used in defining a composite object representation so that it could be used for the class information in the core object’s description. This composite data can then be used by a deep learning network for complete perception fusion in order to solve the detection and tracking of moving objects problem. Camera image data from subsequent frames along the time axis in conjunction with the speed and size of the object will further contribute in developing better recognition algorithms. In this paper, we present preliminary results using only camera images for detecting various objects using deep learning network, as a first step toward multi-sensor fusion algorithm development. The simulation experiments based on camera images show encouraging results where the proposed deep learning network based detection algorithm was able to detect various objects with certain degree of confidence. A laboratory experimental setup is being commissioned where three different types of sensors, a digital camera with 8 megapixel resolution, a LIDAR with 40m range, and ultrasonic distance transducer sensors will be used for multi-sensor fusion to identify the object in real-time.