Autopilot Strategy Based on Improved DDPG Algorithm 2019-01-5072
Deep Deterministic Policy Gradient (DDPG) is one of the Deep Reinforcement Learning algorithms. Because of the well perform in continuous motion control, DDPG algorithm is applied in the field of self-driving. Regarding the problems of the instability of DDPG algorithm during training and low training efficiency and slow convergence rate. An improved DDPG algorithm based on segmented experience replay is presented. On the basis of the DDPG algorithm, the segmented experience replay select the training experience by the importance according to the training progress to improve the training efficiency and stability of the training model. The algorithm was tested in an open source 3D car racing simulator called TORCS. The simulation results demonstrate the training stability is significantly improved compared with the DDPG algorithm and the DQN algorithm, and the average return is about 46% higher than the DDPG algorithm and about 55% higher than the DQN algorithm.