Finding Diverse Failure Scenarios in Autonomous Systems Using Adaptive Stress Testing 12-02-04-0018
This also appears in
SAE International Journal of Connected and Automated Vehicles-V128-12EJ
Identifying and eliminating failure scenarios is critical in the development of autonomous vehicle (AV) systems. However, finding such failures through real-world vehicle-level testing is a difficult task as system disengagements and accidents are rare occurrences. Simulation approaches have been proposed to supplement vehicle-level testing and reduce the costs associated with operating large fleets of autonomous test vehicles. While one can run more vehicles in simulation than in the real world, applying traditional Monte Carlo sampling techniques to find failures still yields an unguided search and a large waste of computing resources. A more directed method than random sampling is needed to identify failure scenarios in a computationally efficient manner. Adaptive Stress Testing (AST) is a method that uses reinforcement learning (RL) paradigms to efficiently find failure scenarios in stochastic sequential decision-making systems. Through iteratively exploring the action space and collecting rewards, AST aims to establish an optimal policy that generates a set of high-probability failure trajectories. However, the trajectories obtained through AST tend to lack diversity and converge to similar failure states. Due to the range of possible accident scenarios an AV can face, such homogeneous failures are not very beneficial in validating a system’s roadworthiness. In this article, we present a method to enhance the expressiveness of the failure scenarios found using AST. By augmenting the reward function used by AST with domain relevant information, we guide the solver to discover more diverse sets of trajectories. We present an implementation using Monte Carlo Tree Search (MCTS). To show the efficacy of our approach, we evaluate the failure trajectories obtained for a vehicle and pedestrian crosswalk scenario. We show that our implementation is able to find more diverse and domain-relevant failures when compared with baseline AST.