A Framework for Benchmarking Feedback-Based Dynamic Data Collection Methods in Connected Vehicle Networks 2021-01-0184
Supervised, unsupervised and active learning techniques can be used to develop prognostics and diagnostics in connected vehicle networks, where a wide variety of sensors are available for data collection. However, constraints placed by the on-board equipment, vehicle network, time, and human resources limit the amount of sensor data and labels for machine learning. When no prior information about the data distribution or domain knowledge is available, it becomes a challenging task to collect limited and relevant data to train a machine learning model matching the desired performance threshold. To tackle this challenge, techniques such as experimental design, feature selection, and active learning can be applied, and the data collection process can be advanced to a closed-loop system where new data collection decisions are made based on the feedback from collected data. In this paper, an iterative design and evaluation procedure is considered to develop and deploy these feedback-based decision-making algorithms. So far, no simulation-based benchmarking frameworks are available for evaluating and testing the performance of these methods in a connected vehicle network setting that can impose the real-world conditions. In this work, we propose a simulation platform that makes use of static datasets to simulate dynamic data collection process and provide comprehensive evaluations of feedback-based dynamic data collection decision-making algorithms. The platform provide means for stepwise input of desired dynamic data collection decisions and delayed return of corresponding data to mimic the real-world data collection procedure. Compliant experimental design, feature selection, and active learning strategies can be used within the framework to determine the data to be collected at each step. We also provide an implementation example of a wrapper-based feature selection algorithm using greedy search and exploration components along with a random feature selector for comparative analysis.