Methodology for Multi-Dimensional Surrogate Data Generation Useful for Practical Machine Learning 2019-01-1053
A practical method for supplementing an existing multi-dimensional data set is described and reviewed. Typical within the blossoming fields of Data Science and Machine Learning, a multi-dimensional dataset has been obtained with a desire to utilize it for development of and verification for a predictive model. A robust, implementable method is presented that is capable of augmenting this original data set with a surrogate or synthetic one that will be shown to have the same four fundamental statistical moments as the original multivariate data set. This method is tested in engineering design problems and shown to be capable in supporting machine learning methods (neural networks and multiple regression) that convert “no solution” problems into useful models with good predictive accuracy. While it is recognized there is no substitute for real data which captures true systemic expectations, in the absence of other data acquisition alternatives, this method is shown to effectively extend datasets for modeling purposes.