Techniques to Avoid Pitfalls in Empirical Modeling 1999-01-2045
The development of a mathematical model that adequately captures and describes the interactions among the various system components is critical to the understanding and control of physical, chemical or biological phenomena. This often involves developing a multivariate model that will be used to forecast future events. Once the model has been proposed, it must be validated to check its adequacy in terms of its ability to forecast future events.
However, such empirical models are subject to a number of pitfalls including overfitting, chance correlation, extrapolation, and lack of parsimony. In this paper, we describe the application of techniques to avoid these problems. The techniques described here are stratified data sampling, cross-validation, summed independent variables, and the use constraints to model complexity.
Although most of these techniques can be applied to any type of data model (e.g. linear, polynomial, non-linear, artificial neural network, etc.), we have studied their application for polynomial autoregressive models with exogenous variables (e.g. PARX). By using these techniques we are able to validate parsimonious models with reduced risk of overfitting, extrapolation, or chance correlation. As applied to PARX models we were able to develop higher order polynomials which significantly reduce forecast errors over traditional linear, autoregressive models.