Statistical Characterization, Pattern Identification, and Analysis of Big Data
In the Big Data era, the capability in statistical and probabilistic data characterization, data pattern identification, data modeling and analysis is critical to understand the data, to find the trends in the data, and to make better use of the data. In this paper the fundamental probability concepts and several commonly used probabilistic distribution functions, such as the Weibull for spectrum events and the Pareto for extreme/rare events, are described first. An event quadrant is subsequently established based on the commonality/rarity and impact/effect of the probabilistic events. Level of measurement, which is the key for quantitative measurement of the data, is also discussed based on the framework of probability. The damage density function, which is a measure of the relative damage contribution of each constituent is proposed. The new measure demonstrates its capability in distinguishing between the extreme/rare events and the spectrum events.