Stonedahl F., Rand W. (2014) When Does Simulated Data Match Real Data?. In: Chen SH., Terano T., Yamamoto R., Tai CC. (eds) Advances in Computational Social Science. Agent-Based Social Systems, vol 11. Springer, Tokyo
Agent-based models can be calibrated to replicate real-world data sets, but choosing the best set of parameters to achieve this result can be difficult. To validate a model, the real-world data set is often divided into a training and a test set. The training set is used to calibrate the parameters, and the test set is used to determine if the calibrated model represents the real-world data. The difference between the real-world data and the simulated data is determined using an error measure. When using evolutionary computation to choose the parameters, this error measure becomes the fitness function, and choosing the appropriate measure becomes even more crucial for a successful calibration process. We survey the effect of five different error measures in the context of a toy problem and a real-world problem (simulating online news consumption). We use each error measure in turn to calibrate on the training data set, and then examine the results of all five error measures on both the training and test data sets. For the toy problem, one measure was the Pareto-dominant choice for calibration, but no error measure dominated all the others for the real-world problem. Additionally, we observe the counterintuitive result that calibrating using one measure may sometimes lead to better performance on a second measure than could be achieved by calibrating using that second measure directly.
Agent-based modeling Calibration Genetic algorithms News consumption Web traffic