When Does Simulated Data Match Real Data?
Agent-based models can be calibrated to replicate real-world data sets, but choosing the best set of parameters to achieve this result can be difficult. To validate a model, the real-world data set is often divided into a training and a test set. The training set is used to calibrate the parameters, and the test set is used to determine if the calibrated model represents the real-world data. The difference between the real-world data and the simulated data is determined using an error measure. When using evolutionary computation to choose the parameters, this error measure becomes the fitness function, and choosing the appropriate measure becomes even more crucial for a successful calibration process. We survey the effect of five different error measures in the context of a toy problem and a real-world problem (simulating online news consumption). We use each error measure in turn to calibrate on the training data set, and then examine the results of all five error measures on both the training and test data sets. For the toy problem, one measure was the Pareto-dominant choice for calibration, but no error measure dominated all the others for the real-world problem. Additionally, we observe the counterintuitive result that calibrating using one measure may sometimes lead to better performance on a second measure than could be achieved by calibrating using that second measure directly.
KeywordsAgent-based modeling Calibration Genetic algorithms News consumption Web traffic
We thank Uri Wilensky for his support for F.S., and Northwestern’s Quest HPCC for providing computational resources for this work. We also acknowledge support from Google under the Google Marketing Research Award.
- 3.Calvez B, Hutzler G (2005) Automatic tuning of agent-based models using genetic algorithms. In: MABS 2005: proceedings of the 6th international workshop on multi-agent-based simulationGoogle Scholar
- 6.Gilbert N, Troitzsch K (2005) Simulation for the social scientist. Open University Press, New YorkGoogle Scholar
- 7.Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, ReadingGoogle Scholar
- 8.Hasan AA, Dellarocas C, Lucas HC, Yim D (2010) The impact of the internet and online news on newspapers and voter behavior. Technical report, University of MarylandGoogle Scholar
- 9.Hassan S, Antunes L, Pavon J, Gilbert N (2008) Stepping on earth: a roadmap for data-driven agent-based modelling. In: Proceedings of the 5th conference of the European social simulation association (ESSA08)Google Scholar
- 10.Hassan S, Pavón J, Antunes L, Gilbert N (2010) Injecting data into agent-based simulation. In: Simulating interacting agents and social phenomena. Springer, New York, pp 177–191Google Scholar
- 11.Holland J (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann ArborGoogle Scholar
- 13.Langville AN, Meyer CD (2005) A survey of eigenvector methods for web information retrieval. SIAM Rev 47:135–161. doi:10.1137/S0036144503424786. URL http://portal.acm.org/citation.cfm?id=1055334.1055396
- 17.Narzisi G, Mysore V, Mishra B (2006) Multi-objective evolutionary optimization of agent-based models: an application to emergency response planning. In: Proceedings of the second IASTED international conference on computational intelligenceGoogle Scholar
- 19.Purcell K, Rainie L, Mitchell A, Rosenstiel T, Olmstead K (2010) Understanding the participatory news consumer. Pew Internet Am Life Proj 1:1–51Google Scholar
- 21.Rubner Y, Tomasi C, Guibas LJ (1998) A metric for distributions with applications to image databases. In: Sixth international conference on computer vision, 1998. IEEE, pp 59–66Google Scholar
- 22.Stonedahl F, Wilensky U (2010) BehaviorSearch [computer software]. Center for connected learning and computer based modeling, Northwestern University, Evanston. Available online: http://www.behaviorsearch.org/
- 23.Stonedahl F, Wilensky U (2010) Evolutionary robustness checking in the artificial anasazi model. In: Proceedings of the 2010 AAAI fall symposium on complex adaptive systemsGoogle Scholar
- 24.Stonedahl F, Rand W, Wilensky U (2010) Evolving viral marketing strategies. In: Proceedings of the 12th annual conference on genetic and evolutionary computation. ACM, New York, pp 1195–1202Google Scholar
- 27.Thorngate W, Edmonds B (2013) Measuring simulation-observation fit: an introduction to ordinal pattern analysis. J Artif Soc Soc Simul 16(2):4. URL http://jasss.soc.surrey.ac.uk/16/2/4.html
- 28.Wahle J, Schreckenberg M (2001) A multi-agent system for on-line simulations based on real-world traffic data. In: Proceedings of the 34th annual Hawaii international conference on system sciences, 2001. IEEE, p 9Google Scholar
- 29.Weinberg R (1970) Computer simulation of a living cell. Ph.D. thesis, University of MichiganGoogle Scholar
- 30.Wilensky U (1999) NetLogo. http://ccl.northwestern.edu/netlogo/