When Does Simulated Data Match Real Data?

Comparing Model Calibration Functions Using Genetic Algorithms
  • Forrest StonedahlEmail author
  • William Rand
Conference paper
Part of the Agent-Based Social Systems book series (ABSS, volume 11)


Agent-based models can be calibrated to replicate real-world data sets, but choosing the best set of parameters to achieve this result can be difficult. To validate a model, the real-world data set is often divided into a training and a test set. The training set is used to calibrate the parameters, and the test set is used to determine if the calibrated model represents the real-world data. The difference between the real-world data and the simulated data is determined using an error measure. When using evolutionary computation to choose the parameters, this error measure becomes the fitness function, and choosing the appropriate measure becomes even more crucial for a successful calibration process. We survey the effect of five different error measures in the context of a toy problem and a real-world problem (simulating online news consumption). We use each error measure in turn to calibrate on the training data set, and then examine the results of all five error measures on both the training and test data sets. For the toy problem, one measure was the Pareto-dominant choice for calibration, but no error measure dominated all the others for the real-world problem. Additionally, we observe the counterintuitive result that calibrating using one measure may sometimes lead to better performance on a second measure than could be achieved by calibrating using that second measure directly.


Agent-based modeling Calibration Genetic algorithms News consumption Web traffic 



We thank Uri Wilensky for his support for F.S., and Northwestern’s Quest HPCC for providing computational resources for this work. We also acknowledge support from Google under the Google Marketing Research Award.


  1. 1.
    Althaus S, Tewksbury D (2000) Patterns of Internet and traditional news media use in a networked community. Polit Commun 17(1):21–45CrossRefGoogle Scholar
  2. 2.
    Bankes S (2002) Agent-based modeling: a revolution? PNAS 99(10):7199–7200CrossRefGoogle Scholar
  3. 3.
    Calvez B, Hutzler G (2005) Automatic tuning of agent-based models using genetic algorithms. In: MABS 2005: proceedings of the 6th international workshop on multi-agent-based simulationGoogle Scholar
  4. 4.
    Conway R, Johnson B, Maxwell W (1959) Some problems of digital systems simulation. Manage Sci 6(1):92–110CrossRefGoogle Scholar
  5. 5.
    Dutta-Bergman M (2006) Community participation and internet use after September 11: complementarity in channel consumption. J Comput Mediat Commun 11(2):469–484CrossRefGoogle Scholar
  6. 6.
    Gilbert N, Troitzsch K (2005) Simulation for the social scientist. Open University Press, New YorkGoogle Scholar
  7. 7.
    Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, ReadingGoogle Scholar
  8. 8.
    Hasan AA, Dellarocas C, Lucas HC, Yim D (2010) The impact of the internet and online news on newspapers and voter behavior. Technical report, University of MarylandGoogle Scholar
  9. 9.
    Hassan S, Antunes L, Pavon J, Gilbert N (2008) Stepping on earth: a roadmap for data-driven agent-based modelling. In: Proceedings of the 5th conference of the European social simulation association (ESSA08)Google Scholar
  10. 10.
    Hassan S, Pavón J, Antunes L, Gilbert N (2010) Injecting data into agent-based simulation. In: Simulating interacting agents and social phenomena. Springer, New York, pp 177–191Google Scholar
  11. 11.
    Holland J (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann ArborGoogle Scholar
  12. 12.
    Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86CrossRefGoogle Scholar
  13. 13.
    Langville AN, Meyer CD (2005) A survey of eigenvector methods for web information retrieval. SIAM Rev 47:135–161. doi:10.1137/S0036144503424786. URL
  14. 14.
    Ma T, Abdulhai B (2002) Genetic algorithm-based optimization approach and generic tool for calibrating traffic microscopic simulation parameters. Transp Res Rec J Transp Res Board 1800:6–15CrossRefGoogle Scholar
  15. 15.
    Midgley D, Marks R, Kunchamwar D (2007) Building and assurance of agent-based models: an example and challenge to the field. J Bus Res 60(8):884–893CrossRefGoogle Scholar
  16. 16.
    Miller J (1998) Active nonlinear tests (ANTs) of complex simulation models. Manage Sci 44(6):820–830CrossRefGoogle Scholar
  17. 17.
    Narzisi G, Mysore V, Mishra B (2006) Multi-objective evolutionary optimization of agent-based models: an application to emergency response planning. In: Proceedings of the second IASTED international conference on computational intelligenceGoogle Scholar
  18. 18.
    North, M, Macal C (2007) Managing business complexity: discovering strategic solutions with agent-based modeling and simulation. Oxford University Press, OxfordCrossRefGoogle Scholar
  19. 19.
    Purcell K, Rainie L, Mitchell A, Rosenstiel T, Olmstead K (2010) Understanding the participatory news consumer. Pew Internet Am Life Proj 1:1–51Google Scholar
  20. 20.
    Rand W, Rust R (2011) Agent-based modeling in marketing: guidelines for rigor. Int J Res Mark 28(3):181–193CrossRefGoogle Scholar
  21. 21.
    Rubner Y, Tomasi C, Guibas LJ (1998) A metric for distributions with applications to image databases. In: Sixth international conference on computer vision, 1998. IEEE, pp 59–66Google Scholar
  22. 22.
    Stonedahl F, Wilensky U (2010) BehaviorSearch [computer software]. Center for connected learning and computer based modeling, Northwestern University, Evanston. Available online:
  23. 23.
    Stonedahl F, Wilensky U (2010) Evolutionary robustness checking in the artificial anasazi model. In: Proceedings of the 2010 AAAI fall symposium on complex adaptive systemsGoogle Scholar
  24. 24.
    Stonedahl F, Rand W, Wilensky U (2010) Evolving viral marketing strategies. In: Proceedings of the 12th annual conference on genetic and evolutionary computation. ACM, New York, pp 1195–1202Google Scholar
  25. 25.
    Tewksbury D (2003) What do Americans really want to know? Tracking the behavior of news readers on the internet. J Commun 53(4):694–710CrossRefGoogle Scholar
  26. 26.
    Tewksbury D (2005) The seeds of audience fragmentation: specialization in the use of online news sites. J Broadcast Electronic Media 49(3):332–348CrossRefGoogle Scholar
  27. 27.
    Thorngate W, Edmonds B (2013) Measuring simulation-observation fit: an introduction to ordinal pattern analysis. J Artif Soc Soc Simul 16(2):4. URL
  28. 28.
    Wahle J, Schreckenberg M (2001) A multi-agent system for on-line simulations based on real-world traffic data. In: Proceedings of the 34th annual Hawaii international conference on system sciences, 2001. IEEE, p 9Google Scholar
  29. 29.
    Weinberg R (1970) Computer simulation of a living cell. Ph.D. thesis, University of MichiganGoogle Scholar
  30. 30.
    Wilensky U (1999) NetLogo.

Copyright information

© Springer Japan 2014

Authors and Affiliations

  1. 1.Centre CollegeDanvilleUSA
  2. 2.University of MarylandCollege ParkUSA

Personalised recommendations