When Does Simulated Data Match Real Data?

Stonedahl, Forrest; Rand, William

doi:10.1007/978-4-431-54847-8_19

Forrest Stonedahl¹¹ &
William Rand¹²

Part of the book series: Agent-Based Social Systems ((ABSS,volume 11))

1249 Accesses
5 Citations

Abstract

Agent-based models can be calibrated to replicate real-world data sets, but choosing the best set of parameters to achieve this result can be difficult. To validate a model, the real-world data set is often divided into a training and a test set. The training set is used to calibrate the parameters, and the test set is used to determine if the calibrated model represents the real-world data. The difference between the real-world data and the simulated data is determined using an error measure. When using evolutionary computation to choose the parameters, this error measure becomes the fitness function, and choosing the appropriate measure becomes even more crucial for a successful calibration process. We survey the effect of five different error measures in the context of a toy problem and a real-world problem (simulating online news consumption). We use each error measure in turn to calibrate on the training data set, and then examine the results of all five error measures on both the training and test data sets. For the toy problem, one measure was the Pareto-dominant choice for calibration, but no error measure dominated all the others for the real-world problem. Additionally, we observe the counterintuitive result that calibrating using one measure may sometimes lead to better performance on a second measure than could be achieved by calibrating using that second measure directly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our goal here is not to argue for the superiority of genetic algorithms for model calibration, but to examine the use of different error measures as fitness functions. We expect our findings to generalize to other metaheuristic search algorithms, but this should be confirmed in future work.
2.
Our apologies to J.R.R. Tolkien.

References

Althaus S, Tewksbury D (2000) Patterns of Internet and traditional news media use in a networked community. Polit Commun 17(1):21–45
Article Google Scholar
Bankes S (2002) Agent-based modeling: a revolution? PNAS 99(10):7199–7200
Article Google Scholar
Calvez B, Hutzler G (2005) Automatic tuning of agent-based models using genetic algorithms. In: MABS 2005: proceedings of the 6th international workshop on multi-agent-based simulation
Google Scholar
Conway R, Johnson B, Maxwell W (1959) Some problems of digital systems simulation. Manage Sci 6(1):92–110
Article Google Scholar
Dutta-Bergman M (2006) Community participation and internet use after September 11: complementarity in channel consumption. J Comput Mediat Commun 11(2):469–484
Article Google Scholar
Gilbert N, Troitzsch K (2005) Simulation for the social scientist. Open University Press, New York
Google Scholar
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading
Google Scholar
Hasan AA, Dellarocas C, Lucas HC, Yim D (2010) The impact of the internet and online news on newspapers and voter behavior. Technical report, University of Maryland
Google Scholar
Hassan S, Antunes L, Pavon J, Gilbert N (2008) Stepping on earth: a roadmap for data-driven agent-based modelling. In: Proceedings of the 5th conference of the European social simulation association (ESSA08)
Google Scholar
Hassan S, Pavón J, Antunes L, Gilbert N (2010) Injecting data into agent-based simulation. In: Simulating interacting agents and social phenomena. Springer, New York, pp 177–191
Google Scholar
Holland J (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article Google Scholar
Langville AN, Meyer CD (2005) A survey of eigenvector methods for web information retrieval. SIAM Rev 47:135–161. doi:10.1137/S0036144503424786. URL http://portal.acm.org/citation.cfm?id=1055334.1055396
Ma T, Abdulhai B (2002) Genetic algorithm-based optimization approach and generic tool for calibrating traffic microscopic simulation parameters. Transp Res Rec J Transp Res Board 1800:6–15
Article Google Scholar
Midgley D, Marks R, Kunchamwar D (2007) Building and assurance of agent-based models: an example and challenge to the field. J Bus Res 60(8):884–893
Article Google Scholar
Miller J (1998) Active nonlinear tests (ANTs) of complex simulation models. Manage Sci 44(6):820–830
Article Google Scholar
Narzisi G, Mysore V, Mishra B (2006) Multi-objective evolutionary optimization of agent-based models: an application to emergency response planning. In: Proceedings of the second IASTED international conference on computational intelligence
Google Scholar
North, M, Macal C (2007) Managing business complexity: discovering strategic solutions with agent-based modeling and simulation. Oxford University Press, Oxford
Book Google Scholar
Purcell K, Rainie L, Mitchell A, Rosenstiel T, Olmstead K (2010) Understanding the participatory news consumer. Pew Internet Am Life Proj 1:1–51
Google Scholar
Rand W, Rust R (2011) Agent-based modeling in marketing: guidelines for rigor. Int J Res Mark 28(3):181–193
Article Google Scholar
Rubner Y, Tomasi C, Guibas LJ (1998) A metric for distributions with applications to image databases. In: Sixth international conference on computer vision, 1998. IEEE, pp 59–66
Google Scholar
Stonedahl F, Wilensky U (2010) BehaviorSearch [computer software]. Center for connected learning and computer based modeling, Northwestern University, Evanston. Available online: http://www.behaviorsearch.org/
Stonedahl F, Wilensky U (2010) Evolutionary robustness checking in the artificial anasazi model. In: Proceedings of the 2010 AAAI fall symposium on complex adaptive systems
Google Scholar
Stonedahl F, Rand W, Wilensky U (2010) Evolving viral marketing strategies. In: Proceedings of the 12th annual conference on genetic and evolutionary computation. ACM, New York, pp 1195–1202
Google Scholar
Tewksbury D (2003) What do Americans really want to know? Tracking the behavior of news readers on the internet. J Commun 53(4):694–710
Article Google Scholar
Tewksbury D (2005) The seeds of audience fragmentation: specialization in the use of online news sites. J Broadcast Electronic Media 49(3):332–348
Article Google Scholar
Thorngate W, Edmonds B (2013) Measuring simulation-observation fit: an introduction to ordinal pattern analysis. J Artif Soc Soc Simul 16(2):4. URL http://jasss.soc.surrey.ac.uk/16/2/4.html
Wahle J, Schreckenberg M (2001) A multi-agent system for on-line simulations based on real-world traffic data. In: Proceedings of the 34th annual Hawaii international conference on system sciences, 2001. IEEE, p 9
Google Scholar
Weinberg R (1970) Computer simulation of a living cell. Ph.D. thesis, University of Michigan
Google Scholar
Wilensky U (1999) NetLogo. http://ccl.northwestern.edu/netlogo/

Download references

Acknowledgements

We thank Uri Wilensky for his support for F.S., and Northwestern’s Quest HPCC for providing computational resources for this work. We also acknowledge support from Google under the Google Marketing Research Award.

Author information

Authors and Affiliations

Centre College, Danville, KY, USA
Forrest Stonedahl
University of Maryland, College Park, MD, USA
William Rand

Authors

Forrest Stonedahl
View author publications
You can also search for this author in PubMed Google Scholar
William Rand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Forrest Stonedahl .

Editor information

Editors and Affiliations

Department of Economics, National Chengchi University, Taipei, Taiwan
Shu-Heng Chen
Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan
Takao Terano
Department of Political Science and Economics, Waseda University, Tokyo, Japan
Ryuichi Yamamoto
Department of Economics, Tunghai University, Taichung, Taiwan
Chung-Ching Tai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stonedahl, F., Rand, W. (2014). When Does Simulated Data Match Real Data?. In: Chen, SH., Terano, T., Yamamoto, R., Tai, CC. (eds) Advances in Computational Social Science. Agent-Based Social Systems, vol 11. Springer, Tokyo. https://doi.org/10.1007/978-4-431-54847-8_19

Download citation

DOI: https://doi.org/10.1007/978-4-431-54847-8_19
Published: 18 April 2014
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-54846-1
Online ISBN: 978-4-431-54847-8
eBook Packages: Business and EconomicsEconomics and Finance (R0)

Publish with us

Policies and ethics