Abstract
Software bug database and benchmark are the wheels of advancing automated software testing. In practice, real bugs often occur sparsely relative to the amount of software code, the extraction and curation of which are quite labor-intensive but can be essential to facilitate the innovation of testing techniques. Over the past decade, several milestones have been made to construct bug databases, pushing the progress of automated software testing research. However, up to the present, it still lacks a real bug database and benchmark for game software, making current game testing research mostly stagnant. The missing of bug database and framework greatly limits the development of automated game testing techniques. To bridge this gap, we first perform large-scale real bug collection and manual analysis from 5 large commercial games, with a total of more than 250,000 lines of code. Based on this, we propose GBGallery, a game bug database and an extensible framework, to enable automated game testing research. In its initial version, GBGallery contains 76 real bugs from 5 games and incorporates 5 state-of-the-art testing techniques for comparative study as a baseline for further research. With GBGallery, we perform large-scale empirical studies and find that the current automated game testing is still at an early stage, where new testing techniques for game software should be extensively investigated. We make GBGallery publicly available, hoping to facilitate the game testing research.
Similar content being viewed by others
Notes
The full game version is not granted due to the permission restriction.
References
Aleem S, Capretz LF, Ahmed F (2016) Critical success factors to improve the game development process from a developer’s perspective. J Comput Sci Technol 31(5):925–950
Amann S, Nadi S, Nguyen HA, Nguyen TN, Mezini M (2016) Mubench: A benchmark for api-misuse detectors. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR), pp 464–467
Amann S, Nguyen HA, Nadi S, Nguyen TN, Mezini M (2018) A systematic evaluation of static api-misuse detectors. IEEE Trans Softw Eng 45(12):1170–1188
Banerjee I, Nguyen B N, Garousi V, Memon A M (2013) Graphical user interface (GUI) testing: Systematic mapping and repository. Information & Software Technology 55(10):1679–1694
Borrelli A, Nardone V, Di Lucca GA, Canfora G, Di Penta M (2020) Detecting video game-specific bad smells in unity projects. Association for Computing Machinery, New York, NY, USA, pp 198–208. https://doi.org/10.1145/3379597.3387454
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv:1606.01540
Buglog (2015) Video game bug blog. https://airtable.com/universe/expEU1JW4I8ie2zOB/basic-video-game-bug-loghttps://airtable.com/universe/expEU1JW4I8ie2zOB/basic-video-game-bug-log
Burda Y, Edwards H, Storkey A, Klimov O (2018) Exploration by random network distillation. arXiv:1810.12894
Cadar C, Dunbar D, Engler DR et al (2008) Klee: unassisted and automatic generation of high-coverage tests for complex systems programs. In: OSDI, vol 8, pp 209–224
Dallmeier V, Zimmermann T (2007) Extraction of bug localization benchmarks from history. In: Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, pp 433–436
Do H, Elbaum S, Rothermel G (2005) Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empir Softw Eng 10(4):405–435
Fraser G, Arcuri A (2011) Evosuite: automatic test suite generation for object-oriented software. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pp 416–419
GBgallery (2021) https://sites.google.com/view/gbgallery
Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Horgan D, Quan J, Sendonaris A, Osband I, et al. (2018) Deep q-learning from demonstrations. In: Thirty-second AAAI conference on artificial intelligence
Hill A, Raffin A, Ernestus M, Gleave A, Kanervisto A, Traore R, Dhariwal P, Hesse C, Klimov O, Nichol A, Plappert M, Radford A, Schulman J, Sidor S, Wu Y (2018) Stable baselines. https://github.com/hill-a/stable-baselines
Hutchins M, Foster H, Goradia T, Ostrand T (1994) Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria. In: Proceedings of 16th international conference on software engineering, pp 191–200
Iftikhar S, Iqbal MZ, Khan MU, Mahmood W (2015) An automated model based testing approach for platform games. In: 2015 ACM/IEEE 18th international conference on model driven engineering languages and systems (MODELS). IEEE, pp 426–435
Inozemtseva L, Holmes R (2014) Coverage is not strongly correlated with test suite effectiveness. In: Proceedings of the 36th international conference on software engineering, pp 435–445
Just R, Jalali D, Ernst M D (2014) Defects4j: A database of existing faults to enable controlled testing studies for java programs. In: Proceedings of the 2014 international symposium on software testing and analysis, pp 437–440
Khalid H, Nagappan M, Shihab E, Hassan A E (2014) Prioritizing the devices to test your app on: A case study of android game apps. In: 22nd ACM SIGSOFT international symposium on foundations of software engineering, pp 610–620
Konda V R, Tsitsiklis JN (2000) Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014
Lin D, Bezemer C-P, Hassan AE (2017) Studying the urgent updates of popular games on the steam platform. Empir Softw Eng 22(4):2095–2126
Liu K, Koyuncu A, Bissyandé T F, Kim D, Klein J, Le Traon Y (2019) You cannot fix what you cannot find! an investigation of fault localization bias in benchmarking automated program repair systems. In: 2019 12th IEEE conference on software testing, validation and verification (ICST), pp 102–113
Lovreto G, Endo AT, Nardi P, Durelli V H S (2018) Automated tests for mobile games: An experience report. In: 17th Brazilian symposium on computer games and digital entertainment, SBGames 2018, Foz do Iguaçu, Brazil, October 29 - November 1, 2018, pp 48–56
Madeiral F, Urli S, Maia M, Monperrus M (2019) Bears: An extensible java bug benchmark for automatic program repair studies. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 468–478
Madeiral F, Urli S, Maia M, Monperrus M (2019) Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies. In: Proceedings of the 26th IEEE international conference on software analysis, evolution and reengineering (SANER ’19). arXiv:1901.06024
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533
Newzoo (2020) Global games market report 2020. https://newzoo.com/insights/trend-reports/newzoo-global-games-market-report-2020-light-version
Nordin M, King D, Posthuma S (2018) But is it fun? software testing in the video game industry. http://www.es.mdh.se/icst2018/live/
Pacheco C, Ernst MD (2007) Randoop: feedback-directed random testing for java. In: Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion, pp 815–816
Papadakis M, Shin D, Yoo S, Bae D (2018) Are mutation scores correlated with real fault detection? a large scale empirical study on the relationship between mutants and real faults. In: IEEE/ACM 40th Intl Conf on Software Engineering (ICSE), pp 537–548
Pearson S, Campos J, Just R, Fraser G, Abreu R, Ernst M D, Pang D, Keller B (2017) Evaluating and improving fault localization. In: 2017 IEEE/ACM 39th international conference on software engineering (ICSE), pp 609–620
Saha RK, Lyu Y, Lam W, Yoshida H, Prasad MR (2018) Bugs. jar: a large-scale, diverse dataset of real-world java bugs. In: Proceedings of the 15th international conference on mining software repositories, pp 10–13
Shamshiri S, Just R, Rojas J M, Fraser G, McMinn P, Arcuri A (2015) Do automatically generated unit tests find real faults? an empirical study of effectiveness and challenges. In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering. ASE ’15, pp 201–211
Wu Y, Chen Y, Xie X, Yu B, Fan C, Ma L (2020) Regression testing of massively multiplayer online role-playing games. In: 2020 IEEE international conference on software maintenance and evolution (ICSME), pp 692–696
Zheng Y, Xie X, Su T, Ma L, Hao J, Meng Z, Liu Y, Shen R, Chen Y, Fan C (2019) Wuji: Automatic online combat game testing using evolutionary deep reinforcement learning. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE), pp 772–784
Acknowledgments
This work was supported in part by funding from the Canada First Research Excellence Fund as part of the University of Alberta’s Future Energy Systems research initiative, Canada CIFAR AI Chairs Program, Amii RAP program, the Natural Sciences and Engineering Research Council of Canada (NSERC No.RGPIN-2021-02549, No.RGPAS-2021-00034, No.DGECR-2021-00019), the Ministry of Education, Singapore under its Academic Research Fund Tier 1 (21-SIS-SMU-033), as well as JSPS KAKENHI Grant No.JP20H04168, No.JP21H04877, JST-Mirai Program Grant No.JPMJMI20B8, and JST SPRING Grant.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by: Shaowei Wang, Tse-Hsun (Peter) Chen, Sebastian Baltes, Ivano Malavolta, Christoph Treude and Alexander Serebrenik
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Collective Knowledge in Software Engineering
Rights and permissions
About this article
Cite this article
Li, Z., Wu, Y., Ma, L. et al. GBGallery : A benchmark and framework for game testing. Empir Software Eng 27, 140 (2022). https://doi.org/10.1007/s10664-022-10158-x
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-022-10158-x