GBGallery : A benchmark and framework for game testing

Li, Zhuo; Wu, Yuechen; Ma, Lei; Xie, Xiaofei; Chen, Yingfeng; Fan, Changjie

doi:10.1007/s10664-022-10158-x

GBGallery : A benchmark and framework for game testing

Published: 27 July 2022

Volume 27, article number 140, (2022)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Zhuo Li¹,
Yuechen Wu²,
Lei Ma^1,3,4,
Xiaofei Xie ORCID: orcid.org/0000-0002-1288-6502⁵,
Yingfeng Chen² &
…
Changjie Fan²

558 Accesses
3 Citations
Explore all metrics

Abstract

Software bug database and benchmark are the wheels of advancing automated software testing. In practice, real bugs often occur sparsely relative to the amount of software code, the extraction and curation of which are quite labor-intensive but can be essential to facilitate the innovation of testing techniques. Over the past decade, several milestones have been made to construct bug databases, pushing the progress of automated software testing research. However, up to the present, it still lacks a real bug database and benchmark for game software, making current game testing research mostly stagnant. The missing of bug database and framework greatly limits the development of automated game testing techniques. To bridge this gap, we first perform large-scale real bug collection and manual analysis from 5 large commercial games, with a total of more than 250,000 lines of code. Based on this, we propose GBGallery, a game bug database and an extensible framework, to enable automated game testing research. In its initial version, GBGallery contains 76 real bugs from 5 games and incorporates 5 state-of-the-art testing techniques for comparative study as a baseline for further research. With GBGallery, we perform large-scale empirical studies and find that the current automated game testing is still at an early stage, where new testing techniques for game software should be extensively investigated. We make GBGallery publicly available, hoping to facilitate the game testing research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifying gameplay videos that exhibit bugs in computer games

Article 25 June 2019

On the Impact of Software Patching on Gameplay for the League of Legends Computer Game

Article 28 February 2017

Game Engine Comparative Anatomy

Notes

The full game version is not granted due to the permission restriction.

References

Aleem S, Capretz LF, Ahmed F (2016) Critical success factors to improve the game development process from a developer’s perspective. J Comput Sci Technol 31(5):925–950
Article Google Scholar
Amann S, Nadi S, Nguyen HA, Nguyen TN, Mezini M (2016) Mubench: A benchmark for api-misuse detectors. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR), pp 464–467
Amann S, Nguyen HA, Nadi S, Nguyen TN, Mezini M (2018) A systematic evaluation of static api-misuse detectors. IEEE Trans Softw Eng 45(12):1170–1188
Article Google Scholar
Banerjee I, Nguyen B N, Garousi V, Memon A M (2013) Graphical user interface (GUI) testing: Systematic mapping and repository. Information & Software Technology 55(10):1679–1694
Article Google Scholar
Borrelli A, Nardone V, Di Lucca GA, Canfora G, Di Penta M (2020) Detecting video game-specific bad smells in unity projects. Association for Computing Machinery, New York, NY, USA, pp 198–208. https://doi.org/10.1145/3379597.3387454
Google Scholar
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv:1606.01540
Buglog (2015) Video game bug blog. https://airtable.com/universe/expEU1JW4I8ie2zOB/basic-video-game-bug-log https://airtable.com/universe/expEU1JW4I8ie2zOB/basic-video-game-bug-log
Burda Y, Edwards H, Storkey A, Klimov O (2018) Exploration by random network distillation. arXiv:1810.12894
Cadar C, Dunbar D, Engler DR et al (2008) Klee: unassisted and automatic generation of high-coverage tests for complex systems programs. In: OSDI, vol 8, pp 209–224
Dallmeier V, Zimmermann T (2007) Extraction of bug localization benchmarks from history. In: Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, pp 433–436
Do H, Elbaum S, Rothermel G (2005) Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empir Softw Eng 10(4):405–435
Article Google Scholar
Fraser G, Arcuri A (2011) Evosuite: automatic test suite generation for object-oriented software. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pp 416–419
GBgallery (2021) https://sites.google.com/view/gbgallery
Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Horgan D, Quan J, Sendonaris A, Osband I, et al. (2018) Deep q-learning from demonstrations. In: Thirty-second AAAI conference on artificial intelligence
Hill A, Raffin A, Ernestus M, Gleave A, Kanervisto A, Traore R, Dhariwal P, Hesse C, Klimov O, Nichol A, Plappert M, Radford A, Schulman J, Sidor S, Wu Y (2018) Stable baselines. https://github.com/hill-a/stable-baselines
Hutchins M, Foster H, Goradia T, Ostrand T (1994) Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria. In: Proceedings of 16th international conference on software engineering, pp 191–200
Iftikhar S, Iqbal MZ, Khan MU, Mahmood W (2015) An automated model based testing approach for platform games. In: 2015 ACM/IEEE 18th international conference on model driven engineering languages and systems (MODELS). IEEE, pp 426–435
Inozemtseva L, Holmes R (2014) Coverage is not strongly correlated with test suite effectiveness. In: Proceedings of the 36th international conference on software engineering, pp 435–445
Just R, Jalali D, Ernst M D (2014) Defects4j: A database of existing faults to enable controlled testing studies for java programs. In: Proceedings of the 2014 international symposium on software testing and analysis, pp 437–440
Khalid H, Nagappan M, Shihab E, Hassan A E (2014) Prioritizing the devices to test your app on: A case study of android game apps. In: 22nd ACM SIGSOFT international symposium on foundations of software engineering, pp 610–620
Konda V R, Tsitsiklis JN (2000) Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014
Lin D, Bezemer C-P, Hassan AE (2017) Studying the urgent updates of popular games on the steam platform. Empir Softw Eng 22(4):2095–2126
Article Google Scholar
Liu K, Koyuncu A, Bissyandé T F, Kim D, Klein J, Le Traon Y (2019) You cannot fix what you cannot find! an investigation of fault localization bias in benchmarking automated program repair systems. In: 2019 12th IEEE conference on software testing, validation and verification (ICST), pp 102–113
Lovreto G, Endo AT, Nardi P, Durelli V H S (2018) Automated tests for mobile games: An experience report. In: 17th Brazilian symposium on computer games and digital entertainment, SBGames 2018, Foz do Iguaçu, Brazil, October 29 - November 1, 2018, pp 48–56
Madeiral F, Urli S, Maia M, Monperrus M (2019) Bears: An extensible java bug benchmark for automatic program repair studies. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 468–478
Madeiral F, Urli S, Maia M, Monperrus M (2019) Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies. In: Proceedings of the 26th IEEE international conference on software analysis, evolution and reengineering (SANER ’19). arXiv:1901.06024
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529–533
Article Google Scholar
Newzoo (2020) Global games market report 2020. https://newzoo.com/insights/trend-reports/newzoo-global-games-market-report-2020-light-version
Nordin M, King D, Posthuma S (2018) But is it fun? software testing in the video game industry. http://www.es.mdh.se/icst2018/live/
Pacheco C, Ernst MD (2007) Randoop: feedback-directed random testing for java. In: Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion, pp 815–816
Papadakis M, Shin D, Yoo S, Bae D (2018) Are mutation scores correlated with real fault detection? a large scale empirical study on the relationship between mutants and real faults. In: IEEE/ACM 40th Intl Conf on Software Engineering (ICSE), pp 537–548
Pearson S, Campos J, Just R, Fraser G, Abreu R, Ernst M D, Pang D, Keller B (2017) Evaluating and improving fault localization. In: 2017 IEEE/ACM 39th international conference on software engineering (ICSE), pp 609–620
Saha RK, Lyu Y, Lam W, Yoshida H, Prasad MR (2018) Bugs. jar: a large-scale, diverse dataset of real-world java bugs. In: Proceedings of the 15th international conference on mining software repositories, pp 10–13
Shamshiri S, Just R, Rojas J M, Fraser G, McMinn P, Arcuri A (2015) Do automatically generated unit tests find real faults? an empirical study of effectiveness and challenges. In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering. ASE ’15, pp 201–211
Wu Y, Chen Y, Xie X, Yu B, Fan C, Ma L (2020) Regression testing of massively multiplayer online role-playing games. In: 2020 IEEE international conference on software maintenance and evolution (ICSME), pp 692–696
Zheng Y, Xie X, Su T, Ma L, Hao J, Meng Z, Liu Y, Shen R, Chen Y, Fan C (2019) Wuji: Automatic online combat game testing using evolutionary deep reinforcement learning. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE), pp 772–784

Download references

Acknowledgments

This work was supported in part by funding from the Canada First Research Excellence Fund as part of the University of Alberta’s Future Energy Systems research initiative, Canada CIFAR AI Chairs Program, Amii RAP program, the Natural Sciences and Engineering Research Council of Canada (NSERC No.RGPIN-2021-02549, No.RGPAS-2021-00034, No.DGECR-2021-00019), the Ministry of Education, Singapore under its Academic Research Fund Tier 1 (21-SIS-SMU-033), as well as JSPS KAKENHI Grant No.JP20H04168, No.JP21H04877, JST-Mirai Program Grant No.JPMJMI20B8, and JST SPRING Grant.

Author information

Authors and Affiliations

Kyushu University, Fukuoka, Japan
Zhuo Li & Lei Ma
NetEase Fuxi AI Lab, Hangzhou, China
Yuechen Wu, Yingfeng Chen & Changjie Fan
University of Alberta, Edmonton, Canada
Lei Ma
Alberta Machine Intelligence Institute, Edmonton, Canada
Lei Ma
Singapore Management University, Singapore, Singapore
Xiaofei Xie

Authors

Zhuo Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuechen Wu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Xie
View author publications
You can also search for this author in PubMed Google Scholar
Yingfeng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Changjie Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xiaofei Xie or Yingfeng Chen.

Additional information

Communicated by: Shaowei Wang, Tse-Hsun (Peter) Chen, Sebastian Baltes, Ivano Malavolta, Christoph Treude and Alexander Serebrenik

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Collective Knowledge in Software Engineering

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Wu, Y., Ma, L. et al. GBGallery : A benchmark and framework for game testing. Empir Software Eng 27, 140 (2022). https://doi.org/10.1007/s10664-022-10158-x

Download citation

Accepted: 28 February 2022
Published: 27 July 2022
DOI: https://doi.org/10.1007/s10664-022-10158-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GBGallery : A benchmark and framework for game testing

Abstract

Access this article

Similar content being viewed by others

Identifying gameplay videos that exhibit bugs in computer games

On the Impact of Software Patching on Gameplay for the League of Legends Computer Game

Game Engine Comparative Anatomy

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GBGallery : A benchmark and framework for game testing

Abstract

Access this article

Similar content being viewed by others

Identifying gameplay videos that exhibit bugs in computer games

On the Impact of Software Patching on Gameplay for the League of Legends Computer Game

Game Engine Comparative Anatomy

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation