1 Introduction

It has become apparent that video games could be used to assess or develop skills such as communication, collaboration, and digital literacy skills (see, e.g., Sourmelis et al. 2017). Therefore, Petter et al. (2018) argued that employers should consider gaming-related experience and achievements in the hiring process and encourage applicants to share their gaming backgrounds on their résumés or during job interviews. Similar to voluntary work that signals social responsibility and awareness, or sports activities that demonstrate goal orientation and team spirit, an applicant’s history in gaming may indicate a variety of skills and attributes that are professionally valuable (Barber et al. 2017).

In fact, employers are increasingly interested in video games. Robert Half (2017), a US-based human resource (HR) consulting firm, surveyed more than 2500 CIOs about technology-related hobbies and activities that increase graduates’ appeal to technology employers, and 24 percent of the respondents cited video-gaming or game development. In addition, there is a growing interest in video games on the other side of the recruiting table, too, as applicants increasingly inform employers about their gaming experience and expertise to help them land a job. For example, Heather Newman, Director of Marketing and Communications for the School of Information at the University of Michigan, included on her résumé her experience with World of Warcraft, as several tasks involved in the game, such as managing guilds of hundreds of people and organizing large-scale raids, required skills that she believed also applied to the job (Rubenfire 2014; also see Barber et al. 2017). Such anecdotal evidence suggests that video games have something to offer to HR management, which is why companies from various industries have started to use them to identify and attract talent (see Fetzer et al. 2017).

However, while most HR professionals would probably agree that video games may be used for branding and recruiting purposes—for example, America’s Army, a first-person shooter game, is widely considered the most effective of all armed-service recruiting tools (Buday et al. 2012)—many are unsure about their usefulness in personnel selection, as whether and to what extent game-related data can predict future job performance remains unclear (Melchers and Basch 2022). As this paper argues, a strong argument for the usefulness of video games in candidate assessment is that they may indicate intellectual and cognitive abilities (see Weidner and Short 2019), which has not only been confirmed for puzzle and brain-training games (e.g., Buford and O’Leary 2015; Quiroga et al. 2009) but also for several other game genres such as action and casual games (see Quiroga and Colom 2020). HR research has repeatedly demonstrated that general intelligence is one of the best, if not the best, predictors of future job performance (see, e.g., Schmidt et al. 2016), so video games that indicate intelligence may meaningfully support companies’ assessment procedures: In particular, as video games enable “stealth assessments” during which candidates are less aware that they are being monitored and evaluated, game-based assessments could reduce test anxiety, prevent faking, and foster candidate engagement (see, e.g., Boot 2015; Fetzer 2015; Shute et al. 2016; Wu et al. 2022). As Fetzer et al. (2017, p. 297) put it, “candidates may become so immersed in the game that their true behaviors emerge, increasing the accuracy of the assessment, rather than being constrained or changed by social desirability and the propensity of candidates to second-guess their actions during employment assessment.”

Still, the assessment of job candidates based on video games has some limitations, one of which is that a video game can lose its ability to indicate intelligence when players become familiar with the game (see Quiroga et al. 2011). Therefore, companies typically gamify traditional assessment techniques or use self-developed, “serious” games for which applicants are not likely to be able to prepare. But even then, experience with other video games or with gaming hardware may bias the results, since gamers can be expected to have better mouse and keyboard control and better knowledge of the underlying game mechanics than inexperienced applicants do, so a potential issue of using video games for assessments is to ensure fairness in the selection process (see Bina et al. 2021). Accordingly, game-based assessments require the use of video games that are similarly attractive to different groups of applicants (Melchers and Basch 2022), and it is up to research to determine what games and game genres qualify for personnel selection and which ones can be used to demonstrate and develop which skills and abilities (see Petter et al. 2018). The relationship between gaming and intelligence has been studied for only some of the available games and game genres (see Quiroga and Colom 2020), and empirical studies on the criterion-related validity of game-based assessment are rare (see Bina et al. 2021; Melchers and Basch 2022).

Virtual reality (VR) games in particular deserve researchers’ attention (e.g., Sanchez et al. 2022; Weiner and Sanchez 2020). As Aguinis et al. (2001) argued, VR-based assessments may be more valid than some of the traditional assessment techniques, allow for higher levels of standardization and structure, and even enable simulations that were once not feasible, such as handling hazardous tasks or fictitious products. Despite these benefits, high equipment costs and negative side effects, such as cyber sickness, have long challenged the broader adoption of VR hardware (Valmaggia 2017), but now that VR technology has become affordable and has matured, many HR professionals have realized that it has much to offer for personnel selection. For example, Lloyds Banking Group (2021) uses VR to present candidates with situations that would not be feasible in conventional assessments, Accenture assesses graduates’ problem-solving skills in a virtual “Egyptian tomb,” and BDO Global is testing an Alice in Wonderland-themed VR game to assess candidates’ ability to judge cause and effect (Consultancy.uk 2019; also see Wohlgenannt et al. 2020).

Against this background, research is challenged to keep pace with HR practice, as few empirical studies, especially studies that use game-based approaches, have addressed VR technology’s usefulness in personnel selection (e.g., Sanchez et al. 2022; Weiner and Sanchez 2020). VR games differ considerably from conventional video games, as they are much more immersive and realistic and give players the feeling of physical presence in a virtual environment (Weiner and Sanchez 2020), so it remains to be determined whether extant findings from video-game studies are applicable to VR contexts. Accordingly, this proof-of-concept study uses the commercial VR game Job Simulator and the intelligence test BIS-4 and explores the relationship between game results and test results to discuss VR’s applicability to personnel assessment.

The paper proceeds as follows. Section 2 provides a rationale for game-based assessment and reviews studies that have investigated the relationship between gaming and intelligence. Section 3 explains the controlled laboratory study that was used for data collection and analysis, and Sect. 4 presents the results. Section 5 discusses implications and limitations, and Sect. 6 draws conclusions.

2 Background

2.1 Game-based recruitment and assessment

Researchers from various disciplines have repeatedly found that performance in video games correlates with skills that are professionally valuable. For example, tower-defense games like Plants vs. Zombies 2 could be used to assess problem-solving skills (Shute et al. 2016); massively multiplayer online role-playing games like World of Warcraft and EVE Online to assess leadership skills (Lisk et al. 2012); strategy video games like Sid Meier’s Civilization to assess managerial skills like organizing and planning (Simons et al. 2021); learning video games like Physics Playground to assess creativity (Shute and Rahimi 2021); and Xbox Kinect video games like Just Dance or Table Tennis to assess elderly people’s driving skills (Vichitvanichphong et al. 2016). In addition, while researchers have long studied the negative effects of video games, such as addiction or aggression, they have recently turned to possible positive outcomes and provided arguments for video games’ ability not only to indicate but also to develop professional skills. For example, multiplayer games like Rock Band and Halo 4 have been used in team-building activities to improve team cohesion and performance (Keith et al. 2018), and games such as Borderlands 2, Lara Croft and the Guardian of Light, Minecraft, Portal 2, and Warcraft III may foster players’ communication skills, adaptability, and resourcefulness (Barr 2017).

Accordingly, given the ever-growing skill gap that has emerged in practice, researchers have realized that video games may meaningfully inform personnel recruitment. For example, Barber et al. (2017) and Petter et al. (2018) proposed that employers encourage applicants to include their gaming accomplishments and experience in their résumés or to share them during job interviews, as finding digitally skilled candidates has become increasingly difficult. In addition, researchers have also argued that game-based recruitment practices can improve employers’ talent pools because the use of video games or game elements may increase companies’ attractiveness to applicants and foster diversity and engagement (see Bina et al. 2021). Since modern video games are typically played online and since players’ profiles are often public, video game platforms, rankings, and forums may further provide promising new data sources for scouting and recruiting, and employers may even create add-ons or “mods” for video games with which to recruit suitable candidates. For example, WibiData, a former software company, developed a Portal 2 mod with new puzzles, and players who solved all of the puzzles unlocked a special job application (Kuo 2013).

Perhaps the most promising application of video games in HR management is candidate assessment (see, e.g., Bhatia and Ryan 2018; Fetzer et al. 2017; Landers and Sanchez 2022; Melchers and Basch 2022), among others because video games may allow companies to make inferences about candidates’ intellectual abilities (see Weidner and Short 2019). The assessment of intellectual ability plays an important role in personnel selection, as intelligence has long been established as one of the best performance predictors for diverse professions, especially for employees with no experience in the job (Schmidt et al. 2016). However, companies’ reluctance to use intelligence tests has grown (see Krause et al. 2014), since intelligence tests are not popular among applicants (see Hausknecht et al. 2004), so game-based assessment could provide a useful supplement to personnel selection. In particular, as they are entertaining and fun, video games could help organizations to “augment their brand awareness, engage candidates and enhance positive perceptions of the company due to being at the cutting edge of technology, providing competitive advantage in the battle for talent” (Fetzer et al. 2017, p. 297).

2.2 Game-based intelligence assessment

Researchers’ interest in using video games to assess intellectual or cognitive abilities goes back to the 1980s, and the Space Fortress video game (e.g., Rabbitt et al. 1989) was probably the most notable and systematic attempt to study the relationship between gaming and human cognition (Boot 2015). Since then, several studies have provided empirical evidence that video games can indicate intelligence and cognition levels. For example, some of Nintendo’s puzzle and brain-training games, such as Train and Professor Layton, have been used to assess general intelligence (Quiroga et al. 2009, 2016); casual online games such as DigiSwitch and Sushi Go Round to assess working memory, perceptual speed, and fluid intelligence (Baniqued et al. 2013); puzzle platformers such as Portal 2 to assess problem-solving ability, spatial skills, and persistence (Shute et al. 2015); multiplayer online battle arenas like League of Legends to assess fluid intelligence (Kokkinakis et al. 2017); digital board games such as Taboo to assess abstract reasoning, spatial reasoning, and verbal reasoning (Lim and Furnham 2018); and sandbox games such as Minecraft to assess fluid intelligence and spatial ability (Peters et al. 2021). Perhaps most noteworthy, Quiroga et al. (2015) used a variety of video games, most of which were from Nintendo’s Big Brain Academy, to measure individual differences in general intelligence, and Quiroga et al. (2019) provided similar results using several genres of games other than brain-training games. Accordingly, grounded in a comprehensive overview of game-related research studies, Quiroga and Colom (2020, p. 651) recently provided strong arguments for the use of video games to measure intelligence, arguments that could also justify their use for assessment purposes (“Is it time to use video games for measuring intelligence and related cognitive abilities? Yes, it is.”).

However, despite these promising results, the use of video games to assess intellectual ability comes with some challenges, one of which is that their ability to indicate intelligence can decrease with practice. As Quiroga et al. (2011) argued based on a three-stage model of cognitive, associative, and autonomous learning (see, e.g., Ackerman and Cianciolo 2000), individual cognitive differences can explain the variability in game results only in the first of these stages, when one learns how to play the game; after that, perceptual speed (second stage) and psychomotor ability (third stage) come into play. The authors concluded that only for games that are sufficiently novel, complex, and inconsistent can players be kept from going beyond the cognitive stage to using abilities other than cognitive (Quiroga et al. 2011). Therefore, commercial video games cannot usually be used for assessment purposes, and companies typically use self-developed, “serious” games instead. Put simply, selection games should be played only once, since “there is a strong need to avoid contaminating the scores obtained with practice” (Fetzer et al. 2017, p. 305).

Having said that, even if companies use such serious games, previous gaming experience and expertise may still bias the results, so researchers have raised concerns that game-based assessments may favor certain groups of applicants (see, e.g., Bina et al. 2021; Leutner et al. 2021; Weidner and Short 2019). As Fetzer et al. (2017) argued, potential issues in game-based assessment have most commonly been associated with age and gender, and although the number of female and older gamers continues to increase, it is still important to minimize demographic differences in game-based assessment. For example, using a large dataset from the financial industry, Melchers and Basch (2022) recently found that male and younger applicants achieved overall higher scores in a game-based assessment than female and older applicants did—even though the female candidates performed significantly better in an adjunct assessment center. Accordingly, game-based assessments require the use of games and game genres that are similarly attractive to different groups of applicants as well as careful consideration of previous gaming experiences. For example, McPherson and Burns (2008) found that men completed the video game Space Code, a self-developed, computer-game-like test of processing speed, more quickly than women did, but when the authors controlled for current and previous computer gaming, the correlation dropped to almost zero. Similarly, Quiroga et al. (2016) found significant, game-related differences between two studies that used the Professor Layton game to assess intelligence, likely because participants in one study had higher levels of gaming experience than participants in the other, since the samples were otherwise quite homogeneous. Still, the available evidence is partly contradictory when it comes to gaming experiences in assessment contexts, which highlights the need to carefully design game-based assessments (see Fetzer et al. 2017). For example, Foroughi et al. (2016) found no significant differences between experienced and inexperienced gamers in their Portal 2 study of fluid intelligence, probably because they used self-designed puzzles (“chambers”) that did not (or only partly did) require hand-eye coordination and measured the number of puzzles solved rather than the time required to complete them. As the authors acknowledged, a different research design may have resulted in an advantage for experienced gamers unrelated to fluid intelligence, as gamers can be expected to have better mouse and keyboard control and maneuver in the game environment more quickly than non-gamers can (Foroughi et al. 2016).

In summary, to ensure fairness in the selection process, game-based assessment must ensure that previous gaming experience and expertise with both the video game used for assessment and video games in general do not bias the results. As the present study serves as a proof-of-concept, its research design is intended to address these and related challenges. In particular, the study uses a casual VR simulation game for assessment purposes. On the one hand, this type of game genre is intuitive and similarly attractive to different groups of applicants, and on the other, few researchers have investigated whether and how VR games can be used to make inferences about intelligence.

3 Research design

3.1 Participants

We recruited university students for our study, so the participants had similar educational and social backgrounds, which reduced the need for extensive control variables. This pool of recruits was appropriate because the intelligence test we used was developed for relatively young subjects (i.e., teenagers and young adults) with at least an intermediate-school education and because students can be expected to have differing levels of experience with video games. We promoted participation in the study in lectures, via e-mail, and through posters and flyers. To avoid participants’ preparing for the study or being nervous about taking an intelligence test, we told them only that the study would assess VR games’ applicability in HR management and that they would be taking a paper-and-pencil assessment test. We asked only native German speakers to apply to ensure sufficient comprehension of directions and the tests themselves, and we compensated each participant with CHF 30.00. Plus, the participants had the chance to win a drawing for one of three CHF 500.00 vouchers for purchases at a local electronics store.

One hundred twenty students volunteered to participate. As we had to ensure that none of them knew the VR game we used for the study, we excluded eight students who had experiences with VR technology. (We did not disclose this exclusion criterion when we promoted the study to avoid recruiting students who were highly interested in VR and gaming and whose eagerness to participate could lead them to provide false information about their VR experience.) In addition, we excluded two students because the video-clip application we used (Sect. 3.3) did not work for one of them, and the other one spent considerably more time playing the VR game than any other participant, so they were identified as an outlier (Grubbs test: G = 3.68, U = 0.87, p value = 0.01).Footnote 1 Finally, seven students did not show up for their appointments, so our final sample consisted of 103 participants.

Table 1 provides descriptive information about the participants. Their average Age was 23.12 years, with a standard deviation of 2.88 years, and seventy of the participants (67.96%) were men. To account for their gaming experience and expertise, which could have influenced how well they coped with the VR game, we asked them to self-assess their gaming skills using a Likert scale from 1 (“very poor”) to 5 (“excellent”). The participants’ self-assessed Gaming Skills averaged 3.27 (“fair”), with a standard deviation of 0.84.

Table 1 Participants’ descriptive statistics

3.2 Materials and equipment

The VR game was run on a work station that was set up specifically for the study. We used the PC version of the commercially successful game Job Simulator from Owlchemy Labs, which runs on Steam, a distribution platform for video games, and the first-generation HTC Vive VR system, which includes a headset, two wireless controllers, and two base stations that create a 360-degree virtual space with a maximum radius of 15 × 15 feet within which players can move. Job Simulator puts players into a futuristic world in which robots do all of the humans’ jobs but, for nostalgia and entertainment, the humans in the game can visit “job museums” to experience how it must have felt to work for a living. While this game, which can perhaps best be described as a casual simulation game, contains comic and satiric elements and confronts players with some unusual tasks, it also requires them to think out of the box and solve challenging problems, so we deemed it appropriate for our proof-of-concept study. In particular, we selected the Job Simulator game, as there are only a few genres other than simulations that are similarly attractive to female and male players (see Lucas and Sherry 2004), and since the game has been found to highlight “the multitude of ways that the increasing proliferation of virtual reality technology could help to shape future job assessments” (Weidner and Short 2019, p. 155). Job Simulator features three types of simulations, one of which, Gourmet Chef, is a kitchen simulation that we used for data collection. The Gourmet Chef simulation confronts players with seventeen tasks, from preparing a sandwich to participating in a television cooking show, and requires them to complete each task before they can move on to the next. While successful approaches to completing some of the tasks differ, one solution was considered as good as another in the study. (Note that the Internet offers videos that illustrate the gameplay (e.g., YouTube 2016).)Footnote 2

Our study was grounded in the Berlin Intelligence Structure (BIS) model, which makes three assumptions: Intellectual achievement is a function of all intellectual abilities, albeit to different degrees; intellectual abilities can be classified into content-related and operative abilities; and intellectual abilities are hierarchically structured (Jäger et al. 1997). Therefore, the model distinguishes four operative components of intellectual ability—processing capacity, processing speed, memory, and creativity—and three content components of intellectual ability—figural ability, verbal ability, and numerical ability—and puts general intelligence at the top of the hierarchy (Fig. 1).

Fig. 1
figure 1

BIS model (Jäger et al. 1997, p. 5)

The BIS test is a validated instrument for measuring general intelligence and several more specific intellectual abilities, so researchers have used it in various studies (e.g., Beauducel and Kersting 2002; Bucik and Neubauer 1996; Süß and Beauducel 2005; Süß et al. 2002; Weis and Süß 2007). The current version of the test, BIS-4, is a paper-and-pencil test that was developed for subjects aged sixteen to nineteen with intermediate or high levels of education, but it can also be used for older subjects if they have sufficient educational backgrounds (Jäger et al. 1997). The test covers a broad range of intellectual abilities, as its development is grounded in a pool of more than 2000 exercises (Beauducel and Kersting 2002), and it can be customized for research purposes. In particular, its developers have made available a shortened test version, which we deemed appropriate for our proof-of-concept study: While the short version uses fifteen instead of forty-five exercises, it is still a valid instrument for measuring general intelligence and processing capacity, although it does not contain enough exercises to measure the other abilities (Jäger et al. 1997).Footnote 3

3.3 Procedure

We conducted a controlled observational laboratory study. There was no treatment, so the study design was correlational and the same for all participants, and we made individual appointments with the participants, each of which took approximately two and a half hours. At the beginning of these appointments, we informed the participants about the risks in using VR devices (e.g., epilepsy and dizziness) and about their rights during the study (e.g., the right to quit at any time for any reason without forfeiting the CHF 30.00 payment). We also told them that we would be taking video of their VR games (not video of themselves) and asked them to keep the study’s contents confidential until it was completed, so other participants could not prepare. Finally, to provide a performance incentive, we informed them that one of the shopping vouchers would be raffled among the ten participants who completed the VR game the most quickly, and all of them completed a short survey about their experience with video games to add to other information that had already been collected via e-mail.

After receiving this introductory information, the participants were given a written instruction sheet that explained the Job Simulator game and how to use the VR headset and controllers. In particular, the instruction sheet told them to complete the game as quickly as possible, so thoroughness was not required or helpful (since the Job Simulator game does not count points or anything similar). Participants confirmed that they understood these instructions and that they would try to complete the VR game as quickly as possible. We then conducted a series of vision and hearing tests and performed some simple motor tests with the participants, such as asking them to focus on an object or to grab and throw items in the virtual environment. After the participants signaled they were ready, they started the Gourmet Chef simulation.

To avoid the risk of injury, the room provided sufficient space for participants to move about when they played the VR game, and at least one researcher supervised the participants to ensure they did not fall over the cable that connected the VR headset with the desktop computer or collide with a wall. The supervising researcher used a detailed protocol that explained what information could be given to participants who asked for support and contained the exact wording of all audio instructions the participants could hear during the game. Apart from small technical and language problems that could be addressed quickly, all simulations ran smoothly and were consistent.

After completing the VR game, the participants had a short break during which they confirmed they did not feel unwell or dizzy, and then took the intelligence tests. As with the VR simulations, these tests were conducted individually to ensure that participants did not feel pressure from the presence of other participants and that they could ask questions. Two researchers, both of whom the psychologist on the research team had instructed, supervised the participants during the test. The verbal instructions given to the participants followed the test’s guidelines exactly, and time-keeping was strict. The researchers created detailed protocols on which they noted the times required to explain and perform the exercises and any special incidents, although none occurred.

3.4 Measures

The main purpose of our study was to determine whether the time required to complete the VR game correlated with the scores achieved in the intelligence test. Playing Time was measured in minutes and automatically calculated based on two unambiguous audio signals that sounded when the participants started and finished the game. The evaluation of the intelligence test’s results followed the test’s guidelines and its detailed instructions and standardized templates. As the structure of the BIS model is bimodal (Bucik and Neubauer 1996), the BIS-4 test considers that performance on each exercise is influenced not only by general intelligence but also by an operative factor and a content factor (Brunner and Süß 2005), so the test provides a measurement instrument of General Intelligence, three content-related abilities (i.e., Figural Ability, Numerical Ability, and Verbal Ability) and four operative abilities (i.e., Processing Capacity, Processing Speed, Memory, and Creativity). However, as explained, we used the short test version for our study, which should be used only to measure general intelligence and processing capacity; still, we also calculated the number of points earned in those exercises that related to the other abilities, as we were not interested in diagnosing or certifying abilities at the individual level but in exploring and understanding patterns at the group level. While the validity of these measures is unclear, the points achieved in those exercises may still indicate what abilities could be valuable in playing the Job Simulator game and highlight areas for future research. Accordingly, Table 2 uses all the seven intellectual abilities to show the order and type of exercises that are included in the short test version.

Table 2 Mapping of exercises with skills

After the participants completed the test, all pages of the test booklets were disassembled and copied, so the two assessors could evaluate each exercise independently using the evaluation protocols provided by BIS-4. In line with those instructions, one of the researchers evaluated all participants’ test results, and another researcher evaluated 52 randomly selected tests (i.e., more than 50%). The BIS-4 evaluation templates provided unambiguous sample solutions for nearly all exercises, so the assessors’ ratings were almost always consistent, and the failure/disagreement rate was well below the threshold of 1 to 2 percent per exercise. The more complex exercises that measured creativity based on either idea fluency (number of solutions) or idea flexibility (diversity of solutions) were all independently evaluated by both assessors. Because these exercises required creative solutions, the participants’ responses were less consistent than those for the other exercises, so the assessors created protocols to justify their ratings. Still, as they used detailed criteria catalogues and checklists, the level of agreement between the two assessors was also high for the three creativity exercises.Footnote 4 All participants completed the tests, so there were no missing values, and because there were no special incidents, measurement was straightforward.

4 Results

4.1 Descriptive results

Table 3 shows the descriptive results for the VR games and the intelligence tests. Ranging from 24.46 to 46.03 min, Playing Time averaged 33.42 min, with a standard deviation of 4.41 min. The total points achieved in the intelligence tests, which provided our measure of General Intelligence, varied between 1313 and 1648 points, averaged 1484.77 points, and had a standard deviation of 71.52 points. Processing Capacity varied between 507 and 687 points and had a mean (standard deviation) of 586.49 (34.87) points.Footnote 5

Table 3 Descriptive results

As explained, we further calculated the total number of points achieved in the exercises that related to intellectual abilities other than Processing Capacity, although these scores should not be interpreted as ability measures. The means (standard deviations) were as follows: Processing Speed: 310.54 (22.30); Memory: 290.09 (21.59); Creativity: 297.65 (20.89); Verbal Ability: 502.06 (31.05); Numerical Ability: 488.22 (34.41); and Figural Ability: 494.49 (30.51). (The point values for Memory, Creativity, and Processing Speed were generally lower than those for the content-related abilities because they were measured with fewer exercises (Table 2).)

4.2 Correlational results

To explore the associations between our measures, we started with an explorative data analysis that examined how our Playing Time variable correlated with General Intelligence and Processing Capacity and with the number of points that participants achieved in the exercises that were related to the other intellectual abilities. We calculated both Pearson and Spearman correlation coefficients (Table 7), but these coefficients were similar, so we report only the Pearson coefficients r here. Following Kokkinkakis et al.’s (2017) visualization, Table 4 shows the skill variables that were significantly related to the participants’ playing times, as well as histograms and scatterplots that illustrate these variables’ distributions. While the scatterplots suggested weak and negative linear relationships between Playing Time and the other variables, the histograms and a series of Shapiro-Wilk tests indicated that we could assume they were normally distributed (even though the Playing Time and Figural Ability variables were a little skewed).Footnote 6

Table 4 Correlational results

In summary, the correlation analysis showed several moderate to weak correlations. The association between Playing Time and General Intelligence was significantly negative (r = −0.257; p = 0.009), as was the relationship between Playing Time and Processing Capacity (r = −0.281; p = 0.004), so participants who required less time to complete the VR game demonstrated higher levels of general intelligence and processing capacity than those who required more time. In addition, Playing Time correlated with the number of points achieved in exercises that were related to Memory (r = −0.202; p = 0.040), Verbal Ability (r = −0.206; p = 0.037), and Figural Ability (r = −0.344; p = 0.000), but not with exercises related to Processing Speed, Creativity, or Numerical Ability, and we found no significant correlations between Playing Time and any of our control variables, except for Gaming Skills (r = −0.318; p = 0.001). Of the remaining control variables, only Gender correlated significantly with other variables: Age (r = 0.399; p = 0.000), Gaming Skills (r = 0.297; p = 0.002), Verbal Ability (r = −0.199; p = 0.044), and Numerical Ability (r = 0.279; p = 0.004) (Table 7).

4.3 Regression results

To explore further the associations between our main variables—that is, (1) between General Intelligence and Playing Time and (2) between Processing Capacity and Playing Time—we built on studies that have analyzed response times (e.g., van der Linden 2006). As response times are typically skewed with a tail and cannot be below zero, as is the case for our Playing Time variable (Table 4), we used the following log-normal regression models:

$${\mathrm{Playing}\, \mathrm{Time}}_{i}\sim \text{Log Normal}\,(\mu , \sigma )$$
$${\text{Model }}\,1:\mu = \alpha + \beta_{1} \times {\text{General}}\, {\text{Intelligence}}_{i} + \beta_{2} \times {\text{Controls,}}$$
$$\mathrm{Model~ }2:\mu =\alpha +{\beta }_{1}\times {\mathrm{Processing}\, \mathrm{Capacity}}_{i}+{\beta }_{2}\times \mathrm{Controls},$$

where i indices the subjects 1,…,N, and Playing Timei is the dependent variable, which is assumed to be log-normally distributed and described by a mean \(\mu\) (i.e., the linear model) and a standard deviation \(\sigma\). The associations between General Intelligencei and Playing Timei and between Processing Capacityi and Playing Timei are shown by β1, for which we expect a negative sign in both models, as higher ability levels should be associated with faster gameplay. Controls is a vector of variables that includes Age, Gender, and Gaming Skills.

We used Bayesian inference to estimate the statistical models (van der Linden 2006), which offers some advantages over more traditional methods (see Kruschke et al. 2012). For example, a Bayesian approach can incorporate existing knowledge from the literature as prior belief and update it using new data, which makes Bayesian models particularly suitable for small sample sizes. (Although our sample was comparatively large, small sample sizes have been identified as a shortcoming of laboratory video-game research (Unsworth et al. 2015).) For the models’ unknown parameters \(\alpha , {\beta }_{1}\), \({\beta }_{2}\) and \(\sigma\), we used the following, weakly informative priors from the literature (e.g., McElreath 2020): \(\alpha \sim Normal\left(\mu =3, \sigma =1.5\right), {\beta }_{\mathrm{1,2}}\sim Normal\left(\mu =0, \sigma =1\right),\) and \(\sigma \sim Exponential\left(\lambda =1\right)\).

Table 5 shows the regression results for our first model, both without control variables (Model 1a) and with control variables (Model 1b).Footnote 7 The results confirm that General Intelligence and Playing Time are negatively associated (β1 = −0.0005***; 95%-CI [−0.0008, −0.0002]) and remain robust when control variables are added (β1 = −0.0004***; 95%-CI [−0.0007, −0.0001]). (Of the control variables, only the participants’ self-assessed Gaming Skills are significant (\({\beta }_{2}\) = −0.0441***; 95%-CI [−0.0682, −0.0190]).) As we used a log-normal model, these numbers can be interpreted as a percentage change, such that a one-point increase in General Intelligence reduces Playing Time by approximately 0.05 percent, a substantial effect given that the mean of General Intelligence was 1484.77. The effect size becomes evident in a sensitivity analysis, where an increase of 100 test points is associated with a decrease of approximately 1.7 playing minutes.

Table 5 Regression results for Model 1

Table 6 shows the regression results for our second model, both without control variables (Model 2a) and with control variables (Model 2b). The results confirm that Processing Capacity and Playing Time are also negatively associated (β1 = −0.0011***; 95%-CI [−0.0017, −0.0005]) and remain robust when control variables are added (β1 = −0.0009***; 95%-CI [−0.0015, −0.0003]). (Of the control variables, only the participants’ self-assessed Gaming Skills are significant (\({\beta }_{2}\) = −0.0407***; 95%-CI [−0.0671, −0.0145]).) Again, as we used a log-normal model, these numbers can be interpreted as a percentage change, such that a one-point increase in Processing Capacity reduces Playing Time by approximately 0.11 percent (while the mean of Processing Capacity was 586.49), so an increase of 100 points in exercises related to Processing Capacity is associated with a decrease of approximately 3.7 playing minutes.Footnote 8

Table 6 Regression results for Model 2

Finally, as part of our robustness checks, we ran linear regression models both with and without controls, which confirmed our results (Model 1a: β1 = −0.016***; 95%-CI [−0.025, −0.006]; Model 1b: β1 = −0.015**; 95%-CI [−0.024, −0.005]; Model 2a: β1 = −0.036***; 95%-CI [−0.055, −0.016]; Model 2b: β1 = −0.033**; 95%-CI [−0.052, −0.014]). That is, with each additional point that a participant achieved in the intelligence test (in exercises related to Processing Capacity), Playing Time decreased by 0.016 (0.036) minutes, so an increase of 100 test points was associated with a decrease of 1.6 (3.6) playing minutes in the linear models.

5 Discussion

Researchers have proposed the use of VR-based tools to assess cognitive abilities like working memory and attention (Climent et al. 2021), and it has been argued that VR technology may eventually replace traditional intelligence tests in the future, as they offer new ways to analyze data and to simulate highly naturalistic scenarios (Koch et al. 2021). With the advent of consumer-grade VR headsets that are readily available for gaming and entertainment, researchers have also developed an interest in VR games and used them to train high-fidelity memory in older adults (Wais et al. 2021), to determine their impact on cognitive ability compared to 3D games (Wan et al. 2021), and to assess cognitive abilities like visual speed and accuracy, visual pursuit, and space visualization (Weiner and Sanchez 2020). However, compared to the rapidly growing stream of intelligence-related research on regular video games, VR games have received little in the way of research attention, and the available evidence is mixed. For example, Sanchez et al. (2022) used two commercial VR games (Project M and Richie’s Plank Experience) for assessment purposes and could provide only limited evidence for their reliability and validity.

Against this background, the objective of this proof-of-concept study was to (re-)assess whether VR games may be used to draw inferences about intellectual ability and to discuss their applicability for assessment purposes. A correlational analysis suggested that, on average, participants who completed the VR game Job Simulator more quickly than others had higher levels of general intelligence and processing capacity, while a regression analysis largely confirmed these results. In addition, we found that faster participants achieved overall higher scores on exercises that were related to memory, verbal ability, and figural ability, which could indicate that these abilities were helpful in playing the game (although these measures’ validity is questionable, as we used the short version of the BIS-4 intelligence test). While these results are promising, they also have some limitations that suggest topics for further research.

First, although our results are comparable to those of related work in the VR realm (e.g., Weiner and Sanchez 2020), the correlations we found were only moderate to weak, so our results do not confirm that VR games may replace traditional intelligence tests but suggest that they could be used for pre-screening candidates or as supplementary tools in predicting job performance. As other gaming studies have reported higher correlations (see Quiroga and Colom 2020), future research could explore whether the Job Simulator game (or similar games) can be used to assess intellectual ability more reliably by, for example, focusing only on tasks that are cognitively demanding and excluding those that require motor skills or luck (which probably plays a role in Job Simulator, as, for example, some of the items that participants had to search for were located in unusual places). However, researchers who plan to conduct similar studies should consider that we used the Job Simulator game for test purposes and do not recommend the use of commercial games for personnel selection. As explained, video games may lose their ability to indicate intelligence with practice, so HR professionals typically use self-developed, serious games instead, and the Job Simulator game is particularly (or maybe even only) challenging when it is played for the first time (as it is highly repetitive). Still, we believe that our results provide an important step toward clarifying the potential of VR games in assessing intelligence: If a simple, fun game like Job Simulator that was developed for entertainment purposes, and not for assessment purposes, can indicate intelligence and related abilities (albeit only to some degree), surely VR games have much to offer for personnel selection. Compared to the use of regular video games, VR games may amplify the various benefits that game-based stealth assessments offer, such as increased candidate engagement or reduced test anxiety, as they are much more immersive and realistic and can even make players forget about the real world (see Wohlgenannt et al. 2020).

Second, since fairness is a prerequisite in the selection process, researchers have raised concerns that game-based assessments may favor certain groups of applicants (see Weidner and Short 2019), which leads to another limitation of our study. In particular, candidates’ previous gaming experience and expertise may bias the results, so it has been argued that women and older applicants could have a disadvantage (see Fetzer et al. 2017), which has been confirmed by empirical research that used archival data from actual game-based assessments (Melchers and Basch 2022). While the number of female gamers continues to increase (Lopez-Fernandez et al. 2019), a broad survey from the Entertainment Software Association (2020) still suggests that fifty-nine percent of video-game players in the US are male and no older than thirty-four years, and that gamers’ preferences vary across groups (e.g., females most often play casual mobile games, which are typically not part of on-site assessments). While we did not have much age-related variance in our data, and although our female participants tended to rate their own gaming skills lower than our male participants did, our participants’ age or gender did not significantly influence how fast they completed the game—an interesting result that could confirm that the demographics of video gamers have changed during the past years (Fetzer et al. 2017) and/or that the genre we studied is similarly attractive to different groups of applicants (see Lucas and Sherry 2004). On the other hand, our results may also indicate that VR games are more intuitive to use than regular video games are (Zhang 2017), which would provide another argument for their applicability in personnel assessment. Accordingly, future research should explore whether experience with VR technology may be another issue for game-based assessment, as none of our participants had such experience. Aguinis et al. (2001) identified the need for applicants to be familiar with or trained in using VR technology as a potential drawback to its broad adoption in assessment practice, and Sophie Thompson, co-founder and Chief Operating Officer of VirtualSpeech, a UK-based VR education platform, observed that “it’s quite a jump from looking at a computer or phone and observing the digital world, to then becoming an active participant right in the middle of it” (Debusmann 2021).

Third, researchers should use more sophisticated questionnaires and tests to measure previous gaming experience and expertise, as we used only the participants’ self-assessment of their gaming skills in our regression models. In doing so, however, researchers should be aware that such measurement is not straightforward, as other game-related variables that we intended to use for our study turned out to be misleading. For example, we asked our participants how often and for how long they had been playing video games but omitted these measures from the analysis because they were not only unrelated to our main variables but also of questionable validity. Questions like “Approximately how often do you play video games?” and “For how many years have you been playing video games?” seem intuitive but are difficult to answer because, for example, skilled and experienced players who are currently abstaining from video-game play may answer the first question with “very rarely,” and the second question does not appreciate the difference between occasional and intensive video-game play. Against that background, Latham et al. (2013) suggested the use of screening video games or more objective measures like rankings, achievements, and awards, which are readily available on gaming platforms like Steam, to measure participants’ gaming experience and expertise. When using more comprehensive study set-ups and models, researchers should also consider variables other than those we have measured, including temperament, persistence, drive, motor skills, motivation (although the lottery offered in our study was intended to provide a performance incentive) and, in particular, cyber sickness, which has been identified as a major barrier to the broader use of VR technology (Tian et al. 2022). (We simply asked participants if they had felt unwell or dizzy during the game, which none of them confirmed.) Even height may have had an influence in our results: One participant was only 1.62 m (5′4″) tall, so they had to jump several times to reach some items in the VR game. (At the time we conducted our study, we were not aware that Owlchemy Labs had made available a “smaller human mode” for the Job Simulator game.)

Fourth, while our controlled laboratory study was designed to be reproducible for other researchers, future research could explore VR games’ applicability in more realistic, high-stakes selection contexts (e.g., Melchers and Basch 2022). Companies commonly use techniques other than intelligence tests—personality tests, work-sample tests, interviews, reference checks, and many more—to assess several skills and attributes that were outside the scope of this paper. For example, assessment centers are often used to assess management potential in terms of communication, drive, organizing and planning, problem-solving, influence, and awareness (Arthur et al. 2003), and employers are also interested in applicants’ personality traits (see, e.g., Weidner and Short 2019; Wu et al. 2022) and in forms of intelligence other than general intelligence, such as practical, emotional, and social intelligence (Lievens and Chan 2017). Therefore, even though general intelligence has been identified as one of the best predictors of work performance, future research could evaluate VR games’ usefulness in assessing these and related skills and abilities to provide a comprehensive picture of candidates’ aptitude. Such broader assessments may also require researchers to consider alternative approaches to collecting the game data, as we measured only the time required to complete the Job Simulator game. However, efficiency is only one of many ways to measure work performance, and even the Job Simulator game could be used to analyze qualitatively how thoroughly or resourcefully participants complete their tasks and/or to collect other types of data that were outside the scope of this article. For example, in a secondary study that may be presented elsewhere, we further analyzed the video clips that we took from the participants’ games to track how often and how fast they moved their heads during the game. A preliminary analysis of the motion data collected suggests that video clips—as they are typically shared on platforms like YouTube and Twitch—may also be used to draw inferences about intellectual abilities: On average, participants who moved their heads less frequently and faster also achieved better test results than did participants who looked around more frequently and at lower speed.

Fifth, the intelligence test we used may present another limitation. We used the short version of BIS-4 for our study, so we could measure only general intelligence and processing capacity. However, as part of our proof-of-concept study, we also compared the participants’ game results with the number of points they achieved in exercises that were related to abilities other than processing capacity, and though these numbers should not be interpreted as ability measures, they may guide similar studies that seek to clarify what intellectual abilities may be assessed with VR games. In any case, while we used a valid instrument to measure general intelligence and processing capacity, the validity of our other measures is questionable, so future research could build on and extend our results using the full test version. In addition, BIS-4 is an established instrument in the German-speaking research community, but most Anglo-American research on assessment has followed the Cattell-Horn-Carroll theory of cognitive abilities (see, e.g., Schneider and McGrew 2018), so future research could replicate our results using alternative measurement instruments to identify the intellectual abilities that may be assessed with VR games. Researchers have explored the relationships between BIS and other models, and our results suggest that fluid intelligence, which reflects the capacity to solve reasoning problems, may be assessed using VR games, as intellectual abilities like processing capacity and memory are related to fluid intelligence (see Beauducel and Kersting 2002). Finally, when studying the relationship between VR gaming and different facets of intelligence, researchers should also recruit more diverse samples, as our participants were of similar age, had similar backgrounds and no experience with VR technology, and were not randomly selected (but had to apply for our study).

6 Conclusion

Researchers have proposed the use of video games for assessment purposes to create a more pleasant test atmosphere, reduce test anxiety, and increase candidate engagement, among other reasons. VR games in particular deserve researchers’ attention in this regard, as they can simulate highly realistic, work-like environments and confront candidates with situations that were once not feasible. Since intelligence is one of the most widely used predictors of future job performance, our controlled laboratory study used a commercial VR game, an established intelligence test, and a sample of 103 university students to explore the relationship between the participants’ playing times and their intellectual abilities. We found that, on average, participants who completed the VR game more quickly than others also had higher levels of general intelligence and processing capacity and achieved more points on exercises related to memory, verbal ability, and figural ability. While researchers have raised concerns that game-based assessments could favor men and younger applicants, who are supposedly more experienced with video games, our participants’ age and gender did not significantly influence how well they coped with the VR game. However, since participants who assessed their gaming skills as higher tended to complete the game faster than others did, our study reinforces the need to study individual differences in game-based assessment. Furthermore, the correlations we found were not strong but moderate to weak, so our results do not suggest that VR games should replace traditional intelligence tests but may be used to pre-screen candidates or as supplementary tools in predicting job performance.