1 Introduction

A significant research interest emerged in investigating the positive influence of video games upon cognitive abilities [25, 26, 28, 43, 55, 61, 69, 81, 87]. In this regard, most of the literature investigated whether video game players (VGPs) outperform non-video game players (NVGPs) or not in terms of contrast sensitivity [52], temporal processing [24], change detection [14, 27], selective visual attention [9, 45], multiple objects apprehending [39], periphery and central vision [40], spatial resolution [41], attention skills [29], cognitive flexibility [17, 38], executive control [80], multiple object tracking, cognitive control [67], encoding speed [86], processing speed, visual short-term memory [62], general speeding of perceptual reaction [30], working memory [11, 18, 82], and even probabilistic inference [42]. Other significant amounts of literature considered video games as a training instrument for the improvement of various cognitive abilities, such as executive control [5], task switching, multiple object tracking [23], backward masking [53], visual skills [1], selective visual attention [70], visualization [54], attention [64], executive functioning [64, 68], cognitive functioning [13, 59], intellectual ability [10], navigation skills [19], physical functioning [59], and even contrast sensitivity [52]. The remaining literature considered video games as a therapeutic tool for cerebral palsy [58], cognitive impairment [65], short-term memory [60], and schizophrenia [20].

To measure the targeted cognitive abilities in the literature mentioned above, various observational [5, 39, 59, 82, 86], paper-pencil-based [5, 42, 59, 82], and computer-aided [1, 5, 9, 14, 18,19,20, 23, 24, 27, 29, 30, 39, 40, 45, 53, 59, 62, 67, 68, 70, 80, 82, 86] tests were administered (see Section 2). Most of these traditional tests are inappropriate due to the following reasons. First and foremost, they can only be administered by a psychologist in a clinical setting, making the cognitive assessment process cost-ineffective due to the cognitive assessment fee, tiresome due to the boring and repetitive nature, and invasive due to the subjective nature for the subjects. Second, they do not consider time as an important factor in estimating their respective cognitive abilities, which makes them incapable of tracking minor changes over frequent assessments [51]. Therefore, the current era needs to build cost-effective, exciting, and plausibly engaging techniques that could non-invasively measure cognitive abilities.

Since video games have now become a popular medium of leisure among all generations [4, 7, 31, 46, 50, 63, 71,72,73, 75, 83], this paper performs construct and concurrent validation of the following hypothesis: H= The game-based intelligent technique can substitute traditional cognitive assessment tests to non-invasively measure cognitive abilities. Accordingly, this article presents LAS, an intelligent technique that utilizes non-invasively collected game analytics to automatically evaluate the cluster of three cognitive abilities (i.e., VLTM, AC, and VSTMiCDP). A major motivation behind the selection of this cluster of cognitive abilities was its importance in human cognition [22, 33, 77, 78]. LAS employs BrainStorm as its first valuable component for the non-invasive collection of game analytics, which include accuracy (i.e., number of correct and incorrect attempts) and efficiency (i.e., time of each correct and incorrect attempt) [2, 3, 34, 35]. BrainStorm, a cross-generational game suite that consists of three brain games (BGs) (i.e., Picture Puzzle, Letter and Number, and Find the Difference) dedicatedly designed in relation to the theoretical operationalization of each particular cognitive ability (i.e., respectively as VLTM, AC, and VSTMiCDP). The second major component of LAS is its kernel which consists of three statistically driven models to interpret the nominative game analytics in a meaningful manner for evaluating the cluster of three targeted cognitive abilities. In the construct validation of LAS, an experimental group-based leave-one-out cross-validation (LOOCV) successfully categorized the targeted cognitive abilities of cross-generation within the 5-point evaluation sphere (i.e., 0–1 = very bad, 1–2 = bad, 2–3 = fair, 3–4 = good, 4–5 = excellent). Furthermore, in concurrent validation, a control group-based assessment was highly correlated with the experimental group-based cognitive evaluation results.

The remaining paper is organized as follows. Section 2 discusses the existing literature. Section 3 simultaneously explains both the components of LAS that include BrainStorm and its statistically driven models for the evaluation of each targeted cognitive ability (i.e., VLTM, AC, and VSTMiCDP). Section 4 states the research methodology including complete detail about the quantitative studies, participants’ recruitment, experimentation, data collection, and analysis. Section 5 demonstrates the results. Section 6 provides conclusive remarks.

2 Literature limitations and gaps

The domain under discussion is rich in terms of literature; however, most of the available literature is based on traditional test studies. In this section, the most relevant literature is incorporated to understand the area and background of the problem. This section is further divided into two paragraphs. The first paragraph encompasses the limitations and impracticality of \traditional test studies, whereas the second paragraph highlights the lack of validation in game-based test studies.

As demonstrated in Table 1, scientists employed multiple observational, paper-pencil-based, and computer-aided tests for the assessment of a single cognitive ability. For example, reading span, paper unfolding, and figure reconstruction tests were administered in a clinical setting to assess visual working memory [82]. Simultaneously, as also demonstrated in Table 1, one traditional test was utilized to assess the different cognitive abilities. For example, the N-back task was administered in a clinical setting to assess processing speed, visual short-term [62], working memory [18, 20], general speeding of perceptual reaction [30], executive control [5], task switching, and multiple object tracking [23]. The key reason for utilizing multiple and overlapping traditional tests to assess a single cognitive ability is the partial and overlapping association between the theoretical operationalization of cognitive abilities and the traditional tests. Due to the inability of a single traditional test to comprehensively assess targeted cognitive ability as well as the infeasibility to administer multiple traditional tests due to their cost-ineffective, tiresome, and invasive nature, a need was to build cost-effective, exciting, and plausibly engaging techniques that could non-invasively measure cognitive abilities.

Table 1 Summary of the cited cognitive tests and their correspondingly assessed various cognitive abilities

The game-based test studies partially follow the footsteps of traditional test studies, where the theoretical operationalization of the targeted cognitive ability is considered as a test bed for the selection of suitable test(s), thus translating the theoretical operationalization of the targeted cognitive ability into their game design and aesthetics to offer a cost-effective, exciting, and plausibly engaging solution that could operate non-invasively [8, 12, 21, 37, 47,48,49, 84]. Like [3], these studies recorded game analytics that primarily includes task completion time, accuracy, and count to project the performance of a subject against a particular cognitive ability. It is worth mentioning that none of the studies proposed a technique to interpret the nominative game analytics comprehensively. Unlike the presented research, these studies are lacking with the construct (i.e., categorizing the targeted cognitive ability within a certain evaluation sphere) as well as concurrent (i.e., reporting a relationship between the experimental and control group-based results to potentially substitute the traditional tests) validation.

3 Las

The proposed intelligent technique, named LAS (see Fig. 1), employs BrainStorm as its first valuable component for the non-invasive collection of game analytics [2, 3, 34, 35]. BrainStorm is a cross-generational game suite that consists of three BGs (i.e., Picture Puzzle, Letter and Number, and Find the Difference) (see Fig. 2) dedicatedly designed in relation to the theoretical operationalization of each cognitive ability (i.e., respectively as VLTM, AC, and VSTMiCDP). The second major component of LAS is its kernel which consists of three statistically driven models (see Fig. 3) to interpret the nominative game analytics in a meaningful manner for evaluating the cluster of three targeted cognitive abilities.

Fig. 1
figure 1

Architectural diagram of LAS

Fig. 2
figure 2

BrainStorm high fidelity prototype. Note: BrainStorm design is already explained [35], and its significant cross-generational acceptability is already evident [2, 3, 34]

Fig. 3
figure 3

Proposed models

Both components of LAS are simultaneously demonstrated and explained for each of the targeted cognitive abilities in the following subsections. Each subsection explains the theoretical operationalization of a particular cognitive ability. Subsequently, it narrates the corresponding game scenario as well as its alignment with the preceding and game analytics. Finally, it utilizes the equations that are comprehensively described in Table 2 to propose a model. For more clarity, a complete formation of each proposed model is also demonstrated in Fig. 3.

Table 2 Equations used in the proposed models

3.1 VLTM

3.1.1 Theoretical operationalization

VLTM is a two-step process that sequentially executes when visual objects are represented to the cognitive system, and both of their outputs may provide the basis on which objects get stored and retrieved. In the first step, signals from the retina are analyzed to extract visual features such as orientation, colors, and so forth. These assists in creating a detailed representation of independent features that are closely related to the physical properties of the visual objects. In the second step, the representation of independent features is integrated into the representation of coherent objects, which leads to the phenomenal experience of a visual scene that is segregated into coherent objects [74, 79]. One of the possible methods to evaluate the performance of a VLTM of an individual is by looking into their adequacy of retrieving the stored information (i.e., accuracy) in relation to retrieval time (i.e., efficiency) against the data received by visual stimuli. Thus, Picture Puzzle BG was employed to non-invasively collect the nominative analytics for VLTM evaluation.

3.1.2 Game scenario

The Picture Puzzle BG has 15 images of renowned places and personalities that appear in a sequence. Each time, the player must select the name correctly from the provided choices (see Fig. 2a). The core mechanism of this BG compels the player’s attention to receive the data from their visual source and pass it to the active memory. Active memory then proceeds and fetches its precise information from declarative long-term memory, established previously in the memorization session.

A clue cards-based memorization session, ranging from 5 to 10 minutes, was organized for each participant, prior to the gaming activity, to learn the names of distinguished places and personalities that could be asked in Picture Puzzle BG; however, it is noticed that on average only 10% of the places and personalities were already known to the participants.

3.1.3 Proposed model

The above-described theoretical operationalization of VLTM in relation to Picture Puzzle analytics is formed by utilizing the equations (see Table 2) into a proposed model as follows (see Fig. 3).

$$ {\boldsymbol{M}}_{\boldsymbol{VLTM}}={R}_{player\_ correct\_ attempt}\times {R}_{time\_ of\_ player\_ correct\_ attempt}\times {c}_1 $$
(9)

c1 = 0.91 is the adjustable constant defined after trials to confine the score of MVLTM in a 5-point evaluation sphere.

3.2 Ac

3.2.1 Theoretical operationalization

AC is a four-step process that sequentially includes visualization, information gathering, articulation, and analysis to solve complex or uncomplicated problems by rational decision-making based on one’s understanding [44]. A basic insight about an individual’s AC can be deduced by looking into any of its sub-domain’s problem-solving accuracy in relation to time (i.e., efficiency). Thus, Letter and Number BG was employed to non-invasively collect the nominative analytics for AC evaluation.

3.2.2 Game scenario

The Letter and Number BG has 10 incomplete series of numbers or letters that appear in a sequence. Each time, the player must identify its pattern to complete the series by selecting the correct letter or number from the provided options (see Fig. 2b). The core mechanism of this BG compels the player to sequentially execute information visualization, articulation, analysis, and decision-making based on their perception.

3.2.3 Proposed model

The above-described theoretical operationalization of AC in relation to Letter and Number analytics is formed by utilizing the equations (see Table 2) into a proposed model as follows (see Fig. 3).

$$ {\boldsymbol{M}}_{\boldsymbol{AC}}={R}_{player\_ correct\_ attempt}\times {R}_{time\_ of\_ player\_ correct\_ attempt}\times {c}_2 $$
(10)

Like c1, c2 = 0.83 is also the adjustable constant defined after trials to confine the score of MAC in a 5-point evaluation sphere.

3.3 VSTMiCDP

3.3.1 Theoretical operationalization

VSTMiCDP refers to the cognitive ability that stores visual information for a few seconds so that it can be used to compare a difference between the memory and the test array. VSTMiCDP plays an important role in maintaining continuity across visual interruptions, such as eye movement and blinking [56]. One of the possible methods to evaluate the performance of VSTMiCDP of an individual is by looking into their visual search accuracy between the two similar images (i.e., for change detection) in relation to time (i.e., efficiency). Thus, Find the Difference BG was employed to non-invasively collect the nominative analytics for VSTMiCDP evaluation.

3.3.2 Game scenario

The Find the Difference BG has 3 different pairs of similar images of renowned places that appear in a sequence. Each time, the player must discover exactly 6 differences between each pair of images, displayed one after another (see Fig. 2c). The core mechanism of this BG compels the player to hold visual information of the sample image (i.e., displayed for a few seconds on one side of the screen) in its active memory to match with the test image (i.e., later displayed on the other side of the screen) for identifying differences between the sample frame and the test frame.

3.3.3 Proposed model

The above-described theoretical operationalization of VSTMiCDP in relation to Find the Difference analytics is formed by utilizing the equations (see Table 2) into a proposed model as follows (see Fig. 3).

$$ {\boldsymbol{M}}_{\boldsymbol{VSTMiCDP}}={R}_{player\_ correct\_ attempt}\times {R}_{time\_ of\_ player\_ correct\_ attempt}\times {R}_{player\_ incorrect\_ attempt}\times {c}_3 $$
(11)

Unlike MVLTM and MAC, the addition of Rplayer _ incorrect _ attempt in MVSTMiCDP is due to the variableness of Nplayer _ incorrect _ attempt in Find the Difference BG. This can be explained as in the first two proposed models the value of Nplayer _ total _ attempt remains constant (i.e., 15 and 10, respectively). Thus, with the knowledge of Nplayer _ correct _ attempt, Nplayer _ incorrect _ attempt was deducible. Conversely in MVSTMiCDP, Nplayer _ incorrect _ attempt is a variable that also makes Nplayer _ total _ attempt a variable entity. Thus, to fully monitor the performance, it is equally important to explicitly consider Nplayer _ incorrect _ attempt along with Nplayer _ correct _ attempt.

Moreover, like c1 and c2, c3 = 0.65 is also the adjustable constant defined after trials to confine the score of MVSTMiCDP in a 5-point evaluation sphere.

4 Research methodology

4.1 Research design

Organized empirical research consists of three independent three-fold quantitative studies (QSs: QS1, QS2, and QS3). Every QS was undertaken by a distinct group of participants who simultaneously acted as a member of an experimental group (i.e., in the first and second fold) as well as a control group (i.e., in the third fold). In the first fold, each participant went through the memorization phase for Picture Puzzle BG. Later in the second fold, each participant played all three BGs of the BrainStorm game suite in a single-player gaming mode. Finally, in the third fold, traditional tests were administered to each participant to measure the cluster of their three targeted cognitive abilities. These tests include the names-faces subtest of the memory assessment test (MAT) [85], numerical sequences subtest of the differential aptitude test (DAT) [6], and spot the difference in cognitive decline (SDCD) [66] respectively for VLTM, AC, and VSTMiCDP.

4.2 Participants recruitment

To dynamically examine the potential applicability of LAS, participants of three different age groups were recruited (see Table 3). For instance, 23 children (13 male and 10 female of age 10 to 15 years) were recruited for QS1, 20 younger adults (14 male and 6 female of age 22 to 27 years) for QS2, and 20 older adults (7 male and 13 female of age 40 to 45 years) for QS3 on a volunteer basis. The recruitment process for volunteers as participants was carefully carried out based on the criteria of their sufficient gaming experience (i.e., habitual gameplay once a week or more). A reason behind the recruitment of volunteers with sufficient gaming experience was to avoid the learning curve and its consequences on gaming performance during QSs. An agreement on non-invasive game analytics collection was signed by the parents of the children, as well as by the younger and older adults themselves at the time of recruitment.

Table 3 Details of participants

4.3 Experimental and control settings

Every QS intervention was 1 day for 8 hours, which includes experimental and control settings. Initially, in the experimental setup, computer-aided BGs playing multi-parallel activities were carried out under the observation of psychologists in a quiet room. On average, each participant took 20 minutes to end their one-time gaming activity. The 19.5-inch touch screens were utilized for the BGs play to allow each participant to interact with the BGs with better visibility. However, the touch screens were horizontally fixed on the table to ease the participants like tablets. Later, in the control setup, three traditional tests were administered side by side by the psychologists in a quiet room. On average, each participant took 35 minutes to complete their one-time traditional tests assessment. The first traditional test, MAT, requires the participant to learn the names of individuals who are portrayed in photographs. Following the delayed recall session of learning trials, the participant is presented with photographs and is asked to recognize the correct name from a brief list of alternatives. The second traditional test, DAT, requires the participants to identify the number sequence and solve the problem by identifying missing numbers from the sequence. The third traditional test, SDCD, requires the participant to memorize the details of the first picture for 30 seconds, after which the first picture is taken away and the second picture is shown. The participant is then asked to identify as many differences as possible between the first and second pictures, which were presented sequentially. To prevent bias risk and the potential influence of any unknown variable in the QSs, the participants who had completed the activity were not allowed to interact with the participants who were waiting to start the activity.

4.4 Data collection and analysis

The game analytics of each participant are noninvasively recorded and compiled separately for each BG activity to train and test the statistically driven models during the experimental group-based cross-generational evaluation. Similarly, the control group-based assessment results of each participant are compiled separately for their three targeted cognitive abilities.

Statistical analysis is performed on the experimental and control group-based results compiled for each targeted cognitive ability to determine trends within the QSs. During the initial phase of the statistical analysis, the mean, standard deviation, and standard error are calculated. While in the second phase, the Pearson correlation [15] is applied over the experimental and the control group-based results of each targeted cognitive ability in every QS to determine their degree of correlation, where the correlation coefficient r indicates the direction and the effect size of the correlation. According to [15, 16], the effect size is low if the value of r varies between ±0.1 to ±0.3, medium if it varies between ±0.3 to ±0.5, and large if it varies between ±0.5 to ±1.0. Furthermore, the p-value is calculated in this phase to demonstrate the significance of the findings [15]. It is well established that correlation does not imply causation, yet this method has been used by the vast range of literature that also includes a research study named StudentLife [76]. It is almost impossible in a real-world situation to discover the element(s) that has a causal relationship with the other element, as there always exist unknown factor(s) that alter(s) the causality between the related elements. Hence, the motivation behind employing the correlation technique, is not to discover a causal relationship but to comprehend the influence of an element in relation to the other(s) while conceding the influence is not causal.

5 Results

A complete summary of the experimental group-based evaluation results of each QS is demonstrated in Table 4 and Fig. 4a-c. The column charts in Fig. 4a-c demonstrate the potential of the proposed models to successfully categorize the cluster of three targeted cognitive abilities of the cross-generational participants within a 5-point evaluation sphere. Whereas the line charts in Fig. 4a-c demonstrate the potential of the proposed models to successfully estimate the mean level of the targeted cognitive abilities of three different age groups. This delineates the estimated mean level of younger adults’ targeted cognitive abilities as higher than children (i.e., VLTM : 16.2 % , AC : 6.2 % , and VSTMiCDP : 10.6%) as well as older adults (i.e., VLTM: 10.8 % , AC : 5.4 % , and VSTMiCDP : 2.2%). Furthermore, to estimate the significance of an experimental group-based evaluation in relation to a control group-based assessment, the results of the statistical analysis are given in Table 4.

Table 4 Evaluation summary
Fig. 4
figure 4

Evaluation details

The experimental group-based evaluation results of QS1 illustrate the estimated mean level of VLTM (\( \mu =2.62,\sigma =0.91, and\ {\sigma}_{\overline{x}}=0.19 \)) (i.e., above 60.9% and below 39.1% of the children’s evaluation score), AC (\( \mu =2.72,\sigma =0.79, and\ {\sigma}_{\overline{x}}=0.17 \)) (i.e., above 47.8% and below 52.2% of the children’s evaluation score), and VSTMiCDP (\( \mu =1.86,\sigma =1.19, and\ {\sigma}_{\overline{x}}=0.25 \)) (i.e., above 60.9% and below 39.1% of the children’s evaluation score) of the children (see Table 4 and Fig. 4a). This specifies that the estimated mean level of children’s VLTM and AC fall within the fair region of the 5-point evaluation sphere, whereas their VSTMiCDP falls at the upper bad region of the 5-point evaluation sphere. Statistical analysis is performed to estimate the significance of an experimental group-based cognitive evaluation in relation to a control group-based assessment. This revealed a significant correlation, with a large effect size, between the experimental and control group-based results of VLTM (r = 0.81, p < 0.001), AC (r = 0.76, p < 0.001), and VSTMiCDP (r = 0.84, p < 0.001).

The experimental group-based evaluation results of QS2 illustrate the estimated mean level of VLTM (\( \mu =3.43,\sigma =0.71, and\ {\sigma}_{\overline{x}}=0.16 \)) (i.e., above 45.0% and below 55.0% of the younger adults’ evaluation score), AC (\( \mu =3.03,\sigma =0.78, and\ {\sigma}_{\overline{x}}=0.18 \)) (i.e., 50.0% above as well as below the younger adults’ evaluation score), and VSTMiCDP (\( \mu =2.39,\sigma =1.35, and\ {\sigma}_{\overline{x}}=0.30 \)) (i.e., above 65.0% and below 35.0% of the younger adults’ evaluation score) of the younger adults (see Table 4 and Fig. 4b). This specifies that the estimated mean level of younger adults’ VLTM and AC fall within the good region of the 5-point evaluation sphere, whereas their VSTMiCDP falls at the middle fair region of the 5-point evaluation sphere. Statistical analysis is performed to estimate the significance of an experimental group-based cognitive evaluation in relation to a control group-based assessment. This revealed a significant correlation, with a large effect size, between the experimental and control group-based results of VLTM (r = 0.78, p < 0.001), AC (r = 0.72, p < 0.001), and VSTMiCDP (r = 0.86, p < 0.001).

The experimental group-based evaluation results of QS3 illustrate the estimated mean level of VLTM (\( \mu =2.89,\sigma =0.95, and\ {\sigma}_{\overline{x}}=0.21 \)) (i.e., above 60.0% and below 40.0% of the older adults’ evaluation score), AC (\( \mu =2.76,\sigma =0.89, and\ {\sigma}_{\overline{x}}=0.20 \)) (i.e., 50.0% above as well as below the older adults’ evaluation score), and VSTMiCDP (\( \mu =2.28,\sigma =1.24, and\ {\sigma}_{\overline{x}}=0.28 \)) (i.e., above 65.0% and below 35.0% of the older adults’ evaluation score) of the older adults (see Table 4 and Fig. 4c). This specifies that the estimated mean level of older adults’ targeted cognitive abilities falls within the fair region of the 5-point evaluation sphere. Statistical analysis is performed to estimate the significance of an experimental group-based cognitive evaluation in relation to a control group-based assessment. This revealed a significant correlation, with a large effect size, between the experimental and control group-based results of VLTM (r = 0.85, p < 0.001), AC (r = 0.78, p < 0.001), and VSTMiCDP (r = 0.71, p < 0.001).

6 Conclusion

The feasibility of substituting traditional cognitive assessment with a game-based approach was previously fuzzy. In this regard, this paper introduced LAS, an intelligent technique that utilizes non-invasively collected game analytics to automatically evaluate the cluster of three cognitive abilities (i.e., VLTM, AC, and VSTMiCDP). LOOCV is employed to demonstrate the construct validity of LAS, which successfully categorizes the targeted cognitive abilities within a 5-point evaluation sphere. Furthermore, the correlation technique is employed to demonstrate the concurrent validity of LAS, which reports a significant relationship between the experimental and control group-based results to potentially substitute the appointed administered tests.

It is observed that the estimated mean level of children’s VSTMiCDP falls at the upper bad region of the 5-point evaluation sphere. This finding was alarming until the estimated mean level of younger and older adults’ VSTMiCDP was reported within the fair region of the 5-point evaluation sphere. Therefore, it is assumed that in contrast to the other targeted cognitive abilities, VSTMiCDP takes relatively longer to excel among most children, and by the time a child transforms into a younger adult this ability gets naturally improved. However, another independent investigation can be conducted to further investigate the validity of this assumption.

The design principle of various commercially available BGs solutions such as Elevate [32], Lumosity [57], and Fit Brains [36] is somewhat similar to the theoretical operationalization approach of BrainStorm; however, unlike the demonstrated potential of LAS to independently evaluate the cluster of cognitive abilities, these software only provide a training facility to periodically improve and monitor the performance of the weaker brain aspects such as memory, brevity, speed, problem-solving, attention, and even flexibility. Therefore, the above-mentioned limitation in commercially available BGs solutions makes LAS the first-of-its-kind technique.