A pilot study on the evaluation of cognitive abilities’ cluster through game-based intelligent technique

Numerous scientific studies measured cognitive abilities through administered tests. Most of these traditional tests are inappropriate for several reasons, such as cost-ineffectiveness, tiresomeness, and invasiveness. The need is to build cost-effective, exciting, and plausibly engaging techniques that could noninvasively measure cognitive abilities. This paper presents LAS, an intelligent technique that utilizes non-invasively collected game analytics to automatically evaluate the cluster of three cognitive abilities (i.e., Visual Long-term Memory (VLTM), Analytical Capability (AC), and Visual Short-term Memory in Change Detection Paradigm (VSTMiCDP)). The experimental group-based cross-generational cognitive evaluation in the game-based scenario established the potential of the proposed technique is twofold: 1) It successfully categorizes cluster of three targeted cognitive abilities within the 5-point evaluation sphere (i.e., 0–1 = very bad, 1–2 = bad, 2–3 = fair, 3–4 = good, 4–5 = excellent), and 2) It highly correlates with the results of a control group that are measured with three traditional cognitive tests.

To measure the targeted cognitive abilities in the literature mentioned above, various observational [5,39,59,82,86], paper-pencil-based [5,42,59,82], and computer-aided [1, 5, 9, 14, 18-20, 23, 24, 27, 29, 30, 39, 40, 45, 53, 59, 62, 67, 68, 70, 80, 82, 86] tests were administered (see Section 2).Most of these traditional tests are inappropriate due to the following reasons.First and foremost, they can only be administered by a psychologist in a clinical setting, making the cognitive assessment process cost-ineffective due to the cognitive assessment fee, tiresome due to the boring and repetitive nature, and invasive due to the subjective nature for the subjects.Second, they do not consider time as an important factor in estimating their respective cognitive abilities, which makes them incapable of tracking minor changes over frequent assessments [51].Therefore, the current era needs to build costeffective, exciting, and plausibly engaging techniques that could non-invasively measure cognitive abilities.
Since video games have now become a popular medium of leisure among all generations [4, 7, 31, 46, 50, 63, 71-73, 75, 83], this paper performs construct and concurrent validation of the following hypothesis: H= The game-based intelligent technique can substitute traditional cognitive assessment tests to non-invasively measure cognitive abilities.Accordingly, this article presents LAS, an intelligent technique that utilizes non-invasively collected game analytics to automatically evaluate the cluster of three cognitive abilities (i.e., VLTM, AC, and VSTMiCDP).A major motivation behind the selection of this cluster of cognitive abilities was its importance in human cognition [22,33,77,78].LAS employs BrainStorm as its first valuable component for the non-invasive collection of game analytics, which include accuracy (i.e., number of correct and incorrect attempts) and efficiency (i.e., time of each correct and incorrect attempt) [2,3,34,35].BrainStorm, a cross-generational game suite that consists of three brain games (BGs) (i.e., Picture Puzzle, Letter and Number, and Find the Difference) dedicatedly designed in relation to the theoretical operationalization of each particular cognitive ability (i.e., respectively as VLTM, AC, and VSTMiCDP).The second major component of LAS is its kernel which consists of three statistically driven models to interpret the nominative game analytics in a meaningful manner for evaluating the cluster of three targeted cognitive abilities.In the construct validation of LAS, an experimental group-based leave-one-out crossvalidation (LOOCV) successfully categorized the targeted cognitive abilities of crossgeneration within the 5-point evaluation sphere (i.e., 0-1 = very bad, 1-2 = bad, 2-3 = fair, 3-4 = good, 4-5 = excellent).Furthermore, in concurrent validation, a control groupbased assessment was highly correlated with the experimental group-based cognitive evaluation results.
The remaining paper is organized as follows.Section 2 discusses the existing literature.Section 3 simultaneously explains both the components of LAS that include BrainStorm and its statistically driven models for the evaluation of each targeted cognitive ability (i.e., VLTM, AC, and VSTMiCDP).Section 4 states the research methodology including complete detail about the quantitative studies, participants' recruitment, experimentation, data collection, and analysis.Section 5 demonstrates the results.Section 6 provides conclusive remarks.

Literature limitations and gaps
The domain under discussion is rich in terms of literature; however, most of the available literature is based on traditional test studies.In this section, the most relevant literature is incorporated to understand the area and background of the problem.This section is further divided into two paragraphs.The first paragraph encompasses the limitations and impracticality of \traditional test studies, whereas the second paragraph highlights the lack of validation in game-based test studies.
As demonstrated in Table 1, scientists employed multiple observational, paper-pencilbased, and computer-aided tests for the assessment of a single cognitive ability.For example, reading span, paper unfolding, and figure reconstruction tests were administered in a clinical setting to assess visual working memory [82].Simultaneously, as also demonstrated in Table 1, one traditional test was utilized to assess the different cognitive abilities.For example, the Nback task was administered in a clinical setting to assess processing speed, visual short-term [62], working memory [18,20], general speeding of perceptual reaction [30], executive control [5], task switching, and multiple object tracking [23].The key reason for utilizing multiple and overlapping traditional tests to assess a single cognitive ability is the partial and overlapping association between the theoretical operationalization of cognitive abilities and the traditional tests.Due to the inability of a single traditional test to comprehensively assess targeted cognitive ability as well as the infeasibility to administer multiple traditional tests due to their cost-ineffective, tiresome, and invasive nature, a need was to build cost-effective, exciting, and plausibly engaging techniques that could non-invasively measure cognitive abilities.
The game-based test studies partially follow the footsteps of traditional test studies, where the theoretical operationalization of the targeted cognitive ability is considered as a test bed for the selection of suitable test(s), thus translating the theoretical operationalization of the targeted cognitive ability into their game design and aesthetics to offer a cost-effective, exciting, and plausibly engaging solution that could operate non-invasively [8,12,21,37,[47][48][49]84].Like [3], these studies recorded game analytics that primarily includes task completion time, accuracy, and count to project the performance of a subject against a particular cognitive ability.It is worth mentioning that none of the studies proposed a technique to interpret the nominative game analytics comprehensively.Unlike the presented research, these studies are lacking with the construct (i.e., categorizing the targeted cognitive ability within a certain evaluation sphere) as well Cognitive Functioning [59] Reaction Time and Plate Tapping Test Physical Functioning [59] Reading Span, Paper Unfolding and Figure Reconstruction Visual Working Memory [82] Visual Perceptual Decision-Making Task Paper-pencil-based Probabilistic Inference [42] Mental Rotation Test Executive Control [5], Cognitive Functioning [59] Trail Making, Cancellation, Matrix Reasoning, Digit Symbol Substitution and Letter Sets Test Cognitive Functioning [59] Directional Headings Test Physical Functioning [59] Plate Matching Test Visual Working Memory [82] Simultaneity Judgment and Temporal-order Judgment Task Computer aided Temporal Processing [24] Image Change Detection Trial (i.e., for top-down approach) Change Detection [14] Flanker-compatibility Test Change Detection [27], Executive Functioning [68] Sample Trial Sequence and Sample Search Display Task Selective Visual Attention [9] Multiple Object Tracking Task Multiple Objects Apprehending [39], Selective Visual Attention [45], Task Switching, Multiple Object Tracking [23], Visual Skills [1] Uniform Field of View Paradigm Periphery and Central Vision [40], Selective Visual Attention [45,70], Visual Skills [1] Perceptual Load Paradigm Periphery as well as Central Vision [40] Attentional Network Test Attention [29], Visual Skills [1], Encoding Speed [86] Dual-task Paradigm Executive Control [80] Task Switching Paradigm Executive Control [5,80] Attentional Blink Test Multiple Object Tracking [67], Executive Control [5], Visual Skills [1] Filter, Spatial Working Memory and Complex Verbal Span Task Multiple Object Tracking as well as Cognitive Control [67] Posner Letter Identity and Proactive Interference Task Processing Speed, Visual Short-term Memory [62] N-back Task Processing Speed, Visual Short-term [62], Working Memory [18,20], General Speeding of Perceptual Reaction [30], Executive Control [5], Task Switching as well as Multiple Object Tracking [23] Simon and Variable Attention Test General Speeding of Perceptual Reaction [30] Stop-signal Paradigm Working Memory [18], Executive Control [5], Task Switching as well as Multiple Object Tracking [23] Wechsler Memory, Adult Intelligence Scale-revised Working Memory [20] Picture Span Task and Simple Object Span Test Working Memory [82], Executive Control [5] Table 1 Backward Masking [53] Go/No-go Task Executive Functioning [68] Virtual, Physical Navigation and Drop Off Task Navigation Skills [19] Spatial Span Test Cognitive and Physical Functioning [59] as concurrent (i.e., reporting a relationship between the experimental and control groupbased results to potentially substitute the traditional tests) validation.

Las
The proposed intelligent technique, named LAS (see Fig. 1), employs BrainStorm as its first valuable component for the non-invasive collection of game analytics [2,3,34,35].Brain-Storm is a cross-generational game suite that consists of three BGs (i.e., Picture Puzzle, Letter and Number, and Find the Difference) (see Fig. 2) dedicatedly designed in relation to the theoretical operationalization of each cognitive ability (i.e., respectively as VLTM, AC, and VSTMiCDP).The second major component of LAS is its kernel which consists of three statistically driven models (see Fig. 3) to interpret the nominative game analytics in a meaningful manner for evaluating the cluster of three targeted cognitive abilities.Both components of LAS are simultaneously demonstrated and explained for each of the targeted cognitive abilities in the following subsections.Each subsection explains the theoretical operationalization of a particular cognitive ability.Subsequently, it narrates the corresponding game scenario as well as its alignment with the preceding and game analytics.Finally, it utilizes the equations that are comprehensively described in Table 2 to propose a model.For more clarity, a complete formation of each proposed model is also demonstrated in Fig. 3.

Theoretical operationalization
VLTM is a two-step process that sequentially executes when visual objects are represented to the cognitive system, and both of their outputs may provide the basis on which objects get stored and retrieved.In the first step, signals from the retina are analyzed to extract visual features such as orientation, colors, and so forth.These assists in creating a detailed representation of independent features that are closely related to the physical properties of the visual objects.In the second step, the representation of independent features is integrated into the representation of coherent objects, which leads to the phenomenal experience of a visual scene that is segregated into coherent objects [74,79].One of the possible methods to evaluate the performance of a VLTM of an individual is by looking into their adequacy of retrieving the Fig. 1 Architectural diagram of LAS stored information (i.e., accuracy) in relation to retrieval time (i.e., efficiency) against the data received by visual stimuli.Thus, Picture Puzzle BG was employed to non-invasively collect the nominative analytics for VLTM evaluation.

Game scenario
The Picture Puzzle BG has 15 images of renowned places and personalities that appear in a sequence.Each time, the player must select the name correctly from the provided choices (see Fig. 2a).The core mechanism of this BG compels the player's attention to receive the data from their visual source and pass it to the active memory.Active memory then proceeds and

(a)
Translation: Let's do the puzzle, Score, Eastern Tomb of Qing Dynasty, The Ming Tombs, The mausoleum of Genghis Khan.

(b) (c)
Fig. 2 BrainStorm high fidelity prototype.Note: BrainStorm design is already explained [35], and its significant cross-generational acceptability is already evident [2,3,34] fetches its precise information from declarative long-term memory, established previously in the memorization session.A clue cards-based memorization session, ranging from 5 to 10 minutes, was organized for each participant, prior to the gaming activity, to learn the names of distinguished places and personalities that could be asked in Picture Puzzle BG; however, it is noticed that on average only 10% of the places and personalities were already known to the participants.

Proposed model
The above-described theoretical operationalization of VLTM in relation to Picture Puzzle analytics is formed by utilizing the equations (see Table 2) into a proposed model as follows (see Fig. 3).
Fig. 3 Proposed models The overall attempts of the current player, that is, the sum of the total number of correct and incorrect attempts. 2 The average number of correct attempts by all the other players (i.e., of the current player's genre).
3 The ratio between the current player's total number of correct attempts (i.e., current player's accuracy) and the average number of correct attempts among all the other players (i.e., the overall accuracy of the current player's genre).This equation estimates the current player's accuracy vs. all the other players of same genre.The higher, the better. 7

N player incorrect attempt
The ratio between the average number of incorrect attempts among all the other players (i.e., the overall inaccuracy of the current player's genre) and the current player's total number of incorrect attempts (i.e., current player's inaccuracy).This equation estimates the current player's inaccuracy vs. all the other players of same genre.The higher, the better.The ratio between the average time taken by all the other players for each correct attempt (i.e., the overall efficiency of the current player's genre) and the average time taken by the current player for each correct attempt (i.e., current player's efficiency).This equation estimates the current player's efficiency vs. the other players of same genre.The higher, the better.
c 1 = 0.91 is the adjustable constant defined after trials to confine the score of M VLTM in a 5point evaluation sphere.

Theoretical operationalization
AC is a four-step process that sequentially includes visualization, information gathering, articulation, and analysis to solve complex or uncomplicated problems by rational decisionmaking based on one's understanding [44].A basic insight about an individual's AC can be deduced by looking into any of its sub-domain's problem-solving accuracy in relation to time (i.e., efficiency).Thus, Letter and Number BG was employed to non-invasively collect the nominative analytics for AC evaluation.

Game scenario
The Letter and Number BG has 10 incomplete series of numbers or letters that appear in a sequence.Each time, the player must identify its pattern to complete the series by selecting the correct letter or number from the provided options (see Fig. 2b).The core mechanism of this BG compels the player to sequentially execute information visualization, articulation, analysis, and decision-making based on their perception.

Proposed model
The above-described theoretical operationalization of AC in relation to Letter and Number analytics is formed by utilizing the equations (see Table 2) into a proposed model as follows (see Fig. 3).
Like c 1 , c 2 = 0.83 is also the adjustable constant defined after trials to confine the score of M AC in a 5-point evaluation sphere.

Theoretical operationalization
VSTMiCDP refers to the cognitive ability that stores visual information for a few seconds so that it can be used to compare a difference between the memory and the test array.VSTMiCDP plays an important role in maintaining continuity across visual interruptions, such as eye movement and blinking [56].One of the possible methods to evaluate the performance of VSTMiCDP of an individual is by looking into their visual search accuracy between the two similar images (i.e., for change detection) in relation to time (i.e., efficiency).Thus, Find the Difference BG was employed to non-invasively collect the nominative analytics for VSTMiCDP evaluation.

Game scenario
The Find the Difference BG has 3 different pairs of similar images of renowned places that appear in a sequence.Each time, the player must discover exactly 6 differences between each pair of images, displayed one after another (see Fig. 2c).The core mechanism of this BG compels the player to hold visual information of the sample image (i.e., displayed for a few seconds on one side of the screen) in its active memory to match with the test image (i.e., later displayed on the other side of the screen) for identifying differences between the sample frame and the test frame.

Proposed model
The above-described theoretical operationalization of VSTMiCDP in relation to Find the Difference analytics is formed by utilizing the equations (see Table 2) into a proposed model as follows (see Fig. 3).
Unlike M VLTM and M AC , the addition of R player _ incorrect _ attempt in M VSTMiCDP is due to the variableness of N player _ incorrect _ attempt in Find the Difference BG.This can be explained as in the first two proposed models the value of N player _ total _ attempt remains constant (i.e., 15 and 10, respectively).Thus, with the knowledge of N player _ correct _ attempt , N player _ incorrect _ attempt was deducible.Conversely in M VSTMiCDP , N player _ incorrect _ attempt is a variable that also makes N player _ total _ attempt a variable entity.Thus, to fully monitor the performance, it is equally important to explicitly consider N player _ incorrect _ attempt along with N player _ correct _ attempt .Moreover, like c 1 and c 2, c 3 = 0.65 is also the adjustable constant defined after trials to confine the score of M VSTMiCDP in a 5-point evaluation sphere.

Research design
Organized empirical research consists of three independent three-fold quantitative studies (QSs: QS 1 , QS 2 , and QS 3 ).Every QS was undertaken by a distinct group of participants who simultaneously acted as a member of an experimental group (i.e., in the first and second fold) as well as a control group (i.e., in the third fold).In the first fold, each participant went through the memorization phase for Picture Puzzle BG.Later in the second fold, each participant played all three BGs of the BrainStorm game suite in a single-player gaming mode.Finally, in the third fold, traditional tests were administered to each participant to measure the cluster of their three targeted cognitive abilities.These tests include the names-faces subtest of the memory assessment test (MAT) [85], numerical sequences subtest of the differential aptitude test (DAT) [6], and spot the difference in cognitive decline (SDCD) [66] respectively for VLTM, AC, and VSTMiCDP.

Participants recruitment
To dynamically examine the applicability of LAS, participants of three different age groups were recruited (see Table 3).For instance, 23 children (13 male and 10 female of age 10 to 15 years) were recruited for QS 1 , 20 younger adults (14 male and 6 female of age 22 to 27 years) for QS 2 , and 20 older adults (7 male and 13 female of age 40 to 45 years) for QS 3 on a volunteer basis.The recruitment process for volunteers as participants was carefully carried out based on the criteria of their sufficient gaming experience (i.e., habitual gameplay once a week or more).A reason behind the recruitment of volunteers with sufficient gaming experience was to avoid the learning curve and its consequences on gaming performance during QSs.An agreement on non-invasive game analytics collection was signed by the parents of the children, as well as by the younger and older adults themselves at the time of recruitment.

Experimental and control settings
Every QS intervention was 1 day for 8 hours, which includes experimental and control settings.Initially, in the experimental setup, computer-aided BGs playing multi-parallel activities were carried out under the observation of psychologists in a quiet room.On average, each participant took 20 minutes to end their one-time gaming activity.The 19.5-inch touch screens were utilized for the BGs play to allow each participant to interact with the BGs with better visibility.However, the touch screens were horizontally fixed on the table to ease the participants like tablets.Later, in the control setup, three traditional tests were administered side by side by the psychologists in a quiet room.On average, each participant took 35 minutes to complete their one-time traditional tests assessment.The first traditional test, MAT, requires the participant to learn the names of individuals who are portrayed in photographs.Following the delayed recall session of learning trials, the participant is presented with photographs and is asked to recognize the correct name from a brief list of alternatives.The second traditional test, DAT, requires the participants to identify the number sequence and solve the problem by identifying missing numbers from the sequence.The third traditional test, SDCD, requires the participant to memorize the details of the first picture for 30 seconds, after which the first picture is taken away and the second picture is shown.The participant is then asked to identify as many differences as possible between the first and second pictures, which were presented sequentially.To prevent bias risk and the potential influence of any unknown variable in the QSs, the participants who had completed the activity were not allowed to interact with the participants who were waiting to start the activity.

Data collection and analysis
The game analytics of each participant are noninvasively recorded and compiled separately for each BG activity to train and test the statistically driven models during the experimental group- based cross-generational evaluation.Similarly, the control group-based assessment results of each participant are compiled separately for their three targeted cognitive abilities.Statistical analysis is performed on the experimental and control group-based results compiled for each targeted cognitive ability to determine trends within the QSs.During the initial phase of the statistical analysis, the mean, standard deviation, and standard error are calculated.While in the second phase, the Pearson correlation [15] is applied over the experimental and the control group-based results of each targeted cognitive ability in every QS to determine their degree of correlation, where the correlation coefficient r indicates the direction and the effect size of the correlation.According to [15,16], the effect size is low if the value of r varies between ±0.1 to ±0.3, medium if it varies between ±0.3 to ±0.5, and large if it varies between ±0.5 to ±1.0.Furthermore, the p-value is calculated in this phase to demonstrate the significance of the findings [15].It is well established that correlation does not imply causation, yet this method has been used by the vast range of literature that also includes a research study named StudentLife [76].It is almost impossible in a real-world situation to discover the element(s) that has a causal relationship with the other element, as there always exist unknown factor(s) that alter(s) the causality between the related elements.Hence, the motivation behind employing the correlation technique, is not to discover a causal relationship but to comprehend the influence of an element in relation to the other(s) while conceding the influence is not causal.

Results
A complete summary of the experimental group-based evaluation results of each QS is demonstrated in Table 4 and Fig. 4a-c.The column charts in Fig. 4a-c demonstrate the potential of the proposed models to successfully categorize the cluster of three targeted cognitive abilities of the cross-generational participants within a 5-point evaluation sphere.Whereas the line charts in Fig. 4a-c demonstrate the potential of the proposed models to successfully estimate the mean level of the targeted cognitive abilities of three different age groups.This delineates the estimated mean level of younger adults' targeted cognitive abilities as higher than children (i.e., VLTM : 16.2 % , AC : 6.2 % , and VSTMiCDP : 10.6%) as well as older adults (i.e., VLTM: 10.8 % , AC : 5.4 % , and VSTMiCDP : 2.2%).Furthermore, to estimate the significance of an experimental group-based evaluation in relation to a control group-based assessment, the results of the statistical analysis are given in Table 4.The experimental group-based evaluation results of QS 1 illustrate the estimated mean level of VLTM (μ ¼ 2:62; σ ¼ 0:91; and σ x ¼ 0:19) (i.e., above 60.9% and below 39.1% of the children's evaluation score), AC (μ ¼ 2:72; σ ¼ 0:79; and σ x ¼ 0:17) (i.e., above 47.8% and below 52.2% of the children's evaluation score), and VSTMiCDP (μ ¼ 1:86; σ ¼ 1:19; and σ x ¼ 0:25) (i.e., above 60.9% and below 39.1% of the children's evaluation score) of the children (see Table 4 and Fig. 4a).This specifies that the estimated mean level of children's VLTM and AC fall within the fair region of the 5-point evaluation sphere, whereas their VSTMiCDP falls at the upper bad region of the 5-point evaluation sphere.Statistical analysis is performed to estimate the significance of an experimental group-based cognitive evaluation in relation to a control group-based assessment.This revealed a significant correlation, with a large effect size, between the experimental and control group-based results of VLTM (r = 0.81, p < 0.001), AC (r = 0.76, p < 0.001), and VSTMiCDP (r = 0.84, p < 0.001).

Conclusion
The feasibility of substituting traditional cognitive assessment with a game-based approach was previously fuzzy.In this regard, this paper introduced LAS, an intelligent technique that utilizes non-invasively collected game analytics to automatically evaluate the cluster of three cognitive abilities (i.e., VLTM, AC, and VSTMiCDP).LOOCV is employed to demonstrate the construct validity of LAS, which successfully categorizes the targeted cognitive abilities within a 5-point evaluation sphere.Furthermore, the correlation technique is employed to demonstrate the concurrent validity of LAS, which reports a significant relationship between the experimental and control group-based results to potentially substitute the appointed administered tests.
It is observed that the estimated mean level of children's VSTMiCDP falls at the upper bad region of the 5-point evaluation sphere.This finding was alarming until the estimated mean level of younger and older adults' VSTMiCDP was reported within the fair region of the 5point evaluation sphere.Therefore, it is assumed that in contrast to the other targeted cognitive abilities, VSTMiCDP takes relatively longer to excel among most children, and by the time a child transforms into a younger adult this ability gets naturally improved.However, another independent investigation can be conducted to further investigate the validity of this assumption.
The design principle of various commercially available BGs solutions such as Elevate [32], Lumosity [57], and Fit Brains [36] is somewhat similar to the theoretical operationalization approach of BrainStorm; however, unlike the demonstrated potential of LAS to independently evaluate the cluster of cognitive abilities, these software only provide a training facility to periodically improve and monitor the performance of the weaker brain aspects such as memory, brevity, speed, problem-solving, attention, and even flexibility.Therefore, the above-mentioned limitation in commercially available BGs solutions makes LAS the first-ofits-kind technique.
Task Switching and Multiple Object Tracking[23] Lateral, Collinear and Orthogonal Masking Paradigm

player 4 T 5 TAverage 6 R
The average number of incorrect attempts by all the other players (i.e., of current player's genre).Average time taken by the current player for each correct attempt.time taken by all the other players (i.e., of the current player's genre) for each correct attempt.

Table 1
Summary of the cited cognitive tests and their correspondingly assessed various cognitive abilities

Table 2
Equations used in the proposed models

Table 3
Details of participants

Table 4
Evaluation summary