1 Introduction

Virtual environments offer a high-dimensional way to assess a user’s cognitive capacities for performing everyday activities. As such, these assessments typically present more life-like stimuli and situations when compared to lower two-dimensional assessments (i.e., surveys or computer-based tasks with static stimuli; Parsons and Duffield 2020). High-dimensional assessments provide benefits over lower-dimensional tasks such as enhanced stimulus presentations and may lead to more naturalistic user behavior. Virtual reality-based cognitive tasks balance higher-dimensional presentations of multiple sensory modalities with experimental control (Kothgassner and Felnhofer 2020). Virtual reality-based cognitive tasks can also be used to establish computational models with latent context variables that can be extracted using nonlinear modeling and utilized for adaptation to user performance (McMahan et al. 2021).

The Virtual Environment Grocery Store (VEGS) is an example of a high-dimensional virtual reality-based cognitive assessment developed via user-centered design principles (Parsons 2012; Parsons et al. 2017; Virtual Environment Grocery Store 2012). The VEGS incorporated user-centered design (UCD) throughout to ensure that the developers and clinicians understood the user’s needs and expectations throughout all phases of design. An important user-design component was the focus on the manipulation of cognitive load experienced by the user. Specifically, it was important that the user experience experimentally controlled measures of cognitive load (ecologically valid environmental distractors; strategy formation; problem-solving) instead of extraneous load (van Merriënboer and Ayres 2005).

While immersed in the VEGS, users interact with objects and virtual human avatars as they perform shopping tasks. Specific to user-centered design was that emphasis on cognitive measures could be assessed by the VEGS including learning, memory, navigation, and executive functions. As users navigate the virtual store perform, they perform a variety of tasks. First, users travel from the front of the store to the pharmacist where they drop off a prescription with the virtual human pharmacist. From the virtual human pharmacist, the user is given a number for which the user must listen while shopping (see Fig. 1). There are also announcements broadcast over the public address system. To perform well, the user must pay attention to announcements, listen for their number, and ignore other numbers while shopping (cognitive inhibition: ignoring other numbers). The user is instructed to gather items from a shopping list that they learned prior to immersion. Users are also instructed to navigate to an automated teller machine (ATM) after two minutes (time-based prospective memory).

Fig. 1
figure 1

Start of the Virtual Environment Grocery Store protocol

The VEGS also includes other tasks: navigating through the aisles; selecting and retrieving items from the shopping list; ignoring items that were not from the shopping list; and staying within budget. After hearing their prescription number broadcast, the user is to return to the virtual pharmacist and stand in line to pick up their prescription (event-based prospective memory). Following the user’s immersive experience in the VEGS, the user performs delayed free and cued recall of the VEGS shopping items. Recent attempts at psychometric validation of the VEGS have found it to have construct validity for assessing both older and younger aged adults in both high and low distraction conditions (Barnett et al. 2022, 2023; Weitzner et al. 2021). Of note, in the lower distraction conditions, the VEGS appears to be primarily a memory (episodic and prospective) assessment (Parsons and Barnett 2017). The addition of environmental distractors (e.g., additional avatars; more announcements; cell phones ringing) revealed that users’ performance was related to both memory and executing functioning measures (Parsons and McMahan 2017).

While these psychometric results are promising, there is a need for a virtual environment grocery store platform that adapts to the user’s performance. Adaptive virtual environments allow for a shift away from one-size-fits-all experiences toward individual user-centered designs (Scott et al. 2016; Shute and Towle 2018). Moreover, adaptive virtual learning environments can potentially lead to enhanced cognitive representations and better knowledge transfer to other contexts (Scott et al. 2016). Adaptive systems may infer user states to reduce cognitive load (Dorneich et al. 2016) and provide individualized training (Klasnja et al. 2015). Recent systematic reviews of adaptive VR-based training approaches suggest that optimized adaptive virtual environments will include user’s capabilities, performance, and needs (Vaughan et al. 2016; Zahabi and Abdul Razak 2020).

Adaptive algorithms can be developed to tailor assessment and training to the user’s strengths and weaknesses (Reise and Waller 2009; Gibbons et al. 2008, 2016). Recently, several virtual reality-based cognitive assessments have used machine learning to develop classifiers and adaptive algorithms that can be used for personalized cognitive assessments (Alcaniz Raya et al. 2020; Asbee et al. 2023; Belger et al. 2023; De Gaspari et al. 2023; Kerick et al. 2023; Marín-Morales et al. 2018; McMahan et al. 2021; Tsai et al. 2021). Once established, these algorithms can be used for developing an adaptive virtual shopping platform that dynamically adjusts the complexity of stimulus presentations relative to the performance of the user. An adaptive version of the VEGS will allow for the assessment of the user’s limits as well as adapt to the user’s performance in a dynamic manner. For an adaptive assessment to change in response to the user, the system must first determine the user’s state. User states are typically determined using metrics such as discrete behaviors (Scott et al. 2016). These user metrics are used to apply decision rules that classify the user’s state. However, before implementing decision rules, it is important to use machine learning to develop performance classifiers. An initial step in the creation of an adaptive VEGS assessment is an examination of the performance of the classifiers. Examination of classifiers can be used to create optimal decision rules and inform test administrators of the general accuracy of the classification of the user’s state. In this paper, we compare the predictive ability of three machine learning classifiers: the Support Vector Machine, K-Nearest Neighbors, and Naïve Bayes.

2 Methods

The study received approval by a university’s committee for the protection of human subjects.

2.1 Participants

Study data was gathered and analyzed from 75 college-age students from a large university in the southwestern USA. Demographics included mean age = 21.07 (range 18–40); 53% of the participants were female. Education level included high school degree, some college, and bachelor’s degree. Ethnicity distribution consists of N = 16 African American, N = 5 Asian, N = 20 Hispanic, N = 28 Caucasians, and N = 6 Other. 86.6% of the participants were right-handed.

For all participants, the inclusion/exclusion criteria involved: participants must be aged 18 years of age or older; with normal (to corrected) vision. Participants would be excluded if they had a history of acute psychiatric condition(s), attention-deficit/hyperactivity disorder, or other Axis I psychopathology (diagnosed or suspected). Moreover, participants would not be included if they had a history of epilepsy, intellectual disability (IQ < 70), and/or neurological impairments impacting cognitive and/or motor movements. No participants were excluded.

Participants all reported that they were comfortable with computers. All participants rated their technology competency as experienced. There were no significant differences for age, sex, estimated full-scale IQ or computer comfort. Hence, this sample was thought to be a homogenous sample.

2.2 Apparatus and measures

2.2.1 Procedure

The protocol for gathering data and experimental sessions took place over a 90-min period. After the participant (i.e., user) arrived at the laboratory, they were briefed on the study’s procedures, potential risks, and benefits, as well as being told that they could choose to not participate. Before beginning the protocol (before starting study and pre-immersion), participants signed a written informed consent (approved by the university’s institutional review board) that designated their approval to take part in testing and immersion in the virtual environment. Once informed consent was received, general demographic data was gathered and participants responded to questions designed to assess their computer experience, comfort, and usage activities, perceived level of computer skill (Likert scale (1–not at all to 5–very skilled); and what type of games they played (e.g., role-playing; eSports, etc.).

2.2.2 Virtual environment grocery store

The VEGS was run on the Windows 10 operating system. A high technology computer (HTC) with an Intel Core i7 (16 GB RAM) and an NVIDIA GeForce GTX 1080 was used. A DisplayPort 1.2 was used for video output. While the multiple head-mounted displays (HMDs) can be used with the VEGS, the HTC Vive (http://www.htcvive.com) head-mounted display was used. The HTC Vive uses an organic light-emitting diode (OLED) display with a resolution of 2160 × 1200. The refresh rate is 90 Hz. Participant head-position was tracked using embedded inertial measurement units as the external Lighthouse tracking system cleared common tracking drift (60 Hz update rate). The VEGS includes a number of everyday shopping activities that have been found to be associated with cognitive performance on traditional (low-dimensional) neuropsychological assessments. For example, in both low and high distraction conditions, performance on the VEGS has been associated with performance on traditional measures of memory (Parsons and Barnett 2017). Also, during high distraction conditions, the participant’s performance on VEGS tasks also is associated with executive functioning (Parsons and McMahan 2017).

Prior to being immersed in the VEGS, participants took part in an encoding phase (i.e., learned a list of shopping items that they would shop for once immersed in the VEGS) and a familiarization phase (immersed into the virtual environment and experienced controllers). During the encoding phase, the participants (not immersed) were exposed to learning trials aimed at communicating the shopping items needed once immersed. Participants listened as the examiner read aloud 16 items (between each item reading there was an inter-stimulus interval of two seconds). Participants were not provided with a copy of the shopping list. Immediately following the examiner’s reading of the list, the participant was instructed to repeat the shopping items from the shopping list in any order. The participant’s immediate recall of items was recorded verbatim by a microphone and was logged for each of the immediate recall trials (Trials 1–3). Following the encoding phase (but before taking part in the actual VEGS tasks), participants took part in a familiarization phase, during which they were immersed in the virtual environment and learned the controls, navigated the environment, and made selections of items from the shelves. The duration of the familiarization phase was determined by the participant’s reported comfort and prior experience with virtual reality platforms (ranged from 3 to 5 min). Before moving onto the testing phase, examiners made sure that the participant was adept at using the controls and answered any participant questions. Next, the participant was informed of tasks needing completion during the testing phase: (1) the participant would need to travel to the pharmacy at the back of the store and click on the pharmacist to drop off a prescription. Once they clicked on the pharmacist, they would receive a number to remember and instructions; (2) participants were to listen for their number to be called (and ignore other numbers) as they shopped for items from the shopping list (learned during the encoding phase); (3) participants were instructed to watch the clock and go to the ATM machine after 2 min in the virtual environment (time-based prospective memory); and (4) once they heard their prescription “pick-up” number called, they were to return to the pharmacy and click on the pharmacist for pick-up (event-based prospective memory). Once the participant agreed that the instructions were understood the VEGS protocol began.

2.3 Data analytic considerations

MATLAB (version 9.2, MathWorks, Natick, MA, USA) was utilized for all analyses. Participant data was identified that could be used as prediction variables for the machine learning algorithms (see Table 1). Prediction variables were selected based upon the criteria that the variables could be used in real time in the adaptive environment to supply the machine learning algorithm with predictions of participant performance levels. Figure 2 shows the dissemination of high performers and low performers for shopping items (learned during encoding phase) pick-up times.

Table 1 Machine learning predictor variable descriptions
Fig. 2
figure 2

Distribution of shopping items based upon high and low performance. Note: H = high performers; L = low performers

Knowing their performance levels allows the platform to adapt and optimize user experience. Once the prediction variables were identified, the descriptive statistics were calculated for each predictor (see Table 2) and box plots are presented in Fig. 3.

Table 2 Machine learning predictor variable descriptive
Fig. 3
figure 3

Box plots for predictor variables. Note: H = high performers; L = low performers

It is important to note when looking at Table 2 the range of performances reflect high- and low-performance categories. For example, “# of times looked shop list” ranges from high performance (looked at shopping list in VEGS one time) to low performance (participant looked at shopping list 469 times—which means that participant constantly looked at the shopping list throughout the task). Likewise, with the timing some participants were high performers and completed tasks quickly, while other low performers took greater amounts of time. These were included in the model to establish classifiers for high and low performance.

Each participant was categorized as either a high performer or a low performer. Utilizing the number of items that each participant was able to find during the shopping phase, the mean was calculated (Mean = 7.5 items). A participant was assigned to high perform if the number of items they found was larger than the mean and assigned to low perform if the total number of items found was smaller than the mean. The category distribution was 37 high performers to 38 low performers.

(1) Support Vector Machine: The Support Vector Machine (SVM) utilizes a hyperplane to segment the data into two classes when classifying binary labeled data. The SVM trains using data belonging to both categories and attempts to place them into a higher-dimensional space. The goal of the SVM is to create a hyperplane with a maximum distance between the two categories. SVM algorithms can utilize different kernels (liner, polynomial, and radial basis function) to build different hyperplanes. Once trained, the SVM takes testing data and places it into one of the two categories. It determines the category based upon what side of the hyperplane the test data fall. The hyperplane can be optimized by selecting the maximum margins between the hyperplane and data. The SVM accomplish this by transforming the data from input space to feature space. This study implemented a Type 1 classification using 0.5 Nu with a radial basis function kernel (gamma = 0.016). The maximum number of iterations was set to 1000 with a stop error of 0.001. 10 k-fold cross-validation was employed which segmented the data randomly into 90% training and 10% testing.

(2) Naïve Bayes: Based upon Bayes theorem, the Naïve Bayes (NB) classifier is best for circumstances in which the dimensionality of inputs is high. One of the main advantages of NB is that it does not require a large set of training data. The NB classifier uses a calculated probability (see Eq. 1) that a set of data point belongs to a class. The NB algorithm attempts classification by choosing the highest probability as its result. As a supervised learning algorithm, NB is efficient at calculating the probability that new data fits into a specific group. The NB assumes that each predictor is independent from other predictors. A feature vector is calculated for each category during the training phase. During the testing phase, the classifier uses maximum likelihood for placing the data into correct categories. In this study, 10 k-fold cross-validation was utilized segmenting the data into a training set comprising 90% of the sample and a testing set comprising 10% of the sample. A normal distribution was assumed for each predictor.

$$P\left( {x_{i} {|}y} \right) = { }\frac{1}{{\sqrt {2\pi \sigma_{y}^{2} } }}\exp \left( { - \frac{{\left( {x_{i} - { }\mu_{y} } \right)^{2} }}{{2\sigma_{y}^{2} }}} \right)$$
(1)

(3) k-Nearest Neighbor: A supervised learning algorithm, k-Nearest Neighbor (kNN), uses location to determine data categorization. The kNN uses feature vectors to store the category’s datum during the training phase. When kNN is presented with new data, it utilizes (Eq. 2) to calculate the shortest distance to one of the two categories. Uneven data distribution is one of the primary issues with kNN. This can cause the algorithm to choose one category over the other. In this study, the kNN classifier implemented a 10 k-fold cross-validation randomly segmenting the data into 90% training and 10% testing. Additionally, the distance measure was set to Cityblock (Manhattan).

$$D\left( {a,b} \right) = { }\sqrt {\mathop \sum \limits_{i = 1}^{n} \left( {b_{i} - a_{i} } \right)^{2} }$$
(2)

3 Results

The selected predictors (see Table 1) from the randomly chosen participants were used to categorize participants into high performers and low performers using a Support Vector Machine (SVM), a Naïve Bayes (NB) classifier, and k-Nearest Neighbor (kNN) classifier (see Table 3). Having 75 participants, the data was randomly segmented into 67 algorithm training samples and eight algorithm testing samples. Each sample contains 20 data points that were used as predictors for the machine learning algorithms.

Table 3 Machine learning classifier results

The strongest classifier was the SVM, which produced an accuracy rate of 88%. This was followed by kNN (86.7%). NB came in last producing an accuracy of 76%. It is important to note, however, that the results from kNN, SVM, and NB demonstrate that the data was a symmetrical datasets as seen from the the F-Measures. kNN seemed to be is better at correctly assigning low performing participants than SVM which was more balanced when assigning low and high performing participants but it tended to favor high performers (see Fig. 4). This could be due to the kNN algorithm favoring low performers over the high performers. However, NB performed the worst, the NB classifier produced a poor correct classification rate (76%) as seen from the confusion matrices in Fig. 5.

Fig. 4
figure 4

Comparison of classifiers results. Note. SVM = Support Vector Machines; kNN = k-Nearest Neighbor; NB = Naïve Bayes

Fig. 5
figure 5

A Confusion Matrix for SVM; B Confusion Matrix for NB; C Confusion Matrix for kNN. Note. SVM = Support Vector Machines; kNN = k-Nearest Neighbor; NB = Naïve Bayes

4 Discussion

This study developed machine learning classifiers for the Virtual Environment Grocery Store. While psychometric validity (Parsons and Barnett 2017; Parsons and McMahan 2017) and reliability (Barnett et al. 2022; Weitzner et al. 2021) of the VEGS have been shown, there is a need for a virtual environment grocery store platform that adapts to the user’s performance. This study compared three machine learning algorithms: Support Vector Machine (SVM), Naïve Bayes (NB), and k-Nearest Neighbor (kNN). These classifiers were compared for determining when the VEGS environment would need to adapt for a user. Results revealed that the SVM (88% correct classification) classifier was the most robust classifier for identifying cognitive performance followed closely by kNN (86.7%) and NB (76% correct classification). While the SVM was better at balancing between lower- and higher-performing participants it tended to favor high performers, the kNN algorithm was better at assigning lower-performing participants. A hybrid model may be best in the adaptive VEGS platform with combined results from the SVM and kNN classifiers. These findings serve as an initial step toward developing decision rules that can be used for adapting the VEGS environment to the user in real time. These algorithms will be employed for a future version of an adaptive VEGS.

Based on data from the VEGS, the SVM classifier performed best with a correct classification rate of 88%. When utilizing a SVM for classification, the algorithm will attempt to maximize the margin (i.e., distance between the hyperplane used for classification and the training data; Nobel 2006). SVMs with greater margins are believed to perform better (Bhavsar and Panchal 2012). A hyperplane with a large margin may occur when the data is transformed to a higher plane, leading to a high classification rate for the SVM algorithm.

The results from testing the classifiers indicate that the SVM was stable at assigning participants' performance but it favored higher-performing participants but the kNN algorithm was better at assigning lower-performing participants. One reason may be that higher performers may have had more consistent scores. Higher performers may have completed tasks in a manner that was more similar, for example, taking efficient routes and remembering a similar number of items. However, low performers may have retrieved items at more random intervals, creating a wider distribution of scores and causing overlap of the higher performers' scores. This may have made it more difficult to accurately identify higher performers using kNN which classifies neighbors having similar scores within a feature space.

The NB classifier did not perform well. A possible reason for this performance is the fact that the dataset is not Naïve. One of the assumptions of the NB classifier is that each predictor is independent (Arar and Ayan 2017). It is possible that the predictors are not completely independent, which is causing lower classification accuracy. For example, if a participant becomes distracted in the assessment many of the time-based predictors would increase.

Often a single algorithm is used for classification, but a hybrid model may be best in the adaptive VEGS platform with combined results from the SVM and kNN classifiers. The hybrid system could compare the predicted classification. If they agree, then the system could choose that category. If the classifiers do not agree, then the system will use the category in which it has the highest confidence. A study by Mohan and colleagues (2019) found that a hybrid approach to machine learning outperformed standard approaches to prediction.

An important step in the creation of an adaptive system is the creation of a classifier to use decision rules to accurately determine the user’s state. Except for NB, the results suggest that the ML classifiers were able to accurately identify the users’ states (i.e., high or low performance). Psychologists and clinicians have had some success using various machine learning-based classifiers to detect physical and mental health issues such as traumatic brain injury (Mitra et al. 2016), autism spectrum disorder (Omar et al. 2019), and post-traumatic stress disorder (Galatzer-Levy et al. 2017).

Adapting the VEGS via machine learning classifiers allows the system to better identify the participants’ performance in real time and establish their ability level. This is important because participants perform tasks at various levels, with some users finding certain tasks easier or more difficult to perform. For example, some users have more experience or greater deficits than the average user. To perform this task in real time, classifiers (based on machine learning) and decision rules were developed. Machine learning offers a tool for personalized assessment and training of users. The machine learning classifiers were able to accurately identify user performance, but the study was not without limitations. The current work focused on high or low performance, additional categories for classification may be included in the future. For example, the addition of psychophysiological measures could be used to determine when participants may be experiencing frustration or high cognitive load. This could allow the adaptive system to provide assistance when these user states are identified.

The work presented in this research represents an initial step in the development of an adaptive virtual environment. The models implemented in the research utilize 20 predictors as an upper-level boundary to begin to identify the ideal predictors and the strongest classifier to implement within the AVE. However, it is known that not all these predictors will be available at the start of the assessment. The adaptive environment would take this into account and continue to adjust as new data becomes available. Future work requires the optimization of the framework to take delayed data into account in the decision-making process. Earlier approaches aimed at psychometric validation of the VEGS using the general linear model (Barnett et al. 2022, 2023; Weitzner et al. 2021; Parsons and Barnett 2017; Parsons and McMahan 2017). While the current research moves beyond earlier VEGS validation studies with healthy aging cohorts (with the exception of Barnett et al. 2023), there are several recent studies aimed at applying virtual reality and machine learning to aging clinical cohorts (Bayahya et al. 2022; Cavedoni et al. 2020; De Gaspari et al. 2023; Stasolla and Di Gioia 2023; Tsai et al. 2021). Hence, there is a need for a machine learning-based VEGS approach applied to clinical populations.

Using the classifiers identified herein, the VEGS can categorize user performance for use in a future adaptive iteration of the VEGS. The adaptive VEGS system will use a set of decision rules to inform the system on how exactly to process each category. Within the VEGS, rules can be defined for instances when a low performer is identified based upon the currently performed task. For example, if the task is dropping off the prescription and the performer is having difficulty (i.e., low performer), then the system could suggest a path for the user to take to better navigate to the pharmacist. Additionally, during the shopping task users may struggle to find items. The adaptive system could highlight products in the store that users have yet to pick up. If a user is categorized as a high performer for an extended duration, then the system would adapt to make the current task more difficult. This would continue until the user becomes a low performer. In VEGS, the difficulty can be increased by adding additional items that the user must find, or by adding additional tasks. In sum, an adaptive VEGS can use machine learning-based classifiers and decision rules to personalize the assessment and training of users. The development of these machine learning classifiers is a first step toward developing the concise item pools that provide equal to or greater precision at establishing ability levels compared to normative data referenced paper-and-pencil tests (Gibbons et al. 2008). Adaptive virtual environments allow for a shift away from one-size-fits-all experiences toward individual user-centered designs (Scott et al. 2016; Shute and Towle 2018).