Introduction

The human immunodeficiency virus (HIV) epidemic remains a critical public health issue worldwide and particularly in low- and middle-income countries (LMICs). While combination antiretroviral therapy (cART) improves immune function and lengthens life expectancy for persons living with HIV (PLWH), multiple lines of evidence note persistent cognitive impairment, known as HIV-associated neurocognitive disorder (HAND) [1,2,3,4,5,6]. This condition poses an ongoing threat to the health of PLWH as it is known to negatively impact cART adherence and performance of activities of daily functioning [7, 8].

Published studies demonstrate that up to 50% of PLWH on cART experience HAND, including many who achieve viral suppression [9,10,11,12]. The condition is believed to result from HIV-induced structural and functional damage to the fronto-striatal circuit in the brain [10, 13]. This presents with a clinical phenotype of cognitive, motor, and behavioral changes [14]. The current research-based diagnostic categories for HAND are assigned in accordance with 2007 “Frascati” criteria, which emphasizes performance on comprehensive neuropsychological (NP) testing batteries [15]. Classifications range in impairment from HIV-associated dementia (HAD) to mild neurocognitive disorder (MND) and asymptomatic neurocognitive impairment (ANI), the latter which signifies impaired neuropsychological performance in the absence of identifiable functional deficits [16].

The neuropsychological phenotype typically described in HAND in the era of cART is characterized by prominent difficulties in processing speed, executive function, and memory retrieval (rather than encoding) errors [9, 17, 18]. Although comprehensive NP testing is the gold standard for HAND diagnosis, this procedure is not easily accessible. It requires a trained neuropsychologist to administer a broad range of tailored tests that assess multiple cognitive domains, which can last between 1 and 4 h [16]. Due to its time- and resource-intensive nature, this type of evaluation may be less feasible in communities where neuropsychologists are scarce, such as in rural areas and low-income countries [19, 20]. Meanwhile, the COVID-19 pandemic has created additional barriers for patients to access such assessments. In response, the clinical care community has pivoted toward non-traditional and often remote assessment strategies, including tools needed to assess cognition and mental health regardless of setting [21, 22].

In the HIV literature, digitalized cognitive testing first appeared in the early 1990s, and a variety of tests have been developed since then [23,24,25]. These computerized neuropsychological assessment devices (CNADs) offer a wide range of benefits as they usually provide automatically timed, scored, and reported results; can be administered by lay health workers in a much shorter time frame; are typically portable; and can be designed with novel technology to serve as more ecologically valid tests [23, 26]. Many CNADs require no other materials, and some can even be self-administered [27]. Already, such batteries are shown to correlate with traditional NP testing among healthy controls, and are a valid screening mechanism for cognitive impairment in mild cognitive impairment, substance use disorders, and multiple sclerosis [28,29,30,31].

While research into the implementation of CNADs for HIV has been predominately shaped by work completed in high-income countries, these batteries could prove most useful in assessing HAND in LMICs, where HIV is more common and where meta-analyses have shown that MND and HAD are more prevalent [32,33,34]. However, there are inherent drawbacks to CNADs, which currently limit their application in clinical settings. Digital tests are subject to technical hardware and software problems. They often lack normative data. Many are commercial products, which limit their possible utilization in LMICs. Moreover, to date, the sensitivity of these tests to detect impairment is generally moderate and lower than paper-and-pencil testing or has been validated only in very small samples. Thus, they are currently less than ideal for use as screening tools [35].

Within the HIV literature, a number of novel CNADs have been examined. The purpose of this review is to (1) describe these existing computerized batteries, (2) review published validity data as compared to traditional NP testing, and (3) discuss the current state of digital assessments for examining HAND.

Methods

We designed a comprehensive search strategy through consultation with a librarian and using PubMed and PsychInfo databases to search for the following: “HIV” AND “cognitive impairment OR cognition OR HIV-associated neurocognitive disorder” AND “neuropsychological” AND “computer OR digital”. These searches yielded 81 and 54 articles, respectively. The abstracts of relevant articles were compiled and then reviewed for the use of digital neuropsychological testing in the setting of HIV. The articles that met these criteria were read and their reference lists further examined in order to compile a more inclusive review. Our review was limited to peer-reviewed English-language journals published within the past 20 years, with no other restrictions, such as sample size or analysis type (Table 1). We identified eight CNADs that have undergone validity testing in the setting of HIV (Table 2).

Table 1 Systematic review of the literature to examine the validity of digital tools to detect cognitive impairment in the setting of HIV
Table 2 Brief overview of digital screening tools that have undergone validity testing to screen for cognitive impairment in the setting of HIV

Validity of digitalized versions of traditional neuropsychological testing batteries

The following CNADs are modeled after the gold standard paper-and-pencil neuropsychological tests and can be viewed as “traditional” computerized batteries.

Cogstate

Cogstate is the CNAD that we identified as having the greatest number of publications in terms of its validation for detecting HAND [36,37,38,39,40,41,42]. Cogstate offers a variety of commercially available tests, across a number of domains, which can be administered as a customizable battery in 23 languages [42]. It has been used as a tool in clinical research among diseases that include multiple sclerosis, schizophrenia, and neuro-oncological conditions in addition to HIV [30, 43, 44]. Cogstate batteries have been created for specific diseases, such as Alzheimer’s disease and depression. Many of these Cogstate batteries include adapted neuropsychological tests, such as the Finger Tapping Test [45]. They can be administered on a computer using keyboard and mouse movements or a tablet utilizing touch modalities; however, there is evidence that performance differs depending on the device selected, such as faster performance on several measures when using the computer [46].

In the setting of HIV, Cogstate was first examined in 2006 in Sydney, Australia, in a small sample of healthy controls (n = 29), individuals with advanced HIV infection (n = 49; 55% with undetectable HIV RNA; 1 subject off ART), and individuals with AIDS dementia complex (ADC; n = 11; 27% with undetectable HIV RNA; 3 subjects off ART), a term used for advanced cognitive impairment from previously used diagnostic criteria, generally equivalent to the diagnosis of HAD in the contemporary “Frascati” criteria [15, 41, 47,48,49]. Investigators employed the following 10–15-min battery: simple reaction time, choice reaction time, complex reaction time, continuous performance, one-back working memory, matching, incidental learning, and associate learning. The battery had a sensitivity of 81% and a specificity of 70% for determining moderate to severe cognitive impairment in PLWH as impaired or unimpaired individuals based on NP testing (> 2 standard deviations below average in 2 of 14 NP testing measures). Correlations between Cogstate tests with similar conventional tests ranged widely (Pearson correlation r = 0.23–0.62, p < 0.05). An additional analysis determined Cogstate’s ability to classify individuals clinically with ADC those who were “non-demented,” a classification which included individuals with advanced HIV infection with milder or no cognitive impairment. Sensitivity and specificity in this analysis were 18% and 98%, respectively.

Another study completed in Sydney, Australia, further examined Cogstate’s ability to detect ADC within a small sample (n = 20) as compared to PLWH without ADC (n = 20) [40]. All ADC participants were receiving ART while that of those without ADC was not reported. Viral suppression and/or detectability was not reported for either group. Investigators employed the following 8–10-min battery: detection task, identification task, one-back task, and visual learning task. Pearson correlations between Cogstate and paper-and-pencil tests ranged from 0.70 to 0.81 (p < 0.001). These analyses included data from other participant groups, including those with traumatic brain injury and schizophrenia in addition to HIV. Sensitivity and specificity for detecting ADC were not reported.

Our review also identified four studies which analyzed Cogstate’s performance in less severely impaired HIV-infected groups. One group in St. Louis, MO, in the United States (U.S.) examined PLWH who had normal cognition (n = 24), mild impairment (n = 20), and moderate impairment (n = 2) based on a composite Global Deficit Score (GDS) created from a traditional NP testing [39]. For all subjects, 61% were virally suppressed (defined as less than 400 copies/mL) and 74% were on cART. Performance on Cogstate (employing the following 12–15-min battery: two simple reaction time, choice reaction time, one-back test, monitoring test, and associate learning test) correlated weakly with formal testing, with the strongest correlations found comparing traditional NP tests to the simple detection task (r = 0.42–0.53, p < 0.05). Additionally, using a composite score created from five significant test parameters (accuracy and speed of the two simple detection tests, associate learning accuracy, monitoring tasks accuracy, one-back test accuracy) from a regression analysis, 90% of individuals were correctly classified as cognitively impaired or not.

More recent work in Sydney, Australia, in a sample of 53 PLWH (non-impaired (NI), n = 28; ANI, n = 6; MND, n = 14; HAD, n = 5; 80% of all participants had an undetectable HIV RNA plasma, and 87% were on ART) and 22 HIV-uninfected controls who completed both a gold-standard NP testing and Cogstate (employing the following 20-min battery: sustained attention, information processing speed, attention, working memory, verbal learning, and verbal memory) found a sensitivity of 76% and specificity of 71% for likely HAND based on a definition of GDS ≥ 0.5 [37]. When classifying HAD (n = 5) vs. MND (n = 14), the Cogstate yielded a high sensitivity of 100% and specificity of 98%. More so, further analysis showed that Cogstate-based screen criterion validity was higher in the sample of PLWH using the GDS (76% sensitivity, 71% specificity) compared to cognitive domain rating, where each domain was given a z-score based on performance of a single test (72% sensitivity, 57% specificity) [36].

Lastly, one study in Kampala, Uganda (n = 181), found that Cogstate (employing the following 25-min battery: detection task, identification task, one card learning task, and one-back task) was a feasible tool to assess HAND in a resource-limited setting in terms of its usage; however, sensitivity and specificity were 57% and 77%, respectively, when compared to the traditional NP testing for a GDS ≥ 0.5 [38]. Within this study, 80% of participants were on ART, and the percentage with viral suppression was not reported.

NeuroScreen

NeuroScreen is another, more recently developed CNAD [50, 51•, 52, 53]. It relies on the Android operating system and can therefore be administered on tablets or smartphones that employ touchscreens. NeuroScreen consists of 10 brief digitalized tasks used to assess individuals across six domains (i.e., processing speed, executive functioning, working memory, motor speed, learning, and memory). It can be administered in under 30 min, and like Cogstate and many other digital exams, the results of this battery are made automatically and immediately available [50].

In a sample of PLWH in New York City, U.S. (68% with viral load < 100), who had neurocognitive impairment (NCI) based on GDS ≥ 0.5 (n = 33) and PLWH without impairment (n = 11), the complete NeuroScreen battery yielded a sensitivity of 94% and a specificity of 64% [50]. In a larger sample (PLWH with NCI, n = 27; PLWH without NCI, n = 75) completed in Cape Town, South Africa, investigators examined performance of the NeuroScreen based on (1) the sum of all individual test scores, (2) the sum of all individual test scores and error scores from four tests, and (3) an abbreviated version containing (visual discrimination 1 and 2, trail making 1, and number span total) [51•]. HIV RNA data were available for 81 participants and undetectable in 91%; all participants had initiated ART at least 12 months prior. When compared to a gold standard of paper-and-pencil NP testing for a GDS ≥ 0.5, these measures yielded sensitivities and specificities as follows: 82% and 75%; 81% and 81%; 93% and 71%, respectively, when administered by lay health workers. Another study completed in Cape Town, South Africa, assessed processing speed in PLWH using NeuroScreen as compared to uninfected individuals finding that a greater proportion of PLWH performed worse than uninfected individuals [52]. Lastly, construct validity for NeuroScreen was examined in New York City, NY, U.S., in people living with perinatally acquired HIV (PHIV) (n = 33) and perinatal HIV-exposure without infection (PHEU) (n = 29) [53]. All PLWH were prescribed at least one HIV medication, and the median viral load was 46 (copies/mL). In comparing NeuroScreen performance (employing the following battery: trail making 1, 2, and 3, visual discrimination 1 and 2, number span forwards and backwards, and number speed) to paper-and-pencil NP testing (specifically, trail making test A and B, digit span forwards and backwards), Pearson correlations ranged from 0.42 to 0.70 (p < 0.001).

California Computerized Assessment Package (CalCAP)

A number of other CNADs modeled after traditional testing batteries are designed to focus instead on solely one or two domains. For example, CalCAP is a 20–25 min series of brief reaction time tasks administered on a computer designed to assess speed of information processing and psychomotor function [54]. It has been employed in a number of studies to assess reaction time in PLWH [55,56,57,58,59,60]. However, our review was only able to identify one paper examining its validity for assessing these domains in PLWH. This study based in San Diego, CA, U.S. (PLWH with NCI, n = 46; PLWH without NCI, n = 36; 81% had a detectable viral load; 70% were classified as having AIDS, based upon clinical history and/or CD4 cell count below 200; percentage on treatment was not reported), found that CalCAP had a sensitivity of 68% and a specificity of 77% in identifying those with GDS ≥ 0.5 based on traditional NP testing [54]. Correlations between traditional NP testing domains and CalCAP-mini subtests were also reported (Pearson correlation r = 0.22–0.43, p < 0.05).

Computerized Speed Cognitive Test (CSCT)

Similarly, another CNAD, the Computerized Speed Cognitive Test (CSCT), is also designed to measure information processing speed. This brief test takes about 90 s to complete, during which participants match stimuli presented at the bottom of the screen to a key of symbols presented at the top of the screen [61]. It is available on computer and touchscreen platforms. One study in Nice and Cannes, France (non-HAND, n = 19; HAND, n = 67; 98% had HIV-RNA below < 50 copies/mL at inclusion), showed a significant difference between groups based on a CSCT z-score with a cutoff of 47 correct responses (mean (SD) =  − 0.1 (1.0) vs. − 1.1 (1.6); p < 0.005). Sensitivity and specificity of CSCT based on this cutoff were 81% and 53%, respectively, where HAND was classified from the traditional NP testing as at least two cognitive domains that were 1 SD below the mean [61].

Additional Computerized Batteries

Another study, which also employed computerized battery modeled after traditional testing (consisting of the following: reversal learning, emotion recognition, letter 2-back task, stop-signal task, flanker task, corsi block test, self-ordered spatial working memory task), did not report exact measures of the CNAD’s effectiveness in screening for HAND, making it difficult to assess its use [62].

Validity of non-traditional cognitive batteries

While the aforementioned CNADs have been developed to generally resemble the conventional NP model, others have taken advantage of advances in technology that are inaccessible in traditional paper-and-pencil assessments. The following batteries employ features such as simulated or virtual realities, quick daily mobile-phone assessments, or other purported ecologically valid assessments, in addition to traditional tests [57, 60, 63,64,65,66,67].

Computer Assessment of Mild Cognitive Impairment (CAMCI)

The Computer Assessment of Mild Cognitive Impairment (CAMCI) is a CNAD designed specifically for older individuals (60 years and older), who might not be comfortable using digital devices [60, 63, 64]. Similar to Cogstate, CAMCI is a commercial product available in English language that can be administered in roughly 20 min on a tablet or computer using a digital pen, mouse, or touchscreen for input [68]. Normative data are available and come from a predominantly Caucasian sample of U.S. adults with an 8th grade education or higher. While it is composed of nine digitalized versions of traditional NP tests, CAMCI is unique in that it also employs a virtual environment task, in which a participant “drives” a car and is instructed to navigate through a series of intersections while running errands. This “virtual world” task is intended to be more ecologically valid as it simulates cognitive performance in a virtual real-world setting, assessing the individual’s prospective memory, incidental memory, and decision-making [63].

One study in St. Louis, MO, U.S., examined CAMCI in a small sample (HIV-, n = 30; PLWH, n = 29; data on treatment status and HIV RNA not reported) with the goal of determining whether this battery could identify individuals who are likely impaired and may require more formal testing [63]. Using six tests from the battery, including the visual recognition component of the “virtual world” task, CAMCI had a sensitivity of 72% and specificity of 97% to detect mild impairment as compared with normal and borderline test performance determined by a global impairment rating from the NP testing. In examining test–retest reliability, they found a median correlation coefficient of 0.46 for 24-week retest period [63]. This coefficient may be partially decreased by the clinical course of HAND itself, which is known to fluctuate overtime [69].

Another study in Baltimore, MD, U.S., found significant differences in performance on subsets of these CAMCI functional tasks among PLWH compared to uninfected individuals. The sample included those without HIV (n = 38), HIV-infected with normal cognition (n = 16), ANI (n = 37), MND (n = 22), and HAD (n = 39). For PLWH, 41% had a detectable viral load; however, percentage on treatment was not reported. Two of the functional tasks distinguished between HIV uninfected and PLWH, specifically the Errands Bank task (79 vs. 53%; p < 0.05), and Errands Post Office tasks (79 vs. 56%; p < 0.01) [60]. Scores for the shopping list task were different among the five groups, decreasing from HIV uninfected to PLWH with normal cognition to those with HAD (p = 0.02). A pairwise comparisons noted a significant difference between the HIV-uninfected participants and those with HAD. Many of these tasks were shown to weakly correlate with conventional measures of functional performance (Pearson correlation r = 0.19–0.38, p < 0.05) [60].

Novel Computerized Cognitive Assessment Device (NCAD)

Another CNAD identified in this review is the Novel Computerized Cognitive Assessment Device (NCAD). Developed through a partnership between Emory University and the Georgia Institute of Technology in the U.S., this battery uses a unique set of tools designed to create a distraction-free environment [65]. Specifically, the participant wears a headset unit with a video display and noise-canceling headphones while holding a handheld input piece with two buttons, which indicate “yes” or “no” responses. The participant then completes seven fully automated subtests, which are modified versions of established NP tests. This specialized testing environment allows for a greater minimization of environmental distractors, including those that might come from the administrator, since it is completely self-contained. While it is not commercially available, the software is available at the request of the researcher. Preliminary data collected in Atlanta, GA, U.S. (PLWH with impairment, n = 27; PLWH without impairment, n = 12; 72% were undetectable and 74% were on treatment), shows that the NCAD correlates with mean composite neuropsychological score (Pearson correlation r = 0.59, p < 0.001) as well as with a Global Deficit Score (Spearman’s rho =  − 0.36, p < 0.05). Based on a cutoff score of 75.44 for NCAD total subtest accuracy, they found sensitivity of 83% and specificity of 67% to detect impairment, where impairment was determined by presence of at least two domains with scores > 1 SD below the mean. The area under the curve for NCAD total subtest accuracy was 0.756 (p = 0.012) [65].

Ecological Momentary Assessments (EMAs)

As an alternative to the one-session NP testing, some researchers are examining whether a different kind of CNAD used intermittently might produce more ecologically valid results. Known as ecological momentary assessments (EMAs), this type of battery has participants repeat short testing sessions on a smartphone-based application several times throughout the day for multiple days, with each session lasting about 3 min [66]. These assessments ask questions about the participant’s daily functioning and symptoms of cognition, including mood, socialization, and substance use. Results of one study in San Diego, CA, U.S., with 20 PLWH showed that these EMA measures of mood correlated to laboratory-assessed measures using the Beck Depression Inventory-II items, including sadness (r = 0.57; p < 0.05), forgetfulness (r = 0.67; p < 0.05), and problems concentrating (r = 0.73; p < 0.05) [70]. In addition, digital versions of NP tests can be repeatedly administered in this fashion. For example, another study in San Diego, CA, U.S. (PLWH, n = 58; HIV-uninfected, n = 32; 95% on ART; percentage with suppression not reported), had participants complete a mobile color-word interference test (mCWIT) [67]. Based on the widely used and validated Stroop interference test, the mCWIT had participants say the color of written words aloud, recording responses by a smartphone for subsequent scoring by a trained researcher [67, 71]. The conventional Stroop interference trial time and mCWIT performance correlated (r = 0.63, p < 0.05), indicating that this kind of NP testing has promising utility. Performance on average was worse among PLWH compared to a group of controls, showing its potential use in the setting of HIV. Notably, participants in this study completed 86% of the mobile cognitive tests, showing that this was a well-tolerated form of testing by participants, whose ages ranged from 50 to 74 years.

Benefits and limitations of the computerized batteries in screening for HAND

Overall, the above CNADs show great promise in advancing the field of neuroHIV as we develop more accessible assessment tools for HAND. One benefit shared by all of the CNADs described is the relatively short duration of administration (Table 2). Rather than undergo an hour or hours-long battery, all of the digital tests described in this review can be completed in under half an hour, and some require even less time––such as the CSCT, which can be administered in only 90 s. This shows that CNADs may be particularly useful in clinical settings where time is limited. With further development, brief digital assessments could outperform existing paper-and-pencil screening tools, such as the International HIV Dementia Scale and the Montreal Cognitive Assessment, which have been shown to have poor performance characteristics for screening of HIV-related cognitive impairment [72, 73].

More so, within LMICs specifically, these CNADs may prove to be most useful. For example, Cogstate and NeuroScreen have been administered in Uganda and South Africa, respectively, and, in doing so, have shown that this kind of computerized testing is not only feasible but more accessible. These batteries can be administered in rural areas with disparate access to infrastructure, equipment, and technical expertise. Many, including NeuroScreen, CalCAP, and CSCT, are free, which increases the feasibility of neuropsychological testing in these areas. Similarly, most CNADs can be administered by lay persons, which equip them to better serve low-resource communities that do not have easy access to psychometrists and neuropsychologists. While testing supervision has not been extensively researched, it has been shown that Cogstate performance is not impacted significantly by self-administration in comparison to technician-supervised administration [74, 75]. This holds major implications in understanding the benefits that mobile cognitive testing offers, especially in the wake of the current COVID-19 pandemic and the likely shift toward telemedicine that may be sustained as a result. Properly validated and effective tools would be a first step in a process toward advancing remote assessments.

Additionally, almost every CNAD described in this review offers researchers and clinicians instant access to scoring and results. These immediate results would provide some guidance to identify individuals with poor performance that could link to poor outcomes although full interpretation would still require access to a neuropsychologist. These batteries may also be able to examine brain function in novel ways, such as the CAMCI’s virtual reality task and EMA’s model of repeated daily testing, which can assess cognition in ways that more closely resemble every-day life, suggesting greater ecological validity.

However, there are limitations to the CNADs described here. At the present, further work is necessary to create and confirm a tool able to identify the more mild forms of HAND, which are currently most prevalent among treated patients [76]. Given that HAD is much less common than milder impairment, CNADs must overcome the challenge of detecting milder forms of impairment to be widely useful [34].

Another important consideration is that many of these validity tests do not use formal research criteria to classify HAND; thus, the specificity for an HIV-related impairment may be less. For example, the approach designed to detect ADC using Cogstate is less relevant today since that classification is no longer used in the setting of treated HIV [48]. Similarly, the approach designed to detect NCI using NeuroScreen leaves room for other contributing factors, such as head injury as the etiology to the cognitive performance deficits [51•]. More so, the vast majority of these validity studies do not employ large enough sample sizes to conclusively determine their ability to detect HAND. The most promising of published results was for NeuroScreen among a sample of only 102 PLWH [51•]. Similarly, the described studies examining CalCAP, CSCT, EMAs, and NCAD had fewer than 100 PLWH in their analysis, creating a degree of uncertainty in external validity. Access to regionally appropriate normative data also remains a challenge [35].

While CNADs are more accessible generally, it is important to acknowledge that some batteries, such as EMAs, may have limited feasibility in some settings. Testing that requires participants to have access to a smartphone throughout the day may be difficult in regions where smartphone ownership is less widespread or data plans costly [77]. Similarly, though the CAMCI simulated reality task produces potentially useful measures, it may not be appropriate in communities in which carrying out errands such as driving to the bank are not familiar. The NCAD, too, may be difficult to employ in certain places due to the technology required to administer it.

Lastly, there are several CNADs that have been used in HIV literature but remain to be validated, including Internet-based assessments, FePsy (The Iron Psyche), Covert Orienting of Visual Attention Task (COVAT), and the Conners’ Continuous Performance Test (CPT). These could all prove to be useful in screening for HAND with additional research [78,79,80,81,82,83,84].

Conclusions

Altogether, this review suggests that these computerized neuropsychological assessment devices remain in the early stages of development. While currently no study has directly analyzed the performance of one CNAD to another, Table 1 provides information on sample sizes studied, sensitivity, and specificity as well as availability and costs, allowing readers to consider these factors in their work. As of now, there is not enough evidence to support whether these tools can supplant the gold standard paper-and-pencil-based testing in terms of screening for HAND, especially as many of these studies are small and limited, despite some finding high sensitivity and specificity. However, the CNADs above have already proven themselves to be useful tools in research, offering the potential to become clinically impactful in the future with more development. Many already show promise of adequate construct validity when compared to traditional paper-and-pencil testing [85]. They offer novel and exciting ways to examine cognitive function and may be particularly useful in LMICs, where access to formal NP testing is limited, and where more culturally appropriate tests need to be developed. In the wake of COVID-19, the examination of these devices is a critical first step in allowing researchers and clinicians the ability to remotely and safely assess cognitive functioning in PLWH, which in doing so has the potential to improve adherence and, in turn, outcomes for PLWH.