FormalPara Key Points

HIV infection has become a chronic illness when successfully treated with combined antiretroviral therapy (cART). The long-term health prognosis of aging with controlled HIV infection and HIV-associated neurocognitive disorder (HAND) remains unclear.

With a research focus on chronicity, pre-emptive documentation of episodes of mild neurocognitive dysfunction is needed to determine their long-term prognosis. This strategy would also seek to optimally represent the entire HAND spectrum in therapeutic trials to assess positive and/or negative treatment effects on brain functions.

No individual agent or group of antiretrovirals has unequivocally showed benefits for treating or preventing HAND in the cART era, but there are promising results, which we critically review in light of the increasing importance of chronicity effects.

Prospective randomized clinical trials should be the preferred approach for HIV neurology (neuroHIV) treatment studies, including optimized adaptive randomization approaches to balance HAND clinical categories in treatment arms.

1 Introduction

HIV infection has become a chronic illness when successfully treated with combined antiretroviral therapy (cART). The long-term health prognosis of aging with controlled HIV infection remains unclear especially in regard to HIV-associated neurocognitive disorder (HAND).

Research on normal aging indicates that our understanding of neurocognitive functioning is best informed through life-span studies [1]. The same analytic framework is required when investigating chronic diseases, which by their nature interact with the aging process. Studies based on large samples and using longitudinal analyses are sorely lacking in HIV neurology (neuroHIV) research, leaving us with an incomplete understanding of the long-term course of HAND in the era of chronic HIV infection [2]. Furthermore, there are several factors unique to neuroHIV that need to be carefully considered alongside normal aging effects. Firstly, the clinical profile of HAND has changed with the introduction of cART, such that the majority of HIV-infected [HIV-positive (HIV+)] persons do not develop HIV-associated dementia (HAD), but rather a milder form of the disease detectable on standard neuropsychological testing. Secondly, current longitudinal cohorts investigating the effects of chronic HIV and/or aging are systematically biased by a survivor effect where most individuals who developed HAD in the pre-cART era have died. The survivor bias also “excludes” those who have died of other AIDS causes and may have developed HAD as they aged. The size of this effect cannot be quantified easily, especially when considering protective factors (e.g., cognitive resilience or resistance to the effects of aging on the brain) that could explain the comparatively low HAD incidence in more recently studied cohorts. Thirdly, prospective data in large samples (N > 1000) on the effect of HIV chronicity on brain functions are currently lacking at an international level, especially those focusing on HIV+ persons who received cART as a first-line treatment (rather than any pre-cART regimen). Moreover, in many countries these people are still fairly young (<50 years old), meaning that we are not yet at a stage where research on aging and HIV chronicity effects can begin in earnest. Nevertheless, it is important to start working towards a consensus on the best strategies for documenting HAND over the life span of HIV infection and how these data could in turn inform treatment guidelines for HAND.

In this review we suggest that, almost 20 years after the initial introduction of cART in 1996, a change in research focus is needed, with a greater emphasis on chronicity effects driving our research strategy, echoing the viewpoints of other experts in the field [3]. For this to succeed, we argue that pre-emptive documentation of episodes of mild neurocognitive dysfunction is needed to determine their long-term prognosis. This strategy would also seek to optimally represent the entire HAND spectrum in therapeutic trials to assess positive and/or negative treatment effects on brain functions.

Our perspective on this issue is informed by neurocognitive research in large prospective cohorts of aging and non-HIV dementia showing that pathological brain processes precede any symptoms by 20–30 years, and that disease expression varies depending on complex relations between age, cognitive/brain reserve, genotype and pathological burden [4, 5]. While the neurodegenerative causes of neurological deficits are different from the viral causes, the chronicity of HIV infection is a “game changer” [6]. Notably, it can be expected that with improved survival rates in the HIV population today, neurodegenerative pathology and low-grade neuroinflammation documented in chronic HIV infection will have time to accumulate in some patients [7]. Our perspective is also anchored in several key concepts of modern quantitative neuropsychology, the most important being that optimal assessment and definition of the baseline level of neurocognitive functioning is essential in order to reliably quantify neurocognitive change, and in turn the longer-term prognosis [8]. Finally, when considered together, these two propositions indicate that sampling for neuroHIV clinical trials will need to include patients falling along the entire HAND spectrum to optimally detect positive and/or negative treatment effects.

2 Clarifying Quantitative Neuropsychology Concepts

Because HIV treatment effect studies are in essence longitudinal, what is relevant for neuropsychological observational prospective studies is also relevant for randomized controlled trials (RCTs) of HIV treatment on brain functions.

In recent years, several neuroHIV researchers have begun to question the validity of the milder forms of HAND, which in the context of the Frascati criteria is indicated by the nomenclatures asymptomatic neurocognitive impairment (ANI) or mild neurocognitive disorder (MND) when functional decline is evident [911]. This debate is important because it indirectly impacts on whether such patients should be included in treatment studies. Questioning the existence of mild forms of neurocognitive impairment is understandable, especially in terms of the immediate clinical relevance and the issue of whether or not to inform patients [11]. However, psychometric, quantitative and clinical neuropsychology concepts should not be truncated in the process. In this section, we will therefore clarify some of those concepts and highlight the importance of their correct scientific definition and implementation, which can be used in designing neuroHIV RCTs. Moreover, we will highlight how mild HAND should be “re-conceptualized” in the context of chronic HIV infection and argue that we should focus on the long-term prognosis of such deficits rather than be primarily concerned with debates over their immediate clinical significance.

2.1 Test, Individual Neuropsychological Measure and Cognitive Domain

A neuropsychological test simply refers to the name or legal appellation of a test (e.g., Trail Making Test; TMT); a neuropsychological measure denotes the (often multiple) relevant outcome measures of a test (e.g., TMTA time to completion; TMTB time to completion); and a cognitive domain is an umbrella term for a set of related neuropsychological measure(s) (but not tests) that are combined on the basis of their unique correlation structure to form an independent cognitive construct, based on factorial analytic studies [12] yielding a model of normal cognitive functioning (e.g., TMTA primarily assesses speed of information processing, psychomotor speed and visual scanning; TMTB also assesses these skills along with aspects of executive functioning and working memory). The terminology confusion is unfortunately present in the Frascati criteria, which only refers to “tests” [13]. This has led to the criteria being wrongly applied in several studies, especially where more than one single neuropsychological measure (not test) is used per cognitive domain. Incorrect application of the HAND criteria has produced extravagant rates of low performance in HIV-negative (HIV−) control samples and neurocognitive impairment in HIV+ samples [14], and some have concluded that the Frascati criteria fundamentally “over-diagnose HAND”.

2.2 Implementation of the Frascati Criteria Using Z-Score Domains

The optimal implementation of the Frascati criteria as delineated in Antinori et al. [13] is dependent on three conditions: (i) a fairly large battery size (at least ten measures; 15 measures and five cognitive domains are recommended at the very minimum); (ii) the use of demographically corrected scores (e.g., age, gender, education, ethnicity), which we will define below; and (iii) the rating of impairment in cognitive domains as delineated in Woods et al. [15]. This paper explains the correct process for rating impairment when a cognitive domain is composed of one or more neuropsychological measures. However, this publication is based on demographically corrected T-scores and deficit scores, which are less commonly used by non-neuropsychologists. We have therefore provided their correct computation using more widely used z-scores in Table 1. While we provide this to improve the standardization of neuropsychological domain rating in HIV infection and implementation of the Frascati criteria, we urge research teams to involve neuropsychologists in the early stages of study planning to avoid computational or conceptual errors between different types of standard scores and impairment rating methods.

Table 1 Correct Frascati cut-offs for cognitive domains defined by one, two and three individual neuropsychological measures

2.3 Normal and Impaired Performance in Clinical Versus Normal Samples

Normative data are sometimes seen as a uniquely neuropsychological problem. However, all types of brain measurements (including biomarkers) are sensitive to non-disease effects, and in particular demographic effects. This issue requires particular consideration in neuropsychology as individual neuropsychological measures (typically administered as part of a test battery) have unique and complex relationships with demographic variables; for example, some have non-linear relationships with brain functions, while others are contextual in nature (e.g., socio-historical effects of ethnicity). The picture is further complicated by the high degree of inter-relatedness amongst demographic variables (e.g., ethnicity or geographical location is sometimes a proxy variable for more direct effects of education and socioeconomic status [16]). However, this is first and foremost a reflection of the brain’s complexity. “Good normative data,” that is, datasets based on a large sample size with well-identified demographic effects broadly representative of a group of people (usually a nation) for each neuropsychological measure is seen as a luxury because it is resource intensive. On the contrary, this approach is in fact less costly at a national level than many other scientific methods because benefits are cross-disciplinary. Importantly, acquisition of neuropsychological data in a healthy control group does not in and of itself constitute “good” normative data per se. Accurate quantification of demographic and/or socio-cultural effects is critical for an optimal norming process. Large samples are necessary to stabilize demographic and other effects relevant to normal performance. This is crucial both for ensuring representativeness (typically at a national level) and stabilizing inter-correlation within a test battery so that the factor loadings reflecting “normal” functioning across cognitive domains can be approximated as closely as possible. This is not to say that small- to medium-sized samples (N = 50–100) of HIV− persons cannot be used as a normative reference in HIV research; however, there are important conditions and limitations to their use. Indeed, if the Frascati criteria are to be applied optimally, then they should only be used in relation to a restricted and closely comparable HIV+ sample (preferably of similar size and, as a bare minimum, comparable for age and sex) [17]. To illustrate some of the issues associated with using a small HIV− control sample to assess the validity of the Frascati criteria, we specifically review a recent study by Meyer et al. [10], which analyzed the false-positive rates arising from different computations of the criteria in a Kenyan HIV− sample (N = 84) as well as a simulation sample. The demographic characteristics of the Kenyan sample were not presented including key variables that we know dramatically influence the stability of normal neuropsychological performance in limited-resource settings [18, 19]. The study also does not report if the tests were culturally adapted for the Kenyan sample, which makes it even harder to determine if uncontrolled socio-demographic effects have affected their performance. Under such circumstances the construct validity of a neuropsychological battery can be substantially reduced, resulting in a given test measuring construct(s) other that the cognitive function(s) that it is intended to measure. In this instance, some of the explanatory variance due to demographic factors is likely interfering with the test construct, meaning that in the context of correctly applying the Frascati criteria, the HIV− sample can only be utilized if compared with a closely matched HIV+ sample, as the criteria assumes that test constructs will be similar for both samples. As further support for their arguments, the criteria were also tested in a somewhat vaguely defined simulated normal sample. However, their computations assumed configurations and correlational structures amongst the test battery that generally fail to reflect the neuropsychology methods advocated in the Frascati criteria, and as such their conclusions serve only to reiterate existing psychometric knowledge gleaned from Classical Test Theory [20]. Even when the CNS HIV Anti- Retroviral Therapy Effects Research (CHARTER) study test battery was considered, at no time did the authors correctly compute the Frascati criteria, as they failed to take into account the neuropsychological measure/domain count specific to this test battery [21]. Their proposition to apply a cut-off at −1.5 SD/cognitive domain is in fact already in use if at least two neuropsychological measures are included in a cognitive domain, as detailed above. It would be interesting, however, if the simulation work was re-conducted after correctly applying the Frascati criteria, and possibly using the Global Deficit Score (GDS) method as they suggest (also see the later section on updating the Frascati criteria).

Strictly speaking, normative data include demographic corrections that have been carefully identified in the norming process [16]. This can be achieved using one of two approaches. One is to develop z-scores stratified by age and education ranges (and sometimes sex). This strategy is used by most test developers in samples that are rarely N > 300, except for some major test batteries [16]. The application of such z-scores is restricted to clinical samples with closely comparable demographics, as explained above [16]. A second, more sophisticated method is to create demographically corrected T-scores (another type of standard score), which represent a predicted value that is corrected for demographic effects using linear and non-linear analyses [22]. This method has been used in large samples (N > 1000) [22]. Importantly, this method actually eliminates demographic effects, while demographically stratified z-scores do not. This means that demographically corrected T-scores provide the closest approximation to the individual’s personal circumstances and therefore produce the most accurate disease-related effect. Performance in large normative datasets is typically distributed according to the Normal curve, especially when averaged across several neuropsychological measures (as in a cognitive domain). Performance at the lower tail of the distribution can be defined as impaired according to statistical criteria. In other words, in a non-clinical sample, this level of performance is not abnormal per se, but represents lower normal performance. As such, equating impaired performance in clinical and normal groups is not correct (another concept that was not operationalized correctly in the Meyer et al. study [10]). This is especially true when using demographically corrected T-scores because the level of impairment is primarily a reflection of a disease effect in the clinical sample. Importantly, because an increasing number of RCTs addressing prevention and treatment of HAND are likely to be conducted in low- and middle-income settings, funding for the establishment of normative data in these countries will be needed.

2.4 Sensitivity and Specificity to HAND, Cut-Offs and Battery Size

Sensitivity and specificity to brain-related disease effects on neuropsychological functions are bound together in an inverse relation. Therefore, to detect a mild level of clinical impairment, as in the case of milder HAND, an ~15 % cut-off of false positives has been proposed [13] (N.B., based on a fairly large battery of at least five cognitive domains and using demographically corrected T-scores). This cut-off yields the best compromise between specificity and sensitivity to HIV-related brain injury [23], and this is the central argument for its existence. Proposing to modify the trade-off by reducing the false positives rate close to zero [10] will systematically result in an almost total loss of sensitivity to mild neurocognitive deficits in HIV+ samples. Considering what we have explained above, namely that the false-positive rate in an HIV− sample is not the exact equivalent of a clinical sample, especially if demographic corrections have been applied, then it happens that the expected level of low normal performance in a clinical sample (the “actual” false positives) using Frascati criteria is really closer to 5 % than 15 %, something that was not adequately represented in Meyer et al. [10]. To further illustrate this point, we will use some of our Australian HIV− sample (N = 49) neuropsychological data (based on seven cognitive domains and 11 neuropsychological measures), reported in Cysique et al. [17]. This sample was closely comparable to the HIV+ sample (N = 90) in terms of standard demographic and lifestyle factors, indicating that despite the small sample sizes, the Frascati criteria can be applied in this context. We determined rates of the mildest level of impairment, requiring two impaired domains according to the following definition: if the cognitive domain is composed of one measure, <−1 SD = domain impaired; if the cognitive domain is composed of two measures, <−1 SD in measures 1 and 2. In our analysis, 15.5 % met this cut-off in the HIV+ group and 14.3 % in the HIV− group. However, on closer inspection, only 6/90 (6.6 %) HIV+ participants showed impairment in two domains between −1 and −1.5 SD. All other cases exhibited lower performance, indicating that only ~5 % of cases represent a non-clinically meaningful level of deficit. As we will outline further below, we recommend these cases be followed up as some of them may be on the path to decline [2426].

Meyer et al. [10] also propose to reduce the range of neurocognitive functions assessed at baseline for HAND diagnosis. HAND is a fundamentally evolving disease [27] due to cART impact, and subtle changes in neuropsychological profiles have been noted between the pre-cART and cART eras [28, 29]. These changes may become even more pronounced when HIV+ persons reach their 70 s, an age at which neurodegenerative processes often translate into neurobehavioral symptoms. Reducing the number of cognitive abilities assessed carries the risk of “missing the target,” particularly in HIV+ aging persons. If applied to the Frascati criteria, these decisions could have far reaching consequences for neuroHIV research in general, for all HIV+ patients, especially as they age, as well as for treatment studies. Finally, what would be the consequences of artificially reducing our capacity to detect mild impairment in the context of chronic treated HIV infection? Given the causes and prognosis of HAND are mostly unknown, the validity of biomarkers and neuroimaging studies would be reduced. The majority of HAND cases in cART-treated cohorts (50–70 %) [17, 21] would also be excluded from RCTs, including a proportion of MND cases if functional impact has not been evaluated in detail [30]. Again, we believe that these cases should be correctly characterized clinically and statistically using the Frascati criteria and included in longitudinal studies for monitoring and/or RCTs for evaluation of treatment effects.

2.5 Mild to Moderate Global Neurocognitive Impairment Does Not Constitute a Negligible Deficit

This argument has been historically demonstrated for various neurological and psychiatric disorders that are diagnosed on the basis of the assessment of neurocognitive functions, and is therefore not specific to HIV [31]. Evidence in non-HIV populations show that such levels of neurocognitive deficit, sometimes on a single neuropsychological measure, are predictive of later deterioration [32]. Furthermore, it is well recognized that neuropathological changes preceding the onset of elderly dementia occur decades ahead and build up slowly over time [5]. Similar mechanisms can be expected in chronic HIV/HAND. More specifically, a history of compromised immunity (if cART was initiated late), and low-grade chronic HIV-related neuroinflammation (that can be present despite cART) are both likely to affect the trajectory to neurodegeneration, especially in those with general dementia genetic risk [33, 34]. This means that global mild to moderate levels of neurocognitive impairment may represent “the tip of the iceberg” in some patients compared to what is happening in the brain, especially if compensatory mechanisms (e.g., brain/cognitive reserve, coping strategies) are considered [35].

2.6 Confounds Versus Diseases Effects

While selecting mild to moderate levels of global neurocognitive impairment as an initial cut-off for clinically relevant impairment has advantages in detecting the earliest organic form of brain injury, this cut-off level can also be sensitive (although this is not systematic [36, 37]) to other conditions such as psychological distress, learning disabilities, very low levels of education, alcohol and substance use disorders and uncomplicated hepatitis C (N.B., an optimal norming process will also reduce the effect of such confounds, especially education). This has led some neuroHIV researchers to suggest that HIV is not driving the neurocognitive impairment behind ANI and MND [11]. However, the reality is more complex. Based on clinical experience and research data, HIV and confounding neuropsychological factors tend to coexist in a complex manner, especially in the context of chronic disease [38, 39]. In fact, when they co-occur they more often interact and/or converge in the same person to worsen mild neurocognitive deficits rather than simply supplant them [40]. This is the most typical profile seen aside from in cases of acute or very severe neurological or psychiatric disease [41], although such cases are usually excluded in current neuroHIV research. In fact, recent US [21] and African [42] data in high confound cases demonstrate very high impairment rates (>80 %). Moreover, another complication of chronic HIV is that HAND does not show a linear path to deterioration. With chronicity, phases of relapse and remission can be separated by years [43]. In this framework, only long-term longitudinal research can detect a link between episodes of ANI or MND. This type of research is still in its infancy in chronic HIV infection, and results thus far suggest that ANI is indeed predictive of future deterioration in cohorts with and without confounding neuropsychological conditions [2426]. Another issue is acceleration of age-related co-morbidities in those with chronic HIV infection (primarily cardiovascular disease, which is a known risk factor for non-HIV dementia). Increased age-related co-morbidity burden has the potential to alter the profile of neurocognitive deficits in HAND either by accelerating HIV-related neurocognitive decline or by involving new cognitive deficits not typical of HIV-related brain injury. As with the confounding conditions delineated above, age-related co-morbidities are also likely to interact in a complex manner with existing HIV-related brain injury rather than simply supplant it. It is within this context that we can understand why arbitrarily deciding that ANI and MND have no prognostic value by excluding them from early detection could have damaging consequences for patients and the research field as a whole. In RCTs, the careful documentation of various confounds should be conducted a priori. Excluding high confound cases is advised when assessing HIV treatment effects, but exclusion of milder confounds should be based on a careful rationale so as to avoid creating totally unrepresentative groups. Finally, the newest adaptive randomization algorithms [44] may be used as a strategy to balance mild confounds between arms.

2.7 Early Versus Late Detection

The argument for refocusing the field on more severe forms of HIV-related brain injury is perplexing if one considers the history of the HAND diagnostic nomenclatures [4547]. All of the earliest terminologies incorporated mild to moderate levels of deficits that were pre-dementia, and even then, progression to dementia was not systematic, only more frequent than it is today, so many cases could have still been considered ANI/MND then. The Frascati criteria provide a more robust neuropsychology framework for their detection (see Fig. 1). The shift to early detection of brain dysfunction has now happened in all areas of the neurological sciences because neuroimaging, neuropathological and neurobiological data convincingly indicate that brain damage precedes evidence of neuropsychological deficit by decades [48]. In contrast, advocating a renewed focus on the more demented forms of HAND [9], when we all agree that they are now relatively rare, really misses the point. The potential consequences of such a strategy in the era of chronic HIV infection have not been sufficiently communicated to the broader HIV community and researchers wanting to lead clinical trials of ART effects on brain functions: (i) in terms of research, reasoning that the disease of interest is rare indirectly justifies lower funding of this area when more is needed to understand the long-term prognosis of such deficits; (ii) as far as patient care is concerned, it contradicts the views expressed by patients with mild HAND when given the opportunity to contribute to the debate [49]; (iii) clinically, there is evidence that with extensive and detailed assessment of everyday living, many ANI are in fact MND [30, 50]; (iv) it contradicts an emerging movement amongst patient advocacy groups for better recognition and destigmatization of early forms of cognitive impairment [51]; and (v) most relevant to our review, it contrasts indications for early cART initiation/modification, which necessitates screening, assessment and monitoring of HAND.

Fig. 1
figure 1

Correspondence between the 1988 ADC and The Frascati HAND diagnostic nomenclatures. The rationale for the overlap between the two nomenclatures is based on the evidence that cART has decreased the clinical severity of HAND [77, 78]. cART combined antiretroviral therapy, ANI asymptomatic neurocognitive impairment, MND mild neurocognitive disorder, HAND HIV-associated neurocognitive disorder, HAD HIV-associated dementia

2.8 Feedback to Patients

It has been stated without supportive evidence [9] that knowing one has some level of cognitive impairment causes distress for patients while the converse does not. From our clinical experience, the reality is more complex. Most patients can be categorized in one of four ways. The first group does not have cognitive problems but like to be screened and reassured that everything is “normal.” The second group experience cognitive difficulties without understanding why and are therefore quite distressed. Objectification of deficits relieves some of that distress because these patients can start to put strategies in place to compensate for these problems [52]. These two groups represent the majority of patients who often participate in research studies. A third group of patients worry about HAND as a manifestation of an underlying co-morbid anxiety disorder. They tend to seek regular neuropsychological examinations (including as research participants), and are convinced “something is going on” despite evidence of normal functioning. We advise these patients be referred for anxiety treatment/therapy, followed up normally for their HIV disease and cognition and given continual reassurance. The fourth group does not seek neuropsychological testing or consciously avoids it. Some of these patients have co-morbid alcohol/substance use disorders and/or chronic psychiatric conditions. They often try to minimize their functional difficulties [53] and are often lost to follow-up in research studies if they participate at all. There is no straightforward way to engage these patients, but it would be advisable to offer them short screening rather than lengthy testing. Neuropsychological feedback also needs to take into account those circumstances. Overall, the principal message that clinicians and researchers need to communicate to patients is that they may experience some degree of cognitive deficit living with the illness; however, it is unlikely to evolve to dementia if they remain stable on cART, and it is better to monitor these mild difficulties similar to other co-morbidities. Our failure to coherently deliver this message produces undue anxiety in too many patients. Finally, in the absence of clear therapeutic interventions for chronic HAND at present, aside from suppressing HIV viral load in the plasma and Cerebrospinal Fluid (CSF), it is highly recommended that tailored psychological care be offered (especially for those with a history of anxio-depressive symptoms), similar to standard of care in other chronic diseases (e.g., cancer) [54], and steps be taken to lower HIV- and non-HIV-related modifiable risks of cognitive dysfunction [55]. These kinds of preventative measures will in fact be less costly over the long term because they positively alter the course to deterioration [56].

2.9 An Update of the Frascati Criteria is Needed

The Frascati criteria should be updated primarily to describe guidelines for monitoring patients’ neurocognitive functioning in the context of chronic treated HIV infection. More research is needed to establish such guidelines. Indeed, the course of HIV-related mild neurocognitive difficulties remains unclear, and expecting all patients to conform to a single trajectory is unlikely to be correct (as elegantly demonstrated in a recent article based on the Multicenter AIDS Cohort Study [57]). Moreover, it is unclear at what level of decline patients should receive further clinical follow-up and/or undergo cART modification. Similarly, it is unclear when patients have fully recovered from a HAND episode. In updating the Frascati criteria, we should also (i) clarify both the correct method of computing the criteria, as outlined above, and the trade-offs in detecting mild forms of HAND in cross-sectional versus longitudinal studies; and (ii) include the GDS as an alternative computational method, as the GDS shows strong validity in small (minimum six individual neuropsychological measures) and large (greater than ten measures) test batteries [58], although when applied in a relatively small HIV− sample (<200), the level of “low normal” performance can still be empirically validated to reach a maximum of 15 %. [58]. This means that the GDS cut-off of ≥0.50 can be slightly modified to obtain the best compromise between sensitivity and specificity without invalidating the Frascati criteria. The strength of the GDS resides in “weighting the neuropsychological data in a similar manner to clinical ratings by considering both the number and the severity of deficits in an individual’s performance throughout the test battery, giving relatively less weight to performances within and above normal limits”[58]. Secondly, “the cutpoint is roughly equivalent to averaging mild impairment on one-half of the component measures” [58] so that the GDS is slightly more conservative than the domain rating methods of the Frascati criteria. However, one caution in using the GDS is that as an average score, clinically significant impairment in one domain could yield an overall normal GDS. While this is unlikely to occur in chronic treated HIV infection, it may occur in older persons with HIV infection also developing neurodegenerative diseases. This is why we propose maintaining both impairment definition methods as they demonstrate excellent equivalence [58, 59], yet the comparatively simple computation of GDS should help to resolve the inconsistent and divergent implementation of the current Frascati criteria in our research field, as outlined above.

3 Antiretrovirals, cART and Neuropsychological Functions in HIV Infection

Current HIV neuropathogenesis models developed from animal and autopsy data demonstrate that the trafficking of HIV-infected peripheral monocytes to the blood–brain barrier (BBB) where they accumulate as perivascular macrophages plays a primary role in HIV-related brain injury. Perivascular macrophages enter the central nervous system (CNS) via a “Trojan horse” mechanism and a major inflammatory response ensues [60]. Models of CNS invasion were conceived in the context of primary HIV infection when the BBB becomes increasingly porous (due to massive HIV replication), facilitating the infiltration of many cell types, including monocytes/macrophages, dendritic cells and CD4+ T cells [61]. This massive HIV seeding in brain tissue corresponds with the severity of HAD observed in untreated HIV+ persons. Reduction of the overall trafficking once on cART could explain why we now see mostly mild to moderate forms of HAND. But overall, in vivo and clinically relevant human data are needed to determine if this model is still relevant in long-term virally suppressed HIV infection. Studies from one group have produced findings in humans supporting the monocytes’ pathway [62] in some virally suppressed patients. But on careful inspection, their evidence is far stronger in cases of HAD than milder HAND and in non-treated or only recently virally suppressed patients, similar to our own findings [63].

An alternative non-exclusive model of current HIV neuropathogenesis is based on the increased HIV compartmentalization between the CNS and body. This model posits that HIV could replicate only in the CNS at low level or be associated with chronic neuroinflammation involving mechanisms that remain to be fully elucidated. In support of this model, CSF viral escape of HIV is more common than previously thought in plasma virally suppressed patients (see Ferretti et al. [64] for a recent review) and presents distinctive env strains compared to plasma [65], suggesting a high level of compartmentalization.

It is unclear at what level of chronic neuroinflammation/residual CNS viral replication HAND becomes neuropsychologically detectable. Yet, several studies have shown that HAND can develop and worsen despite viral suppression in both plasma and CSF in 10–30 % of cases [66, 67]. Furthermore, CSF and neuroimaging studies have demonstrated continued signal of intrathecal inflammation despite viral suppression [6871]. CNS viral replication may happen in part because individual ART agents do not penetrate the BBB in sufficient concentrations and/or two individual agents with good CNS penetration when combined interact via complex CNS efflux mechanisms to reduce CNS entry. There may be even more complex mechanisms, which have recently been reviewed [72]. Finally, BBB impairment in some patients could still allow reasonable cART entry to the CNS, a mechanism proposed long ago by Brew [6], but remains to be investigated systematically in well-defined HAND cohorts. Yet, evidence for chronic BBB impairment in HIV infection exists [73, 74]. This means that cART concentrations in the brains of historically and neurologically advanced patients should not be extrapolated to well-controlled patients without any prior history of advanced HIV-neurological disorder [75]. In other words, clinical and HAND characteristics of a sample should always moderate the interpretation of any RCTs examining cART and brain functions.

4 CPE, Limitations and Future

The CNS Penetration Efficiency rank score (CPE) established in 2008 and revised in 2010 [76] remains the focus of observational studies more than RCTs at the time of writing. In 2009 [77] and 2011 [78], our group published reviews on ART and neuropsychological functions highlighting the need for this treatment effect to be studied in RCTs. At the time, we also emphasized the need for sufficiently powered trials and developed statistical scenarios for different types of analyses and the required power to assist the development of future trials. Since then, results have been mixed, with several observational cross-sectional or prospective studies finding negative or no effect of CPE on cognitive functions or brain structural changes [7982], while others found positive effects on specific neurocognitive functions and protective effects over time on HAND incidence and deterioration [83]. Finally, others have provided cumulative evidence that CPE is associated with greater CSF viral reduction [84]. Controversially, some retrospective studies have been conducted to assess CPE effects [85, 86], finding negative effects on HAD and advanced HIV-related neurological conditions. Prospective observational studies produce effects that are only indicative, as there is a large amount of bias that demands validation in RCTs [87]. However, the risk of bias is highest in retrospective studies as the treatment effect is not controlled while the definition of the outcome of interest is historical and out-dated in the case of HAND. As explained above, there are unfortunately sub-optimal methods for defining HAND. Therefore, using a poor historical definition that varies between countries and fails to document neuropsychological confounds is one of the worst-case scenarios. Because of this and contrary to the dedicated editorial [88], the findings of Caniglia et al. [85], recently published in Neurology, are less than convincing. Selection [89] and ART channeling bias [90] could explain the entire study results as these are impossible to quantify or control using post hoc adjustments. The large sample size very likely yielded a large bias effect so that the CPE absolute risk was in fact unclear. For example, when Zidovudine (AZT) and nevirapine were the unique primary treatment in rich economy countries, doctors were well aware that cognitive improvements had only been demonstrated for those agents [91]. The same thing happened for abacavir, albeit transitionally as issues of cardio-vascular toxicity had not reached the forefront yet [92]. Moreover, as judiciously noted by researchers and clinicians in limited-resources settings [90], AZT and nevirapine currently form part of the first-line treatment in low-income countries. Thus, the message of the Caniglia et al. study is confusing for countries with the highest HIV burden. This example highlights the ongoing need for well-designed RCTs on cART and brain functions including low-income countries, where observational treatment studies and a few RCTs have been successfully conducted (e.g., see [9395] and the review [96] for a more global perspective). Altogether, retrospective studies must not be conducted to assess treatment effects on brain functions, especially now that we understand HAND as a spectrum that has shifted towards milder forms, which need to be adequately represented in RCTs.

One RCT [97] that has been conducted found no evidence for superior neurocognitive functioning in the high-CPE (n = 23) versus low-CPE (n = 19) arms. However, the study lacked statistical power as accrual was incomplete. The authors also noted several differences in baseline characteristics between the treatment arms than could have influenced neurocognitive performance (i.e., the high-CPE arm had numerically lower mean CD4, higher rates of hepatitis C co-infection, and showed a trend for poorer plasma virological suppression, potentially due to antiretroviral instability prior to enrolment). Finally, the patients were followed up at 16 weeks, but there is evidence that recovery from a HAND episode takes at least 42 weeks [98]. The trial demonstrates that accrual strategies have to be well-thought-out, that use of adaptive randomization is highly preferable, and that a larger trial including sites in low–medium-income countries is needed. Nevertheless, we propose here that a fundamental collaborative reworking of CPE itself is needed.

Indeed, some researchers have recently started to ask important questions on reconceptualizing CPE [99] following propositions from Brew and colleagues in 2009 [77]. One study suggested that cART genotypic susceptibility should be considered, validating that a CPE score accounting for this factor is associated with superior neurocognitive performance in a fairly large HIV+ sample [100]. Other findings relating to the potential toxicity of some ART as a factor in HAND neuropathogenesis suggest that a toxicity weighting should also be considered in the CPE score [101]. For example, this is particularly relevant for efavirenz, but results are not settled (see next section). Other factors that will need to be considered include putative brain mitochondrial toxicity (which remains understudied) along with peripheral toxicities (i.e., renal, cardiovascular, oxidative stress) that can lead to/worsen neurocognitive deficits in chronic HIV+ persons.

On the other hand, the cumulatively demonstrated capacity of high-CPE regimens to reduce CSF viral load more swiftly and/or to a greater extent than low-CPE regimens should not be discounted [78]. Assuming a direct linear effect between CPE and neurocognitive functioning is probably incorrect. It is more likely that several mediators including speed of viral decay in the CNS influence the extent of neurocognitive recovery, which itself depends on baseline level of impairment. This interpretation was strongly supported by our 2009 CPE study [98], albeit a non-RCT. This scenario has important implications for a chronic disease—early initiation of high-CPE regimens in patients at risk of HAND could reduce the likelihood (see first study [102] in next section) and severity of relapse. Finally, the reasoning that a group of antiretrovirals are better for brain functioning should not be abandoned as a potential rationale, but rather constructively and empirically improved. Additional supportive evidence is emerging from relatively novel single ART agents, as we will review below. This opinion is shared in the “treatment for HAND” section of a review on Conference on Retroviruses and Opportunistic Infections (CROI) 2015 findings [103].

5 Specific Agents or Classes of Agents

5.1 Efavirenz

There is an excellent review dedicated to efavirenz, neurocognitive functions and neuropsychiatric symptoms [104], and more recently, a systematic review calling for “large RCTs to determine if the neuronal toxicity induced by efavirenz results in clinically significant neurological impairment” [105]. Most recent non-RCT results have been presented at CROI 2015 and reviewed by Spudich and Ances [103], indicating yet more conflicting effects of the drug. Therefore, we focus on recent findings only in RCTs and a novel finding in an animal model study. Between 2010 and 2015, one RCT has been conducted and fully published [106]; another RCT was conducted and presented at CROI 2015.

The first pilot RCT was conducted in therapy-naïve patients defined as “neuro-asymptomatic,” probably on clinical grounds, and included 28 participants randomized to tenofovir/emtricitabine with efavirenz (arm 1), atazanavir/ritonavir (arm 2) or ATZ/abacavir (arm 3). Importantly, it formed a sub-study of a larger trial [107], which showed that efavirenz and ritonavir-boosted atazanavir arms were equivalent in viral suppression and safety. The sub-study design probably explains why a more comprehensive baseline neuropsychological assessment was not included a priori. Improvement was noted across the board, but the efavirenz-containing arm showed the least improvement on the total composite CogState score and the processing speed cognitive domain. Without knowing patients’ baseline level of neurocognitive functioning, it is very difficult to interpret the results. This study clearly illustrates the need to determine HAND at baseline since trajectories for improvement or decline depend more strongly on this than any treatment effects. In other words, detection of any specific antiretroviral effect requires an RCT to be powered above and beyond the main cART effect as well as the practice effect. Also, given the importance of baseline performance in predicting HAND neurocognitive trajectories, it is advisable for adaptive randomization techniques to be implemented in future trials to ensure equal representation of the HAND spectrum across treatment arms. Another criticism of the study is that improvement was detected despite the patients being labeled “neuro-asymptomatic,” potentially indicating that some had ANI (perhaps even MND), while others were cognitively normal. Genuine cognitive improvement would not be expected in neurologically intact persons, so the “improvement” may solely reflect practice effects. However, we cannot be certain that was the case for this particular study [108]. The study ran for 48 weeks, with a medium time point at 24 weeks; the authors noted improvement over this period in different cognitive abilities. The study indirectly confirms that cART RCTs should be based on a medium- to long-term timeline (i.e., at least 52 weeks) to capture complex repair processes within the brain as well as normal fluctuations on repeated testing. The study also used CogState to assess neurocognitive change. While the test batteries developed for CogState were designed to minimize practice effects, this confound cannot be eliminated and is particularly pronounced in some cognitive domains (e.g., executive functioning) where the authors noted improvement at the longer time point. Thus, caution is warranted when interpreting improvement in executive functioning (or any functions for that matter) in terms of ART effects, especially when no attempt to control for potential practice effects has been made. Unfortunately, though, there is still ongoing debate as to the optimal approach for practice effect “extraction” in RCTs. Further, consultation with biostatisticians and neuropsychologists is advised. This trial also illustrates another problem we have already outlined above, namely that if assessments lack sufficient comprehensiveness, improvement in some cognitive domains may be missed, as different neuropsychological measures behave differently on repeated testing, reflecting both test idiosyncrasies and specific brain functions.

The second RCT was presented orally at CROI 2015 [102] and is yet to be published at the time of writing. We include it here because it was based on the largest ever sample for an HIV treatment RCT in the cART era and aimed to assess cART prevention of HAND. HIV+ therapy-naïve adults (N = 250) were randomized to either AZT-lamivudine-nevirapine or tenofovir-lamivudine-efavirenz. Comprehensive neuropsychological assessment determined that all were neurocognitively normal as per Frascati criteria using the GDS method. This is an important point given the primary study aim to assess incident neurocognitive impairment; if the authors had used a short battery (adopting the rationale that some domains should not be tested) or determined neurological status based on clinical grounds, reliability in the determination of this outcome would have been substantially weakened. Using standard regression-based change scores derived from a demographically comparable control group (methods that control for practice effects and regression towards the mean but add historical bias in an RCT), the authors found incident impairment at 96 weeks was greater in the tenofovir-lamivudine-efavirenz arm than in the other arm. However, a higher proportion of adverse events were noted in the AZT-lamivudine-nevirapine arm. These results support the rationale that not all antiretroviral regimens are equivalent for preventing HAND. CNS and peripheral toxicity cannot be excluded as a potential confounding factor while viral suppression was equivalent between regimens. In the anticipated publication, it would be important for the authors to detail the trajectories of individuals according to their baseline performance and whether particular neurocognitive functions were associated with decline in the efavirenz arm. Nonetheless, the study represents a strong proof-of-concept for larger implementation.

Decloedt and Maartens [105] note that “several mechanisms exist to explain the observed efavirenz neurotoxicity, including altered calcium hemostasis, decreases in brain creatine kinase, mitochondrial damage, increases in brain proinflammatory cytokines and involvement of the cannabinoid system.” Moreover, another mechanism has recently been proposed which could have important implications for aging HIV+ persons. Indeed, Brown et al. [109] have found in a murine model (murine N2a cells transfected with the human “Swedish” mutant form of amyloid precursor protein) that efavirenz promotes β-secretase expression and increased Aβ1-40,42 via oxidative stress and reduced microglial phagocytosis. While preliminary, this finding suggests that some efavirenz effects could be long term, potentially explaining the conflicting research findings to date relating to the drug [104].

5.2 Fusion/Entry Inhibitor (Maraviroc) Intensification Studies

Maraviroc has good CSF penetration as well as anti-neuroinflammatory properties [110113]. Preliminary data supporting a potential neurocognitive benefit of a maraviroc-intensified regimen were reported in a sub-analysis of a recent single-arm, open-label pilot study. A small subset (n = 6) of HIV+ participants with mild to moderate global neurocognitive impairment improved over 24 weeks [114]. In a slightly larger prospective, double observer-blinded, open-label pilot RCT [115] (all established HAND cases, virally suppressed in plasma and CSF: n = 9 maraviroc-intensified arm and n = 5 existing regimen control arm; assessed at baseline and 6 and 12 months on a five-cognitive domain battery), we found medium to large effect sizes favoring improved global neurocognitive performance in the maraviroc-intensified arm over time (after correcting for practice effects and using adaptive randomization for HIV factors and mild co-morbidities). Both studies support the need for larger RCTs.

5.3 Protease Inhibitors

Recently, several European observational studies have investigated whether protease inhibitor (PI) monotherapy/dual therapy differs from more traditional triple therapy in terms of neurocognitive outcomes [116118] given they could be considered as alternative regimens (lower cost, less toxicity) in patients with chronic HIV infection who are otherwise well-controlled if they have sufficiently high genetic barriers to resistance [119]. For example, the largest observational cross-sectional study (N = 191) [117] assessing boosted lopinavir or darunavir as monotherapy versus triple ART found the mild to moderate neurocognitive impairment rate did not differ in otherwise low-confound and well-controlled patients. This finding was subsequently extended in a prospective observational cohort [120]. However, well-powered longitudinal RCTs are still needed to further determine the validity of these results. On the same topic, Spudich and Ances [103] highlighted the findings of Caramatti et al. [121], who showed in a multicenter, randomized, open-label trial that atazanavir/ritonavir (ATV/r) monotherapy (n = 28) and ATV/r triple therapy (n = 37) yielded similar neurocognitive improvement over time. Specifically, HAND prevalence at baseline was 66 % and this dropped to 37 % after 96 weeks, with no between-group differences. These results lend support to an earlier RCT (N = 200) [93], which similarly found no differences in neurocognitive impairment rates at baseline and 48 weeks between patients randomized to second-line lopinavir/ritonavir-based triple therapy versus lopinavir/ritonavir monotherapy, albeit using a reduced number of neurocognitive tests. This alternative treatment could be used in some well-controlled chronic HIV+ patients if it is found to not be associated with HAND incidence over the long term. However, larger RCTs are still needed to fully determine its safety impact on the CNS, particularly the risk of CSF viral escape.

5.4 Integrase Inhibitors

The integrase inhibitor raltegravir demonstrates reasonable CNS penetration [122, 123], albeit with large inter-individual variability [124]. Raltegravir is widely used in rich-economy countries, mainly as part of a second-line treatment. Raltegravir-intensification studies focusing on neurocognitive effects are unlikely to eventuate because of its widespread existing use in contributing to high CPE regimen in chronically HIV+ patients [17]. Another promising agent in this regard is dolutegravir, which also shows decent CSF concentration [125]. Larger RCTs dedicated specifically to neurocognitive changes are needed for these agents, potentially as part of a modified CPE score given their widespread clinical use to date.

6 When Should cART be Initiated to Avoid Incident HAND?

At the time of writing, results of the large RCT Strategic Timing of Antiretroviral Treatment (START) had just been announced by National Institutes of Health (NIH) [126]. This study is arguably the RCT needed to settle the debate over benefits versus adverse effects of early cART. The main study results provide a clear conclusion: “the initiation of antiretroviral therapy in HIV-positive adults with a CD4 + count of more than 500 cells per cubic millimeter provided net benefits over starting such therapy in patients after the CD4+ count had declined to 350 cells per cubic millimeter” [127]. Importantly, with more HIV+ patients starting treatment earlier, the clinical prevalence of HAND is likely to shift even further towards milder forms, emphasizing the need for early detection and at least passive monitoring using screening tools validated for longitudinal assessment. However, early cART is not without its long-term consequences in terms of cumulative toxicity and potential neuro/cardiotoxicity, as well as variable adherence level in different HIV populations. These will need to be carefully considered moving forward. Finally, the practical and financial complexities of an early cART implementation program have already been noted, especially at an international level. On the positive side, there is a strong argument for early ART as an indirect mode for reducing new HIV infection [127]. The START study included a neurology sub-study, for which the results are pending.

7 Future Research Directions

While neuropsychological assessment remains a very useful method to assess current neurocognitive functioning and determine the clinical relevance of neurocognitive impairment, HIV treatment studies should start to also include neuroimaging outcomes, when possible, in order to enhance the neurobiological validity and interpretation of HIV treatment effects. Not all neuroimaging outcomes are valuable for ART effects on brain functions, given the subtle nature of change observed. Structural Magnetic Resonance Imaging (MRI) methods are probably insensitive as brain macro-structural changes (e.g., atrophy) are unlikely to be directly affected by treatment effects (at least over the time period that most RCTs are conducted). However, one particular method should be strongly considered as it measures the chemical mechanisms that putatively underlie treatment effects: magnetic resonance spectroscopy (MRS). Indeed, this in vivo imaging is quick, non-invasive and can provide key information regarding neurochemical abnormalities associated with HIV or treatment effects [128] during/prior to any neurocognitive evidence of HAND and even during primary HIV infection [129], as well as potential ART-related neurotoxicity [130], which needs to start to be seriously considered in longitudinal research. As with neuropsychological data, careful characterization of the baseline metabolite profile is very important for any RCT [131]. Also, demographic effects ought to be characterized. Determination of other technical conditions should ideally involve MRS experts, include voxel size/positioning depending on the regions of interest, type of signal reference, sequences for absolute/relative concentrations, signal fitting and reliability, various signal corrections as well as the timeline of measurements [132]. Finally, because MRS is relatively expensive (though industry-sponsored studies will likely afford the cost) and/or not all countries have access to MRI scanners and MRS expertise, not all RCTs may be able to include this method in the near future. However, recent technological improvement makes the acquisition of spectroscopic data quicker than ever; additionally, there is already a precedent for collaborative MRS neuroHIV treatment studies between countries with a participant cohort recruited in a low–middle-income country, demonstrating its future potential in such research [133].