Introduction

The human immunodeficiency virus (HIV) can induce neurocognitive (NC) impairment [1,2,3,4]. The umbrella term for the spectrum of NC disorders that present in people with HIV (PWH) is HIV-associated neurocognitive disorders (HAND) [5,6,7]. Even though advances in antiretroviral treatment have dramatically decreased the incidence of more severe forms of HAND, the incidence of mild NC impairment persist among PWH [6,7,8,9,10].

Milder forms of HAND have been associated with impaired instrumental activities of daily living, employment difficulties, and a worse overall quality of life. Escalating degrees of NC impairment are also associated with higher mortality rates, lower adherence to complicated treatment regimens, and poorer health-related decision-making [11,12,13,14,15,16,17,18]. Given the functional consequences of HAND, it is vital to identify early signs of NC impairment as soon as possible. Identifying early signs of NC impairment aids in the long-term clinical management of HAND, the initiation and adjustment of treatment regimes, and the monitoring of disease progression and treatment effects [8, 19, 20]. Moreover, diagnosing HAND early provides the opportunity for additional neuroprotective and psychosocial therapies that minimise NC decline, improve cognitive reserve, and ultimately improve the quality of life [5, 8, 21].

South Africa is at the epicentre of the global HIV epidemic [22]. Therefore, it is imperative that culturally sensitive tools are identified to facilitate early detection of HAND in this vulnerable population. The HIV Neurobehavioral Research Center’s International Neurobehavioral Battery (HNRC Battery) is a comprehensive assessment tool sensitive to the NC effects of HIV [23]. It was initially developed for use in research settings and has been used successfully in South African HIV studies [24,25,26]. The HNRC Battery conforms to the Frascati recommendations published by Antinori et al. [5] and is sensitive to both cortical and subcortical patterns of NC impairment [27]. The extensive battery could be a viable option to aid in the early detection of HAND in the South African context [23]. However, like other extensive NC tests, the HNRC Battery present various challenges in resource-limited settings. First, the administration of the HNRC Battery is time consuming, taking an average of two hours to complete. This is problematic for resource-limited settings, like South Africa, that are often faced with time constraints and a lack of local expertise. Second, to our knowledge, no South African norms are currently available for this battery [25]. If South African norms are made available, an abbreviated version of the HNRC Battery could have utility in both research settings (i.e., for studies on HAND in SA) and clinical settings (e.g., regional, and tertiary hospitals where referral to a neuropsychologist for further evaluation, when indicated, is possible).

Norms can be defined as the performance of a well-defined population that provide an empirical frame of reference for determining which test scores are “normal” or “typical” at a specific time point [28]. Different cultural environments emphasise or de-emphasise differing abilities based on ecological demands and situational relevance, which may impact on performance, administration, and interpretation of NC measures [19, 29,30,31,32,33]. Therefore, cultural aspects are important considerations when determining what constitutes a “normal” or “typical” NC test performance [5, 19, 33,34,35,36,37,38]. Without culturally appropriate norms, test scores derived from the HNRC Battery may result in significant errors in diagnosis (false positives and negatives) [19, 29, 33, 38, 39]. The present study sought to address this limitation by developing demographically corrected neuropsychological norms for the HNRC Battery in the South African context.

Methods

Study Design and Setting

This study was nested within a larger study that sought to enhance the practicality of the HNRC Battery for use in South African clinics. The study consisted of three distinct phases. The first phase entailed the development of demographically corrected South African norms for the HNRC Battery, which is described in this paper. These norms were then used in the second and third phases, which involved the development and validation of an abbreviated version of the HNRC Battery (see Spies et al. [40]). The study was cross-sectional in design and data collection ran from May 2016 to June 2019.

Recruitment

Using convenience sampling, HIV-negative South African adults were recruited from the Cape Metropolitan and Winelands areas in the Western Cape of South Africa. In addition, the study made use of secondary data collected from an ongoing longitudinal, prospective study (Ethics reference number: N07/07/153) [26].

NC performance can be influenced by several confounding factors, which hinder the validity of research focused on the effects of HIV on NC test performance [3, 19]. To ensure that the norms developed were not influenced by confounding variables, the present study used exclusion criteria consistent with prior NC norming studies in low- and middle-income countries (LMICs) (e.g., [41,42,43,44,45,46,47]). Specifically, participants were excluded if they met the following criteria: (a) a history of neurological disease (e. g. dementia, seizure disorders); (b) severe head injury resulting in loss of consciousness for more than 30 min; (c) prior neurosurgery; (d) a history of psychotic disorders; (e) current anxiety and mood disorders or high suicidality (as measured by the Mini International Neuropsychiatric Interview 7.0 [MINI 7.0] [48, 49]; (f) post-traumatic stress disorder (PTSD); (g) a history of learning disorders (e.g., dyslexia) or Attention Deficit/Hyperactivity Disorder (ADHD); (h) past or current chronic use of psychotropic medication; (i) current severe alcohol use disorder; (j) regular cannabis use in the last six months; and (k) drug abuse in the last 2 years, excluding cannabis. To ensure that participants were able to read and understand the informed consent documents and complete the neurocognitive assessment, the following exclusion criteria applied: (l) an inability to read or write in either English, Afrikaans, or isiXhosa; and (m) formal education of fewer than 7 years.

The final sample included 500 volunteers who tested negative for HIV infection using a Rapid HIV-1 blood Test. Participants completed the HNRC Battery in the three official provincial languages of the Western Cape [50]: English (n = 200), isiXhosa (n = 150), or Afrikaans (n = 150). Each participant received a shopping voucher to the value of ZAR100 (about 7$USD at the time of the study) as a token of gratitude. Travel costs to the university were also reimbursed.

Procedure

Ethical clearance was obtained from the Health Research Ethics Committee of the Faculty of Medicine and Health Sciences of Stellenbosch University (reference number: S15/05/124).

Participants were recruited using four approaches: (1) advertisement on social media platforms (i.e., www.gumtree.co.za and www.facebook.com); (2) flyers posted on notice boards in shops, churches, and clinics; (3) an advertisement in a local community newspaper; and (4) snowball recruitment. The recruitment process is outlined in Fig. 1.

Fig. 1
figure 1

Recruitment process flowchart. NC neurocognitive; PTSD posttraumatic stress disorder

Potential participants were initially screened for eligibility. Eligible participants were invited for assessment. Data were collected face-to-face in a once-off session in a private office on campus. Participants were fully briefed on the study details and provided written informed consent. Demographic characteristics (e.g., age, sex, race, and years of education) were captured using a self-report questionnaire. Participants were screened for current and lifetime psychiatric disorders using the M.I.N.I. 7.0 [48, 49]. The HIV status of each participant was confirmed via a Rapid HIV-1 blood Test. Confirmation of HIV status, coupled with pre- and post-test counselling, was either conducted at a government clinic specialising in family planning or by qualified study staff on-site.

NC test administration was conducted by researchers (a doctoral student and a professional research nurse), who received standardised training in the administration and scoring of the HNRC Battery. Training was conducted at Stellenbosch University in face-to-face meetings and included several rounds of supervised “mock testing” and interrater reliability sessions. To ensure consistency across assessments, the battery administrators had to follow a structured instruction manual verbatim during each assessment. Test administrators were regularly monitored during the study. Training was provided by a research psychologist, who was previously trained and certified in the administration and scoring of the HNRC Battery at the HNRC, University of Californian, San Diego.

Neurocognitive Measures

The HNRC Battery typically takes 2–2.5 h to complete and is available in English, Afrikaans, and isiXhosa. Instructions for the NC battery were translated to Afrikaans and isiXhosa using standard techniques of forward and back translation. The battery consists of 17 individual test measures that evaluate seven cognitive domains known to be susceptible to the effects of HIV, i.e., learning, delayed recall, processing speed, attention/working memory, executive function, verbal fluency, and motor ability [3, 4, 7].

Learning and Delayed Recall

Immediate recall, learning rate, and delayed recall were measured using the Brief Visuospatial Memory Test-Revised (BVMT-R) [51] and the Hopkins Verbal Learning Test-Revised (HVLT-R) [52].

We used a modified version of the HVLT-R, adapting some of the semantic categories included in the original test to be culturally appropriate to the South African context. Precious stones such as “emerald”, “sapphire”, and “opal” are less known in South Africa. Therefore, the precious stones category was replaced with vegetables (bean, lettuce, corn, and potato). The words were also translated into Afrikaans and isiXhosa [26, 30].

The BVMT-R has demonstrated good interrater reliability, with reliability coefficients of 0.97 for the three learning trials, 0.98 for total recall, and 0.97 for delayed recall [51]. Test–retest reliability coefficients ranged from 0.60 to 0.84 for trial one to three, respectively [51]. The BVMT-R also has established construct validity [51]. The HVLT-R has acceptable reliability, with test–retest coefficients of 0.74 for total recall and 0.66 for delayed recall [52]. Interform reliability were also established, showing equivalent performance between different forms in both learning and delayed recall [52]. Furthermore, the HVLT-R demonstrated acceptable discriminant validity [52].

Processing Speed

Information processing speed was measured using two sub-tests of the Wechsler Adult Intelligence Scale-III (WAIS-III): the Digit Symbol and Symbol Search tests [53], and the Trail Making Test A [54].

The WAIS-III has established good test–retest reliability coefficients (0.88 to 0.94) and face, content, criterion-related, and convergence validity [53]. The Trail Making Test A have shown test–retest reliability coefficients ranging from low (0.46) to high (0.94) [54].

Executive Function

The Colour Trails Test 1 and Test 2 [55], the Stroop Colour and Word Test [56], the computer version of the Wisconsin Card Sorting Test (WCST) [57], and the computerised version of the Halstead Category Test [58, 59] all measured executive function and abstraction [55,56,57].

Colour Trails Test 1 and Test 2 have demonstrated test–retest coefficients of 0.64 and 0.78, respectively. Content and convergent validity have also been established [55]. The Stroop test has established good test–retest reliability, with the three sub-tests obtaining reliability coefficients of 0.86, 0.82, and 0.73, respectively [56]. Test–retest reliability coefficients of the WCST ranged from 0.37 to 0.72 [57]. The Halstead Category Test has demonstrated high internal consistency (0.95) and a test–retest reliability ranging from 0.60 to 0.90 [57]. There is no difference in how the standard and computer versions of the Category Test is administered or how it is recorded. The instructions given to the examinee are identical, as are the experiences of the examinees [60]. Further, there seems to be no statistically significant differences between the standard and computer versions when measuring total error scores, sub-test error scores, or Neuropsychological Deficit Scale scores [61].

Attention/Working Memory

The Wechsler Memory Scale-III (WMS-III) Spatial Span sub-test [58] and the Paced Auditory Serial Addition Task (PASAT): 50-item Short Form [62] were used to measure attention, concentration, and working memory [58, 62].

The WMS-III Spatial Span sub-test has demonstrated adequate internal consistency, generalisability coefficients, and test–retest coefficients ranging from 0.70 to 0.79 [63]. The PASAT has demonstrated very good test–retest reliability (0.73 to 0.96) and high internal consistency (0.90) [62].

Verbal Fluency

The Controlled Oral Word Association Test (COWAT)—FAS, the Category Fluency Test—Animals, and the Action/Verb Fluency Test are language tests included in the HNRC Battery to measure different types of verbal fluency [57]. The Afrikaans and isiXhosa versions of the COWAT were adapted. Specifically, the letters “F”, “A”, and “S” were replaced with “I”, “B”, and “S” in the isiXhosa version and “L”, “B”, and “S” in the Afrikaans version. In the isiXhosa translation, the selection of replacement letters was based on rank ordering the frequency of words in both an English and an isiXhosa dictionary. The isiXhosa words with a similar rank order to that of the English words beginning with the letters “F”, “A”, and “S” were selected. The same approach was used for the Afrikaans version of the verbal fluency tests [26, 30]. The COWAT (FAS) has demonstrated high internal consistency (0.83) and test–retest reliability coefficients (0.74) [57].

Motor Ability

The Grooved Pegboard Test [64] evaluates fine motor coordination and fine motor speed for both dominant and non-dominant hands [64]. The Grooved Pegboard Test has demonstrated test–retest reliability coefficients ranging from 0.67 to 0.86 [64].

Statistical Analysis

Data analysis was done using Statistica version 12 [65] and R software [66], in partnership with the HNRC and a statistician from the Centre for Statistical Consultation at Stellenbosch University.

First, a regression analysis was performed to identify demographic characteristics significantly affecting the raw NC scores for each test.

Next, prediction equations were generated using the “mfp” R package [67], and the “test2norm” R package [68]. In the first step of this normative procedure, the raw scores for each NC test were converted into normally distributed scaled scores (M = 10; SD = 3). A multiple fractional polynomial (MFP) method [69] was then used to generate predicted test scores for each participant. The demographic variables that accounted for significant variance in raw test scores (i.e., age, education, gender, race, and NC test language) were entered into this MFP model to control for the variance in NC performance accounted for by these characteristics. Next, residual scores were calculated by subtracting the predicted scaled scores of each participant from their respective scaled scores. The residual scores were then converted to demographically corrected T-scores (M = 50; SD = 10) [69].

Finally, the impairment rates based on norms developed in the United States of America (U.S.) and the impairment rates based on the newly developed South African norms were compared. African-American norms were used for the self-described Coloured and Black South African samples and Caucasian norms were used for the self-described White South African sample [70,71,72,73]. To estimate the severity of “impairment”, demographically corrected T-scores (as determined by the U.S. norms and newly developed South African norms, respectively) were converted to deficit scores. In an imaging study that compared the accuracy of NC impairment classification methods in an HIV sample (i.e., Global Deficit Score [GDS], Frascati, and Meyer methods), the GDS criteria successfully detected brain abnormalities in an HIV-infected sample, supporting the continued use of this method in determining HIV-associated brain abnormalities [74]. GDS scores were converted as follows: 0 (T-score ≥ 40) = normal cognition; 1 (T-score 35–39) = mild NC impairment; 2 (T-score 30–34) = mild-to-moderate NC impairment; 3 (T-score 25–29) = moderate NC impairment; 4 (T-score 20–24) = moderate-to-severe NC impairment; and 5 (T score < 20) = severe NC impairment. The GDS was determined by averaging the deficit scores across all tests. Global NC impairment was assigned to participants with a GDS ≥ 0.50. The U.S. GDS and South African GDS were compared by calculating the proportion of neurologically intact, HIV-negative individuals defined as “impaired” by each set of normative equations.

Results

Demographic Characteristics

A total of 500 HIV-negative participants (age range = 18 to 63 years, mean age = 31.2, SD = 10.9 years) were evaluated. Most participants were women (n = 398; 79.6%) and self-identified as Black (n = 301; 60.2%). Most participants (n = 285; 57.0%) self-reported not completing secondary school education (grade 12) (M = 11.0, SD = 1.9), were unemployed (n = 373; 74.6%), and lived on an annual household income of below R20 000 (n = 329; 65.8%). isiXhosa was the home language spoken by most participants (n = 285, 57.0%).

The HNRC Battery was administered in English (n = 200), isiXhosa (n = 150), and Afrikaans (n = 150). Participants could complete the test battery in their preferred language. Most participants who completed the English battery, did not list English as their native language (n = 158; 79.0%), but chose to take the battery in English—the primary language of learning and teaching (LoLT) in many South African schools [75, 76]. However, all participants who completed the battery in isiXhosa and Afrikaans, listed isiXhosa and Afrikaans as their native languages, respectively.

A convenience sampling method was used throughout the study and language groups were not matched. The demographic differences between the three language sub-groups were compared to identify significant differences between the groups. Linear variables were compared using analysis of variance (ANOVA) and significant differences in age (F(2, 497) = 18.68, p < 0.001) and education (F(2, 497) = 11.321, p < 0.001) were observed between language groups. Categorical variables were compared using Chi-Square tests of association. Significant differences between groups were found on gender (X2(2, 500) = 21.86, p < 0.001), home language (X2(2, 500) = 418.23, p < 0.001), race (X2(2, 500) = 458.06, p < 0.001), marital status (X2(2, 500) = 38.80, p < 0.001), household income (X2(2, 500) = 72.64, p < 0.001), employment (X2(2, 500) = 6.54, p = 0.038), and handedness (X2(2, 500) = 13.47, p = 0.001).

Table 1 presents the demographic characteristics of the sample by language.

Table 1 Demographic characteristics of the sample based on language of administration of the HNRC battery

Demographic Influences on Raw Scores

Age, education, gender, race, and testing language accounted for significant variances in raw test scores. The percentage of variance in raw test scores uniquely accounted for by each of these demographic variables are presented in Table 2.

Table 2 Percentage of Variance in Raw Test-Scores adjusted for Demographic Variables

Race, age, and education had strong effects on NC performance. Age accounted for the largest percentage of variance explained in raw scores on tests of visual episodic memory and delayed recall (BVMT-R Total Score: 11.75%, F(1, 489) = 79.02, p < 0.001; and Delayed Recall: 9.23%, F(1, 489) = 59.05, p < 0.001), speed of information processing (WAIS-III Symbol Search test: 8.14%, F(1, 489) = 56.78, p < 0.001), abstraction/executive functions (Colour Trials 1: 6.62%, F(1, 489) = 42.57, p < 0.001; and 2: 4.70%, F(1, 489) = 30.82, p < 0.001), and motor function (Grooved Pegboard Test, dominant: 6.94%, F(1, 489) = 50.27, p < 0.001; and non-dominant hand: 4.98%, F(1, 489) = 30.49, p < 0.001). The results were all in the expected direction of younger participants performing better.

Education had the strongest effect on raw score variance in tests of verbal episodic memory and delayed recall (HVLT-R Total: 4.52%, F(1, 489) = 31.32, p < 0.001; and Delayed Recall: 4.39%, F(1, 489) = 26.43, p < 0.001), measures of processing speed (WAIS-III Digit Symbol test: 14.72%, F(1, 489) = 124.84, p < 0.001; and Trail Making Test A: 4.44%, F(1, 489) = 27.60, p < 0.001), attention/working memory (WMS-III Spatial Span: 5.87%, F(1, 489) = 36.98, p < 0.001), abstraction/executive functions (WCST: 1.73%, F(1, 487) = 9.26, p < 0.001), reading fluency (Stroop Word Test: 6.06%, F(1, 488) = 37.05, p < 0.001), and language (COWAT—FAS: 6.46%, F(1, 488) = 36.94, p < 0.001; the Category Fluency Tests—Animal; 9.37%, F(1, 486) = 58.84, p < 0.001; and Action/Verb Fluency: 7.20%, F(1, 485) = 45.73, p < 0.001). Higher education was associated with a better performance.

Race was the strongest predictor of performance on tests of attention/working memory (PASAT: 9.15%, F(2, 489) = 27.38, p < 0.001) and abstraction/executive functions (Stroop Colour Test: 4.47%, F(2, 488) = 14.92, p < 0.001; Stroop Colour-Word Test: 6.42%, F(2, 488) = 21.46, p < 0.001; and the Halstead Category Test: 7.21%, F(2, 488) = 25.80, p < 0.001).

Minor statistically significant effects of gender were also observed on some of the NC tests. Specifically, women performed better on tests of verbal episodic memory and delayed recall (HVLT-R Total: 0.93%; F(1, 489) = 6.45, p = 0.011; and Delayed Recall: 1.04%, F(1, 489) = 6.26, p = 0.013), and one test of processing speed (WAIS-III Digit Symbol test: 1.68%, F(1, 489) = 14.24, p < 0.001). Men performed better on tests of abstraction/executive functions (Halstead Category Test: 2.56%, F(1, 488) = 18.32, p < 0.001; and WCST: 0.96%, F(1, 487) = 5.16, p = 0.023), attention/working memory (WMS III Spatial Span: 1.71%, F(1, 489) = 1.79, p = 0.001), processing speed (Trail Making Test A: 0.92%, F(1, 489) = 5.70, p = 0.017), and motor function (Grooved Pegboard Test: dominant hand; 0.95%, F(1, 489) = 6.91, p = 0.009).

Language of test administration had a significant effect on raw score variance in two tests measuring verbal fluency (Category Fluency Tests – Animal: 2.14%, F(2, 486) = 6.72, p = 0.001; and Action/Verb Fluency Test: 3.37%, F(2, 485) = 10.72, p < 0.001), one test of processing speed (WAIS-III Digit Symbol test: 1.00%, F(2, 489) = 14.24, p = 0.015), and one test of attention/working memory (PASAT: 1.18%, F(2, 489) = 3.54, p = 0.030).

Generation of the Prediction Equation

Table 3 summarises the raw score means and standard deviations obtained on each of the NC tests in the norming sample (n = 500). These raw scores were converted to normalised scaled scores (M = 10; SD = 3). The raw-to-scaled score conversions for each NC test are presented in Appendix 1. The formulas used to convert the NC scaled scores to demographically corrected T-scores are presented in Appendix 2.

Table 3 Raw test-scores (M, SD) for each NC test in the HNRC battery

Comparison with U.S. Norms

The U.S. T-scores (corrected for age, education, gender, and race) and the newly generated South African T-scores were compared based on the proportions of South African participants defined as “impaired” by each set of normative equations. The U.S. norms for Colour Trials 1 and 2 could not be accessed and were not included this comparison. To estimate the severity of “impairment” in the norming sample, demographically corrected T-scores were converted to deficit scores. Deficit scores across all tests were averaged to compute the Global Deficit Score (GDS). Global impairment was assigned to participants with GDS ≥ 0.50. The impairment rates of the sample as estimated by the U.S. GDS and South African GDS are presented in Table 4.

Table 4 NC Test T-scores and Percentage of participants “Impaired” (T < 40) based on U.S. Norms and New South African Norms

The U.S. T-scores generated impairment rates ranging from 15.6% (Grooved Pegboard Test: dominant hand, T-score = 50.2) to 61.1% (Stroop Colour Test, T-score = 37.7). The Grooved Pegboard Test (dominant hand) was the only test that obtained an impairment rate of less than 16%, which is the expected prevalence based upon a 1-SD cut-off for defining “impairment” on individual test measures. More than half of the sample was classified by U.S. norms as impaired by eight of the individual NC tests: the Stroop Colour Test (61.1%); WAIS-III Digit Symbol test (56.4%); Category Fluency Test (Actions) (55.6%); Stroop Colour-Word Test (54.7%); PASAT (53.8%); Trail Making Test A (52.0%); Stroop Word Test (50.9%); and Halstead Category Test (50.3%). A global impairment rate of 62.2% was obtained across tests based on the U.S. GDS. In comparison, the newly developed South African norms generated impairment rates ranging from 13.6% to 16.6%, and the South African GDS evidenced a global impairment rate of 15.0%. When applying the South African norms, the Stroop Word Test was the only test with an impairment rate above 16% (16.6%).

Discussion

In the present study, South African norms that corrected for age, education, gender, race, and test administration language, were generated for the full HNRC Battery—a comprehensive battery that measures several NC domains sensitive to HIV-related impairment [23]. The HNRC Battery is widely used in international neuroHIV studies and norms for these tests were developed in several LMICs, including Cameroon [45, 77], China [42], Zambia [44], and India [43]. To our knowledge, prior to this study, there were no South African normative data available for the full HNRC Battery.

When age, education, sex, and race-corrected U.S. norms for the HNRC Battery were applied to the performance scores of our sample of neurologically intact, HIV-negative individuals, a high impairment rate of 62.2% was observed. In contrast, using the standard 1-SD cut-off for defining “impairment”, the expected rate of 15.0% was observed for demographically adjusted South African norms. Given that NC tests evaluate abilities that are highly influenced by different historical, cultural, economic, and sociological environments [5, 19, 29, 31, 38], these results were expected and emphasise the need for country-specific NC norms. Similar conclusions were drawn in other international studies reporting significant differences in NC test scores across countries [33, 37, 41, 43, 44, 47].

The present study also identified several demographic effects (i.e., age, education, race, gender, and test administration language) that influenced the NC performance of participants. This finding is in keeping with other norming studies conducted in LMICs [37, 41,42,43,44,45,46,47, 77,78,79,80]. These demographic characteristics can all potentially influence NC test performance, thereby highlighting the general importance of controlling for demographic characteristics when developing norms for NC measures.

Demographic Factors in Norm Development

Age

Age had the strongest influence on some tests of visual learning and delayed recall (BVMT-R –Total and Recall); processing speed (WAIS-III Symbol Search test); abstraction/executive functions (Colour Trials 1 and 2); and motor functioning (Grooved Pegboard Test—dominant and non-dominant hands), always with younger age being associated with better performance. Verbal fluency was the only NC domain that did not show any significant age effects. These age effects are not surprising as the same effects have been seen on other tests of these constructs. Studies on aging typically found that cognitive change is part of the normal aging process for some NC abilities, such as memory, certain language and visuospatial skills, executive functions, and processing speed [81,82,83,84,85]. The considerable influence of age on NC test performance was reiterated in several norming studies conducted in LMICs, with the general pattern of NC test performance showing a significant decline with increasing age [37, 41, 43,44,45,46,47, 77,78,79,80].

Education

Also consistent with findings in the U.S. and other international settings, we found that higher education levels were associated with better NC test performance on almost all NC measures. The Grooved Pegboard Test (non-dominant hand), which assesses complex motor function, was the only test without significant education effects. Education best predicted scores on measures of verbal learning and delayed recall (HVLT-R—Total and Recall); processing speed (Trail Making Test A and WAIS-III Digit Symbol test); attention/working memory (WMS-III Spatial Span); executive functions (Stroop Word Colour Test); and verbal fluency. Similar findings were reported in other norming studies conducted in LMICs [37, 41, 43, 44, 46, 77, 78, 80]. To some extent, these findings could reflect the skills developed through formal schooling, although years of formal schooling can reflect other advantages (e.g., SES and quality of education experienced) that are more difficult to quantify. Formal schooling, for example, refines linguistic skills through reading and writing, develops test-wiseness, and reinforces certain values that enhances the learning process, such as the importance of memorising, understanding, and achieving [85, 86]. Further, to the extent that opportunities for higher education are merit-based, more cognitively able youth are likely to eventually complete more education.

It should be noted, however, that the present study provided some control for literacy effects by excluding participants with less than 7 years of formal education. Therefore, the norms developed in this study cannot be generalised to individuals with very low levels of education. In 2017, it was estimated that 13.7% of South Africans aged 20 years or older had no formal education or a formal education of less than 7 years [87].

Furthermore, this study based educational levels on self-reported years of formal education. This does not consider variations in education quality [19, 32, 39]. Hestad et al. [44] controlled for the variation in education quality in Zambia by assessing the formal reading levels of participants using the Zambian Achievement Test (ZAT). The ZAT score contributed significantly to variations in NC test results, above and beyond effect of years of education [44]. Future studies may need to employ similar strategies to control for differences in the quality of education in South Africa and other settings.

Gender

We observed minor gender effects. Specifically, women tended to perform somewhat better on measures of verbal learning and delayed recall (HVLT-R—Total and Recall) and processing speed (WAIS-III Digit Symbol test), while men performed better on measures of abstraction/executive functions (Halstead Category test and WCST); attention/working memory (WMS III Spatial Span); processing speed (Trail Making Test A); and motor function (Grooved Pegboard Test—dominant hand). Gender differences in NC functions were reported in several international studies [37, 77, 83, 88, 89] and were associated with genetics [90], functional and structural differences in the brain [90,91,92], and hormonal influences [90, 93]. However, certain societal and cultural factors, like educational opportunities and expectations, gender equality, and rates of gender-based violence, may also contribute to some of the gender-based variance in NC test performance [44, 94].

Race

Ethical considerations surrounding racial-norming in NC measures are widely debated [95]. Ethical challenges regarding demographic groupings based on race include (1) non-discrete socio-political definitions used to categorise racial groups; (2) existence of a large number of potentially different groups within racial categories; (3) existence of racial subcategories that are not psychologically homogeneous; and (4) non-scientific methods used to classify race in research settings, which mostly consist of self-report data [95,96,97]. Furthermore, the racial effects on NC test performance are likely influenced by complex socio-historical and socio-economic contexts. Therefore, these racial differences may not be globally generalisable and may be country-specific [44]. The development of separate norms could also perpetuate false perceptions regarding the relative abilities of different racial groups [95].

Nonetheless, the use of race as a norming category is highly relevant in the South African context given the country’s history of colonisation and apartheid. Historical policies that disempowered Black and Coloured communities in the apartheid era, has left a legacy of social, economic, and educational inequalities across the South African landscape [98]. High levels of inequality are still apparent across previously oppressed racial groups, even though the post-apartheid government tried to eliminate these inequalities [99,100,101]. Therefore, for the time being, the impact of these inequalities on NC test performance cannot be ignored. If racial corrections are not applied in local norms, it may result in the misrepresentation of different racial groups and a high rate of misdiagnosis in assessment of NC impairment [73, 95, 97], as highlighted in previous international norming studies [73, 78, 102, 103].

Language

Recent South African studies suggested that culturally adapted NC tests can perform equivalently when administered to multilingual adults in either English or isiXhosa [30, 46, 104]. We found that most NC tests did not show significant language effects, but minor language effects were observed on four NC tests. The strongest language effects were observed in two verbal fluency tests, i.e., the Category/Animal and Action/Verb Fluency Tests. Surprisingly, the PASAT-50 (measuring working memory) and WAIS III Digit Symbol test (measuring processing speed) were also significantly influenced by test administration language, even though these tests do not rely on language proficiency.

Furthermore, the present study compared the native language of participants to the language of test administration to see whether language proficiency could account for the variance in test performance. No significant effects were observed, suggesting that NC test performance was not influenced by whether the test was done in the participant’s native language or not. These findings could possibly be attributed to the high exposure of urban South Africans communities to English. English is regarded as the country's lingua franca [105] and is the primary language used in government, business, and commerce [75, 105, 106]. Furthermore, it is widely used in media such as television [106].

Similar to race, language has historical links to inequalities in the South African context [75]. The recent Language-in-Education policy of the South African Department of Education aimed to eliminate these inequalities by promoting multilingualism in schools and developing and promoting native African languages as LoLT [75]. Despite this initiative, most South Africans still prefer English and not their home language as LoLT. These preferences are perpetuated by the belief that English is linked to better education and economic empowerment [75]. More research is needed to better understand the interactions between language, education quality, historical inequalities, and NC test performance in the South African context.

Provincial Differences

Data collection was limited to the Cape Metropolitan and Winelands areas in the Western Cape. We found two other South African studies that generated norms for tests that form parts of the HNRC Battery. Robertson et al. [37] generated South African norms for a battery that included six tests from the HNRC Battery [37]. NC data were collected from two South African provinces—Kwazulu-Natal and Gauteng [37]. Van Wijk and Meintjes [79] collected data from several South African provinces to develop norms for the Grooved Pegboard Test [79]. Both studies observed statistically significant regional variances in NC performance. These findings may be attributable to educational inequalities between different municipalities, cultural and socio-economic differences between sites, and different levels of urbanisation [37, 79]. These findings urge caution in the generalisation of normative data across South African provinces.

Study Limitations

This study has several limitations worth noting. First, the sample size (n = 500) was relatively small for a norming study. Research suggests that a normative data-set should include approximately 1 000 participants to minimise the confounding effects of outliers [47, 107]. While our sample size measured up well against other norming studies conducted in LMICs [37, 41,42,43,44,45,46,47, 77, 78, 80], the validity of these norms could be improved by the inclusion of a larger sample.

Second, the demographic distribution of the sample was not balanced. Approximately 80% of the sample were women, all participants were recruited from urban areas, and participants with fewer than 7 years of formal education were excluded. Furthermore, the sample was disproportionately young with two-thirds of participants being younger than 35 years of age. Children/adolescents (< 18 years) and older adults (> 65 years) were also excluded. The generalisability of the norms developed here could be improved through the inclusion of a sample with a more proportionate demographic distribution.

Furthermore, no formal literacy tests or reading comprehension tests were used to control for education quality. Variation in the quality of education may result in bias, including that less-literate participants may have struggled to understand and follow NC test instructions [19, 32]. Similarly, no formal tests of English proficiency were conducted even though 79.0% of the HIV-negative sample who completed the NC battery in English were not native English speakers. Language proficiency was judged informally based on the feedback of participants regarding their own language skills and their ability to fluently communicate with study staff in English during recruitment procedures. Future norming studies should aim to assess literacy and language proficiency through formal tests that are valid in their local cultural context.

We were not able to control for all possible confounders without severely compromising the sample size. Variables that could possibly influence NC test performance (e.g., perceived stress and stress reduction habits; dietary habits; exercise habits; and a history of mild head injuries [108,109,110,111,112]) were not controlled for in the present study and is considered a limitation.

However, the same can be said for the stringent exclusion criteria used in the current study, which may limit generalisability to real world PWH samples. Nevertheless, NC test performance is only one component of the process for defining HAND. Test results should be interpreted alongside contextual information regarding the individual’s estimated premorbid functioning, functional impairment, and co-morbidities [113, 114]. Possible co-morbidities and their effect on NC test performance should always be considered.

Finally, data were collected by different data collectors, yet interrater reliability was not measured. Nonetheless, to ensure consistency across assessments, standardised training in the administration of the battery was given to all administrators. All administrators were expected to follow a structured instruction manual verbatim during each assessment and were regularly monitored throughout the study.

Conclusion

In conclusion, this study provides much-needed South African NC norms that could aid both clinicians and researchers in a wide range of settings in the correct interpretation of NC test results, thereby empowering them to make decisions that are more informed and relevant to therapeutic interventions/pharmacologic treatments. This is especially important in South Africa considering the high HIV prevalence and the high rate of HIV comorbidity in individuals presenting to psychiatric and medical settings. Several demographic factors (i.e., age, education, race, gender, and test administration language) influenced NC performance, highlighting the need to control for demographic characteristics when developing NC test norms. South African norms for the HNRC Battery also differed significantly from published U.S. norms, highlighting the need for localised, country-specific normative data when interpreting NC performance.