Background

Dementia prevalence is estimated to rise considerably across Asia [1,2,3], as the elderly population is projected to increase from the current 10% to 24% of the total Asian population by 2050 [4,5,6,7]. This significant increase in prevalence of dementia is of immediate medical and social concern for Asian countries since cognitive decline is often associated with loss of independent function [8], loss of employment, and additional burden to family members and the healthcare system [9,10,11]. The economic burden of dementia, estimated at US$73 billion annually [12], is a major public health concern, especially for developing Asian countries. In the face of this dementia epidemic, it is crucial that clinicians are equipped with appropriate cognitive screening tools that can effectively detect dementia at an early stage [13, 14]. Early detection would allow interventions to retard the progression of dementia, more time for individuals and families to cope with this devastating illness, and a window for policy-makers to allocate much-needed resources.

Cognitive screening tools have proven to be simple, useful, and efficient in detecting early cognitive impairment [15, 16]. However, the diversity of languages across the world poses a significant barrier in using cognitive screening tools in an effective and efficient manner. In Asia, relatively lower education, lack of a common regional language, and existence of numerous dialects pose a major obstacle in using cognitive screening tools for the early diagnosis of dementia. Languages spoken in Asia fall into many differing language families, including Indo-European, Sino-Tibetian, and Austronesian with varying writing systems [17]. Based on the writing system classification of languages, Malay which is widely used in Malaysia, Indonesia, and Singapore as well as Tagalog which is widely used in the Philippines belong to the alphabetic group. On the other hand, Mandarin widely used in Singapore, Malaysia, and Indonesia belongs to the logographic group [18]. Thus when cognitive screening tools which were originally developed using English, an alphabetic language, are translated to logographic language, often new cognitive test items are required to replace items that cannot be translated. As a result, no universal screening tool can cater to the vast variety of languages spoken throughout Asia. While most cognitive screeners were designed for English language speakers, translation and adaptation of these tools into Asian languages, although useful, often result in alteration of their original neuropsychological and psychometric constructs. There is evidence to demonstrate that when using cognitive screening tools that have been modified or translated to meet local demands, these tests often result in overdiagnosis of cognitive impairment in non-English speakers [19]. Furthermore, the lack of standardized cognitive screening tools across Asian countries [20] will prevent meaningful cross-cultural comparisons and poses major challenges when conducting international clinical trials with cognition as outcome measures [21].

The Visual Cognitive Assessment Test (VCAT) is a visual-based cognitive screening tool designed to detect early cognitive impairment [22]. It is language neutral and encourages simple application to multilingual populations without the need for translation of test content. The VCAT is a 30-point test that evaluates memory, executive function, visuospatial function, attention, and semantic knowledge. The test items for each cognitive domain are visual based, with pictures and figures selected from the International Picture Naming Project and locally validated in older adults. Pictures that were easily and accurately identified by the participants were used to develop test items. The memory domain of the VCAT includes immediate and delayed recall of a visual scenario as well as recall of shapes and objects, while the executive function items require patients to figure out the mechanisms of gear rotation, recognizing patterns, and categorization of pictures. The language domain contains an item on category fluency and one on naming of pictures. The visuospatial items test patients’ ability to perform spatial reconstruction and grid navigation. Lastly, the attention domain consists of a shape cancellation task. The VCAT was recently demonstrated to be useful in a multilingual population in a single-center study [22], where the test’s diagnostic and discriminative validities were compared against the Mini Mental State Examination (MMSE) and Montreal Cognitive Assessment (MoCA). Its performance superseded MMSE in detecting early cognitive impairment and was comparable to MoCA with the added advantage of just having a single version of the test as there is no need to perform translation or adaptation. The area under the curve (AUC) of the VCAT for detection of cognitive impairment was 93.3 (95% CI 90.1–96.4). The sensitivity and specificity of the VCAT for diagnosis of cognitive impairment (MCI and mild AD) were 85.6% and 81.1% respectively.

In this study we evaluated the performance of the VCAT in a multinational, multicenter study in four linguistically diverse Southeast Asia populations and investigated the influence of different language families and writing systems on test performance across healthy controls (HC) and patients with cognitive impairment (CI).

Methods

Participants

This prospective, multicenter study was carried out across Singapore, Malaysia, Indonesia, and the Philippines. A total of 284 participants were recruited between January 2015 and August 2016. In all, 138 participants were recruited in Singapore from the National Neuroscience Institute Specialist Outpatient Memory Clinic and the Singapore Longitudinal Aging Study, 67 participants were recruited from the memory clinic of Hasan Sadikin Hospital in Indonesia, 40 participants were recruited from the Division of Geriatric Medicine, University of Malaya in Malaysia, and 39 participants were recruited from Asian Hospital & Medical Center and Manila East Medical Center in the Philippines. Inclusion criteria included subjects with mild dementia of the Alzheimer’s disease (AD) type, mild cognitive impairment (MCI), and healthy controls (HC). Diagnosis of dementia was based on the DSM-IV TR criteria [23], and AD was based on the NIA-AA criteria [24]. To ensure that only patients with mild dementia are recruited, a Clinical Dementia Rating (CDR) [25] score of 1 was required. MCI was diagnosed based on Petersen’s criteria [26]. Subjects were required to have symptoms in one or more cognitive domains, remain independent in all instrumental activities of daily living, a MMSE score > 24, and a CDR score of 0.5. HC were required to have no cognitive symptoms, be independent on all instrumental activities of daily living, a MMSE score > 27, and a CDR of 0. Only participants aged 50 years and older with at least 6 years of education were included in the study. From our previous experience, 6 years of education has been shown to be the minimum education required for subjects to be able to complete all of the cognitive assessments required for this study. Subjects with a Geriatric Depression Scale (GDS) score of 10 or more, suggestive of major depression, were also excluded. Participants were then classified into either the HC group or the CI group which consists of both the MCI and mild AD subjects.

All participants were tested in a private and quiet environment by trained raters. Ethical approval was obtained from the Centralized Institutional Review Board in Singapore, University Malaya Medical Center Ethics Committee in Malaysia, Bandung Adventist Hospital Institutional Review Board in Indonesia, and Asian Hospital Institutional Review Board in the Philippines. Informed consent was obtained from all participants and all methods were performed in accordance with the journal’s guidelines and regulations.

Cognitive assessments

During each interview, the participant’s basic demographic information, including age, gender, race, years of education, and employment status, were collected. The MMSE [27], MoCA [28], VCAT [22], and GDS [29] were administered to each participant. The MMSE, MoCA, and VCAT were performed to assess global cognitive performance, which includes memory, language, executive function, visuospatial abilities, and attention. Locally validated and translated versions of the MMSE and MoCA were used in respective countries while a single version of the VCAT (without translation or modification) was used across all countries. These three scales have a score range from 0 to 30 points where lower scores indicate greater cognitive impairment. In order to ensure the uniformity of administration and accuracy in scoring of the newly introduced VCAT, all local (Singapore) raters were trained face to face while the overseas raters were trained through video conferencing. From our previous work, we used a VCAT cutoff score < 18 for detection of dementia and a VCAT cutoff score < 22 for MCI [22]. Lastly, the GDS, with a score range of 0–15 where a higher score reflects more severe depression, was administered to assess for symptoms of depression.

Effect of language on the VCAT

To study the impact of language difference on the VCAT and MoCA, participants were grouped based on two systems of language classification. The first language classification is the writing system, which includes two subtypes, namely alphabetic and logographic. English, Malay, and Tagalog are examples of the alphabetic subtype, while Mandarin is an example of the logographic subtype. The second system is the language family classification, which includes three subtypes, namely Indo-European, Sino-Tibetian, and Austronesian. English is an example of the Indo-European language, Mandarin belongs to the Sino-Tibetan language, while Malay and Tagalog are examples of the Austronesian languages. Comparison was made among the different language groups within each of the HC and CI groups.

Statistical analyses

Statistical analyses were performed using SPSS version 21. Descriptives were presented for demographics and cognitive data. Between-group comparisons were performed, where chi-square test was used to compare categorical variables, while Student’s t test or Wilcoxon–Mann–Whitney test was used to compare continuous variables. Further analyses using a general linear model (GLM) were performed to adjust for confounding demographic variables while setting diagnosis or language groups as the outcome variable. Diagnostic performance was also measured using the AUC. All statistical tests performed were two-tailed and regarded as significant at p < 0.05.

Results

The total sample of 284 participants consisted of almost the same distribution of males (51.4%) and females (48.6%), mean age 67.93 ± 8.79, and was made up of 52.8% Chinese, 27.1% Malays, 5.6% Indians, and 14.4% Filipinos and other races. The majority of the participants were retirees (66.5%) and the mean years of education was 11.51 ± 3.78.

There were 164 HC and 120 CI participants. Significant group differences were identified in the demographic variable of age (p = 0.008) (Table 1). On GLM analyses, after controlling for age, the two groups had significant differences on the MoCA (25.52 ± 3.37 vs 16.59 ± 5.75, p < 0.001), total VCAT (22.48 ± 2.50 vs 14.17 ± 5.05, p < 0.001) score, and all individual VCAT domain scores (Table 1). In all of the cognitive tests, the HC scored higher than the CI participants. GDS was not significantly different among the two groups.

Table 1 Demographic characteristic of HC and CI participants

For discriminating between HC and CI subjects, the AUCs (95% CI) were 0.905 (0.870–0.940) for the total VCAT score and 0.916 (0.884–0.948) for the MoCA score (Fig. 1). For a VCAT cutoff score of 17 which is indicative of cognitive impairment [21], sensitivity is 92.1% and specificity is 74.2%.

Fig. 1
figure 1

Receiver operating characteristic curves: area under the curve (AUC) for discriminating between HC and CI subjects on VCAT and MoCA scores. MoCA Montreal Cognitive Assessment, VCAT Visual Cognitive Assessment Test

Receiver operating characteristic curves: areas under the curves for discriminating between HC and CI subjects on VCAT and MoCA scores

We studied language differences in the HC group and the CI group independently to remove the influence of disease on the MoCA and VCAT scores. Mean time to complete the VCAT was 10.37 ± 3.70 in the HC group and 13.88 ± 6.18 in the CI group. In the HC, based on the language writing system of classification, 116 HC were placed into the alphabetic group and 48 HC into the logographic group. Mean years of education (p < 0.001), gender distribution (p = 0.005), and race (p < 0.001) were significantly different between the two groups. On GLM analyses which controlled for age, years of education, race, and employment, the logographic group scored significantly higher on the MoCA than the alphabetic group (27.02 ± 2.82 vs 24.91 ± 3.40, p = 0.001) but no differences were found for the total VCAT score (22.53 ± 3.53 vs 22.35 ± 3.47, p = 0.413) and all individual VCAT domain scores (Table 2). The total VCAT score was comparable between the two groups.

Table 2 Demographics, MoCA, and VCAT scores for healthy controls based on language classifications

Similar results were also seen when the HC were classified based on language family. On GLM analyses while controlling for age, years of education, race, and employment, MoCA scores (26.10 ± 2.76 vs 27.15 ± 2.77 vs 23.38 ± 3.50, p < 0.001) were different among the three groups, with the Sino-Tibetan group scoring highest on the MoCA followed by the Indo-European and then the Austronesian group. However, no significant differences were observed among the groups for the total VCAT score (22.78 ± 3.34 vs 22.50 ± 3.37 vs 22.23 ± 3.79, p = 0.073) and its individual domain scores (Table 2).

Among the CI subjects, 90 were classified into the alphabetic group and 30 of them into the logographic group. Years of education (p < 0.001), race (p < 0.001), and employment status (p = 0.012) were significantly different between the two groups. GLM analyses showed that there were no differences between the alphabetic and logographic group on the MoCA (15.98 ± 5.98 vs 18.43 ± 4.61, p = 0.306), total VCAT score (14.09 ± 5.27 vs 14.40 ± 4.38, p = 0.605), and VCAT individual domain score of memory, language, executive function, and attention (Table 3). However, for the classification based on language family the CI subjects scored significantly different on the MoCA (18.86 ± 4.85 vs 18.48 ± 4.69 vs 14.15 ± 5.94, p = 0.002) among the three different groups. The Indo-European and Sino-Tibetan groups were quite similar in their mean MoCA scores and both groups scored better than the Austronesian group. No differences were found for the total VCAT score (14.91 ± 5.54 vs 14.41 ± 4.35 vs 13.56 ± 5.07, p = 0.275) and all of its individual domain scores except for visuospatial domain scores (Table 3).

Table 3 Demographics, MoCA, and VCAT scores for cognitively impaired participants based on language classifications

Discussion

This study investigated the performance of the VCAT in a multinational, multilingual Southeast Asian cohort across HC and CI participants. The overall discriminative ability was comparable between the VCAT and the MoCA where the AUCs for MoCA and total VCAT were similar in discriminating between HC vs CI participants. The effect of language differences on the VCAT and MoCA was also explored in the study, where participants were classified into groups based on both writing system and language family. In contrast to the distinct language differences in responses to MoCA, a lack of between-group differences that were observed on the VCAT scores suggests that the VCAT is less likely to be influenced by language of administration, and hence will not require translation to the multiple languages spoken across Southeast Asia.

As shown in our previous study and in this present study, as a cognitive screening tool the VCAT shows satisfactory discriminative validity in differentiating between CI participants from HCs. A low score on the VCAT helps clinicians to accurately identify patients with cognitive impairment who would require further investigations. Therefore, the VCAT may improve the early identification and detection of cognitive impairment.

Based on the language classification, despite using translated and validated versions of the MoCA in different language groups, it appears that the language of test administration continues to influence the MoCA scores. Thus a participant scoring 21 on the English MoCA might not necessarily be experiencing the same level of cognitive impairment as a participant who scored 21 on the Malay or Tagalog MoCA. Previous studies investigating the modified MMSE also showed such differences in performance between English and French speakers and observed that the rates of diagnosis for dementia were different between the two language groups [30, 31]. This could be attributable to the translation and adaptation process of the original test version, which gave rise to test items that varied in difficulty, discrimination, and psychometric properties across different language groups [32]. This large variability in test performance poses a challenge for clinicians and researchers across Southeast Asia. Having differing test scores for well-defined cognitive disorders would result in clinicians being unable to compare treatment responses across Southeast Asian countries. From a research perspective, correlation of cognitive performance to biomarker studies and performing multicenter clinical trials with cognitive outcomes would therefore be not feasible in Southeast Asia.

On the contrary, the VCAT did not show any significant group differences in both the writing system and language family classification group comparisons. The VCAT scores were stable between the logographic and alphabet group and among the Indo-European, Sino-Tibetian, and Austronesian groups. This could be attributable to the fact that the VCAT does not require any translation and furthermore the test items were constructed such that the language of administration does not influence test item scores. As such, all of the test items were kept constant regardless of the language administered. This allowed for the standardization of test difficulty and retention of its original intended neuropsychological constructs. The total VCAT score and domains of memory, language, executive function, and attention were not significantly different between language classification groups in both HC and CI subjects. However, visuospatial scores, while not significantly different among language groups in the HC group, there was a significant difference among CI subjects. While the reason for this is not entirely clear, one explanation is that instructions to evaluate visuospatial domain require lengthy explanation and hence among CI subjects, where there may be some impairment in language ability, language differences continue to impact visuospatial test scores.

Other than the VCAT, there have been several tests, including the Eurotest, Phototest, and Memory Alteration Test, designed to overcome the influence of education and culture in screening for dementia [33,34,35]. The Eurotest was designed to assess one’s cognition by evaluation of subject’s ability to handle money, and has been demonstrated to be effective for subjects who have low levels of education and those who are illiterate. Likewise, the Phototest, which is a short and simple paradigm that employs identification of pictures, has also been shown to be useful for detecting cognitive impairment among subjects who are illiterate.

The strength of this study is the inclusion of participants from four different countries, allowing for the investigation of the effect of factors such as race, language, and cultural background on the accuracy of the VCAT. However, limitations in this study should also be acknowledged. Firstly, this study only included participants from Asian countries and has a relatively small sample size. As such, the results may not be generalizable to non-Asian populations. Future studies should include a larger sample size, longitudinal cohorts, and non-Asian populations. Furthermore, only the MoCA was included as a comparison in a parallel evaluation to investigate the diagnostic ability of the VCAT. To further assess the performance and reliability of the individual domains of the VCAT, future research should include other domain-specific neuropsychological assessments in a multilingual population. As the subjects in this study had a minimum education of 6 years, the usefulness of the VCAT in cohorts with lower education levels warrants further evaluation. However, by comparing the performance of the MoCA to the VCAT, we showed importantly that whereas the MoCA is language dependent, the VCAT is not. We also acknowledge the lack of inter-rater reliability and test–retest reliability data of the VCAT in non-English languages, which we hope to address in the near future. While we demonstrated that the VCAT is useful for the diagnosis of cognitive impairment of the AD type, the inherent language-free nature of the test items together with the visual nature of the test would render the VCAT less useful for the diagnosis of frontotemporal dementia or AD of the posterior cortical atrophy variant, or for patients with significant visual impairment.

Conclusions

This study demonstrates that the VCAT is useful in a multinational, multiethnic and multilingual Southeast Asian population. The test will be an effective tool in helping clinicians differentiate HC and CI patients, with lower scores on the VCAT being associated with more severe cognitive impairment. Furthermore, in regions where populations are culturally and linguistically diverse, such as in Southeast Asia, screening tests such as the VCAT transcend language differences and hence will be an effective tool that encourages accurate detection of cognitive impairment. Similarly, international clinical trials which involve participants from different language-speaking nations would find a single standardized version of the VCAT to be useful in making meaningful cross-cultural comparisons. Having a standardized tool that can be used uniformly across different populations to guide diagnosis can also potentially aid epidemiological studies to represent the distribution and determinants of worldwide dementia prevalence more precisely.