Introduction

Writing is a recently developed human ability. It uses symbolic visual marks for communication that overcomes the boundaries of time and space. Writing is tightly linked to reading, which is the ability to decode written symbols to extract meaning. Writing is not an essential ability for human basic survival. Hence, writing (and reading) is not commonly used by all human cultures [1], and worldwide distribution of literacy abilities is a feature of the last centuries [2], though writing/reading abilities are essential skills for industrialized societies.

The way how brains adapt to new cognitive abilities, such as writing or/ reading, has been debated. Two different explanations have been proposed: the cultural recycling [1] and the neural reuse [3] hypotheses. Specifically, the cultural recycling hypothesis [1] assumes a modular organization of the brain. New skills are acquired through recycling of existing related neural-cognitive abilities, creating new modules–dedicated neural clusters for processing content-specific information. The modular approach assumes that modules are encapsulated and operates independently of related cognitive functions [4]. The cultural recycling hypothesis, on the other hand, predicts dissociations between cognitive abilities, with the assumption that word writing impairments cannot be predicted by other cognitive abilities/functions. A second prediction of the cultural recycling hypothesis is cultural invariant. The hypothesis postulates that the main constrain of a new cognitive module is the existing neurocognitive architecture that is shaped through evolution. Therefore, the neurocognitive makeup of a new skill is culturally invariant.

The neural reuse hypothesis [3] is rooted in the connectionism view of the brain organization. It suggests that cognitive/neuronal processes are adaptable and not content specific. New skills emerge through changes of connections within and between regions, rather than through structural changes and creation of new dedicated module. The tuning of neurocognitive functions is mediated through ongoing dynamic processes of synaptogenesis and pruning. The neuronal reuse hypothesis is salient on the question of cultural invariance. It can be deduced that if an ability across cultures utilizes different cognitive processes as in different writing systems, these abilities will be represented differently depending on the corresponding culture.

Writing is a multifaceted cognitive process, which includes but not limited to linguistic and high-level motor control. With reference to the dominant cognitive writing model proposed by Goldstein [5], it has been suggested that transformation from linguistic to motor output can be achieved via two routes: a phonological or lexical one [6,7,8]. A phonological route includes sound-to-letter conversion, while the lexical route transforms meaning-to-visual orthographic images. The routes can further be distinguished based on the level of processing required. For example, the phonological route is based on sequential processing of discrete parts, but the lexical route is based on holistic processing.

Different cultures have developed different writing systems that can largely be divided into logographical or phonological in nature. Relative to the logographic systems, the symbolic visual marks (grapheme) represent meaning. Examples for systems using logographic graphemes are simplified Chinese character and Arabic numbers. As for the phonological system, graphemes represent sounds (i.e., phonemes), as in the case of alphabets in English. These two writing systems resonate with the dual route model of writing described above. Though crucially in alphabetic writing system, sound-to-motor conversion can be processed via both the phonological and lexical routes, while in character-based systems, transformation can only be carried via the lexical route.

Since alphabetic and logographic systems differ not only in morphologies, but also the mapping of orthograph, phonology, and semantic, the neurocognitive underpinning of writing is likely to be linguistically (and thus culturally) dependent [9]. One should also note that in logographic systems, the basic writing units of characters typically provide no clear phonological information (e.g., codes on how to articulate/pronounce the written symbols) [9]. For instance, both Mandarin and Cantonese writers use the same simplified characters system, while the pronunciation of these words is different. Thus, simplified Chinese characters represent meaning, though characters may include phonetic components. In contrast, alphabetic languages, like English, rely on a grapheme to phoneme transformation using a serial horizontal axis of letter strings, e.g., left-to-right [10]. Therefore, alphabetic writing system would include cognitive features to enable phonology (e.g., mapping of sounds (phonemes)-to-visuals signs, letters) and spatial sequencing, though reading/writing irregular words requires lexical knowledge (e.g., silent letter, like in psychology). It should also be noted that expert alphabetic literate can potentially read and write using the direct lexical-semantic route for words that are stored in the lexical orthographic lexicon as well as the phonological route.

Across cultures, writing and reading are assumed to rely on similar linguistic processes (the mapping of graphemes to words and meaning), but writing also requires fine motor abilities [6, 8]. Acquired writing impairments are classified as central agraphia when impairments are assumed to reflect linguistic deficits [11,12,13,14,15,16,17]; this is in contrast to peripheral agraphia when impairments are assumed to reflect deficits in the motor component of writing [11,12,13, 18].

Some case studies of pure agraphia have been reported in the literatures [19,20,21]. Most of these cases are can be considered agraphia without language impairments. Unfortunately, in most of these cases, patients’ abilities in fine motor control (e.g., praxis), attention, and other cognitive domains were not (or only minimally) considered. Moreover, many neuropsychological studies showed high comorbidities of agraphia, with aphasia [22, 23], alexia [24, 25], acalculia [26, 27], optic ataxia, perceptual impairments [20], and other motor or sensory deficits [28].

Processing linguistic and numerical content is argued to be dissociated [1]. Our recent study [29] has examined the neurocognitive architecture of writing using neuropsychological data of a large cohort of English-speaking stroke patients (N = 740). The study examined the lesion-symptom mapping for writing words and numbers, with words representing phonological writing system and number representing logographical system. The ability to write words and numbers was correlated (r = .63), though no patient showed pure writing deficits of only words and numbers. We observed weak neural dissociations between writing words and numbers, but no reliable effect was observed behaviorally. Irrespective of the writing content, the data showed high comorbidity of writing impairments with tasks that measure language abilities (e.g., picture naming, reading) and fine motor control (e.g., complex figure copy). We argued that the study had challenged the hypothesis of content (numbers versus words) and writing system (logographic versus phonological) affecting neural processing that supports writing ability. However, there were also two limitations of this investigation. First, the study focused only on writing abilities in relation to language and fine motor tasks, and failed to account for deficits on other core cognitive functions. For example, the sluggish attentional shift hypothesis [30] argues that attention plays a key role in writing/reading. Second, it confounded content (number, words) with writing systems. Comparison between word-based writing systems would have offered a better control for potential confounds of meaning and usage frequency.

In the current study, we revisited the neurocognitive architecture of writing across writing systems and writing cultures. Specifically, there are two research questions: (1) Can writing impairments be predicted by impairments in other cognitive abilities? (2) Does the writing system (culture) affect the cognitive makeup of writing?

We used data obtained from the Birmingham Cognitive Screen (BCoS) [31] of two large cohorts of stroke patients. A Mandarin and a Cantonese versions of the screen were used to assess the patients from south China [32, 33]; an English version of the screen was used to assess patients in the West Midland, UK [34]. The BCoS assesses cognition across 5 domains (numerosity, memory, language, praxis, and attention and executive function). It is specifically designed for stroke patients. It uses user-friendly testing format (specifically designed for patients with aphasia and visual neglect) to increase inclusivity and minimize the confounds across these common disabilities. The impairment cut-off scores for each task are provided based on performance of age and culturally matched controls.

Analysis was performed on each cohort, separately. We first divided the stroke patients in each cohort into impaired and non-impaired groups based on their performance on the word writing task. Individual patients’ profile (features) was described using cognitive (performance on the remaining BCoS tasks, as continues measures) and demographic (e.g., age and education) features. Based on a hybrid framework [35], features were ranked using the elastic net algorithm. The ranked features were incrementally added as the input of linear support vector machine (SVM) to determined which features are essential for classifying impaired and non-impaired word writers.

Materials and Method

Participants

Two datasets were analyzed. The first dataset (CN) includes data from stroke patients tested in the BCoS study run in South China [32, 33]. The second dataset (UK) contains data from stroke patients tested as part of the BUCS study run in the UK [36].

Patients in both trials were recruited in acute and rehabilitation stroke units. The inclusion criteria were as follows: the patients (1) were within 3 months of clinically confirmed stroke; (2) were judged by the clinical team to be able to concentrate for at least 30 min; and (3) were able to provide a written consent to participate. Patients were excluded if they showed insufficient understanding of Chinese (Mandarin/Cantonese in the CN dataset) or English (in the UK dataset) to follow the instructions as verified by the orientation tasks. The study included ischemic and hemorrhagic strokes.

The studies were approved by the national and local NHS ethical committees in the UK, the STEM University of Birmingham ethical committee, and the local research ethical committee in China, the Guangzhou First People’s Hospital (China) ethical committee. All participants signed a written informed consent.

In the BCoS-CN study, a total of 315 patients were recruited from Guangzhou First People’s Hospital in China. Of the total, 71 patients were excluded from the current study since they did not complete all the tasks. The final CN dataset contains 244 patients. Based on the BCoS-CN standardized cut-off, 68 patients were classified as impaired in the word writing task, and 176 were considered performing within the range of the control sample.

In the BCoS-UK study (BUCS), 906 British patients were recruited from the West-Midlands area in England. A total of 405 patients failed to complete all the tasks and thus were excluded from the current study. The UK final sample included 501 patients. Based on the BCoS-UK standardized cut-off, 137 patients showed deficits in word writing, and 364 performed within the range of the control sample. Note that the UK dataset is a sub-sample of our previous study which reported data of 740 patients [29].

Table 1 shows the demographic and cognitive features of each group (with/without writing deficits) in each dataset (CN/UK).

Table 1 Demographic and cognitive features of each group (with/without writing deficits) in each included dataset (CN/UK)

Behavioral Measures

The cognitive profile of patients was assessed with the BCoS English version [31] and two Chinese versions [32, 33]. BCoS is a cognitive screen that measures performance across a broad range of domains, (1) language, (2) number skills, (3) memory, (4) action planning and control (Praxis), and (5) attention and executive functions. Each domain is further composed of several sub-domains. For instance, the memory domain includes verbal and visual memory tests, following short and long delays. In total, there are 23 tasks, and some tasks produce multiple measures (e.g., the apple cancelation task has three measures of overall accuracy, asymmetric egocentric neglect, asymmetric allocentric neglect). Please refer to Table 1 for the lists of tasks and measures (features) used in this study to describe each patient’s profile. Importantly, the BCoS is designed to maximize the inclusion of stroke patients and the separation between cognitive abilities; this was done to reduce biases from co-occurrence deficits. For example, in the case of language, it uses multimodal presentations, multiple choice answers; in the case of spatial attention deficits, it uses uncrowded vertical layouts. Language and spatial attention are common symptoms following stroke. If not well controlled, they can impact performance and measurements of other abilities, producing superfluous correlations.

Word writing was a spelling test. Participants heard a word and were asked to write it down. Spelling accuracy and the eligibility of the writing were then assessed. In the BCoS-UK version, participants were asked to write four familiar words and one non-word. There were two exceptional words, one concrete (“scissors”) and one abstract (“although”); two regular words, one concrete (“mustard”) and one abstract (“thinking”); and one non-word (“troom”). Thus, both lexical and the phonological writing routes were assessed.

In the BCoS-CN writing test, there were four characters representing real words in each language version, as it was not possible to write a non-word using Chinese characters. There were two concrete words (“物” and “眼” in the Cantonese version; “纹” and “眼” in the Mandarin version) and two abstract words (“怎” and “授” in the Cantonese version; “帮” and “怎” in the Mandarin version). The characters included different types of structure, namely left–right structure, and up-down structure.

For each correctly written/spelled word, the participant was awarded one point. The maximum score was 5 and 4 in the English and Chinese versions, respectively. If a participant failed to respond or complete any items, their score would be 0, on the condition that the examiner had judged that the participant understood the task and in principle could attempt to complete it.

The age-matched cut-off was based on normative representative (distribution of education levels) control samples. In the BCoS-UK dataset, impaired participants were those who had made mistakes on at least three words (i.e., at least two real words). In the BCoS-CN trial, impaired patients were those who had made more than three mistakes (i.e., did not get any word correct). Note that the cut-off scores indicated that spelling errors, even in familiar words, were common in both cultures. Those who did not meet the cut-off criteria were classed as impaired, while those who did, i.e., performed within the expected range of the controls were classified as intact.

Features Describing the Patient’s Cognitive and Demographic Profile

In total, 36 (CN)/35(UK) features were used. There were 31 cognitive features from the tasks of BCoS and 4 demographic features (age, education, gender, and dominant hand) for the BCoS-CN dataset. We also added tested language (Cantonese/Mandarin) as a demographic feature (see Table 1 for the list of features and sample descriptions). Note that the patients in the BCoS-CN dataset used an identical writing system of simplified characters. All the features were standardized before further data analysis was conducted.

Data Analysis

A hybrid framework for data analysis is shown in Fig. 1. It consists of three steps: (1) elastic net–based feature selection, (2) consensus feature importance ranking, (3) linear support vector machine (SVM), one example of six classifiers, combined with incremental feature selection for identification. The aim of the framework was to find out a feature subset to identify patients who can write or not [35]. The codes are accessible online (https://github.com/NicoYuCN/elasticnetFR).

Fig. 1
figure 1

The proposed framework for stoke patients data analysis. It consists of feature collection or data splitting, Laplacian score–based feature ranking, linear SVM trained with input feature selection, and the performance evaluation. Note that features are added one by one, from the most to the least important

The backbone of the framework was the elastic net algorithm [37]. Elastic net combined the ridge and LASSO (least absolute shrinkage and selection operator) [38] regularization approaches in the context of linear regression. For a dataset, \(\{ (X_{i} ,y_{i} )\}_{i = 1}^{n}\) with \(n\) cases, where each case \((X_{i} ,y_{i} )\) contained an input column vector \(X_{i} { = (1, }x_{{1}} {, }...{ , }x_{p} )^{T}\) with \(p\) features and paired outcome \(y_{i}\). Elastic net estimated feature coefficients \(\beta { = (}\beta_{0} {, }\beta_{{1}} {, }...{ , }\beta_{p} )^{T}\) by minimizing the objective function \(\mathop {\min }\limits_{\alpha ,\beta ,\lambda } \{ \frac{1}{2n}\sum\limits_{i = 1}^{n} {||y_{i} - X_{i}^{T} \beta ||^{2} { + }\lambda (\alpha ||\beta || + \frac{1 - \alpha }{2}||\beta ||^{2} )} \}\). The parameter alpha in elastic net weighted the ridge and the LASSO regularization approaches (alpha = 1, equivalent to LASSO; alpha = 0, equivalent to ridge regularization). Ridge’s shrinkage parameter penalized the sum of square coefficient, LASSO penalized the absolute coefficient value, and thus, elastic net could remove predictors by setting their coefficient zero to enable feature selection. In the case of correlated features, Ridge assigned similar a coefficient to correlated features, while LASSO maximized the coefficient of a single feature (the rest were, or nearly, zero). Both regularization approaches did not make any assumptions regarding the distribution of the data. As the features in the current study were correlated, alpha was weighted to increase LASSO regularization. We reported the results of \(\alpha =0.75\), and similar results were obtained when using \(\alpha =0.90\) (see Supplementary Table 1). It should be noted that \(\lambda\) was automated tuned in the training stage using fivefold cross validation.

To improve the generalization and to avoid overfitting, the first step of feature selection was repeated 100 times (\(N = 100\)). In each iteration, the data was randomly split to a training and a testing set. The split ratio for the cases was 70% for training. The number of intact patients was set equal to that of impaired patients in the training set to avoid data imbalance. Thus, each CN training set included 48 out of 176 intact patients and 48 out of 68 impaired patients; the UK training sets included 97 out of 364 intact patients and 97 out of 139 impaired patients. Data from the remaining patients was used for testing. Each iteration yielded feature coefficients \(\beta { = }(f_{i,1} ,...,f_{i,p} )\), and a feature \({\text{x}}_{j}\) with \(f_{i,j} \ne 0\) was defined as a feature selected.

The second step was ranking feature importance. It concerns the feature selection frequency across the 100 iterations, i.e., the number of times the feature coefficient was not equal to zero. A feature more frequently selected is a feature more important. Thus, using the frequency as an indicator of importance, features were ranked in a descending order (i.e., from the most to the least important).

An in-depth analysis of feature selection algorithms was carried out. We used 16 different feature selection algorithms and report four in supplementary material 2. The top-ranked features were almost identical across the various methods, suggesting a consensus on relevant features that contribute to writing. Elastic net that is reported here was the best performing, when assessed in the third step.

The third steps used SVM with an incremental feature selection to determine an optimal number of features for classifying impaired versus non-impaired writers. The ranked features were added from the most to the least important ones as the input to the linear SVM. The area under the curve (AUC) was used to determine the best fitting model as a tradeoff between model complexity and prediction performance. This step was repeated 100 times, where each time the data was split to training and test using the same methods as was used for the elastic net, with a training-to-test ratio of 7:3 for cases (impaired patient), and equating the number of intact patients for training to matched the cases.

An in-depth analysis of the classifier algorithms was done, using the selected top features. Five other classifiers were used to classified impaired writers from intact: K-nearest neighbor (KNN), linear discriminant analysis (LDA) radial basis function–based SVM (rbf-based SVM), artificial neural network with one hidden layer of 32 nodes (NN32), and artificial neural network with two hidden layers of 32 nodes and 16 nodes (NN32_16) were evaluated and compared to investigate the effect of machine learning classifiers on the prediction performance.

Finally, we examined in detail the cases in which the model failed to correctly classify the patients. For each patient, we counted the number of times the linear SVM have misclassified. In particular, we were interested to know whether some patients were consistently misclassified (in at least 75% of the models). If so, their cognitive profiles were further investigated.

Results

Behavior Profiles

The demographic and cognitive features of each group are presented in Table 1. For each feature, we computed a two-sample t-test (or chi-square) to assess the difference between the intact and impaired patient group in each culture. Due to the imbalance sample size, we did not assume equal variance. After a Bonferroni correction, only P values below 0.0014 (0.05/36) were considered reliable.

Table 1(A) shows that across both cohorts, prevalence of writing impairment is not related to gender, handedness, age, and the tested language (the Chinese cohort). Patients who were impaired in writing had fewer years of education when compared to their counterparts in both cohorts. Table 1(B, C, and D) presents the comparison between impaired in word writing and non-impaired on all cognitive features. In most of the cognitive tests (17/32), patients with word writing deficits performed worse than those without deficits.

Figure 2 presents the correlation matrix (Pearson’s correlation coefficient, r) between all cognitive and demographic features in each cohort. Writing word (right most column, WW) was part of a cluster that showed moderate to strong correlations with tasks that assess linguistic, numerical, verbal memory, auditory attention, and constructive apraxia. The strongest correlation of word writing was with the ability to write numbers (r = 0.71 in the CN cohort and 0.60 in the UK cohort). Other three clusters of highly correlated features were among tasks assessing praxis, attention, and spatial biases in attention. Please note that most measures in the correlation matrix indicate higher value as better performance. The exceptions (depicted in blue shades) are age (top row) where increased age is associated with worse performances and allocentric and egocentric (alloN, egoN - middle column - whitish/bluish column) where higher values indicate more severe impairments.

Fig. 2
figure 2

Correlation matrix for the continuous features of the CN dataset (left) and the UK dataset (right). The features were organized as in Table 1. Acronyms: Demographic (DEM): AGE and education (EDU) in years; picture naming (PIC NAM), sentence construction (SC); sentence reading (SR), non-word reading (NonR) and instruction comprehension (InstruC). Number domain (NUM): number reading (NumR), number writing (NumW), calculation. Memory domain: personal orientation (PerI), Time and Space (TS) immediate story memory recognition test and delayed recognition of the story (MdR) and task recognition test (TR). Praxis domain: multi-step object use (MST), gesture production (GP), gesture recognition (GR), imitation of meaningless gesture (GMI), complex figure copy (CFC). Attention and Executive function (Att & Exe Fun); Visual (VIS) Apple cancelation (AppleC), egocentric visual neglect (egoN), allocentric visual neglect (alloN), Visual extinction left and right unilateral (VeLU, VeRU), left and right bilateral (VeLB, VeRB); Tactile (TACT): tactile extinction left and wright unilateral (TeLU, TeRU), left and right bilateral (TeLB, TeRB); auditory attention task (AA) and the Birmingham rule finding task (BrF), and finally the writing task (WW)

Elastic Net–Based Feature Ranking

The features were ranked according to their importance (selection frequencies) in a descending order as presented in Table 2. Since feature selection via elastic net was dependent on the training dataset, the number of features (weights not equal to 0) was unfixed and could change across iterations.

Table 2 Feature ranking results in two groups

Cognitive Features Predicting Writing Deficits Across Cohorts

The feature ranking procedure revealed the abilities to write numbers and manipulate them (in the number calculation task) were important to predict word writing deficits across cohorts. In the writing number task, examinees were required to write two multi-digit numbers, and three prices using appropriate format (e.g., including currency symbol). In the number calculation tasks, an examinee was asked to compute four operations (addition, subtraction, multiplication, and division) between two numbers (one/two digits). Participants were allowed to use paper and pencil for the calculation.

Performance in the auditory sustained attention task was also crucial to identify the writing deficits in stroke patients across cultures. The prevalence (importance) of auditory sustain attention was higher in the UK than the CN sample. The auditory attention task used six high-frequency words, three as targets and three as distracters. Each word repeated nine times, to give a list of 54 words. An examinee listened to the spoken words which were delivered at a relatively slow pace, with variable rates of 2–4 Hz. They were asked to tap whenever hearing one of three target words (e.g., “no”) and avoid tapping for all other words (e.g., “yes”). The tasks assess examinees’ ability to sustain attention for about a minute. It also assessed the ability to control attention, by avoiding semantic-related and equally frequent distracters and staying focused on the targets.

Surprisingly, reading words assessed in the reading sentences task were discriminative in small proportion of the models, across both cultures. In this task, an examinee is presented with two complex sentences and is asked to read them.

Unique Cognitive Features Predicting Writing Deficits in Each Cohort

In the China dataset, impairment in writing characters was related to the performance on the complex figure copy task, visual and tactile extinction tasks, and immediate and delay memory task as well as to the overall ability to comprehend instructions. (1) In the complex figure copy task, patients are required to copy a figure using a pencil. They are evaluated on the ability to correctly draw and place 47 features. It assesses constructive apraxia. It is also sensitive to an ability to control a pencil and shift attention in space. (2) In the visual tactile extinction task, an examinee closes their eyes and must detect whether the left/right/both hands are tapped (by the examiner). Failing to detect tapping on the right hand when presented on its own or bi-laterally was an important feature for predicting writing impairment. This failure reflects deficits in somatosensory processing of the right hand, which could be linked to right hand hemiparesis or right tactile neglect. (3) The memory task is a story recall. Examinees hear a story and are asked to recall (or correctly recognize using force choice test) 14 units of information from the story at two time points: immediately by the end of the story and 20 mins later. In the China datasets, immediate and delay memory failures were diagnostic features for writing impairment. (4) Finally, the ability to comprehend instruction is based on the number of times examiner had to repeat task instructions and their overall evaluation provided by the end of the screen, on a scale (poor, moderate, good). (5) In addition, the tested language (Mandarin and Cantonese) as well as years in education was also a diagnostic feature for word writing impairment in the China dataset.

In the UK dataset, non-word reading and picture naming were diagnostic features, as well as patients’ time and space orientation. (1) In the non-word reading tasks, examinees are asked to read six pronounceable non-words. This task assesses ability to use phonological knowledge for reading. Even though there is a non-word reading task in the China datasets (where characters are combined to create non-existing/meaningless word), it was not identified as a discriminative feature for the ability to write words. (2) In the picture naming task, examinees are asked to name 14-line drawings. (3) In the time and space orientation task, examinees are asked to provide (or select correctly from given options) information regarding current time, date, and place.

It should be noted that four other feature ranking methods [39] were evaluated on the datasets (supplementary materials 2). The results highlight similar ranking orders as that of the elastic net–based feature ranking.

Linear SVM-Based Prediction of Stroke Patients

Figure 3 shows the prediction performance of intact and impaired stroke patients. Using the China dataset, the model achieved good performance using a single feature (number writing). It reached the peak AUC value when using the top 3 features (AUC, 0.85 ± 0.06; ACC, 0.89 ± 0.03; SEN, 0.81 ± 0.10; SPE, 0.90 ± 0.27), including number writing and calculation tasks, and complex figure copy.

Fig. 3
figure 3

The classification performance according to the top-ranked feature input. The error-bar plots present the results when using most important features and linear SVM for the classification of intact and impaired stroke patients on the CN (left) and the UK dataset (right). The horizontal axis indicates the number of top most important features, and the vertical axis shows the values of evaluation metrics. Acronyms: Num W, number writing; CFC, complex figure copy; edu, education year; V EXT, visual extinction; LB, left bilateral; T EXT, tactile extinction; RU, right unilateral; RB, right bilateral; LANG, tested language; MEM IREC, memory immediately recognition; MEM DREC, memory delay recognition; Sent R, sentence reading; AUD ATT, auditory attention accuracy; Inst Com, instruction comprehension; Non R, non-word reading; Pic Nam, picture naming; TS, time and space

Using the UK dataset, the top 4 features provide the best prediction (AUC, 0.79 ± 0.04; ACC, 0.83 ± 0.03; SEN, 0.74 ± 0.09; SPE, 0.84 ± 0.03). The features included the number writing and calculation tasks, non-word writing, and auditory sustained attention.

The sensitivity on both datasets was close to 80%, indicating that the model wrongly predicted around 20% impaired patients as intact cases. We therefore further explored these misclassified cases to know whether the model consistently failed with specific patients or whether the misclassifications were distributed and not systematic.

SVM Classification Errors Analysis

The frequency of each case misclassified, when it was partitioned into the testing set, was summarized. In supplementary materials Table 3, a descriptive summary is provided, in terms of the breakdown of cognitive and demographic profiles of patients who were correctly and wrongly classified.

In the China dataset, we observed that 9 (out of 68) impaired word writers were consistently misclassified (1 in 95% and 8 on 100% of the iterations). There were also 9 (out of 176) intact word writers who were consistently falsely classified as impaired cases (> 95% experiments). We report here the results of the misclassified group (impaired patients wrongly predicted as intact). The results for the falsely classified group are reported in supplementary material 3.

Specifically, the 9 misclassified cases in the China dataset demonstrated no impairments or only mild deficits in the diagnostic tasks. Eight were able to write numbers without flaw, and one patient scored 3/5 on this task. None of the nine patients was classed as impaired in the number calculation task: six obtained the maximum score and three made one error. Six of them copied the figure with accuracy matching with the control performances, while three showed mild impairments. Formal test confirmed that patients in the misclassified group showed significant milder deficits than the correctly classified counterparts on all three cognitive features (T(76) > 6.44, P < 0.001). The falsely classified group showed an opposite pattern, with more severe performance on the three diagnostic tasks, when compared with the correctly classified counterparts (T (184) > 4.13, P < 0.001).

We further examined co-occurrence of other cognitive symptoms in the misclassified group. All but one patient demonstrated deficits in two or more additional tasks. The most common co-occurring deficit in this group was visual neglect, as measured by the asymmetric score of the apple cancelation task. An attentional spatial deficit was observed in five out of the nine patients (> 50%), and 30% of patients were impaired in the immediate and delay memory, and the sustained attention. As for other tasks, at least two patients were impaired in picture naming, non-word reading, bilateral tactile stimuli detection, and the Birmingham rule findings task. It should be noted that the analysis procedure identified most of these tasks as relevant diagnostic features in small percentage of models (1–7%). One patient was only impaired in word writing and was classified as intact in all other tasks; he was a right-handed 56-year man with 10 years of education.

In the UK dataset, there were 28 (out of 137) frequently misclassified patients (> 98% of testing), and 38 (out of 364) were falsely classified as impaired writers (> 95% of testing). We focused on the misclassified patients and reported detail analyses of falsely classified in the supplementary material.

Like the misclassified CN patients, misclassified UK patients showed no impairments or mild deficits across all diagnostic tasks. Notably, 18 misclassified patients had good performance on the number writing task. The group median was the maximum score (5 correct), and only one patient was able to write only 2 numbers correctly. There were 14 patients who made no errors on the non-word reading task, with the median being the maximum score (6); eight made no errors on the number calculation task performances, the median was a single error on the task (3/4); and 11 made no error on the auditory attention task, with a median being a single error (53/54). Formal analysis confirmed that the misclassified patients’ deficits were milder on the four diagnostic tasks compared with their correctly classified counterparts (T(164) > 5.13, P < 0.001). The falsely classified patients also showed the opposite pattern, exhibiting more severe performance on the four diagnostic tasks as compared to their counterparts (T(401) > 6.06, P < 0.001).

When the co-occurrence of deficits in the misclassified group was examined, a pattern similar to the China dataset was observed. All but one patient showed deficits in at least two additional tasks. The most common (> 50%) co-occurring deficits were the presence of egocentric visual spatial neglect (14/24), deficits in sentence reading (14/24), and picture naming (13/24).

The patient who was not impaired in any other tasks was a 78-year-old, right-handed woman, with 10 years of education. She had a lesion to the left central sulcus which affected her legs. She was able to use her hands. Her writing deficits emerge from phonological errors she made to three of the four real words. There were no errors on spelling non-words.

Classifiers Comparison

Besides linear SVM, five other classifiers (K-nearest neighbors, linear discriminant analysis, radial basis function–based support vector machine, one-hidden-layer neural network, and two-hidden-layer neural network) were used to observe the effect of different machine learning classifiers on prediction performance. Taking linear SVM as a baseline work, it was found that linear discriminant analysis achieved comparable performance and others caused inferior results. The results are presented in supplementary materials 4.

Discussion

This study aimed to explore the cognitive makeup of word writing across different writing systems (China-logographic and UK-phonologic). To achieve this aim, we combined multivariate analysis approaches to identify diagnostic demographic and cognitive profiles that can differentiate between stroke patients with and without word writing deficits. The experimental results indicated that writing numbers and calculation abilities predicted word writing impairments across both cultures. An ability to copy a complex figure predicted the impairment in writing simplified characters, while the ability to read non-words and to sustain attention predicted the ability to write words in English.

For both cohorts, the models consistently misclassified around 20% of the cases. Not surprisingly, these cases (patients with impaired word writing ability) showed mildly impaired to intact skills in the culturally specific diagnostic tasks. An analysis of the profiles of these misclassified cases suggested that writing impairments in most of these cases were related to spatial attention and/or other language-related deficits. Only two patients, one in each dataset, were impaired solely on the word writing task.

We start by discussing in detail the overlapping of cognitive features across the two cohorts that represent culture invariant aspect of writing, before moving to differences between cultures, representing culture-dependent features. We consider diagnostic features based on the proposed framework. We then discuss the cognitive characteristics of patients who were consistently misclassified. We end by discussing theoretical implications of the current findings, the limitation, and methodological considerations of the study.

Cognitive Features Associated with Writing Abilities Shared Across Cohorts

Writing deficits were associated with a large variety of cognitive features. In accordance with the BCoS internal structure, the elastic net framework identified features that represent processing of all the five cognitive domains of attention, language, praxis, memory, and numbers. Using the linear support vector machine, we further minimize the number of critical features that are needed to predict writing. Specifically, across writing systems, number writing and calculation were crucial when classifying post-stroke impairment versus non-impairments in word writing. Number writing was consistently presented as important predictors in the two cohorts, with 100% selected as differential features and the performance on number writing showed strong correlation with word writing (Fig. 2).

It is not surprising that number writing is also important for predicting word writing deficits. Like writing words, writing numbers is a multifaceted cognitive skill, which taps into speech comprehension, meaning-to-symbols conversion, retrieval of memory in terms of visual symbols, and fine motor control involving the ability to use a pen, as well as coordinating visual and hand movements. The mechanism of word and number writing overlaps significantly in the required visual-motor processing skills that support the output and a potential overlap in processing the input as well.

Writing numbers can be considered logographic writing system (conversion of units of meaning-to-symbols), since numbers do not contain any phonological cues, and can be read by all languages to reflect the same meaning. Both number and Chinese characters used symbols or logograms to represent words and meaning (e.g., 1 = one). This explains why using this single feature led to a decent classification result in the China dataset. Here, we argue that proficient readers/writers, who use phonological (alphabetic) systems, often rely on a direct route from meaning to graphemes for highly frequent and familiar words. This may then explain the large overlap of writing numbers and words in the UK dataset.

Surprisingly, very few previous studies have directly compared writing numbers and writing words/characters. Studies on developmental cognitive impairments provided some evidences that reading/writing words and numbers uses overlapping cognitive abilities [40, 41]. Chen and colleagues [27] reported that in the UK sample, half the patients who were impaired in writing words also demonstrated deficits in writing numbers. However, none of them showed pure writing impairments (i.e., impaired word and number writing, but intact in all other tasks). Voxel-based morphometry analyses [27] also revealed a large overlapping neural network across writing words and numbers, though some dissociations in the neural substrates associated with writing numbers and words were also reported.

Intact calculation ability was also a diagnostic feature for writing words in those who use character (logographic) and alphabetic (phonological) systems. This accords with reports of comorbidities of acalculia and agraphia in the literature [38, 42,43,44,45,46]. The high degree of overlap in writing words and numerical abilities, evident in neuropsychological cases across cultures, challenges theoretical models that suggest a double dissociation of language (reading/writing) and numerical abilities. For example, the neuro-recycling hypothesis [1] postulate that numerical abilities are supported by a module in the right inferior parietal while reading words ability are support by a module in the left temporal cortex.

The abilities of staying alert, sustaining attention, and ignoring distractors, as assessed in the auditory attention task [47], was a diagnostic feature for writing in English and simplified Chinese characters. It was more diagnostic in the English cohort as it was identified as one of the four key features. An ability to stay alert and sustain attention is a basic resource for maintaining a cognitive set [48, 49]. We speculated that the reason why this ability was not as prominent in the China dataset was because most of the variance in sustained attention (that contributed to ability to write word) was actually explained by performances on the complex figure copy task (which also requires relative prolong duration of alertness and focus on the task). The auditory sustained attention and complex figure copy task showed a moderate correlation. Hence, the use of elastic net regularization may have magnified the differential contribution of sustained attention versus complex figure copy in different cohorts.

It was surprising that reading, as measured by the ability to read two long sentences, was only identified as predicting of deficits in words writing in 2% of the experiments (iterations), across both cultures. Performances on the sentence reading task were also not identified as diagnostic feature when the linear SVM learned to classify impaired versus non-impaired writer (presumably because it was ranked as a low important feature following the elastic net procedure). These results are in line with our previous reported analysis of the partially overlapped UK data [29]. Using principal component analysis, we show that beyond a general component associated with all motor and language tasks, a smaller specific component included writing words and sentence reading explained only 3.4% of variability.

Intuitively reading and writing should involve similar linguistic processes in reverse order, as both reading and writing require the mapping between the graphemes (visual marks) and meaning. Counter to this intuition, the current results support the view that reading and writing represent two separate language systems [50]. It is specifically shown that in typical development, spelling and hand writing predict reading words but less so vice versa [51]. Early neuropsychological reports suggested that reading deficits (termed alexia) could be dissociated from writing deficits (termed agraphia) though these is a still ongoing debate on how dissociated these two functions are [52].

Can the weak diagnostic value of the sentence reading task for writing impairments be related to the motor element of writing? Agraphia is typically divided to central agraphia (assumed to reflect deficits in the linguistic properties of language) and peripheral agraphia (assumed to reflect properties of high-level motor control) [29]. There is a lack of evidence in the literature regarding the prevalence of the two types of agraphia following stroke. In our previous paper [29], we provided a breakdown description of errors for a sub-sample of the UK cohort. In particular, only 15% of patients failed the writing task due to illegible writings. As expected, poor writing quality correlated with the ability to use the hands in the gesture and copy figure tasks. In other words, in this study, around 85% of patients produce eligible writing but made mistakes in the letter they used. We therefore believe that motor deficits associated with peripheral agraphia are unlikely to account for sentence reading, for it being a weak diagnostic of writing impairments.

Central agraphia can be further divided into subtypes (e.g., agraphia with non-fluent aphasia, agraphia with fluent aphasia, agraphia with conduction aphasia) depending on comorbidity with alexia (reading deficits). The BCoS is specifically designed to be aphasia friendly [31], such that patients can complete the battery even if they have difficulty expressing themselves with speech. It is possible that difficulty to speak has hindered the performances on the reading but not on the writing task.

Taken together, our results suggest that reading and writing are supported by partially distinct cognitive mechanisms that go beyond differences in low-level sensory-motor element of each function.

Cognitive Features Associated with Writing Abilities in Different Writing Systems

Differences between writing systems emerge in several diagnostic features for word writing impairments following stroke. The ability to copy a complex figure was diagnostic for writing impairments using simplified Chinese characters in the logographic system, while ability to read non-words and sustained attention was diagnostic for writing impairment using English alphabets, a phonological system.

The overall performances in the Complex figure copy were within the top-ranked predictors in the Chinese group. Complex figure copy assesses constructive apraxia–the ability to create a complex structure, in this case using drawings. Complex figure copy contains many visual details (47 in the current task) and relies on ability to correctly organize, place, and position these line strokes on the page. In this respect, it requires a similar level of attention to visually configure lines/strokes as one is writing Chinese Characters.

Complex figure copy was not identified in the proposed framework as a diagnostic feature for writing words in the UK datasets. This result is surprising and appears to contradict our previous reports [29]. Chen and colleagues [29] analyzed comorbidities with a more inclusive sample of the UK dataset (740 patients). The authors reported that over 80% of patients who had deficits in writing number and words were impaired in copying complex figure. A principal component analysis confirmed that the ability to use and control a pen is a latent component of writing and drawing, differentiating it from the use of hand when processing gestures. The pen writing component showed a moderate correlation with the intelligibility of patients’ writing [29]. It is likely that in the current data analysis of the UK dataset, writing numbers captured variance associated with the ability use pen, making complex figure copy a redundant feature.

Non-word reading was a diagnostic feature in the UK but not in the China dataset. This finding accords with the differences in the two writing systems. In comparison to Chinese character, English writers use a phonological to graphological transformations in spelling words. The ability to transform graphemes to phonemes is essential for English writer and readers. Non-word reading was designed exactly to assess this ability.

Misclassified Patients

The current study showed that using relatively small number of features, one could distinguish between patients with and without impaired word writing skills. Looking at the performances of the proposed framework, the sensitivity was poorer than specificity. Further analysis indicated that 13% and 17% of cases in the China and UK dataset, respectively, were frequently misclassified by the framework. The top reason for failing to classify these cases was because they showed mild-to-normal performances in the culture-specific diagnostic tasks. We note that patients who showed writing abilities within the expected healthy range but were consistently falsely classed as impaired tended to perform poorly on the diagnostic tasks (see supplementary materials 3 for details). This demonstrates limitation of analyses which are based on heterogenous groups.

The analysis of misclassified cases showed that visual spatial attention processing was a common co-occurrence to writing word deficits in both writing systems in those cases. This was evident as abnormal asymmetric score in the apple cancelation task. Interestingly, the apple cancelation task measures two types of visual spatial neglect, including allocentric and egocentric neglect [53]. Patients who were misclassified showed primarily egocentric visual neglect. Similarly, visual and tactile extinction were characteristic of patients who were misclassified. This finding suggests that spatial attention may be the key cause of the writing deficits in minority of impaired writer (~ 10%). Note also that this patient group also performed poorly on other tasks. For example, around 50% of the misclassified UK patients had deficits in sentence reading and picture naming. In the China dataset, 25% of cases showed deficits in memory and sustained attention. Wrongly classified patients also performed poorly on other tasks. For example, around 50% of misclassified UK patients had deficits in sentence reading and picture naming. In the China dataset, 25% of cases showed deficits in memory and sustained attention.

Finally, one single patient in each dataset was identified as having only writing impairments without notable deficits in all other tasks. It is difficult to make inferences regarding this observation. Can we infer that these two cases present a pure writing word deficit? We believe it is unlikely. We think it is more likely, due to undiagnosed, developmental disorder, like dyslexia, dysgraphia, or dyspraxia. For example, the estimated prevalence of developmental dyslexia in the UK is 10% (https://www.nhs.uk/conditions/dyslexia/) and 2–12% in China [54]. Thus, there it is likely that at least 3–4 individuals in the current sample would have had developmental dyslexia.

Theoretical Implication–How Does the Brain Represent New Skills

The recycling hypothesis [1] predicts that acquisition of new skill would be represented by a module that is content specific and culturally invariant. Our finding challenges this view and provides support to the alternative reuse hypothesis [3]. We showed that writing impairment was predicted by abilities which involve processing of different content (e.g., writing numbers, number calculation, auditory sustained attention). We further showed cultural variance in the diagnostic features across cultures. Deficits in writing English words were predicted by impairment to the phonological processes measured using reading; while deficit in writing Chinese characters was predicted by constructive apraxia, measured as an ability to copy a complex figure. Finally, the analysis of misclassified cases revealed further heterogeneity in neurocognitive makeup of writing words. This suggests that new skill, such as writing, is unlikely to be developed based on evolutionary constrains, and more likely emerge in more flexible way based on individual experience, leading to idiosyncratically brain organization in each person.

Limitation and Methodological Considerations

We assessed deficits in the current study using the BCoS, which adopts a shallow but broad approach to cognitive screening. The shallow aspect means that a specific ability is assessed using limited number of items. In the case of word writing, BCoS-UK assessed four words and one non-word, and BCoS-CN assessed the writing of four words. The broad aspect of BCoS provides a relatively detailed profile of cognition, which is not limited to one domain, thus providing a powerful research tool to assess prevalence and comorbidity of deficits in a large and representative patient population.

The use of BCoS cannot replace a formal clinical diagnosis of known syndrome, such as agraphia, as it does not adhere to formal diagnostic criteria and has relatively small number of trials per task. One may argue that our results reflect different components of writing deficits and cannot draw a full and direct conclusion on agraphia symptoms. The small number of items per test section also means that data were substile to measurement noise. In addition, the data were extracted from a cognitive screen utilized in a clinical setup, rather than a well-controlled lab-based experiment. Each task in the BCoS, like any tasks in a real-life or experimental context, utilized multiple processing capacities. Unlike lab-based experiment, a clinical screen did not include well-controlled conditions to isolate affected cognitive functions. Therefore, task-based interpretation of specific underlying cognitive processing needed for writing should be done with cautious.

The impairment cut-off data from the matched controls suggested a stringent threshold for classifying word writing impairments. In the UK only dataset, patients who made more than three errors were consider as impaired; this means that even if more than half of the words were written inaccurately (3/5), one could still be classified as intact. As for the China cohort, only patients who did not write any word correctly were classified as impaired. In the absence of premorbid data, we had to rely on the distribution of this ability in the matched healthy population. The cut-off indicates that there is a high occurrence in the population of individual who show poor writing abilities. It is likely that word writing ability is not as well integrated into the population behavioral repertoire as assumed. It also highlights the importance of selecting controls in an experimental-lab context. If controls are selected based on their intact ability to perform a task, this leads to artificially clean data which would inflate the prevalence of a diagnosed impairment following stroke.

The sample size of the word writing deficits group was relatively small, which likely impacted the variability across individuals (small sample tend to be more variable). The number of cases (impaired writer) was smaller than the number of controls (intact writer). To account for this imbalance, the training set always includes equal number of impaired and unimpaired word writers. But it is possible that the smaller impaired set biased the representation of the “true” cognitive makeup of writing impairments. Reassuringly, despite the small sample size, the study results were roughly replicated across the two cohorts, showing large degree of overlap in identified discriminative features.

The current study used a data analysis framework that combined elastic net and linear SVM [35]. In the case of word writing, the most straight forward way to identify impairment is by directly assessing this skill (asking a patient to write). The purpose of the current study was not to develop a model to correctly classify impairment, but instead to use a model to help identify the cognitive makeup of writing. The used framework allowed us to identify critical cognitive features based on individuals’ profiles, rather than as a group average profile.

Classification based on linear SVM was superior to other classification algorithms (see Supplementary material 4). It is unclear why they outperform classification based on neural networks. The latter is designed to be more biological plausible, and hence, it is assumed will provide a better model for human behavior. One possibility is the nature of features. First, the network input was less than a handful of features (three for the CN and four for the UK datasets). This was done to enable a fair comparison to the linear SVM. Second, the features were quite complex in terms of the cognitive processes they represented. Each feature represented a task, and each task is composed of multiple processes. For example, complex figure copy [29] and picture naming [55] rely on a large network of brain regions. Thus, it could be that task involving in integrating complex features does not operate using neural network–type models, or at least not using the simple neural network architecture tested here.

Classification results of the China dataset were superior to the UK dataset. It is unclear why this was the case. It is possible that UK dataset was more heterogenous. Relative to the China dataset, UK dataset was double in size, it also includes more variability of words writing (impaired could score 0, 1, 2), while China dataset included only zero scores. It could also be that as there are redundant routes that support word writing in phonological writing system, like in the English language, there is higher potential for impairment/compensation variability.

Neuropsychological research is typically based on case studies. These extensively assessed patients who present unique impairment profiles, often with clear double dissociations between abilities. Patients with pure cognitive deficits are rare. This is because neurological damage is often not restricted to a specific functional region. But it could also mean that the brain does not have a modular structure. Statistical analyses that aimed to optimize explained variability across large cohort of patients’ population may bias or masked effects that are rare. Specifically, in the context of classification, it is unlikely that rare cases can inform a group-based classification, as they would not consistently be represented in both the training and test sets. To partly mitigate the lack of sensitivity to rare cases, we also considered in detail cognitive profiles of misclassified cases.

Conclusions

The current study showed that impairments in the ability to write word tightly linked to the ability to write and manipulate numbers, independently of the writing system. Ability to write simplified Chinese characters was linked to the ability to copy a complex figure attending to global and local configuration, while the ability to use English alphabet was related to ability to convert graphemes to phonemes, as assessed by the non-word reading task. In around 10% of patients, writing impairment may link to visual spatial attention deficits. The study showed that word writing is a multifaceted process that depended on distributed cognitive abilities. We also demonstrated the usefulness of combining feature selection and machine learning to answer theoretical cognitive questions, with potential clinical translation.