Digital pen technology for conducting cognitive assessments: a cross-over study with older adults

Many digitalized cognitive assessments exist to increase reliability, standardization, and objectivity. Particularly in older adults, the performance of digitized cognitive assessments can lead to poorer test results if they are unfamiliar with the computer, mouse, keyboard, or touch screen. In a cross-over design study, 40 older adults (age M = 74.4 ± 4.1 years) conducted the Trail Making Test A and B with a digital pen (digital pen tests, DPT) and a regular pencil (pencil tests, PT) to identify differences in performance. Furthermore, the tests conducted with a digital pen were analyzed manually (manual results, MR) and electronically (electronic results, ER) by an automized system algorithm to determine the possibilities of digital pen evaluation. ICC(2,k) showed a good level of agreement for TMT A (ICC(2,k) = 0.668) and TMT B (ICC(2,k) = 0.734) between PT and DPT. When comparing MR and ER, ICC(2,k) showed an excellent level of agreement in TMT A (ICC(2,k) = 0.999) and TMT B (ICC(2,k) = 0.994). The frequency of pen lifting correlates significantly with the execution time in TMT A (r = 0.372, p = 0.030) and TMT B (r = 0.567, p < 0.001). A digital pen can be used to perform the Trail Making Test, as it has been shown that there is no difference in the results due to the type of pen used. With a digital pen, the advantages of digitized testing can be used without having to accept the disadvantages.


Introduction and overview
In geriatrics, neuropsychological assessments are used to measure cognitive abilities and to detect changes in cognitive functioning (Tuokko & Hadjistavropoulos, 1998). There is a wide range of cognitive assessments testing neurobehavioral disorders in memory, language, emotions, attention, perception, executive functions, or visuospatial skills (Minagar, Finney, & Heimann, 2015). With advancing digitalization, the possibilities for digitalized cognitive assessments using a computer or tablet have expanded. In particular for the early detection of changes in the elderly and in patients with mild cognitive impairment, many digitalized assessments exist (Wild, Howieson, Webbe, Seelye, & Kaye, 2008;Woo, 2008). The advantages of digitalized assessments are described extensively in the literature. These advantages include an increase in reliability, objectivity, and standardization (Sternin, Burns, & Owen, 2019). An automated administration, scoring and interpretation of data, and the possibility of a convenient data storage are further advantages of digitalized assessments (Cernich, Brennana, Barker, & Bleiberg, 2007;Sternin et al., 2019). Furthermore, tests are able to measure additional data such as response rates on the millisecond level, thereby providing more detailed insight into the patients capabilities. Less is said about the disadvantages of digitalized cognitive assessments. Some literature findings stated that clinicians should use tests on a computer or a tablet with caution and with consideration of potential technical complications (Bracken, Mazur-Mosiewicz, & Glazek, 2019;Cernich et al., 2007). Cernich et al. focus primarily on the possible technical problems in hardware, software, peripherals, the display, connections, and bandwidth as well as program considerations (Cernich et al., 2007). However, there are not only the technical challenges; especially for older adults, the computer with mouse, keyboard, and number pad or possibly a touchscreen can be intimidating or unfamiliar in the beginning. Therefore, initial training is necessary (Fortuny & Heaton, 1996). A study by Weber et al. showed that low acceptance of digitalized assessments and patients with a more negative attitude toward computers, correlates with poorer test results in attention tasks (Weber, Fritze, Schneider, Kuhner, & Maurer, 2002). Visual impairment or age-related vision loss as well as cognitive impairment or motor impairment may also cause relevant problems that are not related to the test results (Silverberg et al., 2011). Bauer et al. also stated that the results can be falsified if the patient has to use their non-dominant hand to manipulate a mouse or a touchscreen, e.g., in hemiparetic patients. They conclude that it cannot be assumed that the results of a paper-pencil test are equal to computerized test results (Bauer et al., 2012). Since computerized assessments showed only moderate correlations with paper-pencil tests (Silverberg et al., 2011), new norm values for results classification have to be collected. The generation of new norm values is associated with high effort and extensive costs.
To benefit from the advantages of digitalization in cognitive testing (e.g., automatic soring, additional information) without having to accept the disadvantages (e.g., unfamiliarity), digital tools besides computer, tablet, mouse, and touchscreen can be a solution. A digital pen allows digitalizing all notes written with the pen on normal paper (more information in section Methodology). The positive effects of digital pen technology were already investigated in learning (Boyle & Joyce, 2019) and as a method to detect conducted homework (Rawson, Stahovich, & Mayer, 2017). Within the study by Rawson et al., a digital pen was used to automatically and reliably record the homework activity to find a connection with academic achievement (Rawson et al., 2017). Further concepts describe the possibilities of digital pen usage as an intuitive assistance tool for persons with dementia to improve communication, for example, when writing or answering emails (Prange, Sandrala, Weber, & Sonntag, 2015). Within the demo paper of Prange et al., a digital pen was used that streams its data via Bluetooth directly to a server, while the dementia patients write with a seemingly normal pen on paper with an invisible dot pattern. The authors of the demo paper also point out that the cognitive status of a participant can affect pen holding and (fine-) motor movements (Prange et al., 2015).
The possible advantages of digitalized cognitive assessments (Cernich et al., 2007;Sternin et al., 2019) and the advantages of a digital pen (Boyle & Joyce, 2019;Rawson et al., 2017) have already been examined in various studies. The use of a digital pen in cognitive testing can help to take advantage of digital cognitive assessments without the disadvantages of unfamiliarity or lack of acceptance. The aim of the presented study was to show whether the digital pen technology could be used to conduct cognitive assessments with older people, in contrast to regular paper-pencil execution. Therefore three hypotheses were generated: H1: The execution time in TMT A and TMT B is not influenced by the type of pen (duration DPT = duration PT).
H2: The electronically measured execution time (ER) correlates significantly with the manually measured execution time (MR).
H3: The additional, electronically measured parameters (number of pen lifts, errors, omitted circles, all circles hit, correct order) correlate significantly with the execution time.

Study design and process
To test the hypotheses, 40 participants were included in the presented cross-over design study. Inclusion criteria for participants were a minimum age of 65 and a participants' signed informed consent. Exclusion criteria were severe cognitive disorders, mental diseases, severe auditive, visual, linguistic, sensory or motor limitations, chronic pain, or a legal representative. The participants conducted the Trail Making Test A (TMT A) and the Trail Making Test B (TMT B) (Reitan, 1992) as one of the most widespread assessments for the general examination of brain function (Tischler & Petermann, 2010). In the presented study, we used the TMT version of the CERAD (Consortium to Establish a Registry for Alzheimer's Disease). In TMT A and B, participants had to connect numbers (TMT A) or numbers and letters alternating (TMT B), respectively, in ascending order, without lifting the pen from the paper. The required amount of time (execution time) represents the test results in TMT A and B. If the participant made a mistake, it was immediately corrected by the study personnel, by pointing to the error and the solution, which is done very quick and without pausing the time. Therefore, the number of errors affect the participant's score only in that the correction of errors is included in the completion time for the task. The number of errors was not documented separately. Participants used their dominant hand for the execution. All participants conducted the cognitive tests twice: once with a pencil (pencil tests, PT) and once with a digital pen (digital pen tests, DPT), in both cases on paper. Therefore, the surface on which the test was performed did not affect the test results. Trained study personnel gave the instruction in PT and DPT. The execution time in DPT was measured by the study personnel (manual result, MR) and by an automized system algorithm (electronic result, ER). To decrease sequence effects, half of the participants started with a pencil the other half with the digital pen (sex-stratified, Fig. 1). There was a wash out phase between PT and DPT of approx. 30 min. During this phase, participants completed a self-developed questionnaire to collect socio-demographic data and a questionnaire to record the technology commitment (Neyer, Felber, & Gebhardt, 2012). The self-developed questionnaire for collecting sociodemographic data included questions on age, marital status, education, income, and health status ( Table 1). The questionnaire for recording technology commitment comprises 12 statements. Four statements each relate to the acceptance of technology (e.g., I am very curious about new technical developments), technology competence (e.g. In dealing with modern technology I am often afraid to fail), and technology control (Whether I am successful in the application of modern technology depends to a large extent on me). The participants rated their agreement with each statement on a 5-point likert scale (do not agree at all-agree completely) (Neyer et al., 2012).

Digital pen technology
As a digital pen, participants used the Neo SmartPen N2 (https ://www.neosm artpe n.com). Weighing 22 g, the pen is slightly heavier and thicker than a usual pencil. A small infrared camera within the pen, which recognizes the unique pattern of dots printed on each page, captured all written or drawn information. This enables a direct digitalization of user input in real time (Fig. 2). Accordingly, in preparation of this study, the blank forms for the TMT A and TMT B were printed on paper with the almost invisible dot pattern.

Data analysis
All data and test results were analyzed using SPSS. The PT and DPT results and the MR and ER were compared using paired samples t test, Pearson correlation, and intraclass correlation (ICCs(2,k); two-way random effects, absolute agreement). According to Cohen et al., Pearson correlation coefficients higher than 0.10 indicate weak correlation, values higher than 0.30 indicate moderate correlation and values higher than 0.50 indicate high linear correlation (Cohen, 1988). According to Cicchetti et al., ICC values less than 0.40, between 0.40 and 0.59, between 0.60 and 0.74, and values higher than 0.75 indicate poor, fair, good, or excellent reliability, respectively (Cicchetti, 1994).

Sample
Forty older adults (age M = 74.4 ± 4.1 years, range 67-85 years) were included in the cross-over design study.
Half of them were female. Most of the participants were well-educated (57.5% high-level education), married (57.5%), and right-handed (95.0%). There were no statistical differences between participants who performed PT first (n = 20, age: 74.4 ± 3.7 years, 50% female) and the participants who performed DPT first (n = 20, age: 74.3 ± 4.6 years, 50% female) regarding socio-demographic data and technology commitment. Differences were only seen in income (t(39) = − 2.014, p = 0.05). Table 2 reports the mean values and standard deviation of PT and DPT for TMT A and TMT B, the differences, and the 95% confidence intervals of average differences. T test showed no significant differences between PT performance and DPT performance for TMT A (t (39) Table 2). Within the group of subjects who first performed the tests with a pencil, strong positive correlations between PT and DPT were found for TMT A (r(18) = 0.527, p = 0.017) and TMT B (r(18) = 0.915, p = 0.000). Within the group of subjects who first performed the tests with a digital pen, strong positive correlations between PT and DPT were found for TMT A (r(18) = 0.627, p = 0.003) and TMT B (r(18) = 0.783, p = 0.000). The fundamental agreement in results of PT and DPT for TMT A and TMT B are also shown in the Bland Altman Plots where the differences of DPT and PT are plotted against the mean of the two measurements (Fig. 4). Two (5.0%, TMT A) and four (10.0%, TMT B) data points are outside the limits of agreement.

Comparison of performance in pencil and digital pen test
In addition, participants were asked whether they believed that the type of pen influenced the test results. Almost all participants (95.0%) answered the question in the negative.   Table 3). The Bland-Altman plot (Fig. 5) shows that three data points (7.0%) each in TMT A and TMT B are outside of the agreement limits.

Additional parameters of ER
The algorithm was able to analyze additional parameters in test execution (e.g., pen lifts and errors). Despite the instruction not to lift the pen from the paper during the test, this was done on average 4 (TMT A) and 6 times (TMT B). The frequency of pen lifting correlates significantly with the execution time in TMT B (r(37) = 0.561, p = 0.000). The more often participants lifted the pen from the paper when connecting the circles, the more time was needed. The number of (automatically detected) errors correlates significantly with the execution time in TMT B (r(37) = 0.336, p = 0.036), but not in TMT A (r(37) = 0.289, p = 0.074). The number of omitted circles showed no significant influence on the processing time. Furthermore, there were no mean value differences in the execution time between participants who hit all circles or who did not (Fig. 3a, Table 4).

Discussion
The present study investigated the influence of the pen on the test results of older participants in TMT A and B, as well as the potential of electronic evaluation, to answer the question whether a digital pen can be used for conducting the Trail Making Test. The first hypothesis, that the execution time is not influenced by the type of pen, is confirmed by the present study. The study shows no significant differences and good agreement in test results between PT and DPT. Even though the instruction for TMT states that a pencil should be used for TMT performance and the digital pen is heavier and thicker than a pencil, the test results were almost the same. The influence of pen design on drawing and writing, not on cognitive test performance was investigated in a study by Goonetilleke et al. The authors showed that speed and writing ability were not influenced by pen shape or pen size (Goonetilleke, Hoffmann, & Luximon, 2009). That result is additionally confirmed by the subjective assessment of the test persons that they do not believe the test result is influenced by the type of pen. Minor differences were found in the time taken for a drawing, which increases when the pen size decreased. Since the size of the pencil and the digital pen used in the present study were nearly identical, this difference could not be demonstrated in the results. The findings by Silverberg that sensory or cognitive abilities can lead to problems when conducting digitized assessments (Silverberg et al., 2011) cannot be confirmed by the study.
The second hypothesis that the electronically recorded execution time corresponds to the manually measured execution time can also be confirmed. This is in accordance with a study by Dahmen et al. equally based a digitalized version of the TMT (Dahmen, Cook, Fellows, & Schmitter-Edgecombe, 2017). Within the study, the authors showed that the predicted digital TMT scores correlate significantly with clinical digital test scores. In the study by Dahmen et al., the authors also investigated several additional features (besides time to completion and number of errors) such as timing features (e.g., average pause duration and average lift duration) and mobility features (e.g., number of pauses, number of lifts, and pressure). The results showed that the inclusion of all parameters (mobility and timing) does not provide the best prediction for test results. Furthermore, in the present study, it could be shown that not all additionally recorded parameters correlate with the test result. Only the number of pen lifts and the number of errors showed a significant influence on the duration of the test in TMT B.
A study by Bracken et al. postulates that new standard values must be generated for the application of digital or computerized test procedures, which means a considerable effort (Bracken et al., 2019). This seems not necessary for the use of the digital pen technology, because the method of execution is almost identical.

Limitations
Within the presented study, we investigated only a small homogenous sample, including well-educated, healthy participants. Although participants with severe cognitive, auditory, and visual impairments were excluded in the study, there may be differences in the cognitive and sensory abilities of the subjects. These abilities were not considered in this study. It is possible that our findings will not apply to more heterogenous samples, especially for participants with severe cognitive disorders. Due to the healthy sample, almost all test persons scored average or above average in the TMT, so we could not determine whether the significant differences in the mean values would misclassify some patients. Furthermore, the number of errors made were not recorded by the study personal, so this could not be compared between the manual and the electronical results. Problems that can occur when using technical systems, such as the failure of the technology, the need for recharging, connection problems or the higher price (in comparison to a pencil), were not considered in the present study.

Conclusion
A digital pen can be used to perform the Trail Making Test, as it has been shown that there are no differences in results due to the type of pen used. The parameters recorded in the paper-based version (time, errors) are easily measured digitally, reducing measurement errors and minimizing the influence of the rater. To conclude, the advantages of digitalized cognitive assessment can be used without suffering the disadvantages. The evaluation of additional parameters did not correlate with the test results in the presented study and should be considered with caution when evaluating the results.
Funding Open Access funding enabled and organized by Projekt DEAL. This study was funded within the project InteraKT by the Federal Ministry of Education and Research Germany (16SV7769). The authors declare that they have no financial or personal relationship that influence (bias) their work.

Compliance with ethical standards
Ethical standards All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent Informed consent was obtained from all individual participants included in the study.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.