MobileCogniTracker

As the population ages, cognitive decline is becoming a worldwide threat to older adults’ independence and quality of life. Cognitive decline involves problems with memory, language, thinking and judgement, thus severely compromising multiple aspects of people’s everyday life. Diagnosis of cognitive disorders is currently performed through clinical questionnaire-based assessments, which are typically conducted by medical experts once symptoms appear. Digital technologies can help providing more immediate, pervasive and seamless assessment, which could, in turn, allow for much earlier diagnosis of cognitive disorders and decline. In this work, we present MobileCogniTracker, a digital tool for facilitating momentary, seamless and ubiquitous clinically-validated cognitive measurements. The proposed tool develops digital cognitive tests in the form of multimedia experience sampling questionnaires, which can run on a smartphone and can be scheduled and assessed remotely. The tool further integrates the digital cognitive experience sampling with passive smartphone sensor data streams that may be used to study the interplay of cognition and physical, social and emotional behaviours. The Mini-Mental State Examination test, a clinical questionnaire extensively used to measure cognitive disorders, has been particularly implemented here to showcase the possibilities offered by our tool. A usability test showed the tool to be usable for performing digital cognitive examinations, and that cognitively unimpaired persons in the relevant age-group are capable of performing such digital examination. A qualitative expert-driven validation also shows a high inter-reliability between the digital and pencil-and-paper version of the test.


Introduction
The World Health Organization has recently concluded that circa 15% of adults over 60 suffer from a cognitive disorder (World Health Organization 2016) and 47.5 million people are affected by dementia, with 7.7 million new cases every year (World Health Organization 2017). Mild cognitive impairment (MCI) is an intermediate stage between the expected cognitive decline of normal ageing and the moreserious disorder of dementia. MCI is quite a relevant stage as it involves cognitive changes that are grave enough to be noticed by the person experiencing them or to related people, but not severe enough to interfere with daily life activities or independent function. As a consequence, most applications for treatment and diagnosis target MCI or an early stage of dementia since first symptoms can generally be more easily spotted and treatment in these cases can often reduce the speed of cognitive decline, thus allowing patients to retain control over their lives as long as possible (Spitzer and Williams 1998).
Cognition, its development and evolution, has become a relevant matter of study in multiple domains such as psychology, neuroscience, linguistics, computer science, mathematics, ethology and philosophy. In a broad sense, cognition refers to any process with some bearing on the functioning of the mind (Reed 2012). Cognition can be more specifically defined as "the mental action or process of acquiring knowledge and understanding through thought, experience, and the senses" (Oxford Dictionaries 2017). The study of cognitive abilities or skills is therefore much relevant as they are essential for accomplishing any daily living task. While a wide variety of cognitive abilities can be described, most commonly the cognitive abilities of perception, attention, memory, language skills, visuospatial processing, and executive function are differentiated. Table 1 shows a summary of these cognitive abilities, the main tasks they are used for and some examples of daily activities for which they are typically necessary.
According to (Bermúdez 2014), "the guiding idea of cognitive science is that mental operations involve processing information, and hence that we can study how the mind works by studying how information is processed". Thus, the analysis of the information processing can be used to identify anomalies in the regular cognitive functioning of an individual, for example, by observing that it takes them longer than usual to think of a word or to recall a person's name. This principle is leveraged in cognitive screening tests such as the Mini Mental State Examination (MMSE), a popular questionnaire-type test extensively used in clinical and research settings to diagnose cognitive disorders based on the observation of an individual's mental performance. The original MMSE, as defined by ( Folstein et al. 1975), consists of a number of tasks including questions and problems addressing the time and place of the test, repeating lists of words, arithmetic operations, language use and comprehension, and basic motor skills. The test starts with the orientation section, awarded with a maximum score of ten points, which assesses spatial and temporal orientation. Here the individual should correctly identify the current time (year, season, date, day, and month) and location (state, county, town, hospital, and floor). This section is followed by a registration task, where subjects are asked to repeat the name of three unrelated objects, for a maximum score of three points. Thereafter, attention and calculation are assessed by asking participants to subtract seven from hundred five times, or alternatively to spell the word "world" backwards for a maximum of five points. Subjects are then asked to recall the three objects introduced during the registration section, for a maximum of three points. Next, the individual is asked to name two given objects or to write a sentence of their choosing to assess language aspects, which is scored with up to two points. This task is followed by a repetition section where subjects are first requested to read and follow a specific command. Finally, some complex commands are asked to the user, for example, copying and drawing two intersecting pentagons, for a total of nine points. Based on the answers given to the test, the examiner can in principle assess the level of consciousness of the subject from alert to coma (Folstein et al. 1975). The maximum achievable score is 30 points, but a score of 24 or higher indicates normal cognitive functioning (Copeland et al. 2002;Folstein et al. 1983). Below this, scores can indicate severe ( ≤ 9 points), moderate (10-18 points) or mild (19-23 points) cognitive impairment (Mungas 1991).
Cognitive tests like the MMSE are typically conducted in the presence of a specialist. The role of the specialist, prior and during the test, normally limits to explaining the structure of the test to the participant and guiding them through the examination. Processing and assessment of the answers to the test is performed posteriorly. These tests require no specialized equipment or training for administration, thus making them easy to use (Harrell et al. 2000). However, the need for a person to introduce the test and collect the answers fairly constrains the frequency of administration of these tests. As a consequence, cognitive assessments are performed a few points in time and mostly upon clinical prescription when first symptoms appear. Moreover, the presence of an expert can affect the normal cognitive responses of the individual, hence introducing some level of bias in the results. In the light of these limitations, we propose a new , which can run on a smartphone and can be configured and scheduled remotely. The tool further integrates the digital cognitive experience sampling with multiple smartphone sensor data streams that can be used to study the interplay of cognition and physical, social and emotional behaviours. All the collected data is securely communicated via the internet and made available to the expert through a server for its post-processing and analysis, thus avoiding the need for the individual to visit the clinician's office or any dedicated facilities.
The key contributions of this work can be summarised as follows: • MobileCogniTracker: a digital cognitive experience sampling tool We identify the requirements posed by traditional clinical cognitive assessment tests and design a digital tool that operates on commonplace mobile devices. The tool facilitates the creation, administration and realisation of different question and problem-type tasks in the form of ESMs that can be performed at any location or context as people go about their lives. Mobi-leCogniTracker is fully modular and configurable so that the expert can define the sequence and schedule in which the tests are to be prompted to the user. The tool supports a variety of new ESMs extending beyond simple text questions or checkboxes to cognitively relevant drawing or object-interaction methods. MobileCogniTracker also democratises the execution of cognitive assessment tests by supporting different interaction channels (e.g., audio, voice, text). • An integration of MobileCogniTracker with passive multimodal mobile sensing We particularly move beyond standalone apps while interfacing the cognitive experience sampling method with a popular mobile instrumentation framework. MobileCogniTracker facilitates the collection of not only user active responses but also other types of passive data generated during the daily interaction with their mobile devices. This passive data is considered to help validate the contextual information registered during the tests (e.g., date or location) and the study of the interplay of cognitive and other components of human behaviour (e.g., physical activity or social interaction). • A realisation of MobileCogniTracker based on the full implementation of the MMSE We analyse the characteristics of the MMSE, one of the most widely used cognitive assessment tests, and identify the main challenges posed by its translation to a mobile digital platform. We develop a digital version of the MMSE which is targeted at Android mobile devices.
• A study of the usability and reliability of MobileCogni-Tracker We develop a preliminary user evaluation of the usability of the tool. This evaluation is primarily intended to determine whether users find any difficulties while taking the test on a mobile device. We do not evaluate in this work the clinical validity of the MMSE, which has been already proven in prior work (Tombaugh and McIntyre 1992). However, we do perform an expert-based validation of the proposed digital version with respect to the pencil-and-paper method.
The remainder of the paper is organised as follows. Section 2 presents an overview of the state-of-the-art in digital mobile sensing and assessment of cognitive disorders. Section 3 describes the requirements, design choices and implementation of the proposed mobile cognitive tracking tool. Section 4 presents the usability and reliability study setup, methods and results, which are further discussed in Sect. 5. The main conclusions of this work are summarised in Sect. 6.

Related work
Diverse digital mobile tools have been proposed in the past to measure and treat cognitive disorders. A major part of these tools refers to commercial apps aimed at assessing and treating cognitive disorders through so-called "brain games". For example, Sea Hero Quest (T-mobile 2017) is a mobile serious game that assesses cognitive functioning by analysing the user's spatial navigation skills. The player is invited to navigate a ship within the game world through sea mazes, direct flares and photograph sea monsters challenging memory use, spatial recognition, and orientation (Aškić et al. 2016;Morgan 2016). Lumosity (Lumos Labs 2017) is an app consisting of several mobile games assessing the functioning of different cognitive abilities. The app elaborates on various developmental aspects, namely targeting (specific cognitive abilities involved in daily tasks), adaptivity (tasks of varying difficulty), novelty (non-over-learned exercises to drive nervous system remodelling), engagement (positive encouragement to stimulate brain learning and processing), and completeness (full spectrum of cognitive abilities used in daily activities). Some of the games implemented by Lumosity include tracking and remembering different fishes moving across the screen to train visual attention and working memory, comparing visual and verbal information from familiar faces to exercise associative memory, or identifying hidden rules in a card game to train mental flexibility and working memory (Hardy and Scanlon 2009). MindMate (NHS 2017) is another mobile gaming application that aims to empower not only people with dementia but also families and carers. This app provides a set of interactive games to stimulate the user's cognitive abilities through problem-solving and memory-training activities. MindMate also explores the use of self-reporting methods to keep track of some daily activities. Tapbrain (Kang et al. 2016) is a serious game consisting of thirteen mini-games to stimulate brain exercise and four mini-games to induce physical activity. The objective of Tapbrain is to stimulate cognitive brain functions by targeting triggers that stimulate the brain and also that induce physical movements. Although most of these applications claim to help improve some cognitive abilities, through the general assumption that mental exercising can improve cognitive functioning, only a few are backed by clinical evidence. As a result, some of these apps have been shown to be deceptive and thus involved in some legal issues (Federal Trade Commission 2016). Smartphone passive sensing has been explored in research to measure daily behaviours, which could in principle give insight into cognitive disorders. For example, various works have exploited the smartphone's inertial sensors, GPS and microphones for detecting indoor and outdoor physical activities (Ouchi and Doi 2012;Banos et al. 2015;Hur et al. 2017), which may relate to the cognitive state of a person (Hayes et al. 2008;Hagler et al. 2010). In a similar fashion, Bluetooth scans, photo captures and ambient audio recordings are used to measure levels of sociability (Lane et al. 2011;Vu et al. 2015), that may be linked to cognitive functioning (Akl et al. 2016). Some sophisticated approaches even measure physiological parameters such as heart rate and breathing rate through the smartphone's accelerometer using ballistocardiography (Hernandez et al. 2015), which may be used to analyse cognitive stress (McDuff et al. 2016). The straight application of smartphones for the measurement of cognitive functioning is, however, a fairly uncharted area. There exist very few studies and they all mainly focus on the measurement of attention. In Brown et al. (2014) the authors recorded mobile phone usage including messages, social media and internet navigation for fifteen users during approximately three months. The analysis of this data allowed them to identify potential fill or kill times or even breaks, normally related to boredom situations. Mobile phone interaction (e.g., amount and types of apps used), context (e.g., light levels) and demographics are also combined in Pielot et al. (2015) with machine learning techniques to automatically spot these boredom situations. The results of this work demonstrate that the recency of communication, usage intensity, time of day, and demographics are the best categories of features to fairly identify situations where attention is scarce. Subjective and objective smartphonebased assessments of alertness and fatigue are compared to the influence of chronotype and time-of-day performance in Abdullah et al. (2016). Subjective assessments include questions on alertness, fatigue, as well as recent activities influencing levels of alertness and fatigue, such as the consumption of caffeine, exercising, or napping. Alertness is measured on the smartphones using an independent application implementing the so-called "Psychomotor Vigilance Task". This work shows that alertness can oscillate approximately 30% depending on time and circadian rhythms. The authors also showed that daylight saving time, hours slept, and stimulant intake can influence alertness. When more alert, participants checked their phones more frequently but for shorter lengths of time, while during low alertness, participants engaged in more sustained use.
Despite the relevant progress made by above and related works, current unobtrusive measurements focus on a single cognitive ability and are thus not able to provide a comprehensive overview of the cognitive state comparable to clinical tests. Hence, these techniques cannot be used in isolation yet for diagnosing cognitive impairment. It is also clear that traditional clinical tests do not provide a way of continuous assessment of cognitive state in the way this could be achieved by monitoring sensor data through unobtrusive measurements. Therefore, a technical approach combining clinical tests with unobtrusive measurements of behaviour is considered to be necessary in order to find and validate new measurements for more reliable cognitive assessment.

MobileCogniTracker
Developing new cognitive assessment methods based on unobtrusive mobile sensing data requires reliable digital tests exploiting the diagnosis capabilities of well-established clinical procedures. In that vein, we propose MobileCog-niTracker, a digital tool that complements and extends the potential of existing mobile passive sensing platforms for the measurement of people's cognitive functioning. Mobi-leCogniTracker develops an innovative experience sampling tool that helps automate and objectify the measurement of clinical-grade cognitive data. In the following, we describe the requirements, design choices and implementation of the proposed mobile cognitive tracking tool.

Requirements
We use the MoSCoW prioritization technique (Clegg and Barker 1994) for the elicitation of the requirements of MobileCogniTracker. The tool must work on modern mobile devices. Tablets would perhaps be the preferred choice given the typically large size of the screen, which resembles in a way the format of typical clinical assessment questionnaire-type handouts. However, the tool must also support the realisation of the cognitive assessment tests on smartphones since they are both more available to users and more frequently used for passive sensing. It must be possible to create different sections, as it is normally developed in clinical cognitive tests, which also contain separate tasks. These sections and tasks are typically organised around a specific cognitive ability, thus their separation facilitates proper administration, reusability and shareability among tests. The tool must present the information in the form of schedulable experience sampling methods. The specialist must be able to remotely specify the time when the application should ask the test questions to the user. The expert must be also able to specify separate schedules for the different test sections. In this way, the tests can be partitioned into various parts, possibly measuring different cognitive abilities, which are administered at different times according to the study or user preferences. This requirement is also of much relevance when it comes to the study of both temporal and contextual effects on the realisation of the tests. The data must be stored on a secure server for further analysis. This is a crucial requirement as to avoid any hazardous situation or malicious use of the collected data by third unauthorised parties. Finally, the system must allow for future extensibility, specifically the integration with new unobtrusive sensors and experience sampling modalities.
In clinical cognitive tests, the majority of answers involve a user talking to the specialist. For other questions, users are asked to write or draw their answers. Therefore, different input methods should be supported, allowing users to write, draw or speak their answers freely depending on the question. Voice input can be achieved through speech-to-text functionality, which would allow several questions to closely resemble the traditional testing scenario and avoid typing issues for users with reduced motor coordination. Furthermore, it should also include a text-to-speech functionality, meaning the instructions are read-out-loud to the user in order to avoid misinterpretation and facilitate accessibility to people with mild visual impairment. It could be possible to allow the specialists to change the order in which the test sections and tasks are presented so that the test subjects will be less likely to remember which sections or tasks are to appear next. Together with the capability to schedule specific test components, this feature could allow sections or tasks to be easily replaced by other similar ones, thereby avoiding a learning effect on the subjects after executing the test several times.

MobileCogniTracker architecture
The architecture technical diagram of MobileCogniTracker is shown in Fig. 1. The mobile device is the core entity, which communicates with the other main entities, i.e., the user and the server (expert). The expert sets, through the server, the study properties both in terms of tests to be realised by the user and the schedule for their administration. These properties are stored in a configuration file that is then communicated to the mobile application, which automatically updates the local configuration as to ensure proper operation in the absence of internet connection. At the scheduled time, the app pushes a notification awaiting for the reaction of the user. Once the user clicks on the notification, the corresponding cognitive test, i.e. question(s) and/or task(s), is prompted to the user for its realisation. Questions and tasks can be read on the screen or spoken out for the user convenience through text-to-speech functionalities natively supported by the mobile operating system. The user's answers, which can come in different modalities, namely text, voice and drawings, are stored temporarily on the mobile device. This, and possibly other mobile sensor data, is periodically synced with the server as to make it available to the experts for further analysis.

Cognitive experience sampling methods
This section presents the set of experience sampling methods we have developed for the realisation and collection of the cognitive tests. According to the requirements identified above, different methods for giving instructions and capturing answers must be provided. Plain text (Fig. 2a) is suggested for defining the scope of a given task, describing the instructions to be followed by the user and noting the start or finalisation of the test. This view is also useful in some other cases where users are requested to remember the instructions, e.g., some given names, as part of the current or a future task. Text (Fig. 2b) and numerical (Fig. 2c) inputs are considered for answering questions such as those asking for the current date or location. These types of input are commonly used in mobile devices and they can be easily adapted to each user as to maximise accessibility, e.g. by enlarging the font or display size through magnification. Some cognitive tests require copying given objects (Fig. 2e) or sketching concepts (Fig. 2d), which in turn involve drawing. Fairly ample canvases are considered for such drawings, thus allowing individuals to use their fingertips as a sort of pen. Digital pens, sometimes available for some brands, can also be used. Some tasks involve the repetition of a given piece of text (Fig. 2f). The voice is used in such case as input, which in combination with the text-to-speech functionality helps to automatically transcribe the answer of the user. This approach makes it possible to capture, in the form of text, any voice from supported languages, and virtually the automatic translation to any other. Both text and voice can be used to name a given object that is presented to the user through an embedded picture (Fig. 2g). This type of view is particularly regarded for tasks involving the recognition of items. Finally, we also consider the development of an experience sampling method to realise n-step command type tasks. These commands tend to involve manual handling, which poses special challenges to be implemented on mobile devices. For example, in the paper-and-pencil version of the MMSE, the participants have to take a sheet of paper in their right hands, fold it in half, and place it on the floor. The relevance of this task is not on the physical aspect but on remembering three different instructions and executing them. Therefore, similar tasks involving alike steps can in principle be developed. Users are presented first with the instructions, e.g. arranging some circles in a specific order depending on their colour, which are followed by the interaction space where the user can perform the task (Fig. 2h).

Implementation
MobileCogniTracker has been developed using Android Studio Version 2.3, and it has been tested on Android versions  (Ferreira et al. 2015). The motivation for choosing this framework is twofold: (1) it provides a client-server mobile framework that supports the collection of unobtrusive passive sensor data; and (2) it is licensed under the Apache Software License 2.0 so it allows for changes and extensions to the core code. For serialisation of XML files the Simple-XML serialisation framework for Java version 2.7.1 has been used.
The tool uses a server-client approach, which is enabled through AWARE. Experts can easily set up a study on the AWARE server through a web-based dashboard. Here, for example, the specialist can define the type of mobile data to be recorded on the user device, e.g., acceleration, battery usage or phone call logs, to name a few. Users can then join a study by simply scanning a QR code through the AWARE Fig. 2 Developed cognitive experience sampling dialogues. a Welcome message, task description, instructions. b Orientation task with text input. c Orientation task with numeric input. d Complex task with drawing input. e Repetition task with image-based instructions and drawing input. f Repetition task with voice input using textto-speech functionality. g Object naming task with image-based instructions and text-based input. h N-step command task with movable objects input mobile app. Once it is running, the app sends periodically the collected data to the server over WiFi or 4G. We refer the reader to Ferreira et al. (2015) for additional details on the characteristics of the AWARE framework.
AWARE also supports basic ESM, which is a sort of plugin or extension to the default set of available sensor types. These ESM questionnaires are executed remotely and can be scheduled using the web dashboard or from within a plugin. AWARE provides some ESM types, including free text, radio buttons, checkbox, Likert scale, quick answer, scale, and numeric types. The ESM consist of a title, the instruction text, the submit button text, and the user answer which is encoded into a string. Additionally, it is possible to specify for how long the notification should be active for and how much time the user has to answer the question. The development of MobileCog-niTracker thus consisted of changes to the core of the MobileCogniTracker extends AWARE to support the scheduling and construction of ESM-based tests provided in XML form. This is one main advantage of our tool with respect to similar approaches since the tests can be fully customised both in terms of contents and schedule, without requiring to recompile the application at all. Mobi-leCogniTracker can create an ESM questionnaire and set a schedule based on a definition in an XML file that follows the XML schema. The schema is defined as follows. Each test is defined through one or more components, and each component can consist of one or more tasks and/or questions respectively. Namely, a component consists of a name (<name>) and the task to be performed (<task>). The task is composed of the question(s) to be asked to the user (<question>), an optional score given to that question (<score>), the type(s) of experience sampling elements (<ESM_Type>) and the specific instructions given to the user (<instructions>).
A simplified example of a possible schema is given in Listing 1. This example shows the XML file for the clock drawing task, a classical clinical cognitive assessment test (Royall et al. 1998). In this example: the name and a short description of the role of the test is provided; the textto-speech functionality is enabled to read the instructions out; the question or instruction is defined as well as the experience sampling type, here similar to the one shown in Fig. 2; finally, the ESM is set to activate on Mondays at 11:10. The question and score of the task are only relevant for reference to the pencil-and-paper version of the test.
Listing 1: The clock drawing test digitised by following the test definition schema with a schedule that triggers every Monday at 11:10. <? xml version = " 1.0 " encoding = " utf -8 " ? > < TestDefinition xmlns:xsi =" http: // www . w3 . org /2001/ XMLSchema -instance " xsi:noNamespaceSchemaLocation = " TestDefinition . xsd " > < name > Clock Drawing Test </ name > < short_name > CDT </ short_name > < description > Participants are asked to draw a clock face . </ description > < text2speech > true </ text2speech > < Component > < name > CDT Component </ name > < task > < Question > Draw a clock face at 11 :10 </ Question > < score >10 </ score > < Aware > < ESM_Type > ESM_DRAW </ ESM_Type > < Title > Clock Drawing Test </ Title > < Instructions > Please draw a clock face at 11 :10 </ Instructions > </ Aware > </ task > </ Component > < Schedule > < id > ScheduleName </ id > < hour >11 </ hour > < minute >10 </ minute > < weekday > Monday </ weekday > </ Schedule > </ TestDefinition > The responses to the cognitive ESM are stored in a central SQL database. The structure is shown in Table 2. The database contains among others: the 'device id' which unequivocally and anonymously identifies each device partaking in the study; the 'esm json' field which shows the specific question from the cognitive tasks that was executed; the 'esm user answer' where the answer is collected. For non-textbased answers, such as drawing, copying, and rearranging circles, the data are first converted into strings before sent to the server. Namely, in the case of the drawing and copying tasks, the user-drawn image is encoded into a base64 string, which allows it to be easily stored in the database. The voice-enabled keyboard, i.e. speech-to-text functionality available on most Android phones, is used as an alternate option for users to input their answers. It should be noted that at this stage the system does not perform any analysis, identification, or categorisation of user input whatsoever. Therefore, as it is the case for the pencil-and-paper version, a clinical expert is required to analyse the results a posteriori.

Study setup
A usability study is conducted in order to evaluate how MobileCogniTracker is perceived by end-users. A total of 26 participants of diverse gender, age, education level and employment status were recruited (Fig. 3). The participants evenly distribute between two relevant age groups: older adults (65 +) and (young) adults ( < 65 ). MobileCog-niTracker is eminently targeted at older adults, which are more prone to develop cognitive impairment, and as such, main candidate users of this tool. Thus, we selected a group of 13 seniors aged 65 years old or above (Fig. 4). Despite the prevalence of the use of this test for older adults, we also find of much interest to evaluate how (young) adults perceive the tool. Hence, a similar-size group of 13 (young) adults were also considered for this evaluation (Fig. 5).
All participants reported having no cognitive impairment to their knowledge. A preliminary cognitive screening was out of the scope of this first evaluation. The study was conducted at the University of Twente (Netherlands). In view of the observational nature of the experiment and healthy condition of the participants an ethics approval was deemed not necessary by the competent committee. Written informed consent was obtained from all participants for the collection of the data and the publication of this case report and any accompanying figures.
Participants were arbitrarily provided with a smartphone, either Google Pixel, Samsung Galaxy S7, LG G5 or Huawei P9, which are relatively similar in size and functionality. MobileCogniTracker was installed on the smartphones beforehand. An instance of the MMSE was particularly considered for this evaluation as it implements most of the developed experience sampling methods. The test was scheduled at a given point in time and automatically communicated to the participant through a notification. Participants were instructed to click on the smartphone notification to start the test. The mobile phone would then open the first dialogue box with the first section of the digitised MMSE. After completing the tasks of a given section, the user is automatically prompted to the next one, similar to the way it is performed for the MMSE pencil-and-paper format. The usability test was performed in a single day.
It should be noted that participants did not get a training session or additional information apart from the study aims as described in the informed consent. Although MMSE users are typically instructed by the medical expert before using this tool, we did not want to influence the performance of the user, unless otherwise necessary, as to fairly comprehend the limitations that MobileCogniTracker may have while realising this or similar tests. Despite specific learning sessions were not held in our study, we find completely reasonable to explain the test at the point of need.
An initial assessment of the reliability of the digital tool is also performed by an independent clinical expert (psychiatrist). The expert considered for this evaluation had an ample experience in the use of pencil-and-paper tools for mental disorders assessment. The psychiatrist was largely experienced in the use of the MMSE for regular screening of patients. The expert was familiarised with the use of smartphones and was

Methods
The System Usability Scale (SUS) (Brooke 1996;Lewis and Sauro 2009) was used for the evaluation. The SUS is a ten-item questionnaire used to evaluate the usability of a system. The answers rank on a five-point scale from "strongly disagree" to "strongly agree". The SUS is easy to administer, performs reliably on small sample sizes, and can effectively differentiate between usable and unusable systems. The SUS questions are as follows: (Q1) I think that I would like to use this system frequently.
(Q2) I found the system unnecessarily complex. (Q3) I thought the system was easy to use. (Q4) I think that I would need the support of a technical person to be able to use this system. (Q5) I found the various functions in this system were well integrated. (Q6) I thought there was too much inconsistency in this system. (Q7) I would imagine that most people would learn to use this system very quickly. (Q8) I found the system very cumbersome to use. (Q9) I felt very confident using the system. (Q10) I needed to learn a lot of things before I could get going with this system.
The SUS results were evaluated using the formula described in Fig. 6, whereby n signifies the number of the question, and Q n signifies the score for the corresponding question n. The evaluation conducted in this study also aimed at gaining some understanding of the users' willingness to answer questions from the tool in a more or less frequent basis. Thus, a few questions were asked in addition to the ones established by the SUS. These additional questions were asked at the conclusion of the SUS questionnaire and in no way influenced earlier answers thereof. The test results of the questionnaire were evaluated using the statistical software SPSS version 24. The data was tested for normality through the Shaphiro-Wilk test (1965). For the expert-based evaluation of the tool, a semi-structured interview was conducted upon realisation of the test. Both general and specific questions were asked regarding the resemblance of the digital version compared to the penciland-paper version. The expert provided their opinion for each task on a one-by-one basis.

Results
The responses given to the SUS questions are marginalised over the group of older adults (Fig. 7), (young) adults (Fig. 8) and all individuals (Fig. 9), respectively. Around 70% of the participants would use the tool frequently (Q1), irrespective of age. As low as a 20% of older adults and a 23% of (young) adults deem the tool unnecessarily complex (Q2). As a matter of fact, more than 60% of the senior participants strongly consider the tool easy to use (Q3), which is further confirmed by an 85% of (young) adults. While only one-third of (young) adults find necessary the support of a technical person to use the tool (Q4), around half of the senior participants would use such support. There is a large positive consensus among both (young) adults and older adults with respect to a fine integration of the functionalities of the tool (Q5). Circa 70% of older adults and 85% of (young) adults find the tool consistent (Q6). More than 60% of older adults find the tool use learning process very quick (Q7), a perception that is shared with approximately 70% of the (young) adults. Almost none of the senior participants find the tool cumbersome to use (Q8), whereas half of the (young) adults find it so. Around 45 and 45% of the older adults felt very and somewhat confident respectively while using the system (Q9), while only a 10% of the (young) adults reported feeling unconfident. Finally, a minority of 25% of participants consider necessary to learn a lot of things before being able to use the tool (Q10).
Overall, the mean ± standard deviation scores are 73.08 ± 18.09 for the older adults group (Fig. 10), 69.42 ± 16.93 for the (young) adults (Fig. 11), and 71.25 ± 17.27 for the all-ages group (Fig. 12). Hence, all mean values are above 68, which is defined as the average (Brooke 1996). For the participants aged > 65 the Shapiro-Wilk test for normality reported a significance of 0.455 and for the participants aged 65 + and above a significance of 0.065. Therefore for both groups at a level of = 0.05 , a normal distribution of the SUS test scores can be assumed. The Shapiro-Wilk test for the all-ages test scores reported a significance of 0.363, therefore at a significance level of = 0.05 ( < 0.363 ) the null hypothesis is accepted and normal distribution of the SUS scores can be also assumed.
In addition to the SUS, participants were asked about their smartphone use, their willingness to answer questions from the tool and their perception towards the number of questions made during a full test. The results are shown in Figs. 13, 14, and 15, for the age group 65 +, < 65 and allages, respectively.
Around 40% of the older adults reported using their smartphones every day, which roughly equals the number of seniors who rarely use or simply do not own a smartphone (Fig. 13a). The use of smartphones on an everyday basis is rather popular among (young) adults, with more than 60% of daily users and 30% of occasional users (Fig. 14a). No statistically significant relationship could be found between The frequency willingness to answer questions from the tool varies in between age groups. Very few older adults (Fig. 13b) would answer multiple questionnaires a day, a quarter would do so once a day, and almost half of them would do the questionnaire every month. Less than 24% would not use the app at all. Relatively similar results are obtained for (young) adults (Fig. 13b), where nearly a quarter of participants would answer a questionnaire per day, and almost 40% would take one every week, with a bit more than 15% not willing to use the tool.
Users experienced the number of questions asked in the digitised version of the MMSE (Figs. 13c, 14c) to be right, with a great majority -more than two thirds-fairly satisfied with the number of questions, irrespective of age. Some users even stated "off-the-record" that they were expecting to encounter a much longer questionnaire. From the semi-structured interview with the clinical expert, we collected a number of observations that are outlined next. According to the expert's view, orientation tasks such as questions on the participant's current location or date "do not make a big difference with respect to the paper version, and do not pose in principle additional cognitive load to the user." Similar observations are deemed for the registration and recall tasks where a set of objects are listed and then recalled by the participant. Attention and calculation tasks are considered "quite straightforward". The expert shared some concerns regarding the feasibility of the language tasks: "I have some concerns regarding the effect that the keyboard autocorrect function can play in the naming of objects. This function appears to suggest different options that can influence the decision of the patient." Conversely, for the language tasks involving commands where patients normally interact with a physical object (e.g. folding a piece of paper and placing it on the floor), the expert considered the digital approach "a fairly simple and effective way to assess if the patient correctly interprets the instruction given to her." On a more general note, the text-to-speech functionality was seen as a good feature to facilitate the comprehension of the action to be carried out. "Sometimes I have to read the questions for my patients […], thus having them on audio seems to me very relevant. I cannot judge though how patients will feel about a device telling them to do things." Similarly, the speech-to-text capability was highly valued by the expert: "I was somewhat intrigued with how the tool could register the voicing of a given sentence. The speech recognition functionality works surprisingly good". The expert also showed some uncertainty in the use of the tool outside the clinic. "I have the feeling that not being in presence of the medical expert could have a relevant impact on the outcomes of the test. This has not been explored before to the best of my knowledge and it seems like a good asset to this tool." All-in-all, the expert was "quite satisfied with the fluency [of the tool] in the execution of the tests and very interested in using it in the clinical practice."

Discussion
The usability evaluation showed, in general, a good level of satisfaction on most aspects. A majority of users reported MobileCogniTracker to be simple to use, easy to learn and coherent.
Age-wise, we did not find significant differences to report. Yet, to our surprise, (young) adults scored the tool below older adults rates. The histogram for the group of (young) adults (Fig. 11) actually shows a large difference in the scores for that group. From the test, we got the impression that some people in that group were evaluating the app based on if they would be interested in using it or not. Since they were all cognitively fit, and did not believe that they would have a cognitive impairment in the near future, this group seemed less interested in the app as compared to the older adults group. Conversely, in the older adults group, many reported to like the app because they believed it could bring great benefits for them. We also had the impression that some people in this group were more likely to compare it to a face-to-face examination of cognitive impairment, whereas the (young) adults group was rather inclined to compare the app to their experience with other apps.
A few participants would use the help of a technical person to initiate the tests. This has mainly to do with the fact that some people were either not familiarised with Android phones (i.e., iPhone users) or newer versions of the operating system. In those few cases, users were previously assisted as to be able to realise the test.
Participants also found the different experience sampling approaches to be well integrated and straightforward to use. However, some users experienced some difficulties with drawing and copying tasks. These tasks have been tested using the finger as input, whereas in the original clinical questionnaires they are performed using a pencil. The results from the evaluation test showed that it is sometimes difficult, even for non-impaired subjects, to achieve precise results. Therefore, a stylus should be used as the preferred input method for this task, whether available. On a similar note, the use of tablets or phablets could help facilitating the realisation of the drawing tasks. Although users did not experience difficulties while using the provided smartphones, which are of a generous yet standard size, they anticipated potential difficulties while using MobileCogniTracker on smaller devices. These findings are fairly in line with the requirements and design principles considered in this work.
Participants perceived the amount of questions to be fair enough. MobileCogniTracker is intended for long-term monitoring, and as such, the frequency with which questions and tasks are administered plays an important role in the acceptance and engagement with the tool. However, such frequency fairly depends on how rapidly the cognitive ability may vary as well as the prominence of the task. MobileCogniTracker facilitates the scheduling of questions and tasks, which can be planned separately and spread over the course of a day, week and/or month. User preferences could be combined with requirements posed by the clinical tests as to maximise the efficacy of the test. Hence, some questionnaires or tasks could be triggered once the user is available or the test be shortened so that users can answer in a minimally-interruptive manner. During the evaluation, some participants mentioned that they perceived the questions of the MMSE as "too easy" and expected more challenging questions to measure cognitive performance. Users may thus require being challenged according to their specific age and cognitive state, so personalised testing may result in a more enjoyable and engaging experience. MobileCogniTracker was not aimed at replacing existing clinically-validated cognitive procedures but to enable them digitally in order to facilitate their continuous, opportunistic and ubiquitous administration. This, however, opens up an interesting research area building at the intersection of cognitive assessment, personalisation and context-awareness, in which tests and contents are not simply digitalised but also tailored to preferences or life-events relevant to each individual.
Participants suggested that this kind of tool could be used to improve their abilities through regular cognitive training. Although MobileCogniTracker was originally devised as a tool to observe and measure cognitive functioning, it could be also used to administer specific tasks intended to exercise some cognitive abilities. Thus, for example, users could be asked to perform more complex memory or attention training tasks, which could at the same time be logged for tracking the user's cognitive functioning. Gamification techniques could also help in this regard to increase the appeal of the tasks, thus motivating user's participation and engagement.
The feedback obtained from the clinical expert is also of much relevance to comprehend the advantages and limitations of the current version of MobileCogniTracker. While it seems clear that question-typing tests do not present important differences with respect to the pencil-and-paper version, some built-in functions of smartphones could influence the answers of the users. The autocomplete function is used in most phones to accelerate and facilitate the typing process, which is not really tested in questionnaires like MMSE. However, the fact that words are suggested can certainly bias the response of the user. An approach to deal with this issue would simply consist in disabling such functionality for MobileCogniTracker, as it was set here for the experiments with subjects. The expert also raised another very important point that concerns the persuasiveness of the tool when asking users to execute the tasks. Along this line, the role of context and not being in the presence of the clinical expert is highlighted. This aspect has not been studied in this work but it is something definitely worth studying. Mobi-leCogniTracker can for sure be used at the clinic, as a sort of replacement for the pencil-and-paper version of cognitive tests; however, this tool is envisaged as particularly powerful when brought to the out-of-clinic settings. Our hypothesis is that users will feel more relaxed and calm while performing this type of tests at home. This could thus in turn lead to more reliable measurements of the patient daily cognitive functioning. On the other hand, patients can also get distracted and it may be more difficult to detect misuse of the tool. In that sense, as for any other remote monitoring and/or treatment, the cooperation of the user is particularly valued.
The results of the usability and validity study show that the tool is indeed usable by healthy subjects, even if they are not very familiar with the use of smartphones. The evaluation thus shows its potential use for tracking at least very early cognitive impairment, especially when starting with non-impaired subjects. It is unclear though how much cognitive impairment will influence the perceived usability of the application, and how much the application is capable to indicate the performance differences between cognitively impaired and non-impaired subjects. Future work should perform a more thorough evaluation considering screened cognitive impaired and non-impaired users, also comparing MobileCogniTracker to the paper-and-pencil version of cognitive assessment tests such as the MMSE. In this regard, the effect of external daily stimuli, the difficulty in translation from in-person to smartphone-based assessments, and the difficulty for patients to use the technology should be fairly explored.

Conclusions
This paper describes MobileCogniTracker, a mobile experience sampling tool that allows for the creation, administration and remote execution of digitised cognitive assessment tests. The tool provides multiple means to realise, on a user's regular mobile device, typical questions and tasks used in clinical practice to assess cognitive functioning. Several input types are supported for the realisation of the tests, including plain text, text-to-speech, speech-to-text and free drawing. As for standard experience sampling methods, MobileCogniTracker allows the specialist to schedule the time when a test should be administered. The specialist can also configure whether the test sections should be executed consecutively or at different times.
MobileCogniTracker builds on top of an existing mobile instrumentation framework in order to facilitate the tracking of not only cognitive experience sampling but other types of passive sensor data. This combination is seen to facilitate the realisation of future research studying the interplay of cognitive and physical, social and emotional behaviours at both individual and population levels. In that view, MobileCogni-Tracker could help to support not only the timely assessment of cognitive impairment but also lay the ground for future semi-obtrusive detection of cognitive decline, as well as the recognition of accelerated decline, which could be linked to complex mental disorders such as dementia or the like.
A preliminary usability evaluation has been performed in order to determine how users perceive the proposed tool. To that end, a digital implementation of the popular MMSE clinical test has been particularly considered. Results show that users are generally satisfied with the tool, which they find simple and easy to use. Tasks involving drawing on the screen can nevertheless be enhanced by using more accurate means than the fingertip plus adaptive user interfaces (Hussain et al. 2018). The performed evaluation is limited to healthy people with no recognised cognitive disorder. In addition to that, an expert-based validation of the tool has been performed in contrast to the pencil-and-paper version. According to the expert's opinion, the tool fairly compares to the pencil-and-paper version, although the drawing tasks are more difficult to judge than when performed on paper. Additional studies on the impact of technology-literacy and contextual factors in the realisation of the test should be also performed. Future work thus includes a longitudinal validation with cognitively impaired and non-impaired people as to ascertain the extent to which the tool is usable, especially in those cases with severe cognitive disorders. Our longterm research will also aim to integrate the proposed tool into innovative e-coaching solutions (Banos and Nugent 2018;op den Akker et al. 2018) as to facilitate not only the autonomous tracking but also the intervention of cognitive impairments.