1 Introduction

Dementia is a cluster of diseases that are characterized by a progressive deterioration in cognitive function and the capacity to independently conduct the Activities of Daily Living (ADL) (WHO 2021). Dementia currently affects more than 55 million people worldwide and is a leading cause of dependency among older people (WHO 2021).

Maintaining independence in the lavatory is considered one of the most important tasks by healthcare personnel (Hauber et al 2014). Traditionally, caregivers help individuals with dementia in the lavatory when they require assistance as they may feel disorientated or unsure how to proceed. However, this process has been found to be intrusive and stressful both for the affected person and the caregiver (Drennan et al 2011). To address this problem, we introduce ToiletHelp, the first assistive technology to support people with dementia to use the lavatory independently. ToiletHelp employs a depth sensor to provide automated guidance through verbal prompts and visual aids from depth map processing.

Vision-based technologies rely on RGB cameras to capture and interpret complex contextual information to provide personalised assistance based on individual actions. However, RGB approaches raise significant privacy concerns, particularly in intimate environments such as the bathroom and may hinder the widespread adoption of assistive technologies in such environments (Maidhof et al 2023a). To address these issues, our method uses only a depth sensor. This approach minimises the risk of capturing identifiable features, including facial characteristics, skin texture, subtle expressions, tattoos and other unique characteristics (Mucha and Kampel 2022a). This design choice, combined with edge computing, which allows all image processing to take place on the local device, ensures that the user’s privacy is protected at all times.

Effective communication with the user is a crucial aspect of any assistive system, and it becomes even more critical when designing interactions for people with dementia, who may experience various impairments in their communication abilities. In ToiletHelp, user-system interaction is designed using a human-centered approach, involving individuals with dementia and their caregivers in an iterative design process. To validate the interaction design, we present the results of a user-centered evaluation conducted with 30 older adults and 14 care staff members.

Our main contribution lies in the development of a first-of-its-kind assistive system tailored for individuals with dementia to aid in personal care tasks such as using the lavatory. This system has proven effective in real-world environments, increasing users’ confidence and independence while preserving their privacy. These results validate the effectiveness of our human-centred design and demonstrate the potential benefits of ToiletHelp. By sharing our methods and analysis, our secondary contribution is to provide guidance to future researchers working on assistive technologies for people with dementia.

2 Related work

Privacy-aware automated assistance. Despite technological progress benefiting older adults’ independence, assistance with personal care activities remains underdeveloped, primarily due to privacy concerns in bathroom settings (Zhang et al 2021; Camp et al 2021). Existing vision-based technologies on bathroom assistance focus on detecting the presence of an individual in the room, rather than monitoring specific activities taking place (Camp et al 2021), which limits their usefulness in assisting with tasks in the lavatory.

WiFi sensing can preserve privacy while being used for presence detection or activity recognition. However, domain generalization remains an unsolved problem as variations in the environment, sensing hardware, its positioning, and the physiology of monitored individuals can severely degrade performance (Chen et al 2023). No effective solution currently exists beyond collecting extensive data across various domains (Chen et al 2023), which is impractical for our application, as the model needs to generalize well in unseen bathroom settings without retraining. Despite current limitations, the privacy-preserving capabilities of WiFi sensing make it a promising candidate for future research in assistive technologies for intimate contexts.

Wearables, alone (Zhang et al 2020) or in combination with ambient sensors (Chernbumroong et al 2013), demonstrate potential in detecting bathroom activities, but they are not appropriate for people with dementia as they require the user to remember to carry and charge the device (Lyons et al 2015). In addition, the use of wearables means that each user must carry one device, which is not practical for semi-public washrooms with multiple users. Installing a system accessible to all users is a more viable option, especially for locations such as day centers, like the ones studied in this paper. ToiletHelp ensures user privacy by using a depth sensor and edge computing, and requires no physical interaction with the user or wearing any device, so that all users can benefit from it.

Depth-based Human Action Recognition (HAR). HAR methods can be based on end-to-end learning or on modules that combine deep learning with domain-specific knowledge. End-to-end depth-based approaches include 2D CNNs with dynamic images (Wang et al 2018), LSTMs (Sanchez-Caballero et al 2020) and 3D CNNs (Sanchez-Caballero et al 2022). Point cloud-based HAR methods are also proposed to handle depth maps by mitigating perspective distortion (Fan et al 2021). While promising, end-to-end learning-based approaches rely on large training datasets, a condition that is not compatible with our application where data collection and annotation are highly intrusive and should be minimised for ethical reasons.

Skeleton-based HAR minimizes domain-specific data needs by dividing the problem into depth-based Human Pose Estimation (HPE) and subsequent HAR from skeletons. However, viewpoint generalisation is still limited for depth-based HPE methods (Garau et al 2021) and the absence of large datasets hinders off-the-self approaches. Fine-tuning HPE models demands accurate 3D joint labels, which often require intrusive MoCap markers. Finally, there are no datasets for skeletons or depth maps involving lavatory actions; therefore, dataset creation is still required.

To avoid extensive data collection, we adopt a modular framework for HAR, enabling the use of synthetic data for training and leveraging previously collected dataset (Pramerdorfer et al 2020) for the person tracking and posture classification modules. Our rule-based engine module enhances robustness across varying viewpoints and scenarios without the need for retraining or massive data collection to ensure generalization.

Assistive technologies for people with dementia. Pappadà et al (2021) conducted a systematic review on assistive technologies that are specifically tailored for individuals with dementia and provide general design guidelines that emphasize the importance of incorporating user-centered design. Participatory design is also recommended by Orpwood et al (2005) and Suijkerbuijk et al (2019). However, there is limited information available on effective research methods and tools to actively involve individuals with dementia throughout the development process due to the complex nature of the syndrome, a scarcity of practical research materials and methods, and the necessity for multidisciplinary collaboration (Suijkerbuijk et al 2019). The participation of caregivers is reported to be crucial in the early stages of device development. Being familiar with the needs of individuals with dementia and having experience in communication with them, caregivers provide valuable insight into specific challenges and effectively evaluate potential solutions (Orpwood et al 2005).

The design of assistive technologies for individuals with dementia ought to prioritize the promotion of their autonomy, by providing prompts and reminders, rather than making decisions for them. Previous research demonstrates the effectiveness of prompting in assistive technologies for people with dementia (Pappadà et al 2021). For instance, Mihailidis et al (2007) evaluate a system that provides verbal prompts during hand-washing for individuals with severe dementia. König et al (2016) qualitatively evaluates an intelligent virtual assistant to guide Alzheimer’s patients during hand-washing. In general, multiple studies (Fried-Oken et al 2015; König et al 2016) emphasise the importance of tailoring these technologies to the individual to ensure the effectiveness of prompts for people with dementia.

Assistance in the lavatory for individuals with dementia has only been addressed in a few studies (Lumetzberger et al 2021; Ballester and Kampel 2022). To our knowledge, only Ballester and Kampel (2022) present quantitative results, although limited to a controlled laboratory setting with healthy participants. Leaving aside providing assistance and focusing solely on bathroom action recognition, wearable-based approaches report a quantitative evaluation involving older adults (Zhang et al 2020), but they do not mention whether the participants have any kind of cognitive impairment, so it remains unclear whether such methods could be effective for this population. Therefore, there is an important gap in privacy-sensitive technology proven in real-life conditions for assistance in the lavatory for people with dementia. ToiletHelp addresses this need by placing a high priority on privacy concerns and adopting a human-centered approach that involves people with dementia and caregivers in the development process. We demonstrate the successful operation of ToiletHelp in three real-life environments - one hospital and two day centers.

3 ToiletHelp

ToiletHelp analyses the scene using depth maps to identify the user’s need for help. Based on the user’s current action, the system infers the next action and waits for the user to perform it. If the user fails, a video and verbal messages are triggered to guide the user through a safe lavatory process. The edge device, with the built-in depth sensor, is installed on the wall, near the ceiling, whereas the interaction device is placed within the user’s view.

Fig. 1
figure 1

ToiletHelp Schema. The depth sensor captures the maps, processed with user interface settings to generate scene metrics. The action recognition module interprets these metrics to identify the current action, which is then processed by the interaction module to trigger audio and video prompts when needed

3.1 System components

Figure 1 shows the five components of ToiletHelp:

  1. 1.

    Depth Sensor. ToiletHelp leverages a depth sensor to protect privacy while capturing essential 3D information. Depth sensors produce depth maps, which, unlike traditional RGB cameras, do not collect primary identification data such as facial features (Mucha and Kampel 2022a, b). While secondary identification features may still be present, they are insufficient for reliable automated identification (Haque et al 2016; Karianakis et al 2018). However, it is important to note that identification is still possible in some cases, particularly when conducted by individuals familiar with the user or when the user has highly distinctive physical characteristics, such as exceptional height. To mitigate this risk, it is essential to implement additional measures, such as on-edge processing, as used in ToiletHelp. ToiletHelp uses the Orbec Astra depth sensor, which captures depth maps up to 7 ms with an accuracy of ±3 mm at one meter. The sensor has a 60° horizontal and 49.5° vertical field of view and records up to 30 fps, making it well-suited for small room applications.

  2. 2.

    Remote User Interface. The edge device connects via Wi-Fi to a remote platform for care staff to customize sensor settings, including the placement of the toilet bowl, \(P_{toilet}\), and basin, \(P_{basin}\), as well as the thresholds \(D_{toilet}\) and \(D_{basin}\) for action recognition. Configuration involves a simple drag-and-drop operation during installation and with data stored on the device. To determine the correct \(D_{toilet}\) and \(D_{basin}\) thresholds, two cylinders with respective radii \(D_{toilet}\) and \(D_{basin}\) are plotted over the depth map for visual inspection (see Fig. 2). The thresholds must be set so that the cylinders fully cover the object but do not overlap.

  3. 3.

    The Scene Recognition Module uses 6 steps to generate scene metrics, which are then sent via MQTT for subsequent action recognition:

    1. (a)

      Motion detection: utilizing a background subtraction algorithm (Pramerdorfer et al 2016), depth maps are processed by comparing frames to a background model representing static scene components.

    2. (b)

      Depth maps with motion are transformed into virtual top-views to ensure sensor position and orientation invariance. This mapping aligns depth map pixels to a world coordinate system with the xy-plane coinciding with the floor plane. Extrinsic parameters are determined during system startup using a RANSAC-based plane fitting algorithm.

    3. (c)

      Two top views are generated: a height map encoding maximum object height per pixel, and an occupancy map representing object density by counting occluded 3D points along each pixel’s z-axis. The resulting images share the xy-plane with the generated image plane.

    4. (d)

      Object segmentation is performed through connected component analysis, treating the resulting top views as single channels. To ensure consistency in view size for batch processing, we use the centroid location based on occupancy data to centre each view, followed by extracting a centre-crop of size 40x40, padding with zeros when necessary.

    5. (e)

      Person detection and posture classification are achieved through an adaptation of ShuffleNet v2 (Ma et al 2018). The modifications include the elimination of max-pooling layers to maintain spatial information and a reduction in feature map generation within convolutional layers. A two-step approach is employed for training: we train in a synthetic dataset and finetune the model with real data. The synthetic dataset is generated by placing 3D human models (obtained via motion-capturing) with randomized properties and realistic body postures in an empty room with random objects where a virtual camera is positioned to render a depth and a label map (a segmentation mask containing the labels of the objects included in the depth map). To incorporate noise and sensor errors of the Orbec Astra depth sensor into the simulated depth maps, we adapt the approach by Nguyen et al (2012), which models the lateral and axial noise distributions. The pre-trained model is fine-tuned on a dataset of approximately 21,000 samples collected in a care facility as part of prior research on fall detection. Further details are available in Pramerdorfer et al (2020).

    6. (f)

      Objects classified as “person” are assigned tracks based on the position, velocity, and geometry of all tracked objects. The resulting segments and class confidences are combined with the input from the user interface to provide the scene metrics in the form of a message including a track identifier (id), distance from toilet bowl (\(d_{toilet}\)), distance from basin (\(d_{basin}\)), and a flag indicating whether the person is sitting on the toilet bowl (\(on\_{toilet}\)).To compute the distances \(d_{toilet}\) and \(d_{basin}\), two prior steps are necessary. First, the person’s centre from the computed bounding box, \(CM\), is calculated and then projected to the floor plane, \(CM'\). Second, the toilet bowl and washbasin locations set by the user interface (\(P_{toilet}\), \(P_{basin}\)) are transformed into world coordinates and projected to the floor plane as well (\(P'_{toilet}\), \(P'_{basin}\)). This ensures that distances between the person, toilet bowl and washbasin are not distorted by their size or height.

    7. (g)

      Action Detection and Recognition. This module combines scene metrics and action history to predict the user’s next action based on a predefined model (see Fig. 3). The prediction is made through a rule-based engine, with specific conditions associated with each action, as shown in Fig. 3. The user’s status is determined by comparing their distance metrics with predefined thresholds, \(D_{toilet}\) and \(D_{basin}\). For example, for the action to be set as “Approaching the toilet bowl and undressing”, the conditions to be met are: distance from the toilet, \(d_{toilet}\), and from the basin \(d_{basin}\) are below predefined thresholds, \(D_{toilet}\) and \(D_{basin}\), respectively; not sitting on the toilet (\(on\_toilet=0\)); and previous action “Entering the room”. Time thresholds, labeled as \(T_{on\_toilet}\) and \(T_{washhands}\), are employed to ensure robust action recognition and denote the minimum required time in seconds that a user must spend at the location for the action to be detected.

    8. (h)

      Interaction Module. The Interaction Module guides the user by determining the next action based on detected user actions, following a predefined model as illustrated in Fig. 3. This model consists of seven sequential steps, informed by input from dementia experts and experienced caregivers, each of which must be completed within a specified time frame. Any deviation from the expected sequence triggers the interaction.

Fig. 2
figure 2

ToiletHelp Spatial Diagram.\(CM'\), \(P'_{toilet}\) and \(P'_{basin}\) are the projections on the floor of centre of mass of the user, (\(CM\)), and the positions of the toilet bowl and the washbasin, (\(P_{toilet}\), \(P'_{basin}\)). Distances \(d_{toilet}\) and \(d_{basin}\) calculated from \(CM'\) to \(P'_{toilet}\) and \(P'_{basin}\), respectively, are compared to thresholds \(D_{toilet}\) and \(D_{basin}\) to infer the position of the user

Fig. 3
figure 3

Flowchart of the lavatory procedure. For each action, the conditions to be met for action recognition are represented, along with a depth map showing a user performing the action

Fig. 4
figure 4

Animations for visual support for the different messages: a) Sitting on the toilet bowl (reverse sequence for standing up) b) Going to the washbasin c) Washing hands d) Leaving the room e) Acknowledgement message f) Reassuring message in case of emergency

3.2 User-Centered interaction

User interaction is crucial for effective assistive technology, especially when users are people with dementia with communication impairments (Banovic et al 2018). To cater to their needs, we adopt a human-centered approach, involving caregivers and dementia patients throughout an iterative development process. The design team consists of a multidisciplinary group, including experts in communication and dementia, to ensure that all aspects of the user experience are considered and integrated into the system.

3.2.1 End-user and caregiver involvement

In the absence of clear design guidelines for users with cognitive impairment, we develop design criteria based on the results from focus groups with healthcare professionals. Two sessions are held in a specialised dementia facility in Zaragoza, Spain. Group 1 consists of 4 healthcare professionals (a psychologist, a nurse, an occupational therapist and a physiotherapist), while Group 2 consists of 9 professional caregivers (nursing assistants). Section 3.2.2 presents the essential findings shaping ToiletHelp’s design, with detailed methodology and results in Ballester et al (2022). Next, functional tests are conducted in a laboratory setting with cognitively healthy individuals (reported in Ballester and Kampel (2022)) to validate the system and identify design flaws before testing with individuals with dementia, as per ethical considerations (Orpwood et al 2005). Throughout the process, continuous feedback is obtained from dementia experts to ensure that design choices are appropriate. Finally, a user-centered evaluation with caregivers and older adults, with and without dementia, validates the interaction design.

3.2.2 Interaction design

The interaction module of ToiletHelp is designed based on inputs from focus groups, existing literature, and specific advice from dementia experts.

One of the primary findings from the focus groups is the recommendation to initiate interaction only when assistance is needed to prevent user distress. ToiletHelp follows this recommendation by interacting only when the user’s behaviour deviates from the expected sequence. In addition, successful actions trigger positive feedback, as recommended by Orpwood et al (2004). If the instruction is not executed, the system repeats it a configurable number of times before alerting a caregiver, accompanied by a reassuring message to the user.

Another relevant result from the focus group sessions is the general preference for the combination of video and verbal messages, as opposed to these modalities used separately, which is consistent with previous research (Fried-Oken et al 2015). Additionally, both groups discourage using sound to draw attention to the wash basin or toilet bowl, as it has the potential to frighten users. Similarly, the usage of text alongside these modalities is not recommended as it is believed to be a distraction rather than a supportive feature. With regard to the design of the visual prompts, there is no consensus on the best option. As a result, simple animations are selected based on the advice of dementia experts, who recommend minimising the detail of the animations to reduce the risk of cognitive overload. To operationalize this recommendation, simple animations (shown in Fig. 4) are displayed on a screen. This, in addition to improving the clarity of the message, helps users identify the audio source (König et al 2016).

Participants in both groups stressed the importance of clear and concise verbal messages delivered by a familiar voice, aligning with Orpwood et al (2005). Participants recommend using the voice of a caregiver to leverage their familiarity with the user and their expert communication skills. Opinions on prompt imperiousness varied: Group 1 suggested imperative instructions for clarity, while Group 2 preferred a kinder approach. ToiletHelp’s voice messages are brief, straightforward, and polite, following communication guidelines for health professionalsFootnote 1 and recommendations from König et al (2016). Two levels of imperiousness are implemented: the first prompt is courteous and suggestive, followed by more imperative tones in subsequent prompts.

Finally, previous research (Fried-Oken et al 2015; Suijkerbuijk et al 2019) warns against treating individuals with dementia as a homogeneous group, as this approach results in ineffective one-size-fits-all solutions. ToiletHelp allows for personalization by offering recorded verbal messages in formal and informal tones, adjustable repetitions before caregiver alerts, variable instruction intervals, and customisable thresholds to suit individual user rhythms.

4 Evaluation of the interaction

In a thorough quantitative assessment involving people with dementia, ToiletHelp achieves an 80% accuracy rate in recognizing user actions (Ballester et al 2024). However, the system’s overall effectiveness relies on its capacity to communicate with users. This aspect is specifically addressed by conducting tests to validate the interaction design, evaluate the user experience and collect feedback from end-users and care staff.

4.1 Participants

The participants are 30 older adults, as potential primary end-users, and 14 care staff members, as experts in communication with people with dementia. The tests are carried out in two day centers, one in St. Gallen, Switzerland (DC1), and the other in Coimbra, Portugal (DC2).

4.1.1 End-users

Participants are regular visitors of the day center and are able to provide written informed consent by themselves. Exclusion criteria are severe mobility impairments or dependence on a third person to dress and undress. In DC1, demographic and medical data (age, gender and diagnosis) are gathered from medical records. The dementia severity is estimated using the Mini-Mental State Examination (MMSE) for 6 participants and the Montreal Cognitive Assessment (MoCA) Test for Dementia for 11 participants. In DC2, medical records are unavailable and participants do not consent to share information about their cognitive status. Participants are asked to provide, along with signed informed consent, demographic data, including age, gender, affinity for new technologies and day center attendance duration. Since these individuals’ cognitive status is unknown, they are asked to envisage a scenario in which they need assistance in the lavatory.

4.1.2 Care staff

All care staff are invited to test ToiletHelp and provide feedback through a questionnaire, after signing informed consent. The questionnaire gathers their system perceptions and sociodemographic details, including age range, gender, work experience, and new technologies affinity.

4.2 Procedure

During a 3-day test period, a stand-alone system is installed in a shared bathroom, with participants completing a questionnaire after each use.

4.2.1 End-users

Whenever a participant needs to use the lavatory, a member of the research team reminds them about ToiletHelp and the questionnaire. This cycle is repeated for each use of the lavatory, with recorded frequency per participant for tracking longitudinal responses.

The default configuration of the system minimises user interactions, engaging only when necessary, such as when the user encounters difficulties. However, in these experiments, we intentionally reduce time thresholds to prompt interactions more frequently. This may lead to user irritation, but since the goal is to collect extensive data for improved statistical accuracy and this is the first test with real users, we adopt this worst-case scenario to gather maximum information.

Quantitative surveys are suitable for measuring participants’ experiences consistently and for comparing their responses after each test run. We tailor the questions to meet the participants’ needs and constraints and keep the questionnaire short and concise to avoid cognitive overloading. For this reason, we employ only one item to measure the constructs of interest, including felt safety, felt independence, felt concern, felt annoyance, acceptance, and comprehensiveness. These items are derived and modified from prior user studies in the domain of Active Assisted Living (AAL) to guarantee consistency and reliability (Offermann et al 2023; Maidhof et al 2023b). The formulations and meanings of the items are discussed among the research team prior to the assessment, to ensure the validity of the questionnaire. Finally, the care staff evaluate the questionnaires for suitability before distribution to the participants.

Two questionnaires assess older adults’ perceptions of safety, independence, comfort, fear, annoyance, and desire for ToiletHelp in their private bathroom. In DC2, the questionnaire follows general design guidelines (Krosnick 2018), using clear language, closed-ended questions, and a 1-10 Likert scale format, with 1=“strongly disagree” and 10=“strongly agree”. DC1, consisting of individuals with dementia, uses a different questionnaire with specific guidelines,Footnote 2 employing simple language, emoticons, and brevity. All design choices are dementia expert-verified. This questionnaire includes a question about the understanding of the instructions provided by the system. A direct comparison of the results between the two centers is not possible due to the use of different questionnaires and also to the fact that the cognitive state of the participants in DC2 is unknown. However, the evaluation performed at both centers allows for a more diverse and representative sample and the opportunity to observe the performance of the system in different settings.

4.2.2 Care staff

Caregivers are invited to evaluate ToiletHelp by providing feedback via a questionnaire. The system is set to operate in “demo” mode, where the thresholds for initiating interaction are reduced to initiate the interaction immediately without giving the participant time to do the action independently. The questionnaire for the care staff from both sites is identical and contains 7 questions on a 1-10 scale (1=“Not at all”, 10=“Very much”) to assess different aspects of the system’s performance and usability.

4.3 Results

We outline the findings from the evaluation of the ToiletHelp interaction module and analyze its impact on older adults’ sense of safety, independence, comfort, overall experience, and its effect on reducing caregiver workload.

4.3.1 End-users in DC1

Participants are 17 older adults attending 1–2 days a week at the day center, of whom 11 have a medical diagnosis of mild-severity dementia, 5 of moderate-severity, and 1 of them has normal cognitive functioning. There are 11 women and 6 men, with an average age of 78.6 years (SD=7.2, max=86, min=60). Their mean technology affinity score is 1.5 on a 0–2 scale. The 17 participants test ToiletHelp at least once, with 7 testing it multiple times, totalling 28 runs.

Table 1 summarizes the responses, with the initial responses of each participant presented separately under “first test run”. In general, the results show a positive perception of the TH device. After the first test, 69% answer yes to both questions on security and independence (Q1 and Q2), while 13% report not feeling safer and 31% not feeling more independent. 94% of the participants indicate that they are not scared (Q3) when using the device and 88% do not find it annoying (Q4). Despite positive feedback, 81% of the participants do not want TH installed in their private bathroom (Q5). In terms of instruction understanding, 14 of 17 participants grasp all instructions in the first test run, 2 understand some, and only 1 participant, who declares to have a hearing impairment, does not comprehend any. Subsequently, 16 participants answer the rest of the questions.

Table 1 Results of end-user questionnaires in DC1, including responses after the first test run (17 participants) and all test runs (28 test runs)

Participants’ responses reveal varying perceptions between their initial use and subsequent attempts with ToiletHelp. Figure 5 provides a visual representation of these changes for participants who utilized the system on multiple occasions (N=7). In general, a trend emerges with continued usage: responses transition from negative or neutral to neutral or positive for Q1, Q2, Q5, and Q6. There is an additional negative response in Q3 between the first and the second attempt, and one more neutral response in Q4 between the first and the second attempt.

Fig. 5
figure 5

Evolution of the answers of DC1 end-users. Number of positive, neutral and negative responses (y-axis) to questions Q1-Q6 of the 7 participants who tested the system more than once, sorted by test run (x-axis)

4.3.2 End-users in DC2

The sample in DC2 includes N=13 older adults attending a day center on a daily basis (9 women and 4 men). Age distribution: 3 between 71-80, 6 between 81-90, 4 are 90+. Duration of attendance: 5 for < 6 months, 3 for 6 months to 5 years, and 5 for >5 years. Mean technology affinity score: 4.2 (SD=0.9) on a 1-5 scale (5=high affinity). Participants retain the legal capacity to consent to participate in the test but do not consent to share their cognitive status, so this information cannot be reported. All 13 participants tested ToiletHelp at least once, with five of them testing it multiple times, totalling 21 test runs.

Table 2 shows the mean and SD of the participants’ responses on a 1-10 scale (1=“I do not agree at all”, 10=“Totally agree”), grouped by the number of times the participants used the system. Participants have a positive view of ToiletHelp, with mean scores of 8 or higher for enhanced feelings of safety (Q1), independence (Q2), and comfort (Q3). They report minimal fear (Q4) or annoyance (Q5), with mean scores below 2.5. Responses are neutral regarding the preference of ToiletHelp over a caregiver (Q6) and generally, in contrast to end-users in DC1, participants express a desire to have ToiletHelp in their private bathroom (Q7) (mean=8.1). Similarly to DC1, when comparing different groups, those who use the system 2-4 times have more positive perceptions across all questions than only first-time responses.

Table 2 Results of the questionnaires for end-users in DC2. Mean and SD of the responses, grouped by the number of times the participants used the system

4.3.3 Care staff in DC1 and DC2

Fourteen care staff members (4 from DC1 and 10 from DC2) participate. All participants are female and their age and work experience are detailed in Table 3. Most of them (60%) are middle-aged (41–60 years) with substantial work experience (16+ years). Their technology affinity is measured on a 1–5 scale, with an average score of 4.1 (SD=0.9). When analyzed by center, the DC1 staff score an average of 3.3 (SD = 0.8), while the average DC2 staff score is 4.4 (SD=0.7).

Results in Table 4 reveal positive perceptions. Mean scores for questions 1-6, assessing device usefulness, workload reduction, and interaction modalities, all exceed 8 (1–10 scale) with low SD values indicating moderate to high agreement. DC2 employees generally rate ToiletHelp more positively and find ToiletHelp less intrusive (mean=4.1) than caregiver assistance, whereas DC1 staff see it as more intrusive (mean=7.7).

Table 3 Demographics of care staff participants. Participant counts by category, grouped by day center

5 Discussion

5.1 Interaction module

The evaluation results validate the design of the ToiletHelp. The interaction module is well received by older adults, who report increased feelings of safety and independence, and participants with dementia confirm that they were able to understand all instructions. Care staff report the usefulness of the system in reducing their workload while adequately supporting the target group and validated the modalities used for interaction.

End-users Participants report feeling safer and more independent when using the system and do not experience discomfort or negative emotions such as fear or annoyance, even if the system is configured to interact with the user without the need for it. Furthermore, most of the participants with dementia (82%) report having understood all the instructions given by the system. The results on end-users’ desire to have ToiletHelp in their private bathrooms are inconclusive. In DC2, most end-users express a desire for ToiletHelp in their private bathrooms, rating it 8.1 out of 10, while in DC1, 81% of end-users oppose the idea. This reluctance contrasts with the positive perceptions of ToiletHelp in DC1 (see Table 1) and should be further investigated to fully comprehend the reasons.

The results for both end-user groups reveal a tendency for the system to be perceived more positively as it is used more times. Although further research is needed to fully understand this trend, it may be attributed to participants becoming more accustomed to the device. Another possibility is that those who rate the system positively on their first use are more likely to use it again, potentially creating a self-selection. In any case, these findings suggest that conducting multiple test runs with the same number of participants may be necessary to achieve a steady-state response. If confirmed, this could be a relevant conclusion for future studies, advising to design test procedures with several repetitions to reach more representative results.

Care staff Caregivers also provide positive feedback, stating that the system is useful for older adults and can reduce caregiver workload. They evaluate positively the interaction modalities for people with dementia, validating the tone, vocabulary and videos developed. Differences are found between DC1 and DC2 in the perceived intrusiveness of the system compared to a traditional caregiver. This seems to be related to the differences in age between the respondents as well as to their level of affinity for new technologies, but not discarding other cultural factors not considered.

5.2 Modular approach

ToiletHelp’s modular design allows for the integration of off-the-shelf and pre-trained algorithms, providing flexibility in choosing different algorithms for object detection, segmentation, and action recognition. However, because the accuracy of these components directly impacts overall system performance, future research should examine their combined effects and determine minimum thresholds for robust operation. Additionally, as an assistive system that requires real-time performance, ToiletHelp updates action predictions every 0.1 s. Consequently, the inference times of the different components must be considered when making algorithmic design choices.

Unlike end-to-end learning-based methods, which require extensive data for generalization or domain-specific fine-tuning, our modular approach (and HAR rule-based engine) allows ToiletHelp to be deployed in different rooms without the need for fine-tuning. This eliminates the need for extensive and intrusive data collection, which would prevent real-world deployment. Moreover, the modular methodology is generalisable to other applications with severe data collection constraints by adapting rules to leverage domain-specific knowledge (e.g., for ADL monitoring). Finally, end-to-end HAR methods typically process short videos of fixed length and use different sampling strategies to process longer videos due to memory requirements. Our transition-based approach to HAR makes our method robust to actions with different inter- and intra-class lengths, an important feature for our use case where different subjects perform actions at different speeds (ranging from 5 s to a few of minutes).

Table 4 Results of the questionnaires for care staff. Mean and SD of the responses, total and grouped by day center

5.3 Limitations and future work

First, participation in the test requires informed consent, which may lead to a biased sample. Second, despite the efforts to adapt the questionnaire to meet the needs of people with dementia, the quality of the data may be suboptimal because 5 participants have moderate dementia (Kutschar et al 2019). However, this represents less than 17% of the sample, so even if data quality were compromised in these cases, it would not have a major impact on the results. In addition, future work should explore in more detail the suitability of ToiletHelp for different levels of dementia, controlling for this parameter. Third, although not a limitation, it is important to emphasize that ToiletHelp has been intentionally configured to facilitate extensive user interaction to collect a substantial amount of data. Nevertheless, should that be the case, it would be expected to result in a more conservative assessment yielding more negative outcomes. The present scores are the result of the worst-case scenario, as the test is conducted using the most intrusive version of ToiletHelp. Standard conditions are expected to yield even more positive results. Furthermore, this paper does not specifically address the long-term perceptions and the acceptance of ToiletHelp among older adults as a video-based technology. While further research is required in this area, the study by Offermann et al (2023) indicates that older adults with greater disability levels tend to be more receptive to this type of technology, including video-based systems. Finally, our work prioritised the involvement of formal caregivers, as ToiletHelp is intended for care homes or day centres. However, relatives and informal caregivers may provide complementary perspectives that should be investigated in future work.

6 Conclusions

This paper introduced ToiletHelp, the first system designed to assist people with dementia to use the lavatory independently through a modular, vision-based approach that ensures privacy. We recruited 60 participants from 3 facilities, 33 end-users and 27 healthcare workers, to participate in a human-centered design. The evaluations, with 33 end-users and 14 caregivers, confirmed the effectiveness of the ToiletHelp design. These results underscore ToiletHelp as real-world solution that meets specific needs, particularly in its interactive module design, specifically designed for people with dementia.