1 Introduction

Chatbots or conversational agents (CAs) are applications that interact with users via written or spoken natural language simulating a human-like conversation. They accept input as speech, text, or video; in addition, they may receive input from several different sensors. They process the input and provide relevant advice or feedback in the form of text, speech, or manipulation of a physical or virtual body [1].

Healthcare chatbots aim at eliminating hospital waiting times, making appointments, and providing user assistance such as consultations or even diagnosis and psychological support [2, 3]. In this way, these chatbots decrease the medical and organizational burden while cutting costs [4]. Although it is helpful to use chatbots in healthcare, they are complex to build, and poor design can lead to accuracy problems in the responses or even worse, in the diagnosis.

Other advantages of chatbots are their ability to gather information about the patients and their symptoms, which are then used to correctly respond to the user’s queries as accurately as possible and to collect user feedback on their delivered features which can be exploited to improve their user experience in the interaction.

Another crucial aspect of chatbots is their accessibility, i.e., being accessible, comprehensible, and easy to use by all users, regardless of one’s abilities. Unfortunately, for chatbots the aspect of inclusion is still neglected. There is a need for more active research on chatbots to address diverse user needs, since the latter can experience more barriers with chatbots vs webpages [5]. To support this goal, designers must take into account what type of input or output to accept, how the applications relate to the user (UX), in which language to develop the chatbots, and also whether they can be used by people with disabilities who may rely on assistive technologies (such as a screen reader) or require a tactile response, or even a video output capable of displaying sign language.

Since the first Chatbot, ELIZA, was created in the 1960s [1] technologies such as artificial intelligence (AI) and natural language processing (NLP) progressed over time, increasing chatbots’ “intelligence” (in terms of precision and accuracy) and spreading rapidly in all sectors, even reaching persons in their homes (home assistants such as Alexa or Google Home).

In this systematic review we focus on the eHealth area. We aim to analyze the evolution of chatbots applied in the medical field, exploring their current applications as well as present and future challenges, focusing especially on inclusiveness and how this is included in the design process.

This paper is divided into six sections. After this introduction, the research questions leading our study are shared, then the applied methodology is described in detail. Next, results are discussed, organized by different categories of the selected papers. Lastly, conclusions and future work close the paper.

2 Research questions

As mentioned in the Introduction, this study mainly aims to investigate the evolution of chatbot technology and inclusivity over time, in the eHealth sector. Thus we formulated the two following research questions that lead our analysis:

RQ1: What is the application of chatbots in the healthcare sector and how has it evolved over the last 5 years?

RQ2: How much attention is paid to accessibility when designing an application?

The first question investigates the progress and use of the chatbot in the medical field while the second one investigates whether and how accessibility is included in the their design process.

The paper is organized to introduce the methodology for searching and detecting the selected articles (applying filters), presenting a section devoted to discussing the results obtained, and conclusions with suggestions and directions for future work.

3 Method

This section introduces the methodology applied to perform our study, showing the research strategy, the applied criteria, and the algorithm leading the process of selection of the articles, going specifically into the different types of applications examined in the study.

3.1 Search strategy

An extensive search was carried out in June 2023 on four bibliographic sources: the databases ScienceDirect (www.sciencedirect.com), IEEE Xplore (ieeexplore.ieee.org), ACM Digital Library (dl.acm.org), and Google Scholar (scholar.google.com). Databases such as Pubmed were not considered in this study because we refer to papers focusing on computer science since the interaction and development of chatbots are an essential part of our investigation.

The search was carried out using these keywords: “chatbot”, “hospital”, “healthcare”, “accessibility” and “conversational agents”. The most common keyword combinations used in the research were: “chatbot hospital accessibility”, “chatbot healthcare”, “conversational agents accessibility”, and “chatbot accessibility”; but also using the operators AND and OR as follows:

“Chatbot” AND (“accessibility” OR “hospital” OR “healthcare”) AND “Conversational agents “.

The search yielded a total of 1944 articles retrieved from 4 databases (IEEE Xplore 64, ACM Digital Library 954, ScienceDirect 513, Google Scholar 413).

3.2 Study selection criteria

The first inclusion criterion consisted of terms that appeared in the title and the relevance to the keywords used. To avoid any misperception and effort to translate, the search focused only on articles published in English, while non-English publications were excluded. Concerning the timeline, a period of 5 years was chosen (between 2018 and 2023), an adequate period to observe the evolution of research and related publications in the field.

Lastly, only research articles were included in the candidate set, thus excluding review papers and book chapters or books [6].

3.3 Article selection

According to the methodology proposed in [7], the process of selecting the articles for this study was organized into three parts (as shown in Fig. 1): Identification, Screening, and Eligibility. In the identification phase, we used the databases to find articles that matched with the keywords and put them in a candidate set. Next, we proceeded with the screening phase in which by reading the paper title and abstract we filtered out the articles considered not relevant for our study. Finally, we entered the eligibility phase in which we analyzed the selected articles one by one and decided which of them to include in the review.

Fig. 1
figure 1

Flow chart representing the article selection process

After searching the four databases, a total of 1,944 articles were found, and after removing the duplicates 765 candidates remained. After the screening phase only 75 articles were selected, discarding a total of 690 as they were not in English, not full text (abstract, or full paper not accessible from university authentication credentials), not from authoritative sources, or related to other technologies. Ultimately, we performed the eligibility phase. After analyzing 75 articles, only 21 articles were selected, all including details on the chatbot’s implementation and the technologies used. This was a basic requirement for our study, which aimed to analyze complete chatbots and discard theory studies, in order to understand the technology’s evolution over time.

The selected articles were analyzed and organized by categories (As per Table 1) and can be found in the source section at the end of the review. A total of 29% of papers were related to Diagnostic Support, followed by Access to Healthcare services and Counseling or Therapy (19%). Another 9% were related to Self-monitoring and 14% to (user) data collections. Lastly, 10% of implemented chatbots were specific for COVID-19 support.

Table 1 Distribution of the selected papers by application type

Our study took into consideration four aspects of the analyzed papers: a brief description of the purpose of the applications, the target they refer to, the technology they use, and finally whether they are inclusive or less so.

3.4 Papers characterization

In the following, we describe in detail the papers considering the four selected parameters (Fig. 2).

Fig. 2
figure 2

A chart representing the types of applications analyzed in the study

3.4.1 Applications purposes and targets

Chatbots in healthcare can be developed for patients or their care providers depending on the application goals/objectives of the chatbot. Main support areas include Diagnostic support, Access to healthcare, Counselling or therapy, Self-monitoring, Data collection, and support on COVID-19.

3.4.1.1 Diagnostic support

These types of applications focus on diagnosing disease. To do this, they make use of different methodologies; some refer to the symptoms [9]; and others are based on the insertion of monitoring parameters within the application [8]. Most of these applications are aimed at providing the right diagnoses so that the patients who use them do not go to hospitals, clogging up the emergency rooms. However, a part of these is aimed at a more specific category of users focusing precisely on a single disease [10] or even cancer [11, 13].

3.4.1.2 Access to healthcare

These applications enable users to access health services remotely in order to schedule appointments [16], access hospital hours and contact doctors or the reception. Some apps provide information on the facilities and how to reach them [17], while others allow monitoring patients remotely by entering clinical data into the application, so that doctors can assess the condition of their patients at home [15].

3.4.1.3 Counselling or therapy

Mental health problems are a growing issue, not to be underestimated in today's society. Especially after the arrival of the pandemic, these cases have multiplied dramatically. To help people even in the comfort of their homes, bringing therapy or counselling directly to them, support applications have been created. These apps address this issue in several ways: some use chatbots as a tool, analyzing the stress-reducing effects they can have on the user [18]; others focus on emotions and how they can be managed and kept in check [19]; finally, there are applications aimed at helping users manage anxiety or depression [21] and even at alleviating the symptoms of ADHD [20].

3.4.1.4 Self-monitoring

Until now we have seen applications that help users access services that they previously could only access outside their homes, while this type of app allows users to self-monitor. One has the main purpose of having patients use a telegram chatbot capable of monitoring blood pressure by entering data [22]; another application is dedicated to pregnant women and reducing their stress levels through the use of this app [23].

3.4.1.5 Data collection

The use of chatbots has become increasingly frequent. Continuous improvement in design makes chatbots more reliable and guarantees a wide range of services. Thus, it is essential to receive feedback from users who use the app so that problems can be resolved, and better service guaranteed. These applications focus on just that [2426].

3.4.1.6 COVID-19 support

Other kinds of applications have targeted users affected by Covid-19. As previously discussed, the pandemic contributed greatly to the development of chatbots; for months we were unable to leave the house, so the need arose to access services remotely. COVID-19-targeted apps help users book appointments for tests or tests, analyze symptoms to prevent false positives, and even contact their doctors [27, 28].

3.4.2 Technologies

There are various techniques used for chatbots. Rule-based chatbots use pattern-matching algorithms like Artificial Intelligence Markup Language (AIML) [27] or online platforms to build chatbots [24, 18, 9, 15, 20, 11, 16, 17]. AIML is utilized for response generation, structured with subjects containing related categories, and each category consists of a rule with a pattern representing user queries and a corresponding template for the response. For instance, studies have employed the AIML algorithm for response generation.

Artificial neural networks (ANN) are used in retrieval and generative chatbots. These models receive user input, compute vector representations, feed them as features to the neural network, and generate responses. For example, some studies employed convolutional neural network (CNN) models to classify posts in online health communities and long short-term memory (LSTM) models to generate responses for posts. Additionally, others used feed-forward neural networks to recommend similar hospital facilities.

Many studies have utilized various online tools that incorporate natural language processing (NLP) and machine learning techniques. These tools typically include natural language understanding (NLU) components, which aim to comprehend text. NLU involves intent categorization and entity extraction while considering contextual information. After training, chatbots can categorize users' inputs into intents and extract entities.

Other chatbots rely on online platforms or social networks such as Telegram or Facebook [8, 22, 13, 23, 26]. The remaining ones used a variety of different methodologies like data gathering [25, 28, 21] or online interfaces like Google API’s [14].

3.4.3 Inclusivity

Chatbots should be useful tools for anyone, regardless of their abilities. CAs are especially valuable for people with disabilities, guaranteeing them access to healthcare services from their homes or helping to orient themselves in order to reach hospitals. For this reason, it is important that these tools are designed keeping accessibility in mind, to be used by everyone, guaranteeing vocal and visual answers or inputs but also facilitating their navigation in the best possible way. If we look at this study’s search keywords we can observe that this often does not happen. If we search the four digital libraries using the keywords "chatbot AND accessibility" we receive around two thousand articles; however, if we removed the keyword "accessibility" at that point the search would yield at least five times more results, showing that accessibility is not at the heart of chatbot development. As a matter of fact, out of twenty-one applications analyzed, only four are accessible [15, 20, 17, 13] and only one is designed specifically for people with disabilities [17].

Figure 3 shows the percentage of inclusive applications between the selected papers, resulting in only 15%. This denotes the need to further investigate accessibility of chatbots and enhance their efficacy while delivering a more satisfying user experience.

Fig.3
figure 3

Percentage of inclusive applications in the study

4 Results

In the following section we summarize the results of the study, starting with an overview of the articles published by year (Fig. 4), and continuing with an analysis of each single item [Table 2]. The 21 selected articles were used to conduct a deep analysis in order to answer the aforementioned research questions.

Fig. 4
figure 4

Articles selected for the study per year

Table 2 Summary of the items analyzed: purpose, target, technologies, inclusivity

Figure 4 shows the distribution of these papers per year. It can be observed that from 2019 to 2023 the considered topic has received more and more contributions, with great growth in 2021. There were few contributions in 2023 because the research for this paper was conducted in May.

Table 2 shows all the articles sorted by year of publication from the oldest to the most recent with a summary of the purpose, applied technology, the target of the application and whether they can be defined as inclusive or not.

Examples of design issues include studies that do not consider the interface design, with a focus on interaction of users with special needs (without checking that they are accessible via screen reader if the interface was web-based, or the app is not navigable via gestures, for example). Another example concerns chatbots based on voice interaction that do not involve short, simple answers and feedback.

5 Discussion

The purpose of this study was to conduct a systematic review of the literature on chatbot applications in the healthcare sector and analyze their benefits, problems, and future potential. Most of the research papers included in the study focused on creating or developing AI chatbots to help people access healthcare services and/or treatment from home and only a few of them aimed to get feedback uptake from these patients.

RQ1 What is the application of chatbots in the healthcare sector and how has it evolved over the last 5 years?

After analyzing the selected studies, we were able to confirm that chatbots applied to the healthcare sector are formidable tools, constantly evolving, both in terms of the services offered, as we have seen above, and in terms of the technologies used, ranging from the use of NLP to the implementation of increasingly precise machine learning algorithms.

As shown in Fig. 5, over the past five years, the trend is to create chatbots using more and more frameworks and online platforms, such as Telegram, Facebook, etc., instead of using AIML and ad-hoc NLP-based algorithms. This is at the expense of developing accessible and inclusive interfaces due to the limited functionality offered by frameworks and platforms that are readily available online.

Fig. 5
figure 5

Chatbot technology per years

However, the study also underlined that despite their advanced state, these applications still present problems regarding accuracy and performance.

Table 2 shows the progress of the applications analyzed over time. In the last 5 years, chatbots have become increasingly specialized and targeted. As seen in the Table, the first applications focused on diagnosing diseases or providing different services for all categories of users. Instead, over time the applications have begun to specialize in categories of users helping them with therapy or with specific health problems. In the paper by Walss et al. [10], the application focused on patients suffering from suppurative Hidradenitis; in the article by Chen et al., [11] the target was pregnant women; if we continue to scroll down the table we find many more examples showing how the target varies according to each specific health problem.

As we have seen, most CAs use machine learning algorithms, to be able to better understand user requests and provide the most appropriate response. However, in some of the articles analyzed, users encountered a problem with the standardization of the answers given by the chatbots [15], or with the fact that the applications could not respond at all when the question was asked differently from the way it was designed [10].

Thus, further studies are needed need to improve the interpretation of natural-speaking language and the accuracy and pertinence of the delivered answer.

RQ2 How much attention is given to accessibility when designing an application?

The primary intent of chatbots should be to guarantee an enjoyable user experience (UX) accessible to all users so that the chatbots can be utilized to their full potential, but instead this study revealed that often this does not happen.

To begin with, most of the applications analyzed are text-based as their primary method of communication, and only a few accept speech input. This translates into navigation problems for more sensitive categories of users, such as the elderly or people affected by visual disabilities who can benefit more by using a natural language for the interaction. Only four of the analyzed applications can be defined as accessible and only one is specifically designed to help people with disabilities [17]. Considering that chatbots are becoming increasingly useful tools in our society, and are becoming more targeted, it is essential for future design to be centered around UX. To this aim, co-design with people with disability is the main tool for achieving a satisfactory degree of accessibility and usability.

The study took into consideration research articles from the last 5 years; however, if we conducted a study reaching further back, we would notice that by inserting the keyword "accessibility" not only did the results decrease dramatically by a factor of 5, but also that 70% of articles are dated within the last three years (2021–2023). For example, if we conduct research through ScienceDirect, using the combination "chatbot accessibility", we have 651 research articles as a result, 530 of which have been published in the last 3 years. Furthermore, using the single keyword "chatbot" we arrive at a total of about 3025 articles, of which 2283 only date back to the last 3 years, showing how both topics are increasing progressively, that the problem of making accessible applications is slowly emerging, and that only a few developers are actually attempting to resolve it (Fig. 6).

Fig. 6
figure 6

Comparison by year of how many articles on science direct consider the theme of accessibility

Another issue that emerged after reviewing these papers is that some chatbots fail to address the language barrier issue. An example is the article written by Apuzzo and Burresi [17], in which a team of Italian developers created a chatbot that would allow people with visual impairments to access healthcare services, and to receive support for orientation. It responded with two different types of output: one based on sign language, and one written. The problem reported by the developers was that the text-based output was only offered in English, thus preventing users who do not rely on sign language from fully understanding the answers in the case of non-mastery of the language.

Based on the results of the study, some design suggestions can be proposed:

  1. 1.

    Make the interaction as natural as possible to approach the way users who are unfamiliar with technology interact. Voice interaction, for example, is the way questions are usually asked. This is particularly useful if the interaction has to take place with smartphones, which due to the small screen can make it difficult for everyone to read.

  2. 2.

    Ensure that the user interface—whether web-based, mobile application or not—is navigable via assistive technologies (e.g. screen reader, magnifier, etc.), keyboard and gestures (touch-screen).

  3. 3.

    Give more flexibility to the type of questions the user can ask (and thus the chatbot can interpret).

  4. 4.

    Use simple language with short, frequently used words and short sentences.

  5. 5.

    Allow customization of the interface to offer a multimodal interaction as close as possible to the user's needs (including the spoken language).

6 Conclusions and future work

In this paper, we investigated the progress of CAs in the healthcare sector by considering the recent literature (last 5 years), analyzing the state of the literature and the main features of recently developed applications. Chatbots have shown great potential in revolutionizing hospital management and improving patient experiences. They have evolved to become more sophisticated, intelligent, and capable of addressing a wide range of healthcare needs. The integration of artificial intelligence and machine learning has enabled chatbots to understand and respond to user queries more accurately. However, in their current state several problems remain, the most important being that they are not developed with the idea of accessibility in mind and pay little attention to the user experience. As a result, difficulties including miscommunication between chatbots and users can occur. Moreover, healthcare is a sensitive field that necessitates careful attention to the safety, security, and privacy of data and systems. To prevent these concerns and assure reliability and security, it is crucial to plan the use of chatbots in healthcare carefully, with a major focus on the user experience.

In conclusion, the paradigm of accessibility-by-design has to be incorporated into the practice of developing chatbots not only in the healthcare sector, but in every sector. In this way it is possible to effectively empower all users, regardless of their abilities and technical skills, and to increase the value of chatbots as effective support systems.

Concerning the future of research in this area, in recent months considerable attention has been focused on ChatGPT. When performing a search in the scholar repository by adding the word ‘chatGPT’ to our selected five keywords, we retrieved 244 papers dating from 2022 to the present that discuss this topic (245 from 2021). This indicates that considerable attention has been concentrated in this direction in the last year, discussing the potential of this technology. However, as pointed out by Chow et al. [29] there are some relevant obstacles to making ChatGPT a programming layer when building an accurate medical chatbot. These include accuracy and reliability since it would be necessary to train ChatGPT only on the certified medical information, transparency of the training model, and ethics concerns regarding the treatment of user data.