Keywords

1 Introduction

Artificial Intelligence (AI) is revolutionizing many fields of science and technology. Advances in technology have changed and evolved the definition of AI, bringing new discussions and, in some cases, confusion to the scientific community [1, 2]. In simple terms, AI refers to machines that mimic human reasoning for problem-solving [1, 3], although, the definition of AI has also been used as an umbrella term that covers other techniques, such as Machine Learning (ML) and Deep Learning (DL) [2,3,4].

AI has experienced massive growth in recent years. The global AI market size grew by 37% in 2020. Despite the fact that the global AI market size might slightly slow down by the end of 2025, the European market will grow with a compound annual growth rate of 29.6% [5]. This massive growth of AI reflects the enthusiasm within institutions and businesses to embrace AI. The advances this technology has brought to society have changed and enhanced a wide range of industries worldwide, such as education, manufacturing, and healthcare. However, the adoption of this technology has also brought many challenges, and within this paper, we will discuss both the importance and challenges that AI has brought to the healthcare domain.

The paper is structured as follows: Sect. 2 discusses the importance of AI in Healthcare from a data and application perspective; Sect. 3 presents challenges in healthcare in relation to (1) SDLC frameworks, (2) adaptability of algorithms, (3) explainability and traceability, and (4) terminologies; Sect. 4 presents some of the efforts conducted by regulatory bodies; finally, Sect. 5 introduces a summary and directions for future work.

2 The Importance of Artificial Intelligence in Healthcare

The promises of AI in healthcare are aimed at improving and innovating different areas such as medical practices, research and management [6]. Some examples of high-value AI applications for medical practices could be easier detection of disease, fast action on urgent events, improved confidence in diagnosis, personalized treatments, drug discovery, and management of critical conditions [6, 7]. It must be noted that none of these applications would be possible without data.

Despite the fact that healthcare was found to have one of the smallest global dataspheresFootnote 1 in 2018, this industry will experience rapid growth, reaching a compound annual growth rate of 36% by the end of 2025 [8]. Possible reasons for this increment are due to the advances in digital care, healthcare analytics, and advances in the imaging technology industry [6, 8]. From this perspective, AI will be increasingly important in healthcare to exploit the vast amount of medical data generated daily and, therefore, empower the sector to provide better assistance for patients and physicians.

Within a Medical Device SoftwareFootnote 2 (MDS) context, AI has been adopted to improve medical products and handle large volumes of data for interpretation [9, 10]. AI can be categorized into two major types of MDS: Software as a Medical Device (SaMD) and Software in Medical Devices (SiMD) [11]. Based upon the definition of the International Medical Device Regulator Forum (IMDRF), SaMD is software that is used on its own for one or more medical purposes, and it does not necessarily have to be part of the hardware to achieve the intended use. On the other hand, SiMD is part of a Medical Device (MD), which means that the software is utilized to assist an MD in performing the intended use [12]. In Europe, the Medical Device Regulations (MDR) covers both terminologies SiMD and SaMD by using the term MDFootnote 3. AI could fall into either SiMD or SaMD, which is generally referred to as AI-enabled MDFootnote 4 (AI-MD) [11]. A limited number of AI-MDs are already approved for the market by regulatory bodies. The Food and Drug Administration (FDA) approved 222 AI-MDs from 2015 to 2020 [13], and the agency has indicated that most of these devices were categorized as AI-enabled SaMD [7, 9]. Meanwhile, in Europe, Notified Bodies approved 240 MDs that contained AI between 2015 and 2020 [13].

In general terms, the MD industry, like aircraft, autonomous cars, and nuclear industries, is classified as safety-critical due to its consequences in terms of harm if something goes wrong, i.e., serious injuries or even potential loss of life to patients [14]. Historical examples of the lack of regulation and control resulted in devastating consequences. A particular event occurred decades ago in relation to SiMD, which was one of the starting points in explicitly enhancing software regulation procedures and requirements [15]. The main character of this unfortunate accident was the Therac-25, a software-controlled radiation machine for tumour treatment, late in the 80 s. It was discovered that the device affected patients as a consequence of the high-energy radiation delivery, causing severe injuries and death [16]. Consequently, this event triggered actions from policymakers on how to regulate and ensure the software is safe in MDs [15], which eventually included SaMD as a response to technological advances. Many lessons have been learned from this event. However, there is now an alarm that similar events like the Therac-25 might occur again by enabling AI-MD, given the current uncertainty of regulatory guidance regarding adopting this technology in MDS. Even more, for AI to be adequately incorporated into the MDS industry, challenges must be considered as the complexity and non-deterministic behaviour of AI technology.

3 Challenges Introduced Through Artificial Intelligence in Healthcare

The adoption of AI in MDS has challenged the traditional regulatory framework. Manufacturers are facing new struggles related to the integration of AI in MDs. Within this paper, the challenges explored are in regard to various features between traditional MDS and AI, transparency, and terminology. First, we discuss the Software Development Life Cycle (SDLC) in general terms to illustrate the differences between AI and traditional MDS. Subsequently, we introduce adaptive AI algorithms as a unique feature and discuss the challenges this poses. Lastly, transparency and terminology are presented as challenges, although proper implementation may enhance the safety and trustworthiness of AI-MDs. The incorporation of AI in healthcare also magnified ethical and social issues such as fairness and bias. Despite the great importance of these ethical and social challenges, these are beyond the scope of this paper.

3.1 Different Software Development Life Cycle Frameworks in Traditional Medical Device Software and Artificial Intelligence

Traditional MDS and AI-MD have different characteristics [17, 18]. One possible reason for the struggle to regulate AI-MD is the difference in the structure of the Software Development Life Cycle (SDLC) process for AI in comparison to traditional MDS. Regulatory agencies have validated, cleaned and approved several MDS, although barriers have been encountered when it comes to AI due to its complexity [7].

In simple terms, traditional MDS has a defined and deterministic set of instructions that, based on specific inputs, a specific output is generated (see Fig. 1, Traditional MDS diagram) [3]. Different SDLC frameworks have been created and adopted to design, develop, and test traditional MDS, from plan-driven approaches, like waterfall and v-model, to more adaptable ones, like Agile frameworks [19]. However, AI has modified the rules of this game. AI models are fed with data containing features, i.e., inputs, and a target, i.e., output, to be trained and tested (see Fig. 1, AI diagram) [3]. In AI, input(s) and output comprise the dataset used to train and test a model, and the AI technique could be any ML technique operating via a supervised learning paradigm. These elements, input + output + AI technique, are used to build a model which represents the training dataset’s patterns [3].

Fig. 1.
figure 1

Differences between traditional MDS and AI [3].

The functional differences between SDLCs for traditional MDS and AI analysed were related to (1) data, (2) the set of skills required from practitioners, and (3) a lack of modular programming [18]. To mainly focus on (1), when comparing the general SDLC tasks between traditional MDS and AI, it is possible to notice significant differences related to data: the need for data to learn and the different SDLC frameworks. Data engineering processes must be performed before training and testing a model to acquire high accuracy of the outcome. Although the data engineering stage is pre-conditioned by the data – with none or insufficient data, it would not be possible to build an AI application. The functional differences (2) and (3) are not discussed as these are out of the scope of this paper.

In addition, the implementation of AI in the MD industry may require a new SDLC framework structure due to the stochastic behaviour of AI algorithms. Despite the fact that there are several SDLCs for AI projects, these life cycle frameworks may not be suitable for the MDS industry and should be revisited given the possible absence of regulatory requirements such as quality control, documentation design, and monitoring procedures [17].

3.2 Risks Associated with the Adaptability of Artificial Intelligence

Another different feature between traditional MDS and AI-MD is the capability of learning. In high-level terms, the FDA has classified two types of algorithms: locked and unlocked. For those algorithms labelled as locked, these are not retrained over time once the MDS has been deployed and approved; hence, these algorithms always provide the same outputs after feeding the same inputs. However, unlockedFootnote 5 algorithms are those designed to continuously learn under post-market conditions, e.g., from real-world data, in an automated process [7]. In other words, the essential difference between locked and unlocked algorithms is that the unlocked ones are upgraded by themselves, i.e., the software, whereas locked algorithms are upgraded by human intervention via new software versions [12]. In the document Machine Learning-enabled Medical Devices: Key Terms and Definitions by the IMDRF, the learning process from unlocked algorithms is called continuous learning, while for locked algorithms it is batch learning [11]. It is essential to clarify that some AI-MDs are also categorized as locked devices because manufacturers do not have the intention to retrain the model during operation [7, 11].

Regulatory entities and policymakers have drawn attention to this ability of AI, as their unique position among traditional MDS and the benefits of optimizing performance through continuous learning [7], preserving the prediction accuracy of the AI model [20]. However, regulatory agencies also recognize AI models’ potential risk from this stochastic behaviour. The fact that autonomous and continuous learning from real-time data may instruct the AI itself to perform differently could bring unpredictable consequences and potentially harm and endanger patients, and consequently, questioning the idea of request to manufacturers for another premarket submission [7].

As part of a proposed new framework for modifications, the FDA outlined potential future changes in the performance to support the development of unlocked AI-MDs [7]. The IMDRF additionally discussed future changes in the structure of AI algorithms. However, the IMDRF group also mentioned future changes related to external factors that may alter and modify the performance of AI-MDs, such as alteration of the data (e.g., quality of inputs affected) and the environment setting (e.g., system operation upgrades) [11]. It is essential not to forget that these external factors may also affect locked AI-MDs. The adoption of unlocked AI-MDs, employed in a regulated and safe manner, might be also beneficial for changing environments, to which locked AI-MDs are not able to respond.

The challenge that AI brought to healthcare, including the differences mentioned in Subsect. 3.1, generates the need for a new and adaptable SDLC framework for AI-MDs. Moreover, this new framework must fit the regulatory requirements for MD purposes to ensure trustworthy and safe AI devices from the beginning of their development [9, 21]. Additionally, as a consequence of the adaptability of AI algorithms, this may challenge other regulatory requirements, such as management process, risk and quality management, clinical evaluation, manufacturing facility, control design, and post-market surveillance [22, 23]. In particular, transparency has taken a critical role in implementing unlocked algorithms in a safety-critical environment such as healthcare [9, 11, 24].

3.3 Achieving Explainability and Traceability in AI – Essential to Satisfy Regulators

Transparency is defined in various ways depending on the scenario and discipline. Generally speaking, this refers to the possibility of accessing information [25]. Although, particularly in an Information Technologies (IT) environment, it was identified that the use of the word transparency refers to the degree to which the information and functionality of a system are invisible to users [25]. In the medical domain, the “condition of being transparent” is an essential element of end-to-end traceability that establishes a better relationship with patients, enhances services, reduces risk, and increases trust in physicians and the health care system [26]. Despite the fact that challenges to transparency remain in medical care practices [26], the implementation of AI in healthcare has magnified the current ones and raised new disputes in the area. In the context of trust, policymakers have agreed that transparency is one of the essential requirements for achieving trustworthy AI applications [9, 27, 28]. Moreover, the word transparency has been used to encapsulate the conditions of making AI more visible. These qualities are generally related to explainability and traceability [24, 29, 30].

Explainability.

This quality of transparency is related to the structure of the AI algorithms and their visibility to users. Typically, AI models receive specific inputs, e.g., patient data or clinical images, and generate a prediction or classification based on internal procedures [31]. Often, these internal procedures are hidden from physicians, providing no explanation of the decision-making process of AI models [24], which may compromise trust in the prediction of the AI algorithms. Moreover, this provides an insufficient level of understanding of these algorithms to physicians and is referred to as the black box problem [30]. Some ML models are easier to explain, e.g., in Regression Analysis is possible to refer to the weights given to the variables to understand their relationship, whereas the visualization style provides an understanding of Decisions Trees [30, 32]. However, in the case of more sophisticated AI techniques, such as DL and Natural Processing Language, the explanation of AI decisions becomes more and more complex [9, 30]. There is a realization that there is a trade-off: between the best performance from the model (which is often the least explainable algorithm) and those models having inferior performance but being the most explainable [9]. Due to the complexity of AI algorithms, the challenge for explainability is to select the best approach to describe the AI-MDs [24].

Traceability.

Regulatory agencies have recognized the crucial role of manufacturers in achieving transparency, in this case by designing proper traceability of the AI-MD [7]. Traceability in IT is the appropriate design of the life cycle of a system in terms of requirements in an onward and regressive sequence [33, 34]. In the MDS industry, traceability refers to the proper documentation of the system’s design, and it is critical as it is utilized as a risk control mechanism [33]. In short, control design aims to ensure a plan for the development process is designed, increasing the probability of correct translation of the user needs into an MDS, increasing the system’s quality and assuring safety before being placed on the market [35]. It has been suggested to document the entire process of AI SDLC and implementation [24]. Although, manufacturers are still struggling to document AI models due to the lack of mechanisms and guidance on how to do it [10]. Moreover, another challenge is that some AI applications are rarely delivered with complete traceability documentation due to the preference of manufacturers to keep the functionality, data, and algorithms private and confidential for Intellectual Property purposes [23].

The importance of explainability and traceability in AI-MD is not just to increase physicians’ and patients’ trust [23] but for troubleshooting (e.g., diagnose and trace incorrect outcomes) and liability purposes (e.g., who is responsible for mistakes?) to minimize risk and assist adoption of AI [9, 23, 24]. Additionally, transparency would play a significant role in clarifying functionality, learning approach (i.e., batch or continuous), and changes over time [7, 36]. However, challenges in the selection of approaches to explain AI algorithms and the lack of guidance to document the life cycle of AI-MDs remain, which may require adjustments including the introduction of best practices in the documentation of AI projects in the MDS industry. Even more, the lack of transparency aggregates more challenges to other areas, such as cybersecurity, validation, and verification procedures [37]. Furthermore, the erroneous use of terminologies in documentation may limit the explainability of the AI-MD.

3.4 Conflict Use of Terminologies

Another challenge exists in terms of the terminology and taxonomy of AI [21]. This complication arises as there are different fields working together in the MDS industry, such as Artificial Intelligence, Data Science, Computer Science, Healthcare, and Regulatory agencies. Most of these disciplines have adopted different terminologies, with similar words but different meanings, leading to conflict and confusion. A simple example of this is the word validation, which is used in AI and Data Science as a technique to evaluate the performance of the model, whereas, from a regulatory perspective, this is to evaluate whether the user needs have been met [11].

Additionally, the use of terminologies from one discipline in another has been identified as another challenge. Researchers described how a study was conducted to identify the number of devices approved in the US and Europe and reported issues when exploring the documentation of the device. It was claimed that there were discrepancies in the use of the terms associated with AI and ML. This issue, and the lack of transparency in terms of the documentation, made the identification of the AI-MD cumbersome [13]. Additionally, the possible misuse of terminologies may increase and create barriers to the development process of AI-MD [21].

There should be a commitment from standards organizations and stakeholders to overcome terminology challenges. From a standards body perspective, their intention and purpose are to harmonize terminologies and taxonomies [21]. Whereas stakeholders should adhere to the standards developed by the standardization bodies when researching and developing MDs in line with regulations to ensure proper and consistent implementation of such terminologies across the industry [21].

4 State-of-the-Art from Regulators

In Europe, the Medical Device Directives (MDD) was replaced with a new version named Medical Device Regulations (MDR). This new regulation was enacted in May 2021, and it was a response to the technological advances in the medical device industry [38]. A study [38] revisited the MDR to verify whether the new changes would improve performance and safety in AI-MD. Despite the fact that AI is not mentioned in the document, the MDR would likely improve the performance and safety of most of the AI-MDs due to the new risk classification rules for software [38]. Based on this, it seems that AI-MDs would probably be classified in a higher risk classification, and therefore, such devices must be developed in a manner that is deemed safe before entering the European market. Although, it was also claimed that there is a lack in the evaluation process and external validation, which may affect the performance of AI-MD [38]. Besides the MDR, in April 2021, the European Commission released a draft of the AI Act to regulate and harmonize AI technologies across the Union [39]. The AI Act is based on a risk approach and describes a set of rules to classify AI systems as minimal to little, limited, high, and unacceptable risk. The AI Act proposed a list of requirements for high-risk AI systems. These requirements are related to risk and data management, technical documentation, record-keeping and traceability, transparency, human oversight, and adequate level of accuracy, robustness, and cybersecurity. In terms of adaptability, the AI Act proposes that providers must establish how the AI system and its performance would change over time. Moreover, post-market monitoring was established as a key requirement for adaptive AI systems in order to perform corrective actions more efficiently.

In April 2019, the FDA released a discussion paper in which it proposed a new Regulatory Framework for Modifications in AI/ML-enabled SaMD [7]. This framework includes a predetermined change control plan (PCCP) in order to assist manufacturers in the development of unlocked AI-MDs. The PCCP contains two sections: pre-specifications (PS) and Algorithm Change Protocol (ACP). The PS contains a list of future modifications related to the structure of the AI model, as it is expected that most of these will occur after the retraining process [7, 10]. The agency identified three changes in AI-MD after retraining: (1) performance; (2) inputs used in the model; and (3) the intended use of the device [7]. Whereas the ACP is associated with the step-by-step implementation of methodologies for future changes, i.e., procedures on how the algorithm will be retrained and change in post-market data conditions [7]. With the implementation of the PCCP, it is expected that AI-MDs will remain safe after retraining in post-market conditions [10]. Subsequently, the FDA held an open discussionFootnote 6 with stakeholders in relation to this new proposed framework. The feedback was analysed, and in 2021 the FDA released an Action Plan based on the comments and suggestions from the open discussion [10]. In relation to the PCCP, the FDA reported that stakeholders claimed that the list of future modifications was “relevant and appropriate” but limited [10]. As a response to the feedback from stakeholders, the FDA is currently working on expanding the list of modifications, which will be included in a new draft guidance of ACP [10]. Another point from this list was related to transparency to users, in which the FDA plans to promote transparency via public workshops and labelling training for manufacturers [10].

The IMDRF published in May 2022 the final document Machine Learning-enabled Medical Devices: Key Terms and Definitions. This document is a result of the efforts of the group of regulators to harmonize relevant terms around ML technologies in the MDS industry. The baseline of this document is the standards ISO/IEC DIS 22989 and ISO/IEC TR 24027, related to IT and AI terminologies and bias, respectively. In a nutshell, the document covers definitions of Bias, Continuous Learning, types of learning approaches, and terms related to testing and training processes [11]. The IMDRF also included two types of changes in unlocked AI-MD: to AI-MD and to AI-MD environment for data [11]. Future changes to AI-MD refer to modifications to the model. Some changes to AI-MD include the retraining process with new data, additional tuning of hyper-parameters, and training of the model with different AI methods and algorithms [11]. On the other hand, changes to the AI-MD environment for data are related to external factors that affect the learning process and the AI model. Examples of this type of change are the alteration of the quality of the inputs provided by third sources, changes in clinical practices, and the population upon which the AI model was initially trained and tested during the development process may have changed [11].

5 Conclusions

Medical device software standards and regulations have evolved over many years to provide manufacturers with helpful guidance in developing safe medical device software. However, the increasing usage of AI in MDS presents challenges in terms of the traceability and explainability of such algorithms, and there is a need for greater guidance to manufacturers in relation to the development of the safety of MDs containing AI. The adoption of AI in MD has challenged the traditional regulatory framework and set barriers for manufacturers. Moreover, sometimes in AI is not possible to adequately design documents as the lack of guidance, standards, best practices, and harmonization of terminologies. These also may impact the transparency of AI applications.

We identify several future contributions to MDS and AI. A potential contribution in AI-MD is the adjustment of existing guidance and standards already applied to MDS but for an AI context [21]. It is fundamental to start with the development and standardization of the structure of AI-MD projects by designing a regulatory-friendly framework, revisiting and comparing SDLC frameworks commonly implemented for traditional MDS and AI [17] and, subsequently, tailoring them to an AI-MD context. It is assumed that most AI life cycle frameworks have been mainly employed for non-safety-critical environments. Hence, these frameworks should be inspected to verify whether they would satisfy the regulatory requirements for MD purposes. In addition, best practices, standards, and guidelines will be considered for the development of the framework in order to improve the explainability and traceability of AI-MDs. Additionally, human oversight and post-market monitoring will be considered in the design of this framework for risk mitigation purposes. Systems engineering, and socio-technical systems will be also considered. This work will provide a baseline for unlocked AI algorithms for future work.

We realize there are difficult challenges that need to be overcome in order to establish universal rules and procedures for AI, particularly, in healthcare, due to the diverse context, different pathologies, user cases, and the constant evolution of the technology [40]. It will be challenging but not impossible.