Background

Recent years have seen the development and introduction of a number of artificial intelligence (AI) enabled technologies for healthcare. AI is a term which encompasses diverse computational technologies, making it challenging to define: prominent definitions include that AI is ‘a collection of interrelated technologies used to solve problems that would otherwise require human cognition’ [1], or that AIs are technologies with the ability to ‘perform tasks to achieve defined objectives without explicit guidance from a human being’ [2]. Broad in application, AI technologies arrive with optimistic promises of transforming the patient experience. Many of these modern developments are the result of innovations in machine learning (ML), a branch of AI focussed on developing algorithms which learn from examples [3]. So far, advocates of healthcare AI (HCAI) have promised the technology will improve the accuracy of screening and diagnosis [4], increase the availability of care in remote regions [5], and free up physicians’ time so that they can engage more with patients [6].

Alongside innovations in HCAI, there is also a growing field of AI ethics which cautions against uncritical implementation of HCAI and raises questions about its regulation [7]. ML technologies pose new risks and challenges to healthcare: some ML algorithms have been shown to produce biased outcomes [8], many ML technologies are ‘black boxes’ where the reasons behind an algorithm’s output cannot be interpreted [9], and questions remain about how existing liability structures in medicine will effectively manage errors made by deployed ML technologies [10]. AI development also continues to be dominated by large private companies that have been criticised for failing to engage in meaningful conversations about the ethics of their products and research [11].

Publics may be both beneficiaries of new HCAI technologies and the greatest sufferers of AI-related harms [12]. Patients and publics are important voices in developing effective and ethical AI governance, but engaging patients and publics meaningfully in research about ethical HCAI is challenging. Most people have no firsthand experience with HCAI, and some are unfamiliar with the concept of AI in general [13]. Publics may have limited understanding of how HCAI may be implemented, and limited knowledge about the potential wrongs and harms that could arise from implementing HCAI. To ensure that HCAI has a positive impact on patients, it is crucial that AI ethics reflects the values that are important to people [12, 14], but it remains unclear how this should be achieved.

The aim of this review is to determine how common and emerging themes in HCAI ethics are addressed by the existing literature on publics’ and patients’ views on machine learning in healthcare.

Methods/design

Scoping reviews are an effective method for exploring the range and extent of literature on a given topic [15]. Our work will follow the framework proposed by Arksey and O’Malley [16] with modifications from Levac and colleagues [15]. Their six recommended steps include (a) identifying the research question; (b) identifying relevant studies; (c) study selection; (d) charting the data; (e) collating, summarising, and reporting the results; and (f) consultation. The following sections will address each of these steps in greater detail. In preparing this protocol, we completed a PRISMA-P checklist to ensure all necessary details have been reported (Additional file 1).

Stage 1: Identifying the research questions

Our review will address the question, to what extent, and how, are HCAI ethics issues addressed in the existing literature on publics’ views on machine learning applications of HCAI. Our objectives are (1) to explore whether and how research on public views regarding HCAI has included investigation of public views on HCAI ethics (2) to describe study participants’ perspectives on HCAI ethics issues.

Stage 2: Identifying relevant literature

We developed a search query using the Population-Intervention-Context-Outcome (PICO) format. An initial search on Google Scholar helped to identify similar terms which were used to develop a comprehensive search query for published literature (Table 1).

Table 1 Grid of terms describing search strategy

We will use a systematic search strategy to find relevant articles for inclusion in the study. Databases to be searched are PubMed, Scopus, Web of Science, CINAHL, and Academic Search Complete. To find relevant grey literature, we will screen the first ten pages of a Google Advanced search. We will examine the reference lists of included studies to find any publications that were missed in the initial searches. All studies collected through the search project will be imported into a Zotero library.

Stage 3: Study selection

After the search is completed, all studies will be screened for eligibility. EF will be responsible for conducting the search and managing the data. First, duplicates will be removed using the deduplication module from the Systematic Review Assistant [17], and then the remaining files will be exported to MS Excel for the screening process. MS Excel will allow reviewers to easily indicate a study’s inclusion or exclusion, as well as keep notes about any uncertainties for discussion.

The first stage of screening will exclude irrelevant articles based on their title and abstract. Screening will be conducted based on a set of criteria defined below. The first reviewer (EF) will screen the first 10% of articles, including all articles that are potentially relevant based on their title and abstract and excluding articles that are clearly irrelevant. Of this 10%, EF will construct a sample of approximately 40 articles marked for inclusion, and 60 articles marked for exclusion. A second reviewer (RB) will screen this sample of 100 articles and compare results with EF. Results will be discussed with the team and inclusion criteria will be modified if necessary. Once any issues have been resolved, EF will conduct the initial screening on the remainder of the studies.

After initial screening is completed, excluded articles will be removed and full article texts will be collected for the remaining studies. The two-reviewer screening process will be repeated on a random sample of 10% of the full texts. Differences will be discussed and resolved, and modifications will be made to the inclusion criteria accordingly. Inter-rater scores will be generated to quantify agreeance between reviewers. Once the inclusion criteria are finalised, EF will conduct the remainder of the full-text screening.

Inclusion criteria

Articles will be screened against a set of inclusion criteria developed by the team. These criteria may be modified throughout the screening process. Initial design of the criteria was guided by the JBI guidelines for scoping reviews [18]. This guide states that inclusion criteria should address (i) the types of participants, (ii) concept, (iii) context, and (iv) types of evidence sources.

Types of participants

Studies will be included if research participants are recruited as publics, patients (or their unpaid/familial carers), or healthcare consumers. If studies recruit professionals (e.g. physicians, nurses, policymakers, or professional carers) along with publics, they will be included so long as the data related to patients/publics can be extracted.

Concept

Studies must address publics’ or patients’ views on HCAI. In this case, we utilise the term “views” to refer to the various ways participants contribute to social research. Included studies may, for example, quantitatively measure participants’ attitudes toward HCAI, or qualitatively examine participants’ perspectives on HCAI (or an application thereof). Studies will be excluded if the participants’ contribution to the research does not involve sharing views (e.g. studies only measuring whether a particular HCAI tool has improved participant outcomes in a certain area).

Studies will be included if the research addresses machine learning in patient- or (general) public-facing health care or services. An included study may address machine learning in healthcare or services in a specific field (e.g. patients’ perspectives on AI for breast screening, or publics’ attitudes toward AI-enabled mobile phone apps for skin cancer detection).

Studies will be excluded if they only address AI technologies that are not within the machine learning branch of AI. For example, studies only examining participants’ views on care robots or expert systems would be excluded. Studies will be excluded if they only address AI in non-patient/public-facing health applications. For example, studies addressing AI used only to manage bills and claims processing in hospitals would be excluded. Studies will be excluded if they only address non-health applications of AI.

Context

Studies from any geographical location will be included, so long as the manuscript can be assessed in English. Only studies published between 1 January 2010 and 15 September 2021 will be included. This time period has seen the introduction of modern approaches such as deep learning, convolutional neural networks, and natural language processing into HCAI research [6]. These new approaches are the source of much of the current interest and investment in HCAI, and introduce a number of new potential challenges and harms [7].

Types of evidence sources

Only primary research studies will be included in this review. Studies will not be excluded based on method. There will be no restrictions on study design.

Studies will be excluded if they are only available in a language other than English, if they do not address AI in a patient-facing healthcare context or if the study participant profile does not include patients or publics.

Stage 4: Charting the data

We have designed a coding framework to capture information on whether and how studies address a series of AI ethics concerns.Footnote 1 Whilst a number of different frameworks were reviewed [19, 20], the coding framework is primarily based on Fjeld and colleagues’ [21] analysis of a series of AI ethics guidelines. Fjeld et al. identified seven domains that were frequently addressed in AI ethics frameworks: (1) privacy, (2) accountability, (3) safety and security, (4) transparency and explainability, (5) fairness and non-discrimination, (6) human control over technology, and (7) professional responsibility. To capture more detailed data on where each of these ethical issues were addressed, we separated the concepts of ‘safety’ and ‘security’, and ‘transparency’ and ‘explainability’ into individual code categories (Table 2).

Table 2 Adaptation of AI ethics frameworks for data extraction

We added four additional concepts to the framework. The first, power, has become a more common point of discussion in AI ethics frameworks recently, to assess how AI development and governance structures are reinforcing existing power dynamics and failing to redistribute power to marginalised groups [14, 22]. The second, environmental wellbeing, addresses the environmental impacts of AI development including energy usage, materials, and e-waste [22, 23]. Societal wellbeing addresses whether technological development is being implemented for social good [23]. Finally, ethical governance addresses whether existing governance structures are suitable to manage the ethical issues associated with HCAI.

Additional information about study design and methods will also be collected, including detailed notes on study design. We will use MS Excel to chart the data from the studies. The initial data extraction tool (Additional file 2) covers the key areas recommended by Arksey and O’Malley [16], the ethics framework, and the additional information about study design. In adhering to recommendations from Levac et al. [15], we will modify this tool progressively throughout the data collection process. Initially, a random 10% of the included studies will be selected and coded by two coders (EF & RB) and any differences will be resolved in consultation with the research team. We will make changes to the data extraction tool if necessary. The remainder of the charting will be conducted by EF.

Stage 5: Collating, summarising, and reporting the results

We will collate results into tabular format for analysis. Guided by Arksey and O’Malley’s [16] recommendations, analysis will begin with descriptive quantitative reporting where it is appropriate (e.g. the number of studies which address each HCAI ethics issue in the framework).

Our reporting will synthesise publics’ and patients’ views on each of the HCAI ethics issues in Table 2. Given the inclusion criteria for this review, we are likely to collect studies with diverse designs. In some cases, direct quantitative comparison between studies may be possible. In other cases, studies with different methodological designs may be compared with one another. Where studies do not allow for direct comparisons, our results will report narrative descriptions and comparisons, noting how a study’s framing, aims, and contexts might influence the information collected. This synthesis methodology will be refined based on the types of studies collected.

Discussion

This review may have some limitations. Firstly, scoping reviews are designed to map the literature in a topic and are not designed to assess the quality of included studies [15]. The quality of the studies included in this review will not be systematically assessed. Secondly, it is possible that relevant studies will not be captured by the search strategy defined in this protocol. We will conduct a systematic pearling process on relevant identified studies to ensure as many relevant articles are identified as possible. Finally, findings will be limited to studies published in English, which may exclude relevant articles published in other languages. We will reflect on the impact of these limitations, as well as discuss any other arising limitations, in the reporting of our results.

The more widespread use of HCAI technologies is often described as inevitable [6]. However, implementation of HCAI may exacerbate certain harms in healthcare [7]. Although patients and publics are likely to be the greatest sufferers of HCAI-related harms, involving patients and publics in meaningful research about AI ethics remains challenging [13].

To date, the extent to which patients and publics are involved in research about HCAI ethics is unclear. This review will examine where existing research has involved patients and publics in research about a series of HCAI ethics issues. In doing so, we will describe patients’ and publics’ views on each HCAI ethics issue, and highlight potential gaps, or areas of HCAI ethics where research with patients and publics is limited. The results from this review will be important to understanding where further effort is required to involve patients and publics in research about HCAI ethics. Such an effort is crucial to ensuring that HCAI is implemented safely and effectively.