1 Introduction

Artificial Intelligence (AI) can strengthen different economic and social sectors, thus improving our quality of life (Cath 2018). The health sector (Morley et al. 2020; Yu et al. 2018), financial or business sector (Buchanan 2019; Loureiro et al. 2021), educational sector (Chen et al. 2020), security and justice (Hoadley and Lucas 2018; Rigano 2018), among others, are already incorporating AI-based systems, specifically employing Machine Learning (ML) or Deep Learning (DL) algorithms. Government agencies and private companies see significant advantages and benefits in these systems for decision-making or support due to their great precision, automation capacity, and data analytic (Wirtz et al. 2018). Furthermore, these systems can improve compliance with human rights and social welfare and contribute to policy formulation, public service provision, and internal management within the public sector (Cath 2018; Henman 2020; van Noordt and Misuraca 2022).

In the context of our research, AI is defined as computational systems capable of executing tasks that would usually require human intervention (Wang 2019). These systems have become integral to our society due to their remarkable ability to identify patterns in large datasets, appearing in streaming services, machine translation, voice assistants, and chatbots. However, implementing this emerging technology in this fields brings social, economic, and primarily ethical implications (Benefo et al. 2022; Floridi 2019; Jia and Zhang 2021; Martin 2019). Some implications of AI include possible discrimination against vulnerable populations, privacy violations, loss of autonomy or sense of agency, and even impacts on sustainability (Devillers et al. 2021; Galaz et al. 2021). The misuse and incorrect development of AI have been the focus of debate among academics, policymakers, private companies, and civil society (Cath et al. 2018; Jia and Zhang 2021; Mittelstadt et al. 2016). Consequently, numerous documents have been produced suggesting specific principles or guidelines for the ethical advancement of AI.Footnote 1 Notably, Jobin et al. (2019), Fjeld et al. (2020), Floridi and Cowls (2019), and Hagendorff (2020) scrutinize the most relevant documents proposed by organizations and governments. Table 1 shows some examples of the main ethical principles and other low-level principles identified and organized by the community. Their respective associated principles and descriptions are included.

Table 1 Ethical principles and their interpretation

The concern among the mentioned actors is well-founded.Footnote 2 Raso et al. (2018) noted that a single AI application -among the thousands already in use- can affect numerous civil, political, economic, social, and cultural rights. A poor implementation or development of AI can undermine citizens’ trust, generate damage (tangible and intangible), and, ultimately, endanger human rights (Latonero 2018). Consequently, in recent years, ‘AI ethics’ has been taking shape to answer the question: How is an AI developed that benefits society? With clear guidelines and actions. The vast majority of methods to achieve ethical development of AI (Floridi 2015; Morley et al. 2020) are based on following ethical frameworks with widely accepted ethical principles (see Table 1), which provide minimum requirements to build and use an ethically sound AI system (Ashok et al. 2022). Other approaches focus on government laws and technical or practical tools (Cath 2018; Stahl 2021a).

In order to clarify the concept of a tool in the context of ethical AI development, from now on, it will be understood as a versatile resource capable of supporting this process. This term encompasses many resources, such as open-source libraries, procedures, good practices, and government frameworks. These tools have different ways of contributing and play specific roles in developing responsible or ethical AI. For example, technical tools like libraries and web applications generally focus on specific aspects, such as bias assessment in AI systems. On the other hand, non-technical tools, such as guidelines and abstract definitions, provide ethical guidance more conceptually. Although these tools may differ in nature, they share the fundamental purpose of guiding, informing, teaching, or facilitating the development and evaluation of ethical AI. In this sense, the term ‘tool’ is used inclusively, reflecting the diversity of resources contributing to the process, just as a carpenter recognizes a hammer, chisel, and saw as essential instruments for creating a product. This holistic approach seeks to underline the importance of any resource capable of promoting ethics in AI development.

Existing technical tools are linked to certain ethical principles, e.g., technical solutions aimed at creating explainable systems that protect the privacy of sensitive or personal data and involve data collection aware of biases and discrimination (Lepri et al. 2017). However, regarding transparency, non-maleficence, and beneficence, the ethical criteria and technical mechanisms to safeguard them are left to the judgment of the person or organization that implements the AI. As a result, an ethical standard is often not met (Floridi 2019; Floridi et al. 2020). The debate on whether ethical principles are sufficient to develop ethical AI leans towards the fact that they are not. The documents’ definitions remain abstract and unhelpful for developers (Mittelstadt 2019; Morley et al. 2021). In addition, this abstraction “tends to encourage a public opinion that there are good and bad algorithms, rather than well designed or poorly designed algorithms” (Morley et al. 2021, p. 2). As high-level principles continue to emerge, there is also a rise in resources addressing AI ethics. However, it remains problematic to determine which tool could aid adherence to these principles (Corrêa 2021; Morley et al. 2020).

Understanding the Collingridge dilemma is crucial in AI ethics, especially when exploring how technology ethics and values interact. This concept highlights the intricate link between technologies and the frameworks we use to judge them. The ethical side of the Collingridge dilemma poses an ongoing challenge: navigating a constantly changing landscape where technologies reshape societal values and ethical standards (Kudina and Verbeek 2019). This dilemma creates a paradoxical situation where ethical considerations swing between being “too early” -assessing technologies without knowing how they might alter ethical frameworks- and “too late” -understanding ethical implications only after the technology is firmly established, making change difficult. Fundamentally, evaluating technology from an ethical standpoint requires a nuanced equilibrium between foreseeing societal impacts with limited knowledge and comprehending those effects only after the technology has already influenced the frameworks for assessment (Kudina and Verbeek 2019; Strümke et al. 2022).

With the rise of tools such as ChatGPT,Footnote 3 Diagflow,Footnote 4 Dall-e,Footnote 5 and others, it is even more relevant to consider an ethical framework for AI systems. These tools are having great impact in how we work and perform tasks in everyday life. Because they directly impact people’s lives, the ethical context in which they operate becomes more critical (Christoforaki and Beyan 2022). Topics such as the intellectual property of the data used to train algorithms, the false identities that can be created, and how these systems affect work and industry have been discussed (Frank et al. 2019). As these technologies become more ubiquitous, addressing the ethical concerns associated with their use is important. Regulation and establishing solid ethical standards are essential to ensure that AI is used responsibly and with respect for people’s rights and well-being (Hickok 2021). Taking steps to ensure ethical AI makes it possible to maximize its potential while protecting human rights and values.

AI ethics needs to improve to comply with the stipulated ethical principles; it lacks control and reinforcement mechanisms, and infractions do not have significant consequences (Hagendorff 2020). Additionally, the technical personnel in charge of the development and deployment of the system (data scientists, data engineers, and developers, among others) still do not receive adequate education on the ethical implications and how to make a judgment when developing ML or DL algorithms (Burton et al. 2017). These personnel generally follow the life cycle of the AI-based system (see Fig. 1) or the recent MLOps cycleFootnote 6 without explicitly addressing ethical aspects. The latter could arise from inadequate awareness or perception of ethical considerations, rendering it non-essential, and some firms may consider it a hindrance (Morley et al. 2021). The AI life cycle stages consider technical aspects of both the model and data sets. However, it is imperative to incorporate responsible practices in every stage, not only definitions or suggestions, to comply with the principles outlined in Table 1, which serve as the core framework. It is crucial to prevent undesired situations like discrimination resulting from biases in either the model or the dataset (Bogina et al. 2021; Hermosilla et al. 2021) or a lack of transparency caused by system opaqueness and processes (Kroll 2018). All the involved parties, including AI developers and other stakeholders, must acknowledge the importance of ethical principles, their implications, and the risks that emerge when they neglect them. Furthermore, they should possess adequate practical resources to implement AI ethics.

Fig. 1
figure 1

Stages in the life cycle of an AI-based system.

Note Created based on Haakman et al. (2021) and Hermosilla et al. (2021)

The review will examine the application of AI ethics, current shortcomings and challenges, and theoretical and practical tools to aid in developing AI-based systems. Specifically, the section ‘Towards a more ethical design of AI’ provides an overview of the current scenario for responsible and ethical AI development and the limitations of existing approaches, and, finally, a conceptual framework is proposed to compensate for these limitations (focused on operationalization). The ‘Methodology’ section outlines the systematic review conducted to identify the practical resources and tools that multidisciplinary teams can use when developing systems. Likewise, a detailed explanation of the creation of a graph incorporating multiple typologies to classify the tools discovered is provided. The ‘Results’ section exposes the main findings of incorporating the tools in the typologies. The ‘Limitations and Future Work’ section highlights ideas that may help close the gap for implementing AI ethics and the limitations of this review. Lastly, the study concludes by presenting the challenges that need to be addressed to create AI that benefits society.

2 Theoretical background

Most frameworks, guides, and guidelines (even less abstract resources) proposed in recent years by the community to achieve ethical (also called responsible) AI development are based on ethical principles (Hagendorff 2022b). There is a wide variety of these documents, with a plethora of principles (Jobin et al. 2019), that can be “overwhelming and confusing” for AI developers (a small number of the principles can be seen in Table 1). However, some researchers have identified and unified them into more actionable fundamental principles (Becker et al. 2022; Fjeld et al. 2020; Floridi and Cowls 2019). These attempts to clarify the picture, while valuable, remain limited. The level of abstraction of the principles, and even more worryingly, the fact that more technical personnel are unaware of them, means that this approach is still immature (Morley et al. 2021). We are not discouraging or discrediting these valuable efforts; on the contrary, we believe that they are the basis for developing AI governance and should be addressed with methodologies that encourage multidisciplinary development, participation of all stakeholders, and initiatives that support a more practical approach such as the typologies proposed by Morley et al. (2020) or Ayling and Chapman Ayling and Chapman (2021). Their research presents several tools classified into two main categories: fundamental ethical principles and AI lifecycle stages. Morley et al. (2020) identify 106 tools, while Ayling and Chapman (2021) list 39. Similar to our research, these tools vary in nature; some are theoretical, and others abstract. However, they share the goal of providing developers and stakeholders with resources to apply ethics in AI at specific points in development.

Governments are still reserved or cautious about exercising mandatory regulation on these emerging technologies (Marchant and Gutierrez 2022; Maslej et al. 2023; The Law Library of Congress, 2023), probably for two reasons. The first is that since “laws and norms cannot keep pace with code”, this may also explain the great variety of existing soft laws (Fjeld et al. 2020, p. 57; Gutierrez and Marchant 2021). The second possible reason is that “policymakers and legislators [sometimes fall, but] need to push against the false logic of the Collingridge dilemma”; therefore, they don’t intervene until technologies are fully developed and the use is widespread (Morley et al. 2021, p. 8). In any case, the forms of governance and initiatives to achieve the ethical development of an AI system follow some basic principles. These serve as “normative constraints on the do’s and don’ts of [developing and using AI-based systems] in society”, noting that normative constraints should apply to all parties involved, rather than just technical personnel (Morley et al. 2020, p. 4).

2.1 Operationalization of AI ethics

As mentioned, the fundamental or high-level principles proposed by governmental institutions (China’s 2019; European 2019; Government 2019; Espaÿnol et al. 2021), different private companies (de Laat (2021)) give a broader view of the principles of private companies), international organizations (OECD 2019b; UNESCO 2021), academia (Brundage et al. 2018; Diakopoulos et al. xxx; Floridi et al. 2018; Future 2017), among others, have dominated the field of AI ethics and it is hoped that these principles can prevent AI practitioners from “crossing societal red lines” (Morley et al. 2021, 2020). Ideally, abstract principles would suffice to achieve ethical AI development, but this is not the case. There are several reasons or causes why high-level principles are insufficient. We clarify that there may be more reasons why principles per se are insufficient. The first reason is that most of these documents (guides, guidelines, directives, principles) are limited to defining the “what of ethics”, i.e., what must be followed for an AI system to develop ethically, and focus little on “how” to achieve it (Morley et al. 2020). A second possible cause may be because these principles are seen as a replacement for regulation but are not enforceable, so compliance cannot be guaranteed (Rességuier and Rodrigues 2020) or, put another way, ethics “lacks mechanisms to reinforce its normative claims” (Hagendorff 2020). The last cause refers to the fact that in real-world environments, the interests of companies may be above compliance or adoption of the principles. Therefore, they are considered a constraint when developing and implementing AI systems, even creating conflicts with their employees (Hagendorff 2020; Ryan et al. 2022).

With the above in mind, our study gives importance to the first two causes, and we attempt to close further the gap in the operationalization of AI ethics, i.e., how to move from principles to practice (Morley et al. 2020). To address the described gap and achieve our goal, we propose to answer the following questions:

  • What tools exist to apply the ethical principles in AI systems?

  • How can the above tools be classified according to their real-world application?

  • At what level of technological development are these tools?

Achieving ethically aligned results with AI development requires using technical and abstract (non-technical) principle-based tools. We also argue that differentiating between stages of system development (see Fig. 1) will help more technical personnel (Morley et al. 2021). So far, few resources on toolkits can be consulted by the personnel involved to comply with AI ethics, regardless of whether the project is starting or the system is in production. For example, a developer or any person engaged in the system’s pre-development or post-development stage may find a helpful tool to measure the impact of the proposed solution (according to the principle of Non-Maleficence). Even so, we know this principle-centered approach to deontological ethics (normative rules to which we must adhere, sometimes seen as obligations) has shortcomings. We believe that it should be accompanied by a virtue-based approach, i.e., “ideals that AI practitioners can aspire to” and at the same time, decrease the active responsibility gap (Hagendorff 2022b, p. 4; Santoni de Sio and Mecacci 2021).

Fig. 2
figure 2

Pillars for the ethical development of AI

The conceptual framework we suggest as a fundamental basis for the responsible development of AI-based systems can be seen in Fig. 2, and its main features are shown in Table 2. To address AI ethics, we propose a holistic approach of two approaches or theories of ethics: i) virtue ethics, which “focuses on an individual’s character development” (Hagendorff 2022b, p. 19), these virtues create character dispositions that serve as the basis for ethical decision-making and further relate to the concept of active responsibility, which encourages AI professionals to promote and comply with legal standards, goals, and values, as well as to foresee, prevent and avoid unintended consequences (Santoni de Sio and Mecacci 2021); and ii) deontological ethics, based on normative rules to which people must adhere, this approach focuses more on the action and not on the actor (Hagendorff 2022b). These two theories and tools pave the way for applying ethics to AI. Integrating virtue ethics augments ethical growth by relying on the inherent moral character of the individual, which deontology and its external principles fail to tackle entirely (Hagendorff 2022b; Mittelstadt 2019). While the principles of deontology provide external guidelines, virtue ethics goes further, cultivating the internal dispositions that drive the correct behavior. Deontology may present a set of rules to follow, but without ingrained virtue, there is a risk that adherence to those principles will lack depth and commitment (Stahl 2021b; Vallor 2016). For example, a developer might adhere to a list of principles, but without personal solid virtues, he or she might apply them in a lax or superficial manner. The balanced combination of both perspectives, virtues, and principles provides a more holistic framework that guides external actions and drives authenticity and accountability from within. While we focus primarily on the third pillar in our research, we recognize that the other two ethical approaches will also benefit all those involved in AI development and the target population (Morley et al. 2020).

Table 2 Ethical theories and their characteristics

This study aims to provide a resource for people involved in AI development (not just technical staff) so that when operationalizing AI ethics, they are not overwhelmed by unfamiliar notions. To this end, and taking as a reference two studies conductedFootnote 7 on AI operationalization (Ayling and Chapman 2021; Morley et al. 2020), we propose to close further the gap between ethical principles and the actions or methods to enforce them. On the one hand, we will update the list of tools since, to the best of our knowledge, there have been no further studies on tools available to enforce AI ethics; on the other hand, we will extend the scope with two new ideas discussed in the next section.

3 Methodology

This research was divided into four phases: phases one and two follow a systematic review methodology, and the third and fourth phases focus on creating a typology and categorizing the tools. Five researchers participated in all stages of the process, and a random exchange of reviewer roles was implemented among them. This practice aimed to receive diverse and equitable feedback, thus promoting a comprehensive and objective evaluation. The level of consensus was achieved by identifying common ground and clarifying disagreements. Each step is described in detail below.

The first phase was to review the growing literature on AI ethics to provide a theoretical foundation for the study. This phase also aimed to identify research on tools that emerge from the research that promote pro-ethicalFootnote 8 AI development, taking into account ethical principles and the various stages of the AI lifecycle.Footnote 9 We created a search string and explored six databases (see Phase 1 in Table 3) with necessary modifications to accomplish this. The second phase entailed searching for tools using an additional search string through six databases (see Phase 2 in Table 3). Additionally, a non-exhaustive exploration was conducted using the Google search engine without the search string previously mentioned, due to its capacity to offer a broader range of information than traditional academic search engines.

Table 3 Databases and search strings for Phases 1 and 2

The PRISMA methodology (Page et al. 2021) was used for the systematic search described in phases one and two. For their respective selection (on literature and tools), the eligibility criteria shown in Table 4 were applied. The literature and tools selection flow is shown in Figs. 3 and 4, respectively.

Table 4 Eligibility criteria for literature and tool selection
Fig. 3
figure 3

Flowchart illustrating literature selection.

Note. Flowchart to illustrate our literature selection process, which was based on the PRISMA structure developed by Moher et al. (2010) for systematic reviews. The articles incorporated at this stage were used as theoretical foundations for our study, although they were not entirely cited

Fig. 4
figure 4

Flowchart illustrating the tool and resource selection.

Note. Flowchart to illustrate our tool and resource selection process, which was based on the PRISMA structure developed by Moher et al. (2010) for systematic reviews

Keeping in mind that the tools identified in the second phase needed to be aligned with the ethical principles mentioned in Table 2, the stages of the AI life cycle (refer to Fig. 1), the type of algorithm task, and two additional categories described below, the third phase aimed to generate a diagram with various typologies that summarizes all the gathered tools and their relevant information. This diagram can aid AI developers in identifying these resources more efficiently and determining their applicability in specific cases. Inspired by the research of Morley et al. (2020), Fjeld et al. (2020), and Ayling and Chapman (2021), we created a diagram found at Tools4ResponsibleAI, specifically in the ‘Typology’ section of the website.

Each level of the circle is related to an ethical principle (5 levels corresponding to 5 principles, except for the center of the diagram, which is explained in the next paragraph), and each section of the circle represents a stage in the life cycle of an AI (6 sections corresponding to 6 stages). Within each section, there are dots, and inside is a tool identifier number. Pressing a dot displays relevant information, such as the type of task, the sector that created the tool, and the level of development, described below.

The typology includes a sixth principle called Governance, which relates to “establishing and implementing policies, procedures and standards for the proper development, use and management of the infosphere” (Floridi 2018, p. 3). This concept ranges from regulations to social moral values (Ashok et al. 2022). Governance encompasses guidelines and recommendations (Floridi 2018), such as the General Data Protection Regulation (GDPR)Footnote 10 or the proposal to implement an AI Law in the European Union (European 2021). This principle is at the center because it unifies the resources that adhere to the other five ethical principles while addressing a range of social, ethical, and normative considerations.

Finally, the fourth phase involved synthesizing the information from the tools identified in Phase 2 and framing this collated data into the diagram that was created earlier. To ensure the alignment of the tools with ethical principles and lifecycle stages, we used the description provided by the respective tool. Many of these tools clearly indicate which ethical principle they address and in which phases they are best suited. In addition to the ethical principle definitions by Floridi et al. (2018), Becker et al. (2022), Fjeld et al. (2020), and The Ethics Guidelines for Trustworthy Artificial Intelligence (European 2019), we include our knowledge of AI system development stages and relevant literature. In addition, we outline other relevant typologies such as:

  • Task type, i.e., the task(s) the tool or resource addresses (classification, regression, clustering, NLP, among others). In some cases, as in theoretical resources, it is mentioned that its use is directed toward AI-based applications, but the type of task is not specified.

  • Sector that developed the tool (private, public, NGO, or academic).

  • Level of development of the tool. It was determined according to its functionality, documentation and examples, updates, and requirements. Some of these variables do not apply to theoretical tools. The level of development is classified into ranks: level 1 Insufficient, level 2 Basic, level 3 Intermediate, level 4 High, and level 5 Advanced. Passing level three (intermediate threshold) is graded every 0.5 points, although the scores are continuous.

The tools identified were classified into three distinct categories, each playing a specific role in promoting ethical AI. Firstly, hybrid tools blend technical and non-technical approaches, including books, checklists, courses, examples, guides, resource compendiums, tools compendiums, tutorials, and websites. These tools provide resources that integrate theoretical comprehension with practical applicability, furnishing developers and stakeholders with a comprehensive outlook.

Secondly, the non-technical tools cover a broad spectrum, including articles, case studies, certifications, codes of ethics, codes of practice, contractual terms, guidelines, laws, licenses, playbooks, practical frameworks, principles (documents with their respective definitions, and sometimes with theoretical guidance), reports, standards, theoretical frameworks, white papers and theory documents (documents that did not fully fit into the other types but were chosen not to be mixed with too many different types). These tools focus on providing conceptual, ethical, and legal guidance, establishing a solid framework for the ethical development of AI.

Thirdly, technical tools encompass codes, apps, datasets, programs (desktop and web-based programs), and voice assistants. This group is oriented towards development’s practical and technical aspects, providing concrete and applicable solutions. Codes in this group, for example, may include open-source libraries that facilitate the implementation of ethical practices in AI systems. At the same time, datasets improve models’ quality and fairness. The following section presents graphs that illustrate the three macro categories and each type of tool.

4 Results

Our research identified 352 tools (from now on referred to interchangeably as resources) that have made it possible to complete the typology described in the previous section. Each resource was examined and categorized based on the life cycle stage, high-level principle, resource type, task type, and level of development. Other data were also collected, such as the sector that created the tool, year of creation, links of interest, and programming language (applicable to technical tools). The complete data set is shown in Appendix A, and an interactive version of the diagram with the different typologies -still under development- can be seen at https://ricardo-ob.github.io/tools4responsibleai. The main points of the search and framing of the resources found are highlighted below.

Fig. 5
figure 5

Tools by AI lifecycle stage and high-level principle.

Note. The governance principle is not included due to its holistic scope described above. The graph was made according to the occurrence of each principle. An example of a correct graph interpretation is that 92 tools can facilitate compliance with the Explicability principle in the Performance Evaluation stage

Figure 5 illustrates the six stages of the AI life cycle and the five high-level principles (hereafter, stage and principle, respectively). The figure reflects more resources because each stage may present several principles. The above is a distinguishing feature of the resources; they are not monothematic, at least not for the most part, but span numerous principles and stages. In the first stage, many resources focus on beneficence and Non-maleficence, where the AI-based solution’s objectives, scope, and approach are defined. The principles of Non-maleficence and justice maintain a notorious relevance in the other five stages. About 57% of the tools focus on helping to meet these two principles throughout the lifecycle. Table 5 shows the number of principles and stages identified in each tool, showing the most frequent principles (Justice, Non-maleficence, Explicability) and the most frequent stages per tool (Performance Assessment, Planning and Design, Understanding the Business and the Problem).

Figure 6 shows a significant increase in production from 2018 onwards, doubling the output compared to the previous year. This upward trend has persisted, despite a slight decrease in 2021. In contrast, the creation of such tools was considerably more restricted during 2015 and 2016 compared to 2018.

Fig. 6
figure 6

Distribution by year of the resources found in the study

Table 5 AI lifecycle stages and high-level principles identified in the tools

The high-level principles also have associated or low-level principles (see Fig. 7); 26 associated principles were compiled. The main one is the principle of Equity, with 128 tools addressing issues related to the bias in models and data sets. The principles of Interpretability (present in 105 tools) and Explicability (present in 92 tools) are the next in number. The latter two focus on the Performance Evaluation and Model Configuration and Training stages (see Explicability principle in Fig. 5). It can be seen that many low-level principles are summarized in the high-level principles of Justice, Explicability, and Non-maleficence.

Fig. 7
figure 7

Count of low-level or associated principles.

Note. The associated principles were counted by their identifier. A tool may have more than one associated principle depending on the stage at which it is used

Figure 8 shows a count of tools by type (three macro categories as described in the methodology section) for each stage in the life cycle addressed by the tool. It is evident that for the first two stages, there is a significant representation of hybrid and non-technical tools (frameworks, guidelines, reports, articles, among others), while for the following three stages, there is a notable presence of technical tools (mostly code libraries). In the complete cycle, it is evident that non-technical tools predominate, mostly resources that introduce theoretical concepts, principles, requirements, or methodologies for developing responsible AI.

Fig. 8
figure 8

Type of tool according to AI life cycle stage

Fig. 9
figure 9

Tool type according to high-level principle

Fig. 10
figure 10

Tool type count by quantity found

The identification of the types of tools and their presence in the high-level principles can be seen in Fig. 9. The Justice principle contains the highest number of non-technical tools, about 79% concerning the total number of tools in this principle (see principles in Table 5), followed by Non-maleficence with \(\thicksim\)76% and Explicability with \(\thicksim\)75%. It is essential to mention that many technical tools, such as code libraries, are accompanied by research articles (classified as non-technical tools). Therefore, the percentages of technological tools in the three predominant principles (\(\thicksim\)44% in Justice, \(\thicksim\)64% in Explicability, and \(\thicksim\)36% in Non-maleficence) may be at the level of non-technical tools and not correspond to the proportion shown in Fig. 9. Figure 10 shows that the top two resources are scientific articles and code libraries, where 217 resources contain research articles, and 162 are or include code libraries, although these are not mutually exclusive.

Finally, Fig. 11 shows the number of tools by the sector that created them and their level of development, where it can be seen that the academic sector is the primary producer of tools. This sector has 106 tools with a high level of development. The private sector has the most significant number of tools at a high-advanced level (\(n=16\)) compared to the public sector (\(n=12\)), non-profit organizations (\(n=14\)), and the public sector (\(n=1\)). The graph also shows that, although the academic sector is the leading producer of tools, about 39% are at insufficient-high levels. On the other hand, the private sector has \(\thicksim\)22% of its tools in the insufficient-high levels. The Public and Non-Profit sectors are among the lowest producers of tools, with most of their tools at the intermediate-high levels.

It is necessary to clarify that the sectors are presented separately for graphical representation purposes, even though about 12% of the tools were created in alliance with the different sectors. This collaboration between academic, private, and public research is mainly due to companies funding research in universities, private companies with interests in the point of view of non-profit organizations, or academic researchers hired by government institutions.

Fig. 11
figure 11

Number of tools according to the level of development and the sector that created them.

Note. Some tools do not have a development level because the criteria defined for their classification were not applied. The content is too abstract or theoretical, so it would be pertinent to have stricter measures to classify these tools, which is beyond the scope of our research

The research articles taken as a basis for the theoretical development, the tools obtained from databases, citation search, and search engine (Google Search) can be found in Appendix B, Appendix C, and Appendix D, respectively.

5 Discussion

During our research and analysis of results, we observed a rising trend in producing ethical tools. Each resource attempts to solve one or more social and ethical issues at different stages of the cycle, depending on the context of each project. This section discusses the main findings, which are addressed in detail in the following four sections. It concludes with a short assessment of how the compendium of resources should be interpreted and used.

5.1 The academic sector as main tool contributor

Many tools of a ‘high-advanced’ development level are produced by academia, corresponding primarily to research papers, frameworks, and open-source libraries. The sector makes the most tools, followed by the private sector, with about 66% of the resources found (not including collaborations). Among some tools developed exclusively by the academic sector, we highlight the assessment model centered on human rights (Human Rights, Ethical and Social Impact Assessment-HRESIA) proposed by Mantelero (2018), the Medical Algorithmic Audit scheme (Liu et al. 2022), the Dataset Nutritional Label (Holland et al. 2018), the SHAP open-source library (Lundberg and Lee 2017), the DaRe4TAI framework (Thiebes et al. 2021), the Aequitas bias audit toolkit (Saleiro et al. 2018), and the FairSight visual analytics for equity system (Ahn and Lin 2020). These tools have a multidisciplinary and cross-sectoral target audience, so stakeholders are considered. Private sector tools have a higher level of development, albeit in lesser quantities. The last may respond to a concern about strict regulation and ethical washing (de Laat 2021; Floridi 2019) or, on the other hand, to generate value for their companies by obtaining commercial benefits (Mills et al. 2021).

Although the public sector is the primary beneficiary of these resources, it has a low production of ethical tools, and most of them have a below-average level of development. Despite this low production of tools, it is notable that governments in the last six years have increased their initiatives and results in AI regulation (Maslej et al. 2023). Governments must ensure social good by focusing on equity, accountability, sustainability, privacy, and security. Therefore, they see the need to use and produce tools, including responsibly developed AI-based systems, to improve the quality and reliability of bureaucratic processes.

Finally, sectors see a significant advantage in collaborations. We found that the tools created through partnership are mostly at ‘high-advanced’ levels, in contrast to the ‘intermediate-high’ levels developed by a single sector. The most notable resources resulting from cross-sector work are the DEDA dialogic framework (Franzke et al. 2021), the open source library InterpretML (Nori et al. 2019), the AI-RFX contracting framework (Institute yyy), the interdisciplinary framework proposed by Fetic et al. (2020), the HAI dashboard for addressing agile ethics in AI created by Butnaru et al. (2018), the PySyft open source library (OpenMined 2018), and the Evaluation Checklist for Trusted Artificial Intelligence (Ala-Pietilä et al. 2020).

5.2 Gaps in tool types and stages

There is a large gap between the type of tools in the first two stages of the life cycle (see Fig. 8), where non-technical resources stand out, and in the following four stages, technical tools become more relevant. On the other hand, hybrid tools show a constant permanence in the different stages of the life cycle. The absence of technological resources in the first two stages poses a need to develop tools that can be used to understand the problem to be solved and the scope of the project without neglecting the deployment and monitoring stage. Although the resources in the described stages are usually theoretical, implementing automated processes -correctly planned- to assist and evaluate these stages would allow streamlining processes that can be laborious for development teams unfamiliar with social and ethical domains.

The number of tools is also not equally distributed in stages. According to Table 5, the third, fourth, and sixth stages have approximately a quarter of the total tools, meaning that data processing, model training, and model implementation have fewer resources and methodologies available to developers and decision-makers. However, the issues associated with data processing were the first approaches in the development of ethical AI (Butterworth 2018), and its maturity is expected to complement the imbalance in the number of resources.

5.3 Justice, non-maleficence, and explicability as generalized principles

To the principle of explicability, described as the only “all-encompassing principle” in the review by Morley et al. (2020), are added the principles of Justice and Non-maleficence (see Figs. 5 and 9). Therefore, the distribution of high-level principles is also uneven. An extensive literature on AI project requirements (Fjeld et al. 2020; Hagendorff 2020; Jobin et al. 2019) gives evidence of the need for such principles. Figure 7 exposes the associated principles of fairness, interpretability, explicability, reliability, security, privacy, and accountability, among others, as the most recurrent in the tools found.

Various technical resources have recently been developed to audit biases in models and datasets (Hagendorff 2020; Lee and Singh 2021; Mehrabi et al. 2021). These tools allow for identifying biases at individual and group levels through statistical metrics and even allow mitigation during the training stage. In the Justice principle, the most well-represented stages in resource allocation are performance evaluation, data processing, and planning and design, with most of these resources consisting of post-hoc analysis tools. The tools that stand out the most are Amazon’s SageMaker Clarify (Hardt et al. 2021), IBM’s AI Fairness 360 (Bellamy et al. 2018), community-driven Fairlearn and Microsoft (Agarwal et al. 2018), Fairness Indicators created by the TensorFlow community and Google (TensorFlow 2019), the AEKit toolkit (Krafft et al. 2021), the UnBias toolkit (Lane et al. 2018), and FairML (Adebayo 2016).

Fairness, Explainability, and Interpretability remain principles with many technical resources (Guidotti et al. 2018; Namatēvs et al. 2022; Ras et al. 2022). Hagendorff (2020) suggests one possible reason for this is the ease with which the principles can be mathematically operationalized. On the other hand, there is a shortage of tools that aid in upholding principles like data quality, prevention, agency, or inclusion. These principles are highly significant in fostering confidence while dealing with sensitive data and responding to public sector initiatives. Autonomy, a principle in about 10% of all tools, is often neglected. Ensuring individuals are well-informed and free to make their own decisions is crucial. A system based on AI that “limits people’s autonomy will discriminate against them, and if these discriminations are not addressed, they will not be detected nor made visible” (Subías-Beltrán et al. 2022, p. 15).

Although not a principle, governance is found in about 7% of the tools. It is holistic in unifying resources that adhere to the five core principles. It is essential to ensure that AI is developed and used positively for society. It becomes the foundation that underpins integrity and trust in the development and use of AI (Taeihagh 2021). Its inclusion highlights the importance of establishing sound structures to guide the ethical evolution of AI for social welfare and respect for human rights (Mäntymäki et al. 2022).

Ethical aspects such as transparency, accountability, and traceability are present in most non-maleficence stages (see Figs. 5 and 7), which shows that it is necessary and desirable to provide information about the decisions taken (at least the most important ones), who is responsible, and what measures are taken to counteract or mitigate unwanted effects.

5.4 Increase in the process of collecting and classifying resources

In our initial search, no research distinguished between tools, stages, and principles, but we found significant progress on the part of the community. The development of tools to facilitate the search for and selection of ethical tools for projects is becoming increasingly common. We highlight the following compendiums that distinguish between tools, stages, and principles: AI Ethics Tool Landscape (Wenink 2021), the comprehensive Catalogue of Tools & Metrics for Trustworthy AI (OECD 2018), the PLOT4AI threat modeling library and methodology (Barberá 2022), the interactive guide to ethical mitigation strategies proposed by Boyd (2022), and the review of methods and tools by Kaur et al. (2022). In addition, resources that recommend tools at a particular stage (without mentioning principles) or, on the contrary, focus on one principle only are becoming more common (Baxter 2019; BSA 2021; CNIL 2019; Corrêa 2021; NIST 2021; Thoughtworks 2021).

5.5 Final considerations

The proposed conceptual framework could serve as a helpful foundation for generating ethical AI development, as virtue ethics could complement deontology and address its limitations. As previously stated, deontological ethics prioritizes adherence to rules or principles, but these principles lack mechanisms to support their normative claims. Furthermore, their content is highly abstract and does not guide achieving these principles (Hagendorff 2020, 2022b). The principles of deontology serve as external guidelines, while virtue ethics develops internal dispositions that encourage appropriate behavior (Shafer-Landau 2012; Besser-Jones and Slote 2015). This integration would be unproblematic when acting rightly, as a person’s moral virtues would drive authenticity and responsibility from within.

The connection between the two theories has been previously addressed. According to Camps (2015), virtues are necessary to effectively ensure principled ethics or great values function. Although this research pertains to deontological conduct in healthcare practitioners, it has applicability to AI ethics as the five elementary principles of AI are generated from bioethical principles. Furthermore, Camps (Camps 2015, p. 6) affirms that virtues, as they are more related to personal development than abstract principles, values, or norms, serve two crucial objectives. Firstly, fundamental principles are broadened and extended; however, merely recognizing them is insufficient. Proper implementation of these principles is crucial, and this is where virtues prove essential in ensuring these principles are reflected in daily conduct, particularly in challenging situations. Being a competent professional requires acknowledging these ethical principles and having the disposition to behave correctly (Camps 2015; Vallor 2016).

Sganzerla et al. (2022) assert that virtues do not hold inherent value but possess instrumental value, as individuals who exhibit greater virtue are more likely to adhere to rules. The researchers also discuss the recognition by Beauchamp and Childress (who proposed the four bioethics principles) of the necessity of virtue ethics in laying the groundwork for principlism. Without considering virtue ethics, attaining objectives aimed at the betterment of society is rendered more challenging. Furthermore, they not only strengthen principled practice but also often constitute the condition for its correct application, given the variety of circumstances that may arise because principles cannot provide a clear guideline to follow, and it is up to the agent to judge what should be done (Camps 2015, p. 7). In ethical AI development, integrating virtue ethics with deontological ethics can aid stakeholders in creating ethical systems in their design and application (MacIntyre 2007).

On the other hand, our main contribution focuses on the diagram, the resources found, and their correspondence with various ethical principles and stages of development. The diagram is intended as a compendium to facilitate the application of abstract ethical principles. The diagram is hoped to be under constant evaluation, to which the community can contribute. The tool’s usefulness extends to technical professionals and high-level decision-makers, providing an overview of resources that lead to informed decision-making. However, it is crucial to note that simply using one or more tools does not guarantee responsible AI development. It is essential to avoid technical or methodological solutionism, the mistaken belief that complex socio-technical and political problems can be solved or avoided entirely by applying new techniques (Hagendorff 2022a; Santoni de Sio and Mecacci 2021). Although the gap between ethical tools and ethical AI development persists (Morley et al. 2021), our research aims to bridge it by collecting and categorizing resources, some of which are still unknown to the community. However, we acknowledge that a comprehensive solution to this challenge requires a broader and more holistic approach. Our conceptual framework (Fig. 2) aims to address this challenge from a holistic perspective. This approach aims to align the selection of ethical tools with a sound conceptual framework to drive more effective and sustainable ethical development.

5.6 Limitations and future work

The limitations of our research are centered on i) an extensive search for resources and their corresponding search string, ii) the qualitative relationship of ethical principles with the resources found, and iii) the exclusion criteria defined and the search method used. Although we intended to compile a considerable amount of resources, our work was based on too broad search strings, making the filtering and selection process too extensive, with results appearing for the most part from the areas of medicine, economics, or social sciences (outside ethics and AI). Therefore, narrowing and improving the search strings may be a starting point for future work in systematic reviews.

The number of resources found is diverse, ranging from theoretical papers to repositories with code. In some cases, framing the resource/tool with the principles and stage was straightforward (primarily with technical tools). However, because our background is in engineering and computer science, biases could have been introduced during the qualitative analysis to determine which ethical principle applied to the resource. Nevertheless, we believe that the framing and analysis of each resource in the typology was done best and shows valuable resources for developers and decision-makers. Finally, the omission of valuable resources is not excluded from our research. Exclusion criteria such as language, databases, and Google search engine made us overlook valuable resources. Other factors with significant impact are the year, as we did not include 2023 (a year marked with substantial advances in generative AI) and outcomes other than ML, DL, and data.

In future work, we propose more rigorous filtering of the resources found to reduce the resources that do not contribute to a particular principle, stage, or even to the general purpose of responsible AI development. We also propose to create methodologies that complement the typology, such as the tool based on filters, objectives, and stages presented by Boyd (2022), to facilitate the selection of tools for the interested person and not being overwhelmed by the plethora of resources. Another possible future helpful work is that analyzing each resource is an essential step in facilitating the choice of tools. A possible outcome would be a new classification of resources by difficulty of use or implementation since the functioning of each resource needed to be thoroughly inspected.

6 Conclusion

This review has focused on gathering information on the current state of AI ethics, continuing with the creation of a base framework for the responsible development of AI-based solutions, and then focusing on the search for tools, where we were able to find 352 ethical tools that address different areas and have different levels of technological development that facilitate the implementation or putting into practice the myriad of abstract ethical principles. The resources found were classified into different typologies, such as the AI lifecycle stage, ethical principles, applicable tasks, and development sector, among others, to facilitate the search and selection of resources for development groups.

The tools are not evenly distributed in terms of both stages and principles. Many tools focus on the first, second, and fifth stages of the life cycle, and the most frequent principles are Justice, Non-maleficence, and Explicability, with Equity and Explicability being the low-level principles with the most tools (technical and non-technical). The principles of governance and autonomy present a deficit of resources, which are paramount in creating legal frameworks and generating trust among the target audience (Kaur et al. 2022). The academic and private sectors are the leading creators of tools and academia being the sector with the most technologically advanced tools, with the resources of private companies being the most developed.

Finally, the compendium and the classification of resources resulting from the review do not seek to create or provide a single solution for the ethical development of AI-based systems, as this is a technology of a socio-technical nature. Each stage needs to be reflected upon and determine which ethical principles are most relevant (depending on the context). We hope that the ML community, less familiar with ethical issues, will find the tools useful, and emphasise the need for comprehensive academic training to shape the basic virtues of AI and broader dissemination of such resources (Morley et al. 2021).