1 Introduction

Artificial intelligence (AI) has become a critical part of our society, presenting unique advantages and challenges. The ethical implications of AI, including fairness, trust, bias, and transparency are pressing issues that must be addressed. Research has indicated that AI systems can entrench and even exacerbate existing biases in areas such as criminal justice and recruitment processes [1, 2]. Maintaining trust in AI is crucial for ensuring its widespread adoption, but the blackbox nature of these systems can undermine trust [3, 4]. In response to these challenges, calls have been made for the deployment of “fairness-aware” algorithms that take demographic diversity into account and increase transparency in decision-making processes [5].

The integration of diversity and inclusion (D&I) principles in AI has the potential to mitigate the challenges posed by the lack of fairness and bias [6]. Research suggests that diverse teams increase the likelihood of recognizing and addressing biases in AI systems [1]. From a design perspective, diverse teams bring different perspectives on fairness and can identify additional sources of bias in data or algorithms [5]. From a user’s standpoint, involving marginalized communities in AI development can increase the likelihood of the technology being fair and trustworthy for those groups and increase its acceptance among them [7]. Furthermore, ethical concerns for AI technology should also extend beyond privacy and transparency issues to include diversity and inclusion [8]. AI systems should not only benefit from embedding diversity and inclusion principles in their design, development, and deployment but when their development is completed in this manner, they should also be treated as agents of change that could, in turn, improve and accelerate understanding and practices of diversity and inclusion in all aspects of life.

The topics of bias and fairness in AI have received significant attention in recent years. Mehrabi et al. [9] conducted a literature review on the sources of data and algorithm biases in AI applications and the different fairness definitions used to reduce bias in AI. Bertrand et al. [10] conducted an SLR on 37 papers exploring cognitive biases in Explainable AI (XAI) systems, and identified four ways cognitive biases impact XAI-assisted decisions. Another study [11] reviewed 47 articles on fairness in AI algorithms and found a lack of consensus on definitions of AI algorithmic fairness. Obermeyer et al. [12] addressed racial biases through algorithms by providing health and cost predictions for both Black and white patients. Benthall et al. [13] proposed a method for group fairness interventions using unsupervised learning to mitigate racialized social inequality, social segregation, and stratification in machine learning.

In contrast, limited research can be found that has explored the principles of diversity and inclusion (D&I) in AI. To the best of our knowledge no systematic literature review has been conducted on this topic. In this paper, therefore, we fill the above-mentioned research gap and present a systematic literature review that provides the state of the art on AI and diversity and inclusion. Our aim is to explore challenges and solutions (guidelines/strategies/approaches/practices) in the research literature focused on diversity and inclusion in AI (D&I in AI) as well as the applications of AI for enhancing and improving diversity and inclusion practices (AI for D&I). To differentiate “D&I in AI” and “AI for D&I” while extracting challenges and solutions, we followed two different definitions of these two terms. For “D&I in AI”, we followed the definition provided by Zowghi and da Rimini [6]: “inclusion of humans with diverse attributes and perspectives in the data, process, system, and governance of the AI ecosystem”. We defined “AI for D&I” as “the applications of AI systems to enhance the diversity and inclusion practices in environment”.

The main contributions of this SLR include a rigorous search, selection and analysis of 48 articles published in the last six years (2017–2022) on the topic of D&I in AI as well as AI for D&I. We believe that the results of our exploration presented in this paper contribute to a deeper understanding of diversity and inclusion considerations in AI system development and deployment. Our findings from this SLR present:

  • 55 unique challenges and 33 unique solutions about D&I in AI as well as 24 unique challenges and 23 unique solutions about the applications of AI for D&I practices.

  • The number of studies on AI for D&I are significantly less than the number of studies on D&I in AI. Moreover, not all papers that state challenges also propose solution for each challenge.

  • ‘Gender’ is the prominent diversity attribute in AI, whereas other attributes (e.g., race, ethnicity, language) are given less attention.

  • ‘Health’ is the most discussed domain in the literature, whereas other domains such as law, banking, and transportation are ignored in the literature.

  • ‘Facial analysis’ and ‘natural language processing’ are the most discussed types of AI systems to address D&I; other AI systems are ignored such as voice recognition and large language models.

  • ‘Governance’-related challenges and solutions are less discussed both for D&I in AI and AI for D&I.

Paper organization: Sect. 2 describes the background of this research and the related work. Section 3 briefly explains our research method and Sect. 4 reports the findings of this study. We discuss the findings in Sect. 5. Section 6 discusses the possible threats to validity of this research. Finally, the research is concluded with possible future research directions in Sect. 7.

2 Background and related work

AI has emerged as a technological force that is continuously evolving and reshaping various societal structures [14]. In recent years, there has been a heightened focus on the importance of D&I in AI [15], but the literature reveals that D&I concerns are not consistently addressed in AI projects due to the lack of operational tools, and ambiguity around responsibilities in the AI development process [16]. Neglecting D&I can have serious repercussions including harm to users and slowing AI adoption. Therefore, it is crucial for project teams and stakeholders to understand the criticality of D&I in AI. The awareness of D&I in AI will enable them to identify, monitor, and mitigate potential risks and challenges, thereby fostering an AI-literate society that can make informed decisions about the use and participation in AI systems across various contexts.

As the body of AI literature continues to expand, a growing number of traditional and systematic reviews reflects an increased focus on issues related to bias [17, 18], fairness [11, 19], transparency [20], and explainability [21]. This focus arises from the acknowledgment that AI systems have the potential to reproduce and even exacerbate existing societal biases, leading to practices that can be unfairly discriminatory [22, 23]. Bias in AI systems has roots in numerous factors, most notably the utilization of datasets that lack comprehensive representation of the entire society, leading to outcomes that are skewed [22]. Additionally, the homogeneity of AI’s development community, primarily being Western and male, can unintentionally inject biases into the design and programming of AI systems [23]. Addressing this imbalance, there is a growing recognition of diversity and inclusion as critical elements in AI development that can significantly contribute to mitigating these biases [6].

Despite the acknowledged importance of diversity and inclusion, there is a gap in the literature regarding how these principles can be practically implemented in AI systems. Fosch-Villaronga and Poulsen [22] define D&I in AI as a multi-faceted concept that addresses both the technical and socio-cultural aspects of AI. They highlight diversity as the representation of individuals with respect to socio-political power differentials such as gender and race. Inclusion, they suggest, is the representation of an individual user within a set of instances, with better alignment between a user and the options relevant to them indicating greater inclusion. This concept is further analyzed at three levels: the technical, the community, and the user. The technical level considers if algorithms account for all necessary variables and if they classify users in a discriminatory manner. The community level examines diversity and inclusivity in AI development teams, looking at gender representation and diversity of backgrounds. Finally, the user level focuses on the intended users of the system and how the research and implementation process takes into account the stakeholders and their feedback, emphasizing the principles of Responsible Research and Innovation.

Zowghi and da Rimini [6] provide a more detailed and normative definition of D&I within the context of AI and present a set of guidelines for ensuring these principles are incorporated into the AI development process. The authors focus on a socio-technological perspective, recognizing that addressing issues of bias and unfairness requires a holistic approach that considers cultural dynamics and norms and involves end users and other stakeholders. They defined D&I in AI as ‘inclusion’ of humans with ‘diverse’ attributes and perspectives in the data, process, system, and governance of the AI ecosystem. Diversity refers to the representation of the differences in attributes of humans in a group or society. Attributes are known facets of diversity including (but not limited to) the protected attributes in Article 26 of the International Covenant on Civil and Political Rights (ICCPR), as well as race, color, sex, language, religion, national or social origin, property, birth or other status, and inter-sections of these attributes. Inclusion is the process of proactively involving and representing the most relevant humans with diverse attributes; those who are impacted by, and have an impact on, the AI ecosystem context.

According to Zowghi and da Rimini [6], diversity and inclusion in AI can be structured and conceptualized involving five pillars: humans, data, process, system, and governance. The humans pillar focuses on the importance of including individuals with diverse attributes in all stages of AI development. The data pillar highlights the need to be mindful of potential biases in data collection and use. The process pillar emphasizes the need for diversity and inclusion considerations during the development, deployment, and evolution of AI systems. The system pillar recognizes the necessity for the AI system to be tested and monitored to ensure it does not promote non-inclusive behaviors. The governance pillar underlines the importance of structures and processes that ensure AI development is compliant with ethical principles, laws, and regulations.

There is limited literature on how AI can help in enhancing D&I [24,25,26,27], but there is no comprehensive definition in literature to present the concept. D&I in AI, and AI for D&I, create a synergistic cycle of progress that enriches both fields and their potential to effect meaningful change. AI, functioning as a mirror, reflects the patterns and prejudices ingrained in our societies, revealing biases that often go unnoticed. This heightened visibility aids in improving D&I by identifying gaps, promoting awareness, and guiding mitigation strategies. On the flip side, the integration of D&I within AI’s development process is equally critical. A diverse team of creators and evaluators can identify, understand, and correct underlying biases, resulting in more equitable and inclusive AI systems. Thus, D&I and AI form a continuous, self-enhancing cycle: the use of AI advances D&I, while fostering D&I within AI development ensures more holistic, fair, and representative AI systems.

Even with these insights, many existing AI ethics guidelines remain narrowly focused on fairness, justice, and non-discrimination, with a heavy lean toward compliance-based procedures [28]. Furthermore, there is an evident gap in initiatives that aim to directly impact AI actors’ behaviors and foster diversity, equity, and inclusion (DEI) awareness [29]. In terms of inclusivity, it is pertinent to note that the global discourse on AI often lacks voices and perspectives from the Global South and other underrepresented groups, with a marked dominance of Western perspectives [30]. This imbalance affects the development of ethical AI standards and calls for more inclusive practices and deeper consideration of power structures in AI policy formulation [31,32,33].

Despite the increased awareness of these concerns, there remains a dearth of comprehensive understanding in current research addressing these critical areas. Hence, the urgent need for a systematic literature review that investigates diversity and inclusion in AI. This approach will provide a comprehensive evaluation and synthesis of all existing research on this topic, which traditional literature reviews may fail to capture in their entirety. Consequently, it will help identify the current state of the art, define challenges and solutions, and shape future research directions, thereby addressing this critical gap in the literature.

3 Methodology

This study aims to explore and gain a comprehensive understanding of diversity and inclusion in the context of artificial intelligence and the use of artificial intelligence for diversity and inclusion from the published research literature. Our research was guided by the following two research questions.

RQ1. What challenges and solutions are found in the literature about diversity and inclusion in AI (D&I in AI)?

RQ2. What challenges and solutions are found in the literature about the applications of AI for diversity and inclusion practices (AI for D&I)?

Fig. 1
figure 1

An overview of the research method

We conducted a systematic literature review (SLR) in accordance with the guidelines established by Kitchenham et al. [34] to address the research questions. This approach was chosen to comprehensively identify, evaluate, and interpret existing research in this under-explored area [34]. These guidelines have also undergone numerous reviews and revisions by software engineering communities, thereby enhancing their robustness. The protocol for systematic review for this paper has also been assessed by two SLR experts, who made revisions to comply with the reliability and replicability requirements of systematic reviews. The second and third authors of this paper further augment its credibility, with their extensive experience in conducting SLRs and their publication of highly cited systematic review papers.

The methodology of the SLR is outlined in Fig. 1. The preparation stage of the SLR involved the development of a background understanding of diversity and inclusion (D&I) in AI, the formulation of research questions, the creation of an SLR protocol, and the validation of the protocol through a pilot study. The paper selection summary for the pilot study and the main study (primary search and secondary search [35]) is shown in Fig. 2. As a result of a rigorous search and selection process, we finally identified 48 papers that satisfied inclusion/exclusion criteria and are relevant to D&I in AI or AI for D&I.

Fig. 2
figure 2

SLR paper selection summary

To ensure the validity of the data extraction process and the relevance of the search keywords, a pilot study was conducted at the outset of the process. The search string in the five digital libraries (ACM Digital Library, IEEE Xplore, Science Direct, Scopus, and Google Scholar) was formulated using the three primary keywords relevant to the research questions: “artificial intelligence”, “machine learning”, “diversity and inclusion” (see Appendix B). In order to guide the selection of studies, clear inclusion and exclusion criteria were established. The inclusion criteria were: “papers on diversity and inclusion in AI or AI for diversity and inclusion”, “papers in the form of peer reviewed published scientific papers (journal/conference)”, and “papers published in 2017–2022”. The exclusion criteria were: “papers not related to diversity and inclusion in AI or AI for diversity and inclusion”, “literature review paper”, “Tutorial/workshop paper/ArXiv paper/magazine article/book/book chapter”, “Master/Ph.D. dissertations”, “conference version of a study that has an extended journal version”, “papers not written in English”, “full papers unavailable online”, and “papers already covered in the pilot study”. We included papers from 2017–2022, as we did not find many relevant papers prior to 2017. Moreover, we identified only one relevant paper in 2017, whereas the majority of the studies published in 2022.

Considerations for D&I in AI or AI for D&I as a stand-alone topic of research are scarce in the literature. We experimented with including the terms “bias” and/or “fairness” in our search string which resulted in a very large number of papers. For example, ACM digital library returned 92 research articles on diversity and inclusion in AI (2017–2022). When we added “bias” or “fairness” to the search string, it returned 669305 articles.Footnote 1 To ensure the feasibility of the SLR, we decided to narrow the scope and remove “bias” and “fairness” from the search string. Equally, the keywords “dataset”, “training”, and “developer” were not incorporated into our research, although they could potentially yield results providing greater insight into the AI system development mechanisms. Nevertheless, these keywords often lead to papers that do not essentially focus on D&I. Additionally, considering the study’s scope and feasibility, these keywords were deliberately excluded to prevent the return of an overwhelming number of irrelevant articles.

3.1 Primary search

In the pilot study, we used the keyword “diversity and inclusion” which was restricted to the papers that were based on both “diversity” and “inclusion”. After several rounds of discussion among the authors, we decided to include all the papers on “diversity” OR “inclusion”, so that no paper was left out which worked on either diversity or inclusion in AI. Therefore, we developed the final search string for our main study using the three main keywords (“diversity”, “inclusion” and “artificial intelligence”) and their corresponding alternatives. For example, we used “machine learning” as an alternative to “artificial intelligence”. Similarly, we used two alternatives of the keyword, “inclusion”: “inclusive” and “inclusiveness”. The primary search was carried out with this search string in four digital libraries: ACM Digital Library, IEEE Xplore, Science Direct, and Scopus. We also applied the same search string in Google Scholar, but it provided the papers which were already covered in the above-mentioned four digital libraries. The search string was customized depending on the interfaces of different digital libraries. The details of the primary search protocol and the search results are shown in Appendix B.

After eliminating duplicates in the primary search, a total of 184 papers underwent a rigorous application of the study selection criteria on the abstracts, resulting in a selection of 34 relevant papers (Fig. 2). The next stage of selection process was guided by the principle of investigator triangulation [36], where all the authors read the 34 abstracts independently and made decisions on inclusion/exclusion. Finally, they discussed their opinions and agreed on the final selection of 19 papers which later underwent a selection process by reading the full papers. Then, the first author carefully evaluated the full text of each of the included studies and excluded 5 papers, as they were found to be irrelevant to the research questions (such as diverse literature, diverse algorithms, diverse technology). Finally, 14 papers were selected from the primary search for data extraction.

3.2 Secondary search

The secondary searches involved a manual examination of the titles of the references listed in the selected pilot and primary studies. In addition, a manual scan was performed on the proceedings of two most frequent conferences where the pilot and primary studies were published: ACM Conference on Fairness, Accountability, and Transparency and AAAI/ACM Conference on AI, Ethics, and Society. After removing the duplicate papers and the papers already covered in the pilot study and primary search, we came up with a total of 237 papers (110 from the reference list and 127 from the conference proceedings). Then, study selection criteria were applied to the abstracts to yield 95 papers from the secondary search. Investigator triangulation was also met to validate our selection. Then, the first author evaluated the full text of each of the included studies and excluded 69 papers from the secondary search due to their irrelevance to our research objectives, despite appearing promising from their abstracts. Finally, we selected 26 papers from the secondary search (10 from the reference list and 16 from conference proceedings). This provided a total of 48 papers for this SLR for data extraction (see Fig. 2 and the full list of included papers in Appendix 1).

3.3 Quality assessment

To assess the quality of the selected papers, we employed the five-question assessment criteria proposed by Liu et al. [37]. These questions assess the clarity of research aims, appropriateness of research design, clarity of findings and contributions, description of limitations and future work, and empirical nature of the study. Each question was evaluated on a scale of 0 to 1, with 0 indicating “no”, 0.5 indicating “partly”, and 1 indicating “yes”. The overall quality score was calculated by summing the scores of the five questions, and the papers were classified as Good: if the score is between 3 and 4, Fair: if the score is between 2 and 3, Poor: if the score is between 0 and 2. Out of the 48 selected papers, 32 were deemed “Good” quality, 11 were “Fair” quality, and 5 were “Poor” quality, demonstrating the robustness of this review.

3.4 Data extraction

Excel spreadsheet and NVivo software were used to extract demographic and content-related data from the 48 selected papers on D&I in AI and AI for D&I. The demographic data included the source of the paper, title, abstract, authors, affiliated countries of authors, year of publication, venue, and citation. Content-related data included the challenges faced to address D&I in AI and AI for D&I, and the proposed/used solutions (guidelines/ strategies/ approaches/ practices) to those challenges. The data were extracted through manual coding by the first author and cross-checked in weekly meetings with the other authors.

3.5 Data synthesis and analysis

The data synthesis and analysis for RQ1 and RQ2 is outlined in Fig. 1. To answer RQ1 and RQ2, the first author employed open coding to identify the challenges about D&I in AI and AI for D&I, as well as the proposed guidelines/ strategies/ approaches/ practices to address the challenges. The first author established intra-rater reliability by revisiting and cross-checking all the papers and coded data multiple times. All the authors checked the challenges and solutions to ensure inter-rater reliability and had several iterations of discussions to finalize them. Throughout this process, the first author went back to the papers several times to validate the established findings. The solutions were then mapped with the challenges for each paper to get a comprehensive understanding of what guidelines/ strategies/ approaches/ practices are taken for a specific challenge. The initial mapping analysis was undertaken by the first and second authors independently, with each of the contributing authors involved in the review process. The results were finalized after numerous revisions through iterative discussions among all the authors, enabling a consensus to be reached on the final mapping outcome.

4 Results

This section presents the results of the systematic literature review starting with the demographics of our selected 48 studies. We further present the extracted challenges of addressing diversity and inclusion in AI (D&I in AI) and enhancing diversity and inclusion practices in the environment through AI (AI for D&I), as well as the mentioned solutions to address the challenges.

4.1 Demographics

Demographics covers a range of elements, including the publication year, citation count, whether the studies were empirical or theoretical in nature, the attributes of diversity addressed in each paper, as well as the affiliated countries of first authors.

Fig. 3
figure 3

Year of publications and citations of the selected 48 papers

The publication year and citations of the 48 selected papers are depicted in Fig. 3. The data reveal that the majority of the papers (18) were published in 2022, followed by 11 papers published in 2021. Only one paper was published in 2017. This trend suggests that the field of D&I in AI is relatively new, and further research in this area is needed. With regards to citations, Fig. 3 reveals that although we only covered last six years, five papers received more than 100 citations, while two papers received 51–100 citations. We also identified the number of empirical and non-empirical studies among the 48 selected studies. 30 of the selected studies are empirical and 18 are non-empirical.

The attributes of diversity analyzed in the selected studies, such as gender, age, and race, are depicted in Table 1. We differentiated the terms “gender” and “sex” in this table based on the terms used in the selected studies. According to Walker et al., “Sex refers to the anatomical or chromosomal categories of male and female. Gender refers to socially constructed roles that are related to sex distinctions” [38]. The results suggest that the majority of the papers focus on gender (23 papers). There are also a good number of papers (15) on race, leaving room for further research on other attributes of diversity, such as age, sex, disability, neurodiversity, geographic location, skin tone, language, and ethnicity.

Table 1 Attributes of diversity and their corresponding paper IDs
Fig. 4
figure 4

Affiliated countries of first authors of the selected papers

We also explored the ratio of affiliated countries of the first authors of the selected 48 studies which is presented in Fig. 4. The presence of United States of America (USA) is the maximum (29 out of 48), which reveals that the majority of the D&I in AI or AI for D&I-related work has been conducted in USA. Three of the first authors are affiliated with United Kingdom (UK), two are affiliated with China, Canada, and Belgium each. Rest of the countries have only one occurrence each such as Thailand, Netherlands, Turkey, Qatar, India, Ireland, Japan, Sweden, Australia, and Germany. This findings reveal that diversity and inclusion in AI is the limited explored research area worldwide except USA. Therefore, this area should be focused more in future research.

4.2 RQ1: challenges and solutions about diversity and inclusion in AI (D&I in AI)

Table 2 presents the list of challenges about D&I in AI with their corresponding challenge IDs and paper IDs. We identified 55 unique challenges. Among the selected 48 papers, we identified challenges about D&I in AI from 36 papers. We also identified 33 unique solutions to address some of those challenges as shown in Table 3. Among the total of 48 papers, 23 papers discussed the solutions to the specific challenge mentioned. We also mapped the challenges with their corresponding solutions for each of the papers as presented in Appendix C. Some illustrative quotations on challenges and solutions for RQ1 as well some illustrative quotations on their mapping are presented below.

Table 2 Results of RQ1: challenges about diversity and inclusion in AI
Table 3 Results of RQ1: solutions to address the challenges of diversity and inclusion in AI

Illustrative quotations on challenges.


Challenge C14: (Lack of diverse race, ethnicity, sex and gender inclusion and representation in the design, development, and implementation of AI system). “Lack of consideration for race, ethnicity, sex and gender in the design, development, and implementation of AI system in healthcare can lead to marginalization of underrepresented groups from benefiting from such technologies.”- S10


Challenge C15: (The lack of Equity, Diversity, and Inclusion (EDI) principles and indicators). “The lack of EDI principles and indicators, for example, the presence of sex/gender, and racial/ethnicity bias in healthcare can be defined as differential medical and healthcare delivery and treatment of men, women, non-binary people and one race (dominant) over the others, the impact of which may be positive, negative, or neutral.”- S10


Illustrative quotations on solutions.


Solution L10: (Consider purpose and definition of gender before gender classification in facial analysis or image labeling). “Before embedding gender classification into a facial analysis service or incorporating gender into image labeling, it is important to consider what purpose gender is serving. Furthermore, it is important to consider how gender will be defined, and whether that perspective is unnecessarily exclusionary (e.g., binary).”- S11


Solution L9: (Develop policies to prevent discriminatory and nonconsensual gender representations in FA (Facial Analysis) systems). “Establishing policies for how biometric data and face and body images are collected and used may be the most effective way of mitigating harm to trans people-and also people of marginalized races, ethnicities, and sexualities. Polices that prevent discriminatory and non-consensual gender representations could prevent gender misrepresentation from being incorporated into FA systems in both the data and infrastructure by regulating the use of gender as a category in algorithmic systems. For example, by banning the use of gender from FA-powered advertising and marketing.”- S11


Illustrative quotations on mapping of challenges and solutions.


Challenge C52: (Racial categories are ill-defined in computer vision systems). “Racial categories are ill-defined, arbitrary and implicitly tied loosely to geographic origin. Second, given that racial categories are implicitly references to geographic origin, their extremely broad, continent-spanning construction would result in individuals with drastically different appearances and ethnic identities being grouped incongruously into the same racial category if the racial categories were interpreted literally. Thus, racial categories must be understood both as references to geographic origin as well as physical characteristics.”- S41


Solution L32 to address C52: (Adopt fair computer vision datasets with different racial categories). “We empirically study the representation of race through racial categories in fair computer vision datasets, and analyze the crossdataset generalization of these racial categories, as well as their cross-dataset consistency, stereotyping, and self-consistency.”- S41


We have further analyzed to explore the findings from the mapping of challenges and solutions. According to Fig. 5(a), nearly half of the challenges (26) have no associated proposed solutions (e.g., C1, C3, C6, C23). 21 challenges have one solution each and the rest of the 8 challenges have more than one solutions. Figure 5(b) shows the number of papers that have challenges with and without solutions. Among the 36 papers that identified the challenges about D&I in AI (see Appendix C), 13 papers discussed about challenges with no solutions such as S1, S21, S42. The rest of the 23 papers discussed the challenges with possible solutions such as S2, S8, S23. Figure 5(c) presents the number of papers with different numbers of challenges. More than half of the papers (21 papers) have only one challenge each (e.g., S1, S7, S47). The rest of the 15 papers mentioned more than one challenges such as S2, S10, S29. The last pie chart (see Fig. 5(d)) shows the ratio of papers with no solution, one solution and multiple solutions. Majority of the papers (14 papers) have one solution each (e.g., S7, S16, S40). However, there are a large number of papers (13 papers) which did not propose any solution at all such as S29, S36, S46. On the other hand, 9 papers proposed more than one solution for the challenges such as S2, S15, S22.

Fig. 5
figure 5

Analysis of the findings of RQ1

According to Appendix C, the paper S10 discussed the maximum number of challenges (8) about D&I in AI such as “Lack of Equity, Diversity, and Inclusion (EDI) principles and indicators”, “Under-representation of minority groups in sampling during model training and testing”. The paper S17 also discussed a large number of challenges (7) such as “Bias in the training data sets” and “Lack of comprehensive and accurate collection and generation of demographic data”. The paper S22 provided the maximum number of solutions (4) to address the challenges about D&I in AI. For example, “Enhance diversity-oriented design capacity to increase inclusiveness of diversity requirements” and “Establish a user-centered machine learning system based on user and context features”. Similarly, three papers (S2, S15, S27) provided three solutions each to address the challenges about D&I in AI. Some of the papers provided multiple solutions for one challenge. For example, S22 provided three solutions for the challenges C35. Similarly, the papers S15, S23, S25 and S39 provided two solutions each to address one challenge (C24, C36, C22, C50, respectively).

We also identified some of the challenges which have been mentioned by more than one paper (see Table 2). For example, C11 and C18 have been mentioned by four papers. Similarly, three papers discussed each of the challenges of C7, C12, and C17. Similar to the challenges, three solutions (L3, L8, L25) have been discussed by multiple papers (see Table 3).

4.3 RQ2: challenges and solutions about the applications of AI for diversity and inclusion practices (AI for D&I)

20 out of 48 papers focused on the applications of AI for enhancing D&I practices (AI for D&I). Table 4 presents the list of 24 unique challenges about AI for D&I. The solutions to address the challenges with corresponding paper IDs are presented in Table 5, where we identified 23 solutions. The mapping of challenges with their corresponding solutions for each paper is shown in Appendix D. Some illustrative quotations on challenges and solutions for RQ2 as well some illustrative quotations on their mapping are presented below.

Table 4 Results of RQ2: challenges of the applications of AI for diversity and inclusion practices
Table 5 Results of RQ2: solutions of the challenges of AI applications for diversity and inclusion practices

Illustrative quotation on challenge.

Challenge H6: (Underrepresented genders are not acknowledged by gender classification systems). “When classifying gender, designers of the systems we studied chose to use only two predefined demographic gender categories: male and female. As a result, these presentations are recorded, measured, classified, labeled, and databased for future iterations of binary gender classification.”- S11


Illustrative quotation on solution.

Solution N1: (Use Betaface (Betaface.com) facial analysis software to determine the diversity attributes). “To determine the rates of diversity within departments, Betaface facial analysis software was used to analyze photos taken from the hospitals’ websites. This software was able to determine the race, ethnicity, and gender of the care providers.”- S3


Illustrative quotations on mapping of challenges and solutions.

Challenge H16: (Less accuracy of facial recognition technology to identify non-binary gender). “This work which positions transgender faces as problematic to facial recognition accuracy, also raised ethical issues related to user privacy as the data for the database was scraped from transgender individuals’ videos without their consent or knowledge.”- S27


Solution N17 to address H16: (Train automatic gender recognition (AGR) with social and ethical implications). “While AGR technology is still in its infancy, the recent integration of facial recognition into already pervasive technologies suggest it could impact large numbers of people in the near future. As technologists continue to develop AGR applications, it is important to understand the social and ethical implications of widespread adoption.”- S27

Fig. 6
figure 6

Analysis of the findings of RQ2

We have also explored additional findings from the mapping of challenges and solutions for RQ2, which is presented in Fig. 6. As shown in Fig. 6(a), the majority of the challenges (16) have one solution each such as H1, H10, H24. On the other hand, four challenges have no solution at all (H3, H5, H13, H19) and another four challenges have more than one solutions (H2, H7, H8, H12). Out of 20 papers on AI for D&I, majority of the papers (17 papers) provided both challenges and solutions on the applications of AI for enhancing D&I practices (see Fig. 6(b)). Three papers discussed challenges without any solutions. According to Fig. 6(c), two papers (S11, S38) discussed more than one challenge, whereas the rest of the 18 papers provided only one challenge each. S11 discussed the maximum number of challenges (4). As shown in Fig. 6(d), majority of the papers (14 out of 20) provided one solution each to address the challenges related to AI for D&I. On the other hand, three papers did not propose any solution and another three papers provided multiple solutions for the challenges.

4.4 Diversity attributes

Figure 7 illustrates the ratio of diversity attributes (e.g., age, gender, race, ethnicity) discussed in the challenges and solutions about D&I in AI (RQ1). Majority of the challenges (56%) and solutions (54%) did not mention about any attributes at all. Gender has the maximum occurrences (25% for challenges and 23% for solutions). Race is the second highest attribute that was discussed in 7% of the challenges and 14% of the solutions.

Fig. 7
figure 7

Ratio of diversity attributes in challenges and solutions for RQ1

Fig. 8
figure 8

Ratio of diversity attributes in challenges and solutions for RQ2

Figure 8 illustrates the ratio of diversity attributes (e.g., age, gender, race, ethnicity) mentioned in the challenges and solutions about the applications of AI for D&I practices (RQ2). According to Fig. 8(a), gender and disability are the two attributes that have the most occurrences (21% for gender and 17% for disability) in challenges. On the other hand, ethnicity, skin tone, neurodiversity, and geographic location have the least occurrences. Figure 8(b) shows that majority of the solutions (65%) do not indicate any attribute explicitly. However, gender has the maximum occurrences (18%), whereas skin tone and neurodiversity have the least occurrences.

5 Discussion and implications

5.1 Highlights of the results

‘Gender’ as the most discussed attribute of diversity. Our analysis reveals that gender has been the top explored diversity attribute, in 23 out of the 48 papers (refer to Table 1). As for the other dimensions of diversity, 15 papers delved into race, 6 investigated age, 4 explored disability, 3 looked into ethnicity, 2 touched on geographic location, while sex, neurodiversity, skin tone, family income and insurance status, and language each took the spotlight in only a single paper. Moreover, when considering the challenges and solutions pertaining to D&I in AI and AI for D&I, gender has been the predominant topic of discussion (see Figs. 7 and 8). For addressing D&I in AI (RQ1), gender was the focus in 15 out of 55 challenges and 8 out of 33 solutions. Similarly, in the context of enhancing D&I practices through AI (RQ2), 6 out of 24 challenges and 4 out of 23 solutions emphasized gender. The other diversity dimensions are largely overlooked. Recent studies [39] have shed light on the challenges women face in AI, including bias, discrimination, a lack of self-confidence, inadequate resources and support, and limited exposure to AI in early education. Another study [40] highlighted the difficulties faced by gender classifiers in recognizing non-binary genders. Despite these studies, there exists a dearth of research addressing other facets of diversity, like age, disability, race, ethnicity, and language. None of our included 48 studies worked on four of the attributes of diversity which was mentioned in the Article 26 of the International Covenant on Civil and Political Rights (ICCPR): religion, birth or other status, property, and national or social origin. Some recent federal laws of different countries such as Australian discrimination law [41] also discussed diversity attributes. None of our selected papers focused on many of the diversity attributes in Australian federal laws on discrimination such as religion, political opinion, and marital status. This underscores the necessity for more extensive investigations into these areas and the necessity to consider a broader spectrum of diversity, especially the notion of intersectionality, in both AI research and practice.

‘Health’ as the most discussed domain. Some of the included studies worked on D&I in AI and AI for D&I for some specific domains such as health, workplace, and education. Figure 9(a) shows the ratio of different domains mentioned in the selected papers. More than half of the papers (51%) do not focus on any specific domain, rather they discussed diversity and inclusion in AI in general. However, ‘health’ is the most discussed domain, 23% of the papers focused specifically on this domain. The second highest is ‘workplace’ (16%). Only a small number of papers mentioned about other domains such as ‘education’, ‘research’, ‘museum’, and ‘art’. As many important domains such as law, banking, and transportation were not focused in any of the paper, more research is needed.

Fig. 9
figure 9

Ratio of different domains and types of AI systems

‘Facial analysis’ and ‘natural language processing’ as the most discussed type of AI system. Fig. 9(b) illustrates the ratio of different types of AI systems, which were discussed in the selected studies of this SLR. Majority of the papers (68%) did not mention any specific AI systems. Similar number of papers (10%) discussed about facial analysis system and natural language processing system. 6% papers focused on computer vision system, 4% on image processing, and 2% on automated gender recognition system. Other types of AI systems must be studied with the lens of D&I such as voice recognition and large language models.

Global North as predominant region on D&I in AI concept. Being a societal construct, the notions of diversity and inclusion often do not play significant roles in many countries. Many equity, diversity, and inclusion (EDI) policies, initiated in the Global North, address this constructive concept by promoting enhanced representation of Black, Asian, and Minority Ethnic groups within the workforce [42]. However, the Global South does not showcase a similar predominant focus on the concept of diversity and inclusion. Therefore, the field of D&I in AI research exhibits a notable deficiency in geographic diversity, particularly from regions in the Global South. This deficiency results in an insufficient appreciation of various diversity attributes such as language, ethnicity, race, and nationality within the AI ecosystem. Furthermore, the challenges and solutions we have identified based on diversity and inclusion within the AI ecosystem do not adequately represent the unique conditions prevalent in the global South. Consequently, this infers that the specific challenges and solutions pertaining to D&I in AI or AI for D&I, within the context of the global South are yet to be distinctly recognized and documented. Therefore, this represents a substantial gap in this research area, highlighting an urgent requirement for significant improvement.

The correlation between authors’ geographic locations and the progression of AI development. USA is the pioneer of AI development [43] and the affiliated geographic location of majority of the authors is also USA. Therefore, it can be argued that the geographic location of the researchers is directly proportional to the location leading for AI development. However, this assertion lacks empirical evidence. For example, China also holds a dominant position in AI development [43], but they are noticeably behind in D &I in AI research. Therefore, this area should be focused more in future research to develop a comprehensive understanding on the issues related to D&I in AI.

Lack of solutions to address D&I in AI. Number of solutions are less than the number of challenges about D&I in AI (55 challenges, 33 solutions). Figure 5(a) also shows that 26 out of 55 challenges have no solution to offer at all. Similarly, Fig. 5(b) shows that 36% papers do not have any solution, whereas all of the papers discussed challenges. Moreover, a large number (18 out of 48) of selected studies are non-empirical. This implies that proposed solutions are not implemented or validated in real settings.

The area of diversity and inclusion in artificial intelligence is a relatively new and less-explored field, with a limited number of studies undertaken. Consequently, there are fewer identified solutions for the challenges linked to D&I in AI or AI for D&I. This scenario is further amplified by minimal awareness about D&I-related issues within AI, leading to a scarcity of solutions to tackle such challenges. Furthermore, D&I principles have not been widely implemented within AI systems, contributing to the limited understanding among researchers and practitioners on how to mitigate associated challenges. The lack of related research emerging from the Global South also plays a part in the deficit of solutions for the D&I challenges rising within this geographical region. The collective impact of these issues underscores the urgent need for further evidence-based intensive research in this area to recommend more solutions to address the challenges. While the existing literature offers solutions for some of the D&I issues in AI, not every challenge has been addressed. AI researchers and developers can leverage these identified gaps to concentrate more on proposing solutions for the challenges presented in addressing D&I in AI and AI for D&I. In an effort to enhance collective problem-solving, we have plans to publicly share the existing challenges identified and their correlated solutions for the benefit of larger audiences facing similar issues.

Insufficient collaborations between developers and researchers. The diversity of AI system developers is a critical factor. If the people who develop AI systems lack diversity, it is likely that the resulting AI systems will mirror this homogeneity. On the other hand, while researchers studying these systems may identify issues related to D&I in AI or AI for D&I, proposing solutions may be challenging since they are not directly engaged in the AI development process. This difficulty could contribute to the relative scarcity of solutions compared to the identified challenges in D&I within AI. However, if the researchers come from diverse backgrounds, they could leverage their varying perspectives to interpret challenges and propose potential solutions. By fostering collaborative relationships between diverse AI developers and researchers, their combined skill sets can be utilized to uncover more D&I issues in AI and propose tailored solutions. This collaboration could ultimately lead to the development of AI systems that are both diverse and inclusive. However, our SLR has not given a clear evidence and positive indication whether this issue has an impact on the lack of solutions to address the challenges of D&I in AI or AI for D&I. We hope that our paper will serve as a bridge, connecting individuals and prompting more widespread exploration of challenges and potential solutions in this field.

Limited research on AI for D&I. Our literature review shows that the majority of the selected papers (36 out of 48) discussed the challenges and some corresponding solutions to address D&I in AI. On the other hand, a few papers (20 papers) discussed the challenges and solutions to enhance D&I practices by AI (AI for D&I). Similarly, the number of solutions to consider D&I in AI is higher that the number of solutions to address AI for D&I. The findings indicate that AI researchers are aware to address D&I in AI, whereas AI for D&I has taken limited attention. Although some recent studies worked on enhancing D&I practices in workplace through AI [14, 25] and enhancing D&I practices in automated gender recognition systems [40, 44], further research needs to be conducted for more comprehensive understanding on AI for D&I.

Low hanging fruits. Our results revealed that various challenges could be tackled immediately with regard to diversity and inclusion in AI. For instance, including the perspectives of marginalized communities, such as individuals with disabilities and the elderly, in the development process, can support more representation in the training data [31, 45]. This can address various challenges, including the “Under-representation of minority groups in sampling during model training and testing” (S10, S25, S42), “Certain communities’ voices are disregarded and not uplifted in AI practice” (S9, S36, S37, S44), “Lack of comprehensive and accurate collection and generation of demographic data” (S17), “Overlooking disability considerations in ethical or legal levels of AI algorithms” (S38). Additionally, promoting diversity in the recruitment of AI development teams and among researchers can help combat unconscious biases [39, 45,46,47]. Raising awareness and promoting education about diversity, equity, and disparities in AI can assist mitigating the knowledge gap about the people, places, and factors that make up the data [39, 47].

5.2 Five pillars of diversity and inclusion in AI

According to Zowghi and da Rimini, the definition of D&I in AI consists of five pillars: Humans, Data, Process, System, and Governance [6]. We categorized our findings under the five pillars to explore the coverage of the challenges and solutions from this SLR within AI ecosystem and the pillars. We used these pillars for cross analysis and applied thematic coding on the findings for RQ1 and RQ2 to structure the challenges and solutions for D&I in AI and AI for D&I under the five pillars. It should be noted that the challenges and solutions for RQ1 and RQ2 are not necessarily mutually exclusive in relation to the five pillars. Therefore, many of them are listed under more than one pillar. This process was conducted independently by all of the authors and one external expert. An iterative series of discussions were conducted between all authors and the external annotator to ensure that the findings for answering RQ1 and RQ2 were accurately represented under their corresponding pillars. As all of the annotators have previous experience and expertise to this area and they analyzed the challenges and solutions from different disciplinary lens, we did not disregard any of their opinions. Therefore, we took the larger set which means we combined all the pillars categorized by all the annotators. The findings of our analysis are shown in Appendix E. Some examples of challenges and solutions for RQ1 and RQ2 under the five pillars are given below.

Humans: (C11) Certain communities’ voices are disregarded and not uplifted in AI practice. - S9, S36, S37, S44

Data: (L6) Adopt data disaggregation by demographic groups. - S8

Process: (H15) Difficulties in identifying corresponding design patterns by machine learning technology after changing design requirements and problems. - S22

System: (N1) Use Betaface (Betaface.com) facial analysis software to determine the diversity attributes. - S3

Governance: (C15) Lack of Equity, Diversity, and Inclusion (EDI) principles and indicators. - S10

The frequencies of different pillars for the challenges and solutions for RQ1 (D&I in AI) and RQ2 (AI for D&I) are illustrated in Fig. 10. The findings revealed that Human, not surprisingly, has the maximum occurrences for the challenges about D&I in AI (RQ1) (see Fig. 10(a)). However, Process is the highly addressed pillar in solutions to address the challenges about D&I in AI (see Fig. 10(b)). System is the most occurred pillar for both challenges and solutions for AI for D&I (RQ2) (see Fig. 10(c and d)), whereas System was mentioned less for RQ1.

On the other hand, we identified the least number of Governance related challenges for both RQ1 and RQ2. Data related challenges are also minimum for RQ2, whereas many challenges mentioned about Data for RQ1. Surprisingly, Human was mentioned the least for the solutions for RQ2.

Fig. 10
figure 10

Frequencies of five pillars for the challenges and solutions for D&I in AI (RQ1) and AI for D&I (RQ2)

As presented in Appendix E, the literature is limited in its coverage of diversity and inclusion in relation to governance of AI systems. Only a small number of studies mention the governance-related challenges and solutions associated with addressing D&I in AI and AI for D&I, such as “lack of Equity, Diversity, and Inclusion (EDI) principles and indicators (C15)”, “Integrate EDI (Equity, Diversity and Inclusion) and racial justice principles and practice in AI health (L22)”, “Disability is not widely studied in mitigation of bias in AI algorithms on ethical, legal or technical levels (H21)”, “Use AI to adopt fairness standards (N3)”. This is likely due to the fact that establishing D&I principles and standards for AI systems often requires long-term planning, whereas addressing the challenges associated with humans, data, process, and systems can be addressed in less time. Therefore, it is crucial that policymakers be made aware of the importance of D&I in AI to establish adequate plans for AI governance (such as standards, regulations, and policies) and principles to address these issues.

5.3 Implications for inclusive AI systems development

In recent years, the importance of diversity and inclusion in AI and the corresponding have become increasingly acknowledged by researchers. Many challenges to address D&I and AI for D&I have been discussed in literature with various proposed solutions. One key solution is to raise awareness and provide training on cultural competency and algorithmic vigilance [39, 48]. This could help address socio-cultural norms, human biases, and stereotypes that may be embedded within AI systems [39, 47]. Another solution involves mitigating bias from job descriptions and resumes through training AI systems to disregard certain demographic information, such as age, gender, and race, while assessing profiles [46].

Inclusive design practices have also been suggested as a way to address D&I in AI. This could involve adopting participatory design processes that involve diverse communities in the data collection and design process [31]. Another approach involves combining inclusive design tools and methods with machine learning technology to changes design requirements and identify corresponding design patterns [49]. Additionally, policy makers have a crucial role to play in addressing D&I in AI. One suggestion is to establish more explicit policy documentation to ensure transparency on the policies [45].

Although we extracted and presented paper-wise solutions to address the challenges of D&I in AI and AI for D&I, some solutions from different papers could address a specific challenge. For example, the challenge, “Underrepresented genders are not acknowledged by gender classification systems (H6)” identified from the paper S11 and S13, could be addressed by the solutions from different papers such as “Train automatic gender recognition (AGR) with a variety of gender identities early in the design process, by working with diverse teammembers and adopting participatory design approaches to identify non-binary gender (N17)” (S27). This, along with other solutions, can help to ensure that AI systems are designed and developed in a manner that is inclusive and equitable for all.

6 Threats to validity

Limitations. Although we have rigorously adhered to the comprehensive search strategy dictated by the evidence-based SLR guidelines, ensuring a comprehensive selection of our samples, there’s still a possibility that certain papers might not have been incorporated into our data collection. This may result from their inaccessibility or non-existence on electronic platforms, of which we might be unaware.

In the creation of our search strings, the key terms “fairness”, “bias”, “dataset”, “training”, and “developer” were deliberately omitted based on the insights from our pilot study and testing, with the objective of minimizing a large number of unrelated results. While we recognize this could have excluded certain relevant papers from our sample, we employed a meticulous secondary search strategy to counterbalance this limitation. This strategy, we believe, largely made up for the potential drawbacks of not using these terms initially. Nonetheless, we accept the possibility that some potentially relevant research might have been missed due to this strategic decision, though we stand firm in the overall effectiveness of our implemented research approach.

Another shortcoming of this paper is the absence of a detailed analysis of the diversity attributes of each author from all the selected papers. Thoroughly examining all the diversity attributes of the authors of the selected papers would undoubtedly provide us with more comprehensive insights. However, accurately identifying every diversity characteristic of all authors is impossible. Additionally, the process also carries a substantial risk of misidentification. For instance, gender identifiers do not always identify gender correctly.

Internal validity. A potential threat could arise from the small number of selected papers and the restricted time span. As D&I in AI and AI for D&I are relatively new fields of research, we did not find many relevant papers prior to 2017. The majority of the papers were published recently (2022), and only 1 paper was published in 2017. However, in future studies, we will expand our time frame to check if there are more studies in this area. Another significant threat could arise from the bias in study selection and bias in data extraction. However, we mitigated this threat by adopting the investigator triangulation technique.

Construct validity. A potential construct threat could arise from the irrelevance of the selected papers with our research objectives. We selected many papers by reading the abstracts where there was a chance of getting information about D&I in AI or AI for D&I. However, many of them were removed after reading the full papers due to their irrelevance with our objectives. There is another potential threat to the subjective interpretation of the extracted data. Both of the threats were mitigated by adopting the investigator triangulation technique. In addition, conducting the preliminary mapping analysis of challenges and their associated solutions solely by the first author could potentially present a construct threat. Nevertheless, this threat was mitigated by incorporating all the authors in the revision process through several iterations of discussions.

External validity. An external threat could arise from the generalizability of our findings. Although the results of this SLR may not be generalized for all types of AI technology, they can be considered representative within the specific domain of AI system development.

7 Conclusions and future work

We conducted a systematic literature review with the goal to develop a comprehensive understanding of the challenges and corresponding solutions in addressing diversity and inclusion in artificial intelligence (D&I in AI) and enhancing diversity and inclusion practices by artificial intelligence (AI for D&I). After a rigorous process, we selected 48 academic papers published from 2017 to 2022, from which we extracted data and applied open coding on the data to explore information relevant to the challenges and solutions. Finally, we identified 55 unique challenges and 33 unique solutions in addressing D&I in AI, and 24 unique challenges and 23 unique solutions in addressing AI for D&I.

The analysis of the findings revealed that the integration of AI with diversity and inclusion is a less-explored area of research, as we found only a limited number of papers. Majority of these studies discussed the challenges of addressing D&I in AI, but provided limited attention to the solutions to address those challenges. Moreover, a large number of solutions were proposed by some non-empirical studies without any implementation or validation in real life settings. Our study reveals that there is a lack of guidance for operationalizing the proposed solutions. We identified less challenges and solutions to address AI for D&I from a limited number of papers compared to the number of challenges and solutions to address D&I in AI. Hence, further research is required on AI for D&I in particular and solutions of challenges for D&I in AI.

Our results suggest that ‘gender’ is the most discussed attribute of diversity in AI, which leads to the necessity of further research on other attributes such as race, ethnicity, language, ageism, and religion. Similarly, ‘health’ is the most discussed domain, and ‘facial analysis’ and ‘natural language processing’ are the most discussed AI systems in the analyzed literature on D&I in AI and AI for D&I, whereas other domains and types of AI systems are significantly ignored. We also identified that Governance related issues are less discussed in the challenges and solutions to address D&I in AI and AI for D&I.

The results of our SLR have provided much-needed evidence for the advocacy of embedding and integrating D&I practices and principles in the AI ecosystem. The gaps in the literature identified are the starting point for our holistic and comprehensive approach to tackling the D&I related issues in the overall AI ethics and Responsible AI body of knowledge. We have recognized the need for D&I in AI guidelines and as a result, parallel to the conduct of this SLR, we have also performed a multi-vocal analysis of academic and gray literature to develop a comprehensive set of guidelines [6]. Our next step is to design and develop a risk-based framework for practitioners from the findings of this SLR that would incorporate a risk assessment checklist and context-specific recommendations for tackling the related issues at different stages of the AI development lifecycle. Our plan will include co-designing this framework by applying human-centered design and evidence-based approaches involving AI practitioners and relevant stakeholders.