1 Introduction

Human resource management (HRM) activities comprise several routine and time-consuming tasks, while they are also subject to human perception, subjectivities, or biases. For these reasons, HRM is viewed as a fertile ground for the use of artificial intelligence [133, 143]. The use of artificial intelligence (AI) in HRM is being developed, tested, analyzed, and investigated empirically in various research domains [102, 125, 143]. Empirical investigations refer to studies based on data related to a phenomenon observed, measured and/or tested by the researchers [156]. Because there is no consensus on the definition of AI across and within domains due to the historical debate on what exactly “intelligence” is [44, 155] and AI is an umbrella term for different subset of technologies that mimic human intelligence (i.e., computer vision, natural language processing, machine learning, deep learning) [87, 144], this article will use a relatively broad, yet clear definition of the technology that can be applied across the use of AI in HR. Specifically, in this paper AI is defined as “[…] the ability of a machine to learn from experience, adjust to new inputs and perform human-like tasks” [45, p. 63]. The rapid growth in the use of AI in HRM is reflected in the publication, in the last few years, of several literature and conceptual reviews on AI in HRM (e.g., [13, 20, 23, 34, 56, 67, 121, 123, 128, 145]).

Despite the important merits of these reviews, some limitations remain to our complete understanding of the affordances and risks of intelligent technologies in HRM, necessitating a thorough review of the literature from a different lens than the previous ones. Specifically, by focusing more on the literature of their respective domains, such as computer science or HRM, the previous reviews do not fully take into account the multi-domain nature of AI in HRM and the combination of both technical and social aspects of this phenomenon. Our study overcomes those limits by looking across domains at both (1) how AI is used in HRM (i.e., technical aspect) and (2) the responsible AI principles applied in our sample of studies (i.e., a social aspect).

Regarding the technical aspect, there is a certain lack of specificity on the technology studied (AI-enabled HRM). Specifically, because the reviews often fail to state explicitly and define what is the technology under examination, recent reviews have described studies about various and not-necessarily AI-related technologies used in HRM (e.g., big data analysis, which is a massive amount of fine-grained and exhaustive data, but AI software is not ipso facto used to leverage this data [11, 74]). Our current review overcomes this limitation by including only studies that explicitly examine the use of AI, following the aforementioned definition, and thus clarifying to technical aspect of AI use in different HRM functions.

As for the social aspect, there is no current review with a focus on the responsible AI principles applied to HRM. Precisely, current reviews taking into account this social aspect mainly discuss or propose conceptual frameworks providing solutions on how AI should be studied, implemented or used, but none of them empirically observe the actual application of such frameworks. Our study contributes to knowledge by taking an inside look at how responsible principles are applied when developing, studying and deploying AI in HRM. Moreover, there is a necessity to look at the application of responsible research practices as many studies emphasize that responsibility is a key element when studying the use of AI in HRM (e.g., [6, 16, 61, 93, 147, 152, 153]). To our knowledge, this is the first systematic literature review looking precisely at what or which principles constitute responsible AI in HRM and how they are applied in empirical studies across domains.

However, as the notion of responsible use of technology is in constant evolution in the literature, there is no consensus around the definition and applications of responsible AI in the HRM domain. In this study we will adapt the broad definition of responsible AI from Barredo Arrieta et al. [19] which states that it is “[…] a series of AI principles to be necessarily met when deploying AI in real applications” [19, p. 83]. This adaptation will be done by including the responsible way of studying AI and defining responsible AI as a set of ethics principles to be necessarily followed when developing, studying, and deploying AI [133]. This definition will guide our review, but also provides researchers, organizations, and policy-makers of the necessary common understanding of what responsible AI refers to.

In sum, the aim of this article is to examine the scope of the existing empirical literature on responsible AI in HRM while attempting to overcome the limitations of previous work by conducting a systematic literature review including only empirical studies, all types of journals (not just in HRM), and no a priori conceptual framework. The objectives of this review are to: (1) identify empirical studies of current uses of AI in HRM, (2) review empirical knowledge of responsible AI principles in HRM and their application, and (3) evaluate the extent to which these research practices promote the combination of AI use with ethical, dignifying and quality work.

2 Methodology

2.1 Retrieval

To guide our review, we followed the PRISMA 2020 statement, which allows for transparent reporting of our search strategy and our findings [118]. To be included, articles had to: (1) be an empirical study, (2) be peer-reviewed, (3) explicitly be related to a human resource management function, and (4) explicitly include an AI-driven technology based on the definition of AI presented in the introduction. To identify studies, we searched the following databases: Academic search complete, Business Source complete, PsycArticles, Web of science, and ABI/INFORM Collection. The broad scope and variety of these databases allowed us to assess multiple research domains in our literature review.

Appendix 1 presents the search query looking at the intersection of three areas. The first section includes domain related terms (e.g., human resource), the second includes responsible practices related terms (e.g., responsible or business ethics), and the third includes AI related terms (e.g., machine learning). The keywords included in our query were found using a two-step method commonly used in reviews [4, 122, 146]. The first step was to use the following search structure: Domain related terms “AND” responsible practice related terms. This search was conducted in each database. Fifty random studies per database were screened (abstract and title) to deduce any additional search terms that may have been missed. The second step is to use the following search structure: Domain related terms “AND” AI related terms. This was again searched in each database, with a maximum of 50 random studies reviewed [146].

At this point the number of records returned was 2561. The references were organized with the bibliography manager Zotero (Corporation for Digital Scholarship)Footnote 1 and the data was managed with Covidence (Covidence inc., Australia), an online platform for managing systematic reviews, and multiple spreadsheets. Duplicates were automatically detected by Zotero and deleted manually. Off-topic records (e.g., in the veterinary field) were also deleted manually for a total of 1796 removed records, leaving 765 records remaining. We then used the "snowball" approach to add more records that appeared to have relevant titles (n = 259). The snowballing technique is used to enrich systematic reviews by using the references of articles in their existing samples to identify other articles potentially relevant to the reviews [159]. This technique was particularly important for our review because the literature on AI in HRM is rapidly evolving, and freshly published work or conference proceedings may have been slow to enter the databases we searched. The 1024 identified records were then transferred to Covidence, which automatically deleted the remaining duplicates that had bypassed the first process (n = 15).

Hence, 1009 records were identified for the title and abstract screening phase in Covidence. To ensure concordance before proceeding with record screening, an inter-coder reliability score was calculated using the percentage agreement (we agreed in advance that if it reached > 75%, we would move on) [118, 146]. Specifically, in a pilot test, two researchers independently reviewed the title and abstract of a random sample of 50 records based on the four selection criteria and specified which criteria were not met if the study was excluded [146]. Their work was then compared. Only one round of pilot testing was required, with both researchers screening 42 of the 50 records in exactly the same way (i.e., a score of 84%).

The title and abstract of the 1009 records were then screened. Based on the selection criteria, 786 studies were deemed not relevant for the literature review. We then performed a full-text review of the remaining 223 studies and excluded 116 which did not conform to the selection criteria (e.g., not HRM-related or AI-focused). In the end, a total of 107 studies were included in this systematic review. Figure 1 shows our PRISMA flow diagram.

Fig. 1
figure 1

PRISMA flow diagram

2.2 Data extraction

First, after a thorough reading, two members of the research team listed each of the texts in detail in a summary spreadsheet, recording various characteristics related to the manuscript and the reported results. Theses syntheses were manually compared and found to be highly similar. The rare dissimilarities that occurred were resolved through discussion within the research team. Table 1 shows the data extraction categories used.

Table 1 Data extraction categories

Regarding the meta-category about the use of AI in HRM, the researchers followed a recent conceptualization of algorithmic HRM by Meijerink and Bondarouk [101] as a guide to identify HRM functions from the data in the analyzed summary table. Meijerink and Bondarouk [101] describe the affordances of AI algorithms as talent acquisition, performance evaluation, talent management, workforce planning, and compensation and benefits. Moreover, to bring greater clarity and detail in our analysis of the technical aspect of AI in HRMR, we further granularized this meta-category according to whether the associated AI algorithms were descriptive, predictive, and/or prescriptive, as per the work of Leicht-Deobald et al. [90]. Thus, those meta-categories and granularized sub-categories were used to classify the HRM algorithm types in our data extraction.

These types of AI algorithms mentioned in the previous paragraph are used to make sense of or find patterns in small or big data sets, as in internal company data (e.g., resumes, employee portfolios, job descriptions, workloads, turnovers, or key performance indicators) and/or massive and diverse datasets from different external data sources (e.g., social media or job search websites) [75, 90]. Precisely, descriptive AI systems are used to analyze, explain, and understand what happened in the past and how it affects the present, such as those used to rank resumes or assess candidate characteristics in the recruitment process [90, 101]. Then, based on past observations, predictive algorithmic systems are used to determine the probability that a situation or behavior will occur in the future, such as those used to predict the future performance of job candidates [90, 101]. Finally, prescriptive systems consider relevant factors and select actions or decisions to be put in place, such as those used automate candidate screening or suggest candidates to invite to interviews in the recruitment processes [90, 101]. Beyond predicting future outcomes, prescriptive algorithms suggest, decide, or implement actions and decisions in order to support or automate decisions or processes [101]. Overall, extracting both the HRM functions and the algorithm type supported our examination of how distinct types of AI algorithms are used in each HRM function. This will be elaborated in the next section and detailed examples of how each type of algorithm is used in each HRM function will be provided in the results section.

Regarding the meta-category Responsible AI, we focused on the notion of responsibility related to the use of the system and not those related to the goal of the system. For instance, regardless of whether the finality of the system may be deemed responsible, such as promoting employee wellbeing or sustainable behaviors, our focus was on responsible use of the systems. This is based on the argument that a system with a well-intended goal could still be irresponsible in its use (e.g., a wellbeing system could discriminate against a certain population).

As for the categorisation of responsible (or ethical) AI principles, it is interesting to note that there are over 80 normative frameworks on responsible AI to date around the world [5]. These frameworks present several overlaps and commonalities in principles (e.g., transparency of AI), but also important discrepancies in the terminology (e.g., transparency; explainability; black box; opacity). This fuzziness led us to categorize the responsible AI that emerges from our 107 selected empirical studies according to the most common principles in the responsible AI literature (i.e., autonomy and agentivity, bias and discrimination, explainability and transparency, human role, perceived justice and trust, privacy, system accountability, and working conditions) (primarly based on [5, 6, 13, 16, 19, 58, 100, 128, 147, 152]). We first analyzed whether the studies include responsible practices (Category: Inclusion of responsible practices) and then detailed the practice (Category: Type of responsible practices).

Second, once this step was completed, three members of the research team individually analyzed the summary table, with the goal of identifying points of commonality within categories. The selected empirical peer-reviewed studies varied substantially in terms of vocabulary used, theoretical approaches, aims, disciplines, angles of analysis, and methodologies. This highly diversified sample of studies complexified the analysis of the findings. This led us to adopt an inductive approach for the analysis [4, 122]. This approach aims to generate knowledge about concepts in the literature, rather than validating a pre-existing theory, and the end result comes from generalizing all observations [38, 59].

Guided by the study objectives, we paid particular attention to emerging themes of how AI is currently used in HRM functions (Meta-category: Human Resources Management) and how responsible AI concepts are applied in these empirical studies (Meta-category: Responsible AI). The three researchers then met to compare their analyses. Again, the similarities were strong, and the few dissimilarities were discussed within the whole research team and agreed upon. Notably, regarding the category Human Resource Function, we enhanced Meijerink and Bondarouk’s [101] conceptualization by adding a Health and well-being function because we found several studies falling under this topic. Moreover, regarding the meta-category Responsible AI, only 5 principles emerged from the studies. Those will be elaborated in the next section.

3 Results

3.1 Descriptive results

Our 107 selected empirical and peer-reviewed studies were all published between 2004 and 2022. The median year of publication was 2019. Figure 2 shows the distribution of our 107 empirical studies according to the year they were published. It is important to keep in mind that the year 2022 comprises only the period from January to June, because June 2022 was the month of the data extraction.

Fig. 2
figure 2

Number of studies per year (N = 107)

In addition, our sample contains 86 different journals or conference proceedings in various fields (e.g., Engineering, Ethics, HRM, Information systems, Management, Mathematics, and Psychology). Table 2 shows the journals or conference proceedings with three or more studies in our sample.

Table 2 Journals or conference proceedings with three or more studies

In terms of study design, the selected empirical articles include 63 experimental studies, 15 field studies, 24 studies combining both methods, three case studies and two ethnographies. Moreover, 89 studies used quantitative data, 13 used qualitative data, and five used both. In addition, 69 studies examined the development of a new AI system or model. In the vast majority of these studies, the affordances and the design of the new systems as compared to the old ones were not discussed. Rather, they focused on how their new system offered better validity or performance than past systems or human professionals. In addition, the context surrounding almost all of these developmental studies were laboratory experiments and thus were not implemented and applied to practice.

Regarding the context of all of the 107 studies, they included data collection from 23 different countries: Australia, Bangladesh, Belgium, Canada, China, Colombia, France, Germany, India, Indonesia, Iran, Jordan, Korea, New Zealand, Nigeria, Norway, Palestine, Portugal, Russia, Switzerland, Taiwan, Turkey, and the USA. That said, 29 of the studies in our sample do not specify the country of data collection. The country that recurs the most is the USA (12), and no study includes a cross-country analysis. With respect to sector of activity, 49 studies did not specify a sector under study or it was not applicable. The most studied sector was government or public services (e.g., teachers) with 13 studies, followed by the information technology (IT) sector with 10 studies. The other sectors in our sample are services (7 studies), manufacturing (4), academia (4), power supply (3), military (2), telecommunication (2), construction (1), sales (1), retail (1) and non-profit organization (1). In addition, nine studies were not specific to the sector studied or reported on a population of workers from various occupations and therefore could not be classified specifically according to their sector of study.

Moreover, many organisations under study were large or multinational organizations (e.g., [10, 26, 98, 119, 149]). This is coherent with the sample sizes of the 69 studies that developed a new AI system or model who often needed to use massive datasets. For example, Avrahami et al. [15] used a longitudinal archival data set comprising of more than 700,000 employees in a large public organization to develop a tool that predicts turnover rates.

3.2 How AI is used in HRM (a technical aspect)

The goal of this section is to report an overview of the affordances of AI in the HRM field based on the empirical studies included in our review. Affordances refer to a use or purpose that a thing can have, that people notice as part of the way they see or experience it. Our 107 selected empirical and peer-reviewed studies include 79 studies that describe how AI is used in specific HRM functions and the types of AI algorithms involved (30 descriptive algorithms, 31 predictive algorithms, and 27 prescriptive algorithms), and 28 that were not specific enough for us to categorize and therefore are not elaborated in this section (e.g., studies on general perception of AI or general use of AI in HRM) (e.g., [7, 21, 43, 64, 69, 81, 82, 120, 135, 150, 151]). Notably, some studies included contain more than one AI algorithm type and/or more than one HRM function.

Table 3 shows the breakdown of HRM algorithm types according to the HRM function. Here, we have augmented the categorization schema proposed by Meijerink and Bondarouk [101] by adding empirically supported uses of AI in each category and by adding the Health and well-being category. Moreover, Fig. 3 shows the distribution of HRM function categories identified in the studies.

Table 3 Identified HRM functions and types of systems used
Fig. 3
figure 3

Distribution of HRM functions in papers with specific AI algorithm type (n = 79)

3.3 Responsible AI in HRM

Responsible use of AI in HRM encompassed several principles according to our sample of 107 peer-reviewed empirical studies. Six categories emerged from analysis: (1) no responsible principle applied, (2) bias and discrimination, (3) perceived justice and trust, (4) privacy, (5) explainability and transparency, and (6) human role. Some studies applied more than one principle. Appendix 2 shows the classification of studies that clearly applied or investigated responsible AI principles. Figure 4 shows the distribution of studies across the categories.

Fig. 4
figure 4

Distribution of responsible principles across studies (N = 107). Some articles contain more than one principle

3.3.1 No responsible principles reported

Of our sample of 107 empirical studies, 63 did not clearly apply a responsible AI principle. Within these 63 studies, 27 assumed that the AI system would reduce bias and discrimination because it would decrease or eliminate human subjectivity. While this assumption is consistent with some conceptual developments (e.g., [93]), it was not empirically tested in the 27 identified studies.

3.3.2 AI fairness in HRM

The concept of AI fairness in HRM does not seem to have a universally accepted definition across the empirical literature. Instead, we found that it is more of an umbrella term that covers three of our identified principles, namely bias and discrimination in HR-focused AI, perceived justice and trust of decisions and outcomes, and privacy concerns (or intrusiveness) related to AI use.

Twenty studies focused on detecting or mitigating bias and discrimination in an AI system for HRM. Indeed, AI-driven decisions can actually be biased and discriminatory because they reflect the data on which they are based [25, 53, 113, 143]. Some studies have looked at how HRM AI tools can be audited and how this auditability may contribute to the detection and mitigation of bias [22, 32, 36, 76, 131, 132, 140, 158]. For example, regarding AI in talent acquisition, Köchling et al. [76] show that AI reproduces (and may even amplify) existing inequalities in the dataset and that underrepresentation of certain groups leads to an unpredictable probability of inviting candidates from those groups to job interviews. Others include the principle of bias by adding a validation step or test in the development of their AI system to demonstrate that the system developed in the study does not discriminate [26, 119, 124, 127]. Finally, the principle of bias and discrimination is applied in empirical research by studies that have developed AI systems whose sole purpose is to detect and mitigate bias and discrimination [12, 60, 124]. For example, Hangartner et al. [60] developed an AI powered tool to continuously monitor hiring discrimination on online recruitment platforms.

Eleven studies in our sample empirically examined the perception of justice or trust of decisions and outputs among employees or job seekers, a principle that is also associated with acceptance (e.g., [84]). Most of these studies used an experimental research design in a talent acquisition context and none of them involved the development of a new AI system or model (e.g., [3, 17, 77, 83,84,85, 89, 111, 141]).

Regarding privacy, only four studies investigated privacy concerns related to AI in HRM [27, 46, 83,84,85]. Eckhaus [46] shows that scanning emails for data to feed an AI raises privacy concerns, Cayrat and Boxall [27] investigated how organizations implement mechanisms to ensure data privacy and comply with legal obligations (particularly the European General Data Protection Regulation (GDPR)), whereas Langer et al. [83,84,85] show that the degree of automation of job application process was slightly but positively related to applicants’ privacy concerns.

3.3.3 Explainability and transparency in HRM

Explainability (or XAI) is an objective concept as it refers to “[…] an active characteristic of a model, denoting any action or procedure taken by a model with the intent of clarifying or detailing its internal functions” [19, p. 84], while the concept of transparency is more subjective as it can be defined as “[…] the level of awareness and understanding of how [a] system is used” [24, p. 2].

Six studies in our sample applied or investigate this principle [12, 47, 48, 111, 116, 124, 158]. Some studies that have developed AI models have deliberately chosen features that are easier to interpret or provided an explicit explanation of how decisions or outcomes are obtained in order to increase explainability (e.g., [12, 47, 48, 116, 124]). Moreover, Newman et al. [111] directly assessed the effect of this on perceptions of justice in an experimental study by manipulating the level of detail provided about the system process. They found no significant effect.

3.3.4 Human-centered HRM-AI

Finally, 17 studies either applied or investigated the importance of the involvement of humans (e.g., developers, managers, HR practitioners or employees) in the development, implementation and usage of an AI system in HRM. The nature of the human role under study primarily included the level of stakeholders’ control over the system (e.g., change or make the final decision, ask questions, appeal, or provide input to the algorithm). The level of users’ control or involvement over AI systems seems essential to promote responsible use and even acceptance (e.g., [10, 12, 51, 91, 96, 103, 124]), as “[…] humans must ultimately retain the role of decision makers” [10, p. 66]. For example, Anoaica et al. [12] have put mechanisms in place (mainly in terms of explainability) to provide the HR department with the freedom to make its own judgments and Faliagka et al. [51] warn against blind confidence in an automated system. In a human–computer interaction perspective, as for other domains, providing some degree of control seemed beneficial (e.g., [57]). These findings echo the principle of accountability according to which humans should remain responsible and accountable for their decisions even if supported by AI systems.

In addition, some studies showed the importance of implication of multiple stakeholders in the development, implementation, and use of AI. In particular, they emphasized that the team be multidisciplinary, continually seek input from a diverse set of stakeholders, and adapt the AI system along the way [97, 148, 149]. The role of HRM in supporting AI systems was also documented. It was mainly addressed through the importance of developing the skills of various stakeholders (e.g., HR practitioners, developers, managers and employees), as multidisciplinary skills are required for the success of AI in HRM [10, 27, 97, 108, 149]. For example, articles highlighted that stakeholders coined as intended users of the systems need to be skilled in statistics and legislation (e.g., GDPR) and understand the responsibility principles surrounding AI, while developers need to be able to go beyond the data and become familiar with HRM [148, 149].

4 Discussion

This paper presents a literature review on empirical and peer-reviewed research on responsible AI in HRM across domains, taking into account the complexity of this phenomenon by looking at both a technical aspect (i.e., how AI is used in HRM) and a social aspect (i.e., responsible AI principles). We contribute to the literature by showing how AI is used in HRM, examine how responsible principles are applied in empirical research of AI in HRM, and evaluate the extent to which these research practices promote responsible AI.

First, our results show that AI in HRM is a multi-domain research topic studied worldwide and across diverse sectors, provided as our sample of 107 empirical and peer-reviewed studies contains 86 different journals or conference proceedings across diverse domains, 23 different countries, and 12 different sectors. Moreover, our descriptive results show that this research topic has greatly increased in popularity over the past decade. Our results also show that the use of three types of AI algorithms (i.e., descriptive, predictive, and prescriptive) have been reported according to six different HRM functions (i.e., 1—talent acquisition, 2—performance evaluation, 3—talent management, 4—workforce planning, 5—health and wellbeing, and 6—compensation) with talent acquisition AI systems being the most empirically studied HRM function and appearing to be the most well implemented. Several explanations can coexist to explain the significant imbalance in terms of interest in the use of AI in different HR functions. For instance, talent acquisition may be more prone to AI systems because it is a task known to be time-consuming, redundant, and subject to human bias, and the data available to train systems includes both actual and potential candidates, so the quantity of data is typically much larger [49].

Our review also highlights that a large number studies rely on experimental designs or analytical frameworks that have not been tested in real-life settings. Therefore, an important gap with regard to the use of AI in HRM consists of measuring the extent of its effective impact on organizations, the nature of these impacts, as well as the type and size of the organizations concerned. Consequently, the way AI is used in HRM, according to the studies in our sample, and the way it could actually be implemented in organizations may differ. Finally, an important issue that emerges from our analysis concerns the lack of precision regarding the characteristics of the AI tool studied by the researchers, as well as the AI tool’s potential context of implementation (organizational and human dimensions), which led to the exclusion of many studies from our review under the selection criteria of clearly being related to human resource management and include an AI-based technology. This gave the impression of a lack of depth in the literature, which can perhaps be explained by the lack of multi-domain studies on AI in HRM. Indeed, as the studies are mostly carried out in disciplinary silos, they do not allow the researchers to develop a substantially deep and global reflection of the phenomenon. That said, the multi-domain approach of this study provides researchers with perspective, depth, and clarity on which to build, taking into account both the technical and social aspects of AI in HRM.

Second, this paper includes findings on the responsible use of AI in HRM by identifying six categories about how responsible AI is empirically applied and investigated in HRM (i.e., 1—no responsible principle applied, 2—bias and discrimination, 3—perceived justice and trust, 4—privacy, 5—explainability and transparency, and 6—human role). That said, the majority of the studies in our sample did not empirically and clearly examine or incorporate the most common notions of responsibility found in the literature. Therefore, our results show a significant gap between the breadth of conceptual frameworks on responsible AI (e.g., [5, 6, 13, 16, 19, 58, 100, 128, 147, 152]) and the empirical studies that investigate or apply responsible AI principles in HRM. This gap is also observed within the field of HRM, considering the discrepancy between the number of conceptual pieces on the principles surrounding the use of AI in the discipline and the empirical pieces actually examining them. It thus appears that, despite the social and organizational importance of considering the dimensions of responsibility and ethics in the development and use of AI, it is not yet understood and conceptualized as a central dimension in empirical research pertaining to AI systems in HR. We suggest that this state of affairs can be explained by the difficulties involved in integrating the principles of responsibility into empirical research designs. Notably, when empirical research focuses on the effects of tools only once they have been implemented and thus excludes determinants related to the design of the tool itself, it is difficult to identify the full range of possible explanations for gaps in the application of responsibility principles in AI [73, 114].

Considering the elements previously underlined, it is all the more disturbing that some studies have presented AI systems in HRM as more ethical practices than traditional HRM, based on the theoretical argument that AI systems alleviate human subjectivity in practices and therefore reduce bias. This argument is often based on the notion that systems can achieve “fairness through unawareness”, which refers to the ability of systems to not explicitly use protected attributes or to omit sensitive features in the prediction process [31, 35, 79]. However, we found no empirical studies testing whether AI systems are indeed less biased than traditional HRM practices. We would thus discourage any claims that AI systems are less biased than practitioners unless further studies empirically investigate and demonstrate the validity of this statement. Indeed, we consider that supporting such premises without scientific testing would be irresponsible given that they could wrongly encourage practitioners to adopt AI technologies in order to reduce bias and discrimination.

4.1 Call for future research

Our review clearly shows that both the use of AI in HRM and the application of principles of responsibility are in need of further investigation. Our first and foremost encouragement for future research would be the development of more diversified research protocols that rely on extensive field-work and real-life settings. Indeed, as most of the studies in our sample are based on experimental designs, it appears difficult to generalize their contributions to the reality of organizational contexts. Therefore, their contributions for practitioners remain somewhat limited. Moreover, as our results show a major gap between conceptual and empirical research about responsible AI in HRM, we strongly call future research to either apply responsibility principles as a conceptual framework when conducting their empirical work or investigate the effects of responsible principles of AI in HRM. We found little empirical research on topics such as explainability and transparency (in fact, there is no research in our sample on subjective transparency) or privacy. Even the most empirically studied principle in our sample (i.e., bias and discrimination) has received little empirical study relative to the public and academic discussion surrounding it (i.e., [5, 37, 113, 153]). Moreover, perceived justice and/or trust have been primarily studied in experiments and hypothetical scenarios and we call for a methodological diversification such as more research in a real-life context. Regarding the role of humans in responsible AI in HRM, we believe that this principle could be among the most complicated to investigate because the degree and nature of the human role could vary greatly from one situation to another and thus call for more research on this principle. More specifically, although the role of HRM practitioners has been documented, the number of studies was small and knowledge about the outcomes of a high or low role of HRM practitioners on responsible AI remains scarce. In the same vein, although largely discussed in theoretical or conceptual pieces, we know very little on the skills that should be developed among HRM professionals to fully enable them to play this role. We also found that some responsible principles present in the literature were absent from our empirical and peer-reviewed sample. Specifically, our studies did not include empirical research on the impact of AI on stakeholder autonomy and agentivity, or system accountability [100]. We thus call future research on AI in HRM to diversify the approach used to further investigate these responsibility principles.

In addition, we call on future researchers to be explicit and provide as much detail as possible about the AI algorithms being studied, with their characteristics and affordances, as this would allow for a better understanding of how different AI types, features, or responsibility principles affect different outcomes. This could be facilitated through multidisciplinary research teams. In relations to this, we also call on future researchers to take into account the multi-domain nature of responsible AI in HRM by composing multidisciplinary research teams and breaking down silos between research areas. That is, with researchers with advanced technical knowledge of AI as well as researchers with advanced knowledge of HRM. These combinations would allow a better understanding of the complex phenomena of responsible AI in HRM.

As our results show conceptual confusion about responsible principles, we also call for future research to use the knowledge from the conceptual literature and explicitly detail the responsible principle being studied. We found that some empirical studies use terms from AI responsibility such as transparency or discrimination without defining the term or using it in a way that is consistent with the literature. For example, in some studies on AI transparency, this led to conceptual confusion, as researchers were actually studying the concept of explainability.

Also, we found that responsible AI in HRM is studied in many different countries, but we did not find any cross-country analysis. We call for future research to conduct such analyses to further our understanding of responsible AI in HRM and its differences across countries.

Finally, our results show that the field of AI in HRM is evolving rapidly, with the number of studies increasing significantly over the past decade. More empirical work on responsible AI in HRM has already been published since our June 2022 data extraction (e.g., [72]) and we call for future research to continue to update existing reviews.

4.2 Practical implications

For practitioners, our review calls for vigilance in the use of AI within the HRM domain. We have highlighted the lack of research and knowledge about its effects on the workforce. Decision-makers, managers, and HR professionals should be aware of this situation and keep in mind that the benefits of AI for firms also come with risks. Moreover, in order for AI to produce its benefits, it must be carefully crafted and contextualized. Therefore, AI is not a panacea and over-reliance on this technology could come at great costs. This is especially true in the current social context where a high emphasis is placed on issues of equity, diversity, and inclusion (EDI). Among the principles to be considered, the most discussed ones so far are fairness, explainability and transparency and human role. We also convey policymakers to stay tuned of the future research developments concerning those principles in the elaboration of robust frameworks to regulate to use of AI in HRM.