1 Introduction

Modern artificial intelligence (AI) systems routinely outperform humans in a variety of cognitive tasks.Footnote 1 AI is commonly defined as a system’s ability to think like a human by gathering information from its environment, processing it, and then adaptively applying what it has learned to achieve specific objectives [25]. AI systems outperform humans in terms of accuracy, speed, cost, and availability. Every year, AI surpasses parity with humans in increasingly complex tasks. As the amount of available data, the computing power, and the capacity of machine learningFootnote 2 models increase, the abilities of AI will continue to improve.

Since AI has lower cost, higher availability, higher consistency, and often higher accuracy, it augments humans and has the potential to replace them in some tasks or jobs [3, 8]. Every year, more decisions are made using AI, including decisions that affect human lives and well-being. Even though AI systems create economic value, their decision-making processes are often opaque and uninterpretable by humans. As AI systems become more complex, they turn into black boxes, and it becomes challenging to explain to the user the reasoning and logic behind the systems’ recommendations and predictions [34]. This in turn affects the user’s ability to make informed decisions based on the systems’ outputs as they can be arbitrary and biased. Scholars and practitioners have become increasingly interested over the past decade in transparent and explainable AI, where the output from machines leading to decisions could be traced back, explained and communicated to the stakeholders impacted by the systems such as end-users, developers, society, and regulators [1, 38]. The general public has also added pressure for transparent and explainable AI, especially after the emergence of extreme examples of catastrophic decision-making failures by AI systems such as Amazon Alexa recommending a child to undertake a lethal challenge,Footnote 3 a Tesla vehicle crashing due to autopilot failure,Footnote 4 and a Microsoft chatbot offering sexist and racist remarks.Footnote 5

While some decisions are entirely automated with AI, such as deciding which emails are considered spam on an email client or what additional products to recommend for a user on an e-commerce website, AI systems are often employed as decision support systems rather than as decision-makers. For example, AI systems are now being used to screen resumes to shortlist candidates for interviews while the ultimate decision is still in the hands of a human using the system. Thus, examining transparency in AI systems from the perspective of AI augmentation is helpful. In this perspective, humans use AI to enhance their cognitive and analytical decision-making capacities; however, humans retain control over and responsibility for the decisions. In a survey conducted by Gartner in 2021,Footnote 6 over two-thirds of employees in the US indicated that they would like AI to assess their decisions and accomplish tasks. Examples of these tasks include reducing mistakes, solving problems, discovering information, and simplifying processes. While the employees expressed their desire for AI to assist them with their jobs, they also wanted to retain control over the decisions made.

Whether the control over the final decision lies with a machine or an employee, this does not negate the significance of unlocking the black box and having transparency about decisions made by, or assisted by, AI. There is always the risk that unexplainable wrong decisions can be made. This raises potential liability and ethical concerns related to deploying AI tools in high-risk decision-making contexts, such as healthcare. Despite increased efforts to achieve transparent and explainable AI, current studies have yet to examine whether transparency as a concept is required in all contexts or whether AI transparency requirements differ across different jobs and tasks. This paper addresses the following question: Are jobs and tasks at various workplaces subject to different AI transparency requirements?

To systematically study AI transparency in different roles, we examined AI transparency at the level of “job tasks” that AI systems can augment. To do this, we developed a model to determine the jobs and tasks that are highly susceptible to AI augmentation. Then we analyzed the required level of transparency for the adoption of AI systems. We found that the required level of transparency, in most cases, is not different from the transparency required when adopting other traditional technologies. Moreover, we observed that the required level of transparency differed based on the perceived risk of the performed task. Hence, we argue in this paper that AI transparency should be pragmatic and that each job and task might require different levels of transparency.

This paper is organized as follows. In Sect. 2, we present an overview of the burgeoning literature on AI transparency and note the need for a perspective that considers transparency through the lens of AI augmentation. Moreover, the tradeoffs between making AI systems either fully or less transparent are examined, and an overview of the relationship between typical AI errors and transparency is provided. In Sect. 3, we detail the methodology used to determine the jobs and tasks that are susceptible to AI augmentation. In Sect. 4, we describe how the data were analyzed and we present the results. In Sect. 5, we discuss the implications of our analysis. We conclude in Sect. 6 with recommendations for future work.

2 Background and literature review

The notion of transparency is multifaceted, and its theoretical conceptualization has been expanding, especially in the last ten years. This idea has been discussed in many disciplines, including computer science, social sciences, law, public policy, and medicine [26]. In fields related to AI, machine learning, and data science, transparency has become an important issue, as AI systems are becoming more complex, and their characteristics, processes, and outcomes have become more difficult to unpack and understand [14]. Scholars refer to this as the issue of the black box, in which algorithmic models become opaque either by intention or due to the increasing complexity of the models, causing the process that occurs before an input becomes an output to be opaque and difficult to understand [7, 17, 24]. The main objective of efforts to promote transparency in the field of AI has become to resolve the issue of the black box and to enable an understanding of how and why an AI system derives a decision or an output [19]. By resolving this issue, organizations and individuals using AI systems can be held accountable for their decisions, and users affected by AI-assisted decisions can contest the outcomes created by these systems [1, 28]. In this section, we review the different meanings of transparency, articulate its benefits and limitations, and provide an overview of the notion of AI errors and their impact on transparency.

2.1 Definition of transparency

A survey of the literature indicated that ambiguity exists regarding what transparency means. It is frequently used interchangeably with other terms, such as explainability, interpretability, visibility, accessibility, and openness [18]. Moreover, scholarly works have assigned different forms and typologies to AI transparency [2]. For example, Walmsley [35] distinguished between functional and outward transparency. Functional transparency is associated with the inner elements of the AI system, and outward transparency is related to external elements that are not part of the system, such as developers and users. Similarly, Preece et al. [29] tied transparency to explainability and distinguished between the transparency-based explanation, which is concerned with understanding the inner workings of the model, and the post hoc explanation, which is concerned with explaining a machine’s decision without unpacking the model’s inner workings. Similar to the work by Preece et al. [29], Zhang et al. [39] also tied transparency to explainability and distinguished between the local explanation, which refers to the explanation of the logic behind a single outcome, and the global explanation, which refers to the explanation of how the entire algorithmic model works. Felzmann et al. [19] classified transparency as prospective and retrospective. Prospective transparency deals with unveiling information about the working of the system before the user starts interacting with it, whereas retrospective transparency refers to the ability to backtrace a machine’s decision or outcome and provide post hoc explanations of how and why a decision or an outcome was derived.

In this paper, we adopt the definition of transparency provided by the High-Level Expert Group on Artificial Intelligence (AI HLEG) and view transparency as achieving three elements: traceability, explainability, and communication [1]. Traceability refers to the enabling of the retrospective examination of a system by keeping a log of the system’s development and implementation, including information about the data and processes implemented by the system to produce an output (e.g., a decision). Explainability refers to the ability to explain the technical process and the rationale for the AI system’s output. Communication refers to the communication of information about the AI system to the user, including information about the system’s accuracy and limitations so that the user is aware of what they are interacting with [1].

To achieve transparency, system owners might disclose information about data training and analysis, release source code, and provide output explanations [5]. Scholars have argued that achieving transparency requires viewing AI systems as sociotechnical artifacts, meaning that they cannot be separated from the context in which they are developed and deployed, and they cannot be isolated from cultures, values, and norms [16]. Moreover, AI systems are governed by different stakeholders, each requiring different levels of transparency to satisfy their needs. For example, Weller [38] identified developers, users, society, experts/regulators, and deployers as distinct stakeholders with different transparency requirements. A developer might require transparency to verify whether the system is working as intended, eliminate errors, and enhance the system, while users require transparency to ensure that the outcome of the AI system is not flawed or biased and to increase trust in future outcomes [19]. Since different stakeholders have different needs, releasing the source code, for example, might meet the transparency requirements of developers but not users, as they might not understand what the code does.

2.2 The benefits and limitations of transparency

Enhancing the transparency of AI systems could lead to several benefits. The first and one of the most assumed is an increase in users’ trust [12]. Schmidt et al. [32] indicated that the general perception in the literature is that transparency increases trust in AI systems and that system owners can enhance such trust by providing users with simple and easy-to-understand explanations of the system’s output [27, 40]. In the context of AI systems, trust is more important than in other traditional engineering systems because AI systems are based on induction, meaning that they make generalizations by learning from specific instances rather than applying general concepts or laws to specific applications. The second benefit of transparency is ensuring that AI systems that directly impact people do not engage in discrimination to achieve fairness [36]. The third benefit of transparency is the enabling of accountability by reducing information asymmetry, thus allowing organizations and individuals to be held accountable for their decisions [18, 20, 28].

Despite the purported benefits of transparency, several studies have indicated that these benefits might be limited, and in some cases, transparency might have negative consequences [2, 13, 38]. De Laat [13] listed four different areas where transparency might lead to limited benefits and negative consequences. The first is the tension between transparency and privacy. Releasing datasets publicly might violate the privacy of the individuals included in the dataset. Existing research suggests that individuals can be reidentified in many publicly available anonymized datasets (see [31]). The second area is the possible manipulation of an AI system. If information about the inner workings of a system, such as a source code, is released, the system can be manipulated to either prevent it from working as intended or to produce a favorable outcome for the manipulator. For example, knowing that an autonomous vehicle will force a stop if a moving object appears less than 1.5 m from the car, a person can use this information to make an autonomous car permanently idle and prevent it from moving [16]. The third area is related to protecting the property rights of firms that own AI systems. Requiring firms to publish the source code of their AI systems might infringe upon their property rights, affect their competitive stances, and disincentivize them from innovating [12, 33]. The fourth area deals with “inherent opacity,” in which the information disclosed about an AI system might not necessarily be interpretable and understandable, thus failing to achieve the objective of transparency [13].

Ananny and Crawford [2] identified additional issues and limitations related to transparency. First, transparency can be intentionally used to mislead or conceal. Firms might purposefully disclose huge amounts of information and data when adhering to regulations, making it costly and time-consuming to understand and process these data, thus limiting the usefulness of transparency. Second, the correlation between transparency and trust has yielded mixed outcomes in different studies. For example, in their study on recommender systems in the field of cultural heritage, Cramer et al. [9] found no positive correlation between transparency and trust. In the field of public policy, De Fine Licht [11] and Grimmelikhuijsen [22] also found no strong evidence that increasing transparency increases trust. Therefore, whether users consider predictions and recommendations from machines or humans to be more trustworthy is also inconclusive. In their study, Dietvirst, Simmons, and Massey [15] found that people are averse to algorithmic predictions and recommendations and prefer recommendations from humans, even if they could observe that the algorithms outperformed humans. They also found that people lose confidence in algorithms more quickly than humans when witnessing mistakes. Contrary to these findings, Logg, Minson, and Moore [27] found that people actually appreciate predications and recommendations coming from algorithms more than from humans, even when they do not understand how the algorithms make the recommendations [35]. Given the aforementioned issues and limitations of transparency, many scholars argue that achieving full transparency is undesirable, if not impossible [12, 13, 24, 30].

In summary, there is no agreed-upon definition of transparency, as it takes on different meanings in different disciplines. The literature also indicates that transparency is beneficial, although absolute transparency could have negative consequences. Despite several attempts to formalize the concept and definition of transparency in AI, to our knowledge, no work has examined transparency from the lens of AI augmentation in the workplace.

2.3 AI errors and transparency

The nature of AI errors has implications for the type of transparency required of AI systems. Current AI algorithms, or more precisely, machine learning algorithms, are based on statistical generalization; in other words, they learn from data samples to make out-of-sample decisions. Thus, AI algorithms are inductive. This is in contrast to systems that are deductive, as they are based on applications of generalized laws. For example, the automatic take-off and landing system in aircraft is an application of physics laws that are universally true for all practical purposes.

Consider an AI system that screens resumes to shortlist candidates for a job. Suppose an HR department decides to test such a system. They prepare a job description, manually curate a list of 100 resumes, and select 10 that match the position. When the AI system is fed the job description and the 100 resumes, it returns 15 resumes as candidates for screening. What level of transparency should accompany the resume screening system? In this particular case, the HR department can calculate both the precision and recall of the system, as it has curated a ground truth set. In practice, this will not always be possible. It is important to know that most AI systems will be based on some form of scoring and that there will usually be a score threshold that a user can tune based on their needs. A high threshold usually results in high precision and low recall, and a low threshold leads to low precision and high recall. In practice, if the advertised job is for a routine, low-skilled position, then high precision is sufficient to select candidates for further evaluation. However, if the position is for a high-skilled job, then high recall is important to avoid overlooking a suitable candidate whose skill set matches the job but is not presented in a standard resume template.

Another important aspect of using AI systems is understanding the bias–variance trade-off of the underlying algorithms. For practical purposes, this is a specialized skill that somebody in the organization should possess. Linear or shallow models generally have high precision and low recall while the opposite is true for non-linear and deep models [23]. From a transparency perspective, if coarse information on the complexity of the model is provided, then this should be sufficient to appreciate the behavior of the AI system over time. With more experience, an organization may ask a vendor to customize shallow or deep models for different groups of jobs. For example, shallow models for low-skill jobs and deep models for high-skill jobs can be created, with the caveat that the vendor of the AI system had sufficient data to calibrate across the job spectrum.

3 Methodology

We have developed a machine learning model for computing AI augmentation scores (AIAScores) for all jobs and job tasks that appear in the Occupational Information Network (O*NET)Footnote 7 database. Our model extends the work of Michael Webb’s method of creating AI exposure scores, which is based on counting the number of common verb–noun pairs that appear in O*NET and a database of AI patents [37]. Instead, we used word embeddings for both job descriptions and patents and designed a supervised learning model (see Sect. 3.1) to create an AIAScore. The O*NET database is maintained by the US Department of Employment and has been used worldwide for studies on the impact of AI on the future of employment [21, 37]. We used Google’s patent database to extract more than 1.5 million patents related to AI. Once we were able to obtain augmentation scores for the different jobs in the O*NET database, we carried out another analysis to understand the transparency requirements for the tasks of the different jobs. We took all the job tasks’ embedding vectors and applied a k-medoid algorithm to obtain 100 clusters (i.e., each job task, independent of the job description to which it belonged, was assigned to 1 of 100 clusters). In k-medoid clustering, the representative member of the cluster is an actual task, not a hypothetical “average job task.”Footnote 8

3.1 AIAScore model

As mentioned in Sect. 3, we extend Michael Webb’s method [37] for computing an “exposure score” for jobs to AI technology. For each task in a job, Webb computes a propensity score based on the frequency of co-occurrence between verb–noun pairs that appear in a job task and AI patents. The higher the propensity score between a job task and an AI task, the greater the exposure score. The propensity score of a job is the aggregate of the propensity scores for all tasks within the job description. For example, a typical task for an accountant is analyzing accounting records. The verb is analyzed, and the noun is accounting. The percentage of AI patents that contain verb–noun pairs similar to analyze–accounting is a measure of the exposure of the task to AI technology. Another common verb–noun pair for an accountant is maintain–records, which might be associated with software (database) technology but not AI technology. Our extension of Webb’s work is to use modern language models and supervised learning to compute an AIAScore. Our model was developed as follows:

  1. 1.

    We first used FastText embeddings to create a vector embedding of all patents [6]. We denoted a vector embedding for patent P as wP. The dimensionality of each vector was 300.

  2. 2.

    We used supervised learning to build a neural network-based softmax classifier that uses patent embeddings as the input and output of the Cooperative Patent Classification (CPC) code of a patent. All patents are assigned one or multiple CPC codes (by the patent office), and a subset of known CPC codes is used for AI patents. For example, suppose there are three CPC codes [C1, C2, and C3], and C1 is an AI CPC code. Furthermore, suppose the classifier assigns scores of 0.6, 0.3, and 0.1 to a patent vector wP. Then, the AIAScore of the patent is 0.6.

  3. 3.

    To compute the AIAScore of a job task t, we first compute the FastText embedding wt and pass it through the trained classifier. Let Fθ be the trained classifier, C be the set of CPC codes, and CA be the subset of CPC codes related to AI.

    $$\mathrm{AIAScore}\left({\mathrm{w}}_{\mathrm{t}}\right)= \sum_{\mathrm{i}\in {\mathrm{C}}_{\mathrm{A}}}{\mathrm{F}}_{\uptheta ,\mathrm{i}}$$
  4. 4.

    Let TJ represent a set of tasks in a job J. Then, the

    $${\text{AIAScore}}(J) = \frac{1}{{\left| {T_{J} } \right|}}\sum\limits_{{t \in T_{J} }} {{\text{AIAScore}}(w_{t} )}$$

3.2 Data

To train our AIAScore model, we used two datasets. The O ∗ NET database has an extensive dataset covering most aspects of jobs and occupations in the US market. The information of interest for us in this database included data on more than 20,000 tasks from over 1000 job descriptions and the importance rating of a task within a job. The second dataset is the Google database, with over three million full-text patents and their associated CPC codes.Footnote 9 Our AIAScore model is available online.Footnote 10

4 Results

Using the O*NET database, we extracted a list of 100 tasks belonging to different jobs and representing the whole population of the tasks in the database; see Appendix A for the full list of tasks and the distribution of AI augmentation scores. This was achieved by clustering tasks into 100 groups and selecting the centroid tasks in each one. The distribution intervals of the AI augmentation scores for the tasks are illustrated in Table 1.

Table 1 AI augmentation score distribution intervals for the 100 extracted tasks

To analyze the different types of tasks, we extracted the verbs describing each one and calculated the average AI augmentation score associated with each verb. Based on this analysis, as presented in Appendix B, most tasks with low AI augmentation scores appear to be physical or mechanical in nature (e.g., position, set, grind, clean, install, remove). As the AI augmentation score increases, the verbs become strongly associated with cognitive tasks (e.g., design, act, determine, analyze, review, advise, evaluate, instruct, supervise). These findings are consistent with the existing literature that indicates AI is likely to impact cognitive jobs and tasks more significantly than mechanical [21].

Following the extraction of the 100 tasks, we selected two from each of the eight distribution intervals for further analysis. The main objective was to determine the risk category of the task if it was performed by an AI system and ultimately the required degree of transparency associated with it. Table 2 presents the 16 tasks and their assigned risk categories. To determine the risk categories, we implemented the categorization of risk for AI systems developed by the European CommissionFootnote 11 as follows:

Table 2 Analysis of tasks and their associated risk category

Minimal risk: AI systems that pose minimal or no risk to the safety and rights of people. Such systems typically do not cause physical or psychological harm and have no or limited active interaction with humans, that is, they tend to work in the background. Examples of such AI systems are video games and spam filters.

Limited risk: Systems that are generally believed to be safe but have a very low likelihood of causing harm to people on rare occasions. These systems typically have direct and active interaction with humans, for example chatbots.

High risk: AI systems that have a high likelihood of causing damage to people if misused. Such systems can be found in multiple fields, such as education, law enforcement, transport, and healthcare with examples including surgical robots and facial recognition systems.

Unacceptable risk: Systems that are evidently causing harm or damage to humans and should not be implemented and/or deployed. Examples of this kind of AI are social scoring systems and real-time remote biometric identification.

To ensure that tasks were assigned the appropriate risk category, each was reviewed to determine if it involved direct human interaction and/or if it could lead to physical or psychological harm. First, the task was assessed to determine if it involved direct communication with a person (e.g., chatting) or had direct implications on a human subject (e.g., whether someone should get a bank loan or not). Subsequently, we analyzed whether the task could cause physical (i.e., injury or fatality) or psychological (i.e., depression or anxiety) harm to another human being.

From the analysis outlined in Table 2, several observations emerged. First, tasks with low AI augmentation scores (i.e., tasks 1 to 6) involve no direct interaction with humans and are not associated with physical or psychological harm. As the augmentation score increases, tasks involve higher levels of human interaction and greater potential for physical and/or psychological harm. For example, tasks 7 and 10 both involve direct human interaction and could cause harm. Second, we observed that most of the tasks in which AI could lead to physical or psychological harm are associated with the healthcare or medical professions, and, third, harm was a possibility here even without direct human interaction, as in task 14. Fourth, none of the extracted tasks are associated with unacceptable risk.

5 Discussion

The main objective of this paper was to examine the level of AI transparency required by different tasks that are performed by employees in different occupations in an AI-augmented setting. Based on our analysis, the first finding indicated that tasks with low augmentation scores (i.e., less susceptible to AI augmentation) did not involve direct human interaction and were not associated with physical or psychological harms; therefore, they were categorized as minimal-risk tasks. This indicates that tasks that require physical or mechanical labor are less likely to be fully augmented by AI and would always involve a human leading the activities associated with them. Since a human will always be leading the activities associated with these tasks, and they pose no risk of physical or psychological harms, we expect that such tasks would have low or no transparency requirements. This expectation stems from the fact that responsibility for such tasks is clearly held by the person performing the task, and the transparency requirement would not differ from situations in which an AI system is not used at all. On the other hand, the findings indicated that, as the augmentation score increased beyond 30%, the tasks involved more human interaction and posed a higher risk of physical and psychological harms; hence, most of the tasks with a 30% augmentation score or higher were categorized as minimal or limited risk tasks. In this kind of task, the role of humans is diminished, while the role of the AI system becomes more prominent. As a result, these types of tasks are associated with higher transparency requirements. However, what does higher transparency mean here? Higher transparency refers to the need to communicate the process required to complete the activities associated with the task, providing a confidence interval or accuracy rate for the AI system(s) used to complete the task activities, and enabling contestability if the task results in any physical or psychological harm. Let’s consider task no. 8 from Table 2, involving a teacher supervising students, as an example. Assume that to monitor and evaluate the work of the students, the teacher uses an AI system to auto-grade the students’ assignments. The task of supervising the student is considered to pose a limited risk as it involves direct human interaction and might involve psychological harm. In this case, transparency does not refer to explaining how the grading system works and publishing its source code. Rather, transparency refers to communicating how the teacher evaluates the students’ work, how and when the AI system is involved in the evaluation, and what the accuracy rate of the AI system is. Moreover, students should be allowed to contest an evaluation in which an AI system was involved, requiring the intervention of a human (i.e., the teacher) to verify the outcome of the AI system and correct any errors. In this example, even though the role of the AI system might be more prominent than that of the human, it is always the human (i.e., the teacher) who bears responsibility for the outcome of the task.

The findings indicated that tasks categorized as high risk were typically found in the medical profession, and they usually involved direct human interaction and had a high likelihood of causing physical and/or psychological harm. Let’s consider task no. 9 from Table 2 as an example of one that involves a medical practitioner (i.e., a pharmacy assistant). Let’s assume that the pharmacy assistant uses an AI system that can process hand-written prescriptions, convert them to digital format, and prescribe the medicine, or an alternative if the prescribed medicine is not available, to the patient. In the event of an error, such as failing to read a doctor’s prescription correctly or prescribing the wrong medicine to a patient, the consequences might be severe, leading to physical and/or psychological harm to the patient. In this example, the transparency requirement is certainly higher than in the previous examples. First, traceability is essential, so there must be a record of what was input into the system and what the output was. Moreover, the data as well as the source code of the system must be verified and made available to regulators. Secondly, the output of the system should be explainable. For example, the system must justify why an alternative medicine was prescribed. Third, the process of prescribing the medicine should be communicated to the patient, informing him/her that an AI system was involved in completing the prescription task.

The above examples demonstrate that the requirement for transparency differs based on the risk associated with the task. In tasks with low augmentation scores, the transparency requirements are minimal or nonexistent because the tasks are performed mostly by a human, who clearly holds responsibility. On the other hand, transparency requirements increase as the tasks become more augmented by AI systems and the role of the human is reduced. Nevertheless, it remains the responsibility of humans to verify the outcome generated by AI systems. We believe that adapting the performance of the AI system to the context in which they are deployed will reduce the burden of responsibility on humans and ease the effort of verifying the outcomes of such systems. To achieve performance adaptation, we suggest paying more attention to the different types of AI errors—namely, precision vs. recall and also AI model design, for example, the bias vs. variance tradeoffs. In contexts in which an inaccurate outcome of an AI system (e.g., failing to recognize a stock item) might not lead to any physical or psychological harm, the classification threshold of the AI system can be set as “high,” leading to higher precision but lower recall. By having higher precision but lower recall, the outcome generated by the AI system is expected to be precise, but with a likelihood of omitting relevant outcomes. In settings where relevant outcomes pose no physical or psychological risk to other humans, the burden of responsibility on the human is reduced. On the other hand, in contexts in which an inaccurate outcome of a task (e.g., failing to identify the prescribed medicine for a patient) might lead to physical or psychological harm, the classification threshold of the AI system can be lowered, leading to higher recall but lower precision. By having higher recall but lower precision, the outcome generated by the AI system is expected to identify more relevant outcomes, but at the expense of having some noise (e.g., more false positives). In the example of the AI system aiding in prescribing medicine to patients, having a lower threshold will increase the likelihood that the pharmacy assistant will not miss medicines prescribed by the doctor, yet they will still need to verify with the patient that all the prescriptions identified by the AI system were indeed prescribed by the doctor. In this example, the burden of responsibility on the pharmacy assistant is reduced, as they can be more confident that they did not miss any prescribed medicine. At the same time, the verification process is eased, as the pharmacy assistant must verify the readily available information (i.e., the list of medicines prescribed to a patient), rather than searching for the omitted information (e.g., a missed drug prescription). As the use of AI-augmented systems becomes more widespread, employees will become more familiar with notions like precision and recall. However, appreciating more advanced concepts like the bias-variance trade-off will require expertise that only a few in any organization are likely to possess. However, mapping transparency requirements to AI-augmentation score and that in turn to model complexity will advance the debate on this topic in a quantifiable manner.

Although the focus of our work is on AI transparency in workplaces, our approach and findings could be applied in different settings. First, our machine learning model for computing AIAScores could be used to inform other research on the impact of AI on jobs. For example, our model could be used to identify jobs that are highly susceptible to AI augmentation and to examine the implications of that on gender and diversity in future jobs. Similarly, our model could be used to identify the jobs that are highly impacted by AI, which would enable other researchers to investigate the skills that should be developed by people whose jobs are impacted by AI. Second, our findings on AI transparency in the workplace could be applicable in other settings in which AI systems are implemented. For example, whether a human is involved in verifying the output of an AI system or not, classification thresholds and risk are still important factors to consider to achieve transparency. For example, when businesses implement online AI systems, such as chatbots (e.g., a bot that provides automated customer support), or recommendation systems (e.g., a system that recommends content for users), the perceived risk of the content generated by the AI system should influence the adjustment of the classification threshold for the system.

6 Conclusion

In this study, we analyzed job data from the O*NET database to examine the level of transparency required for tasks in AI-augmented settings. Our analysis shows that different tasks require different levels of transparency, depending on the AI augmentation score and perceived risk category. Our findings also indicate that tasks with low AI augmentation scores are likely to be physical or mechanical and require no algorithmic transparency, as such tasks are mostly performed by humans. We also found that as the perceived risk and AI augmentation score increase, the tasks become more cognitive and the required level of transparency increases. However, our analysis indicates that the transparency requirements for AI-augmented tasks are not much different from those of other traditional technologies.

This study offers several opportunities for future work. First, our data sample did not include instances of tasks that can be categorized under the unacceptable risk category. Although the European Commission advises against the deployment of AI systems falling under such a category, future studies should investigate the implications of these systems as well as the level of transparency required to mitigate their harms. Second, our study results indicate that having a good understanding of AI errors can help adapt the performance of AI systems and reduce the burden of responsibility for humans. Hence, future studies should focus on examining the impact of adaptive performance (e.g., tweaking the classification threshold of AI systems) on trust and the adoption of AI systems.