Analysing and Organising Human Communications for AI Fairness-Related Decisions: Use Cases from the Public Sector

AI algorithms used in the public sector, e.g., for allocating social benefits or predicting fraud, often involve multiple public and private stakeholders at various phases of the algorithm's life-cycle. Communication issues between these diverse stakeholders can lead to misinterpretation and misuse of algorithms. We investigate the communication processes for AI fairness-related decisions by conducting interviews with practitioners working on algorithmic systems in the public sector. By applying qualitative coding analysis, we identify key elements of communication processes that underlie fairness-related human decisions. We analyze the division of roles, tasks, skills, and challenges perceived by stakeholders. We formalize the underlying communication issues within a conceptual framework that i. represents the communication patterns ii. outlines missing elements, such as actors who miss skills for their tasks. The framework is used for describing and analyzing key organizational issues for fairness-related decisions. Three general patterns emerge from the analysis: 1. Policy-makers, civil servants, and domain experts are less involved compared to developers throughout a system's life-cycle. This leads to developers taking on extra roles such as advisor, while they potentially miss the required skills and guidance from domain experts. 2. End-users and policy-makers often lack the technical skills to interpret a system's limitations, and rely on developer roles for making decisions concerning fairness issues. 3. Citizens are structurally absent throughout a system's life-cycle, which may lead to decisions that do not include relevant considerations from impacted stakeholders.


Introduction
Algorithms are increasingly being used for various forms of public sector services such as allocating social benefits in the domains of education, health, and detecting fraud in allowances and taxes [1][2][3][4][5].These applications can be beneficial, but can also have detrimental consequences for citizens in high-stake scenarios.Notorious examples where incorrect predictions led to wrongful accusations of citizen minorities are the COMPAS case in the US 1 , the SyRI-case2 , and the Childcare Benefit Scandal3 in the Netherlands.The latter eventually led to the resignation of the Dutch government in 2021 4 .
These examples highlight the problem of fairness in AI.Fairness in this context refers to fair outcomes for decision-making, a principle that prescribes that algorithmic decision-making must have an absence of prejudice or favoritism toward an individual or group based on their inherent or acquired characteristics [6].Nowadays, fairness -and related issues in AI -are widely recognized in well-established legal and ethical guidelines [8][9][10].According to the European Commission's Ethics guidelines on trustworthy AI [9], an important step in supporting trustworthy AI includes involving and educating all stakeholders about their roles and needs throughout the AI system's life-cycle.Indeed, algorithms are always part of a process driven by many stakeholders' design choices and socio-cultural norms [11].All the (design) decisions that are made throughout a system's life-cycle codify the underlying socio-cultural norms of the stakeholders [12][13][14].For instance, when allocating benefits in the public sector, it has to be decided which data features are relevant for the 'eligibility' for social benefits [15].Furthermore, the punitive (e.g., detecting fraudsters) or assistive (e.g., allocating social benefits) nature of policy interventions might require balancing false positive and false negative rates [2,16].Therefore, a solely technical approach to fairness is insufficient, and involving diverse actors and stakeholders is important for ensuring that public interests are prioritized and that potential harms are minimized [17,18].
To address such issues, we investigate the communication and collaborations between stakeholders throughout an algorithm's life-cycle.In this paper, we use a working definition of fairness-related decisions for all design decisions and practices applied by stakeholders that can potentially lead to bias, discrimination, and other forms of prejudice against different groups, individuals, or communities [11,15,19].We do not consider a predefined scope of fairness-related decisions and focus on the non-exhaustive scope of decisions that emerge along our investigations.We focus on internal communications between the direct stakeholders who use or build a system, rather than external communications with the general public [19,20].We do this by identifying the roles, the divisions of tasks, the required skills, and the potential communication challenges between diverse actors occurring throughout the algorithm's life-cycle.The research questions we address are the following: • RQ1: Which actors, roles, and tasks can be identified in multi-stakeholder interactions throughout the phases of an algorithm's life-cycle when making fairness-related decisions?• RQ2: Which communication patterns and challenges can be identified when stakeholders make fairness-related decisions?
To answer these questions, we conducted 11 semi-structured in-depth interviews with public practitioners working on algorithmic systems.For reasons of better accessibility, we concentrated on experts from organizations in the Netherlands, but the methodology applied in the study can be easily replicated in other contexts to further extend our results.From the interviews, we identified who makes decisions about what, and at which phase of the algorithm's life-cycle.We analyzed the interview transcripts to identify the elements that constitute communication patterns and challenges, and we labeled them through in-vivo, descriptive, and process coding [21].
We further structured our findings by building a conceptual framework that draws the key relationships between the constitutive elements of communication patterns that underlie fairness-related human decisions.First, we found that it is crucial to differentiate stakeholders by the individual actors, the roles that actors assume when contributing to a task that involves fairness-related decisions, and the skills that a task requires or an actor has.For example, simply describing a stakeholder as a developer can omit to indicate that the same actor (i.e., the same person) also assumes the role of advisor with domain expertise when they decide which features are to be used as predictors for detecting fraud.Not only do such actors endorse more than a developer role, but they can also miss the skills required for their extra role.Second, we found that it is crucial to identify the elements that stakeholders are missing, to describe the communication challenges that stakeholders experience when making fairness-related decisions.Thus, the conceptual framework we derived for analyzing the communication patterns in fairness-related decisions has 3 main characteristics: (1) it differentiates actors from their roles or skills; (2) it considers 6 key elements of communication patterns: Actors, Roles, Skills, Tasks, Information exchange, and Phases in the algorithm's life-cycle; and (3) it can specify the elements that are deemed missing in the communication patterns.After analyzing the interview transcripts using this conceptual framework, we formalized 3 general patterns that emerged from the participants: (1) Developers play the most prominent role in most tasks and phases of the algorithm's life-cycle even though they miss guidance from stakeholders with advisor and policy-maker roles and domain expertise skills; (2) end-users and policy-makers often lack the technical skills to interpret a system's limitations and uncertainty, and the related fairness implications; and (3) inputs from citizens are structurally absent in fairness-related decisions throughout a algorithm's life-cycle.
These communication challenges indicate inadequate model governance, and the potential inability to recognize and address fairness issues throughout an algorithm's life-cycle.This can lead to misinterpretation and misuse of algorithms, with critical implications for the impacted populations.The communication challenges we identified, and the conceptual framework we derived may help identify such issues before they arise in practice, after algorithms are deployed.

Related Work
Several frameworks and theories from various domains have been proposed to characterize the dynamics of interactions amongst a network of actors [22,23].Actor-Network Theory (ANT) and mediation theory, for example, describe the relations and interactions within a network of (artificial and natural) actors [23][24][25].Following ANT, interaction with technology is never neutral as it influences or mediates the way tasks and decisions are carried out.On the other hand, technology is continuously mediated by human social aspects, e.g. in formulating design goals.To describe the context of reciprocal interactions between human actors and technology, we can broadly refer to socio-technical systems (STS) approaches [22].A view centered on STS does not consider technology alone, it rather stresses the interactive nature of social and technical structures within an organization or society as a whole.This approach is increasingly used in the field of AI, to assess fairness and ethics from a broader normative context in which actors interact and operate, as opposed to focusing on individual actors alone [26][27][28].
Other frameworks have been proposed to investigate the power structures within a network of actors.Following the tripartite model for ethics in technology, three main roles are often identified through their responsibilities: (1) the developer, who handles the technical aspects; (2) the user, who handles the practical usage of the system, and (3) the regulator's role, who is responsible for making the value decisions [29].Prior research on automated systems for public decision-making has shown a shift of discretionary power from the regulator roles to developer roles, often making the latter the main decision-makers [30].When developers become the main decision-makers for design decisions, this can exclude stakeholders without technical knowledge from important decisions about the system [31,32].These imbalanced power dynamics can lead to a form of technocracy, where governance and (moral) decision-making are based on technological insights and may only yield technological solutions [18,29,33].
Beyond these theoretic considerations, empirical field research has been conducted to investigate data practices at local governments [20,34,35].For instance, Siffels et al. (2022) argue that with the process of decentralization in the Netherlands, many tasks from the central government were delegated to municipalities without giving them more resources and capacities.Municipalities invested in data practices to deal with additional tasks and to distribute limited (social) resources.Due to a lack of data literacy, however, public servants were unable to recognize ethical issues and thus sought collaboration with external partners.Other research showed that depending on their roles and tasks, stakeholders can be involved at different phases in the algorithm's life-cycle [5,[36][37][38].Decision-makers from public organizations are often involved in the procurement and deployment phases.Developers, sometimes from third parties, tend to be more involved in the development phase [36,38].This can sometimes lead to "The problem of many hands", which indicates a decreased ability to be transparent and responsible, because parts of the management of the algorithm's life-cycle are outsourced to different stakeholders [34,39].Jonk and Iren (2021) performed semistructured interviews with practitioners at 8 municipalities, to investigate the actual and intended use of algorithms [35].They found a lack of common terminology and algorithmic expertise, at a technical level and at a governance and operational level.The authors argue that municipalities would benefit from a governance framework to guide them in the use of tools, methods, and good practices to handle potential risks.Lastly, Fest, Wieringa, and Wagner (2022) investigated how higher-level ethical and legal frameworks influence daily practices for data and algorithms used in the Dutch public sector [20].They found that applying existing frameworks remains challenging for practitioners because they do not feel competent or miss the required skills to make decisions for their practices to be responsible and accountable.Data professionals, as a result, get too much autonomy and discretion power for handling decisions that belong to the core of public sector operations and mandates.
What is still missing in previous work is a framework to characterize the communication processes that underlie fairness-related human decisions throughout an algorithm's life-cycle.The frameworks and theories in related works indicate that such communication and decision processes arise within a socio-technical interactive network, where algorithms are part of a governance structure comprising actors with different roles and tasks.The literature also shows that our research must consider the interactions between stakeholders who have direct or indirect interactions with an algorithm, and with the populations impacted by the algorithm.Thus, we aim at identifying how fairness-related decisions are mediated by stakeholders who may or may not have direct access to socio-technical information that is relevant for addressing fairness issues.

Semi-structured interviews
We conducted 11 semi-structured interviews.Each interview lasted for approximately one hour.We formulated the interview questions in an open-ended manner, where participants were able to share their information in their own words whilst following a general structure of topics [40,41].Before conducting the interviews, participants received some example questions and a short description of the research.At the start of the interview, participants gave their consent for their interview to be used in this research.Also, they were asked to discuss one use case they were involved in.The questions used for the interviews can be found in Table 1 in the appendix and are divided into three main sections: 1 General: Investigation of the project and use case to which the participant contributed, the other actors involved, and the participant's team, roles, and envisioned (end) users. 2 Development process: Investigation of the type of datasets, resources, tasks, phases, and roles needed throughout the algorithm's life-cycle to make fairnessrelated decisions.3 Considerations: Investigation of the perceived challenges for role and task division, the potential improvements or failures of the system, and the communication gaps.The questions also concerned the assessment of error and bias, and the the potential negative impacts of the algorithm.
In the first two sections, participants were asked to describe the general procedures and practices used in the AI system's life-cycle.Participants had the opportunity to mention internal communication and key elements of the communication processes that underlie fairness-related human decisions.We specifically asked about communication issues in the third section of the interview.This division was made to provide the opportunity for spontaneous answers beyond our specific questions.
We preliminary tested all interview questions with a pilot with 5 researchers from different disciplines in our research lab.The questions were deemed suitable for letting participants describe their communication process and related issues.The suitability of the questions was checked in terms of comprehensibility and relevance to our research questions.No questions were altered afterwards.

Case Studies
We recruited participants who have been collaborating on multi-stakeholder projects in the public sector.Participants working in the social domain, e.g.social benefit allocation or fraud detection were of particular interest because the impacts on citizens can be critical.We used a repository of use cases that was made available to us by the Dutch Ministry of Interior Affairs 5,6 .Next to that, we used the snowball sampling technique to recruit participants.Table 1 describes the participants, their roles at the time of involvement, and if they have a technical background.We consider those who are not educated or have no experience in technical science to not have a technical background.10 participants were involved in the social security domain, and 1 participant was in the education domain.

Qualitative Coding analysis
We performed a qualitative coding analysis by labeling key codes7 from the interview output.We used in vivo8 , descriptive 9 and process coding10 to identify the process of communication exchange between diverse actors, as well as the practices and choices made at each stage of the algorithm's life-cycle.
The coding analysis was performed in multiple cycles.At each round of coding, pieces of text are annotated with codes that represent the concepts mentioned by participants.The codes are refined, merged, or split into categories after each round.This was repeated until no further refinement of the codes was needed.Two of the authors perform a separate coding analysis to reduce the impact of personal bias.We performed coding analysis by hand and using a coding analysis tool 11 .We compared both coding analyses to identify discrepancies or alignments.Beforehand, both analysts agreed that particular attention should be drawn to identifying the roles, tasks, phases, and challenges from the interview transcripts.For example, if a participant were to mention that "[person X] is a developer and performs bias analysis in the development phase", the actor, the role, the task, and the phase would be labeled.
In Figure 1, an example is given for the interview output (left) and the corresponding codes (right).The Figure shows the colors corresponding to the groups of codes for challenge, roles, task, and phase.On the right, an example of the corresponding descriptive codes can be found.For example, "it's hard to get a focused answer" was summarized as an information exchange challenge of the type where "more input is needed".We added the corresponding role(s) to the codes in brackets "[]".If the code concerned multiple roles, we added "-" to indicate a relation for information exchange.In this example, more input is needed between the end-user and the developer role.
After all interview transcripts were annotated with codes, we analyzed which codes co-occurred within the answers to each question, e.g.we counted which roles occurred together with a specific phase, task, or challenge.

Constructing a conceptual framework
The co-occurrence analysis alone did not capture the relations between codes, i.e. "end-user is missing in the development phase" would still count as a co-occurrence of the codes end-user (role) and development (phase), although the role of end-user was actually missing.Therefore, we constructed a framework that further analyses the codes we identified by describing their relations and characteristics.The conceptual framework aims to describe the key codes and relationships between high-level groups of codes (e.g.Actor, Role, Skill, Task, Phases), and the key characteristic that underlie the challenges mentioned in the interviews.We can then represent challenges such as citizens are actors with the role of Data subject (an Actor-Role relationship), and that actors with such roles are "missing" (a characteristic of Actors).
We constructed the conceptual framework iteratively, following a method similar to those used for constructing ontologies [43][44][45][46].This means that we continuously adjusted the framework until it would represent every code we identified from the qualitative coding analysis.We added definitions, characteristics, and properties to the identified concepts.We added descriptions to each concept to agree on common definitions.The relationships and characteristics we used to build the conceptual framework are based on the interviews and were in accordance with some of the definitions we found from documents provided by the European Commission on Trustworthy AI, and from other sources in the literature [8,9,29,44,45,47].For example, by describing the type of private or public affiliation (e.g., national institute, ministry, or municipality) we can contextualize how tasks and roles are divided within multi-stakeholder collaborations.Relations are added between codes.For example, an actor always "has" a certain role whereas a task "involves" a role "during" a phase.

Results
In the next section, we first describe the use cases discussed by the participants in the interviews (section 4.1).Then the results of our qualitative coding analysis are given in section 4.2.Finally, we further analyze the communication challenges and apply them to document the communication patterns and challenges we identified in Section 4.3.

Use cases
In all use cases, multiple stakeholders were involved with varying expertise-from social workers to developers, researchers, program managers, and advisors from third parties.For most use cases (10 out of 11), the procurement for the algorithm came from government organizations and municipalities.Furthermore, the envisioned endusers of the systems were in 10 out of 11 cases policy-makers or social workers at municipalities with minimal or no technical expertise.End-users and policy-makers were mentioned to be the same in most of our use cases.For the remaining use case in the educational domain, teachers were the envisioned end-users.

Identified Codes and Concepts
Our qualitative coding analysis first focused on identifying the main types of Roles, Tasks, and Challenges.It resulted in identifying 7 codes for describing the main roles (Table 2), 10 codes for the tasks (Table 3), and 7 codes for the challenges (Table 4.) In this section, we explain in more detail the concepts that these codes represent, and our decisions for eliciting a consistent set of codes.
For coding the roles, we observed that the terminology is rather diverse for the technical roles.For example, participants mentioned terms such as engineers, coders, developers, and data scientists for the role of developer.Some participants identified themselves or their collaborators as researchers.We questioned the inclusion of code for the role of researcher.However, such code can be ambiguous as the research topics could either concern the technical development of algorithms, or other domains such as governance or social security.Thus, we decided to group under the code "developer" the researchers who focus on the technical development of algorithms.Researchers that contributed from other domains sometimes assumed roles other than developers, such as advisor or manager.
For the role of manager, participants mentioned terms such as innovation managers, product owners, program managers, project managers, or CTO (Chief Technology Officer).These terms were often used interchangeably.We decided to group all management-related roles under the same code "manager", without using specific codes for each job title or hierarchical level.
In Table 2 the descriptions of the main roles can be found.For example, managers are, respectively, those who "supervise the projects for the development of the system and oversee documentation checks and balances".In our use cases, the managers often worked in the same team as developers and were either hired externally or internally by a (public) requester.
The request for the model -associated with the "requester" role-often came from ministries, and they were only mentioned for funding or initiating a project.
The "data subject" role as well as the "requester" role were never described as the end-users.In Table 2 it is also stated that data subjects are "an organization or entity that is impacted by the system, service or product" [45].The data subjects were, in almost all of our use cases (10 out of 11), citizens.The advisor role was often presented as advising on 1) domain knowledge, 2) technical knowledge, or 3) ethical knowledge.Overall, as illustrated in Fig. 2, the developer role was mentioned the most (N=189), followed by End-users (N=107) and Policy Makers (N=92).
In Section 4.2.2, we describe which roles occurred the most for which phase (4.2.2) of the algorithm's life-cycle.

Developer
Research, design, and/or develop algorithms Policy-maker Responsible for designing and overseeing the carrying out of policy and social decisions Manager Supervise the projects for the development of the system and oversee documentation checks and balances End-user (In)directly engage with the system and use algorithms within their business processes to offer products and services to others Data subject Organization or entity that is impacted by the system, service, or product Advisor Give constructive feedback on the system throughout the life-cycle Requester Who are the main client and investor for the use-case In Table 3, we describe the main tasks identified from the qualitative coding analysis.For example, the task "Consulting" refers to advising on domain, technical, or ethical knowledge aspects of the AI model.In Section 4.2.2, we describe which roles occurred the most for which task.
In Table 4, we describe the main challenges identified from the qualitative coding analysis.For example, "Interpretation" issues refer to the misunderstanding or misevaluation of information regarding the AI model.In Section 4.2.2, we describe which roles occurred the most for which challenge.

Roles and Phases
Figure 3 shows that developers are most prominent in the development, evaluation, and formulation phases, but less prominent in the deployment and monitoring phases.Developer (P1) mentioned that "we don't monitor what the municipalities are doing with the results."and "feedback is needed on how the results will be used in deployment".Conversely, stakeholders other than developers could be more involved in the development phase.Another developer (P9) mentioned "For the future, we could incorporate stakeholders at earlier stages in the development to see what the potential sources of bias are."End-users and policy-makers were the second highest in occurrences for phases.Moreover, Figure 3 demonstrates that the monitoring phase (N = 10) was mentioned the least throughout the interviews whereas the evaluation phase was mentioned the most (N = 90).
Data subjects were seldom mentioned to be involved.Data subjects could be more involved throughout the phases of an algorithm's life-cycle e.g.P5 mentioned "it depends on the type of AI.If it has an impact on citizens or uses a lot of data from citizens, it would be relevant to include a focus group of citizens from the beginning but it is less relevant for road repairs.".The role of the requester was only mentioned in the formulation phases but rarely as being involved throughout other phases.Advisor roles were often mentioned to be involved in the evaluation phase before deployment, or when the project is halted.

Roles and Tasks
Figure 4 shows that the developer role was mentioned the most for all tasks (e.g., technical decision-making, consulting, dealing with fairness and risks).This indicates that actors taking on developer roles were the most prominent in making decisions throughout the algorithm's life-cycle.About the typical tasks developers handle, developer (P9) mentioned that they "decided on how to improve accuracy and handling issues.For instance, gathering more diverse data to handle bias".About their collaboration with other roles, another developer (P1) mentioned that they "define and chose metrics for the models" and that these "are defined in collaboration with the municipality but choosing metrics and trimming down after input was decided by the two of their team".The developer role was not mentioned for tasks related to model usage.
Regarding the task of stakeholder involvement, managers are the main decisionmakers.Within teams, managers are sometimes the only ones in direct contact with roles other than developers.Managers were often mentioned to supervise developers in technical decision-making, and they often rely on the developers' judgment for bias and risk oversight.A manager we interviewed (P2) mentioned that for handling error rates and biases they "rely on the technical teams' judgment" and that "the technical colleagues give advice when the model is good enough, but it's a bit of a grey area.We also rely on literature".Another manager (P4) mentioned that "it is time intensive to explain [bias analysis] to stakeholder users.Bias analysis is sometimes so complex, even as an expert I sometimes don't understand it, and it takes a lot of time".
Actors with a developer role also sometimes assume advisor roles.When technical advisors are missing, managers can hire a third-party developer to analyze the code, give technical advice, or even build the model.An advisor we interviewed (P5) mentioned that "an external company was hired to develop the model for the municipality", which made the "data ecosystem quite complex".Another manager we interviewed (P3) added that they "hired an external bureau for auditing and investigating the algorithm", e.g., as they "could not get reliable predictions because the social domain changes all the time, and it's hard to keep track of these changes-for example in social support-and how that impacts the system".
Advisor (P10) mentioned that they "were involved to give feedback as an involved bystander.But it was hard for someone like me to understand what the difference between implementation and design is and what that means for real-life implications".This demonstrates that roles other than developers lack the technical skills to participate in decisions.

Roles and Challenges
In Figure 5 the co-occurrences of roles and challenges are shown.Most communication challenges were associated with the roles of developers, end-users, and policy-makers.
The role of the end-user was frequently mentioned for challenges related to interpretation and role.This means that most challenges were related to either the (mis)understanding or (mis)evaluation of information or an unclear function or duty division amongst actors.Several participants mentioned that more input is needed from end-users on the interpretation and use of the results envisioned in the deployment phase.For instance, a manager (P3) mentioned that it is challenging that "we don't know if governments and municipalities can understand the model".A developer (P1) also mentioned that "it's hard to get a focused answer on how they are going to use the model and what the results will be', and that "the municipality is too loosely involved in the project.
Regarding the challenges with bias, risk oversight, and the interpretation of model output, the role of policy maker was frequently mentioned for the challenges related to risk oversight.This concerns problems in governance, legal, ethical, and procedural aspects.The role of developer was frequently mentioned for challenges related to resources, input, and bias.It was mentioned that more input is needed on the analysis of feature selection and bias in the development phase from developers to end-users and policy makers.Both end-users and policy-makers were often mentioned as missing the technical skills to understand the uncertainty of predictions and limitations of the model in real-world settings.A developer (P6) mentioned about end-users that "people could trust the model blindly and mistake it for a decision-making tool" and another advisor (P8) where inspectors were the end users mentioned that they were "not sure if the inspectors fully understood why certain cases were flagged as misuses or put on the list [of potential frauds]".
With regard to challenges for bias and risk oversight, an advisor (P8) mentioned that "there should be more focus on asking users what policy-makers perceive as risks and biases" and that it is "difficult for them to understand that there are many different interpretations.What it really means to be a 'true positive', is this person really a fraud, or was this person not able to fill in the forms properly?".Interpretation challenges by end-users and policy-makers were mentioned most in the monitoring and deployment phases.A developer (P6) mentioned that "training for users is needed, to remind users not to rely on the tool but that the decision is up to them.".
Data subjects (e.g.citizens) were also mentioned for involvement and risk oversight challenges.Participants mentioned a need for more citizen involvement and for being more transparent to citizens throughout the phases of an algorithm's life-cycle.Managers are looking for appropriate frameworks for (fruitful) collaborations with citizens.Manager (P4) mentioned that "there is a long history with the citizen council for consultation and it is usually conflict-based.It's hard to make fruitful collaboration, getting them to understand the issues and getting them out of anger mode".An advisor (P7) mentioned about previous involvement with citizens that "they [citizens] said no on the feasibility of the model from the municipality.They did not get it.It was more of a general no to technology instead of asking a targeted question".

Key Insights from Qualitative Coding Analysis
From the qualitative coding analysis, we conclude that: (1) Actors with developer roles are predominant in most phases and tasks while potentially lacking the required guidance from domain experts.(2) End-users and policy makers often lack the technical skills to interpret a systems output or estimate potential fairness issues.(3) Citizens filling the role of data subjects are seldom mentioned to be involved throughout the phases of the algorithm's life-cycle.In the next section, we analyze these challenges further by characterizing the relations between the main elements of communication patterns in a conceptual framework.

Modeling Communication Patterns
The communication patterns emerging from the interviews are not easily described with qualitative analysis in written form only.The codes may identify the key elements of communication patterns, but not their relationships.Counting the (co-)occurrences of codes could not fully capture these relationships.We thus elicited a conceptual framework that models the relationships between the elements of the communication patterns (e.g., between actors, roles, and skills).This also allowed us to explore the perspective of socio-technical systems (Section 2), in which AI models and fairness-related Fig. 6 The basic concepts selected from our qualitative coding analysis, and used to characterize the communication patterns underlying fairness-related decisions, and the challenges we identified.
decisions arise through the interactions between actors.The conceptual framework we elicited is shown in Figure 6 and detailed in Table 5.
We elicited 6 elements to describe the communication patterns: Phase, Role, Task, Skill, Actor, and Information Exchange.At least two concepts were needed to characterize the communication process: the stakeholders who exchange information (represented by the concept Actor ), and the act of communicating (represented by the concept Information Exchange).Describing the context of the communications requires at least 4 additional concepts (Phase, Task, Role, Skill ) to underlie fairness-related human decision-making and their challenges.For example, Tasks may be missing at certain Phases of a system's life-cycle.Or Actors may not have the right Skill or Role when making a fairness-related decision.Skill was added as a key element of the communication pattern because the interviews showed that the challenges that stakeholders face often arise from the mismatch between their role and skill.
To provide a temporal overview of the communication processes, we link the Tasks to the Phases of the system's life-cycle in which they take place (e.g., to reflect on the fairness-related) Tasks that are executed at specific Phases).We link the Tasks to the Actors and Information Exchange they involve, to represent the stakeholder collaborations for each Task.Finally, we relate the Actors to their Roles and Skills, and also link the Roles to the Skills they require.
Adding the property is missing to any of the 6 elements in the communication model is of great interest for documenting the challenges mentioned in the interviews.We chose to represent the communication patterns using these 6 elements precisely because challenges arise if any of them are missing.For example, an Actor may miss specific Skills, or a fairness-related Task may be entirely missing.Adding information on the affiliation of Actors is also of interest to better describe the stakeholders involved in fairness-related decisions, and to identify potential issues with conflict of interests, privacy, accountability, or legal frameworks.
The elements and properties of this conceptual framework were sufficient to represent the communication patterns we observed in the interviews.Adding more elements or properties would come at the risk of making it harder to generalize to new contexts.
In the next section, we apply this conceptual framework to illustrate three relevant patterns observed in the challenges.Several participants mentioned challenges with the involvement and role of stakeholders with domain expertise (Table 4).Actors with a developer role are predominant, especially at the beginning of an algorithm's life-cycle, i.e., formulation, development, and evaluation phases (Figure 3).Developers make decisions that seem technical but have crucial implications for fairness and public policy.Yet, actors with domain expertise may not be involved in guiding such seemingly technical decisions.Actors with technical skills become the main decision-makers, while they potentially miss domain expertise skills and stakeholders with the roles of advisor and policy-makers.Such technical decisions with fairness implications include, e.g., balancing a model's False Positive and False Negative rates, or fairness metrics based on these error rates.Domain expertise is needed to assess the practical implications of each type of error 12 .Our conceptual framework (Figure 6) can be used to represent such communication issues.For example, Figure 7 illustrate the quotes from Table 6.Developers must decide which data features are suitable for a use case, and how to use them withing AI systems.Domain experts could inform developers about the context in which the data features are representative of specific social groups.Without such information, developers may decide to use data features in ways that produce biased results for specific social groups.

Pattern 2:
End-users and policy-makers may lack the technical skills to interpret the system's limitations and uncertainty Several participants mentioned challenges with the interpretation of a system's limitations and uncertainty (Table 4).At the deployment phase, actors with end-users to help individuals in need) can be more problematic than False Positives (e.g., helping less vulnerable individuals).
Participant Quote P2 [For handling bias and error rates] "the technical colleagues give advice when the model is good enough, but it's a bit of a grey area.We also rely on literature and on the technical teams' judgment".

P3
"Hired an external bureau for auditing and investigating the algorithm".[Also because they] "couldn't get reliable predictions because the social domain changes all the time, and it's hard to keep track of these changes".

P5
"An external company was hired to develop the model for the municipality, which made the data ecosystem quite complex" P8 "There should be more focus on asking users what policy-makers perceive as risks and biases" P9 "Involvement and direct information of the operators who work with AI system is needed, which particular change or improvement would be most useful for them" "Training for users is needed, to remind users not to rely on the tool but that the decision is up to them."P8 "The difficult for them to understand that there are many different interpretations.What it really means to be a 'true positive', is this person really a fraud, or was this person not able to fill in the forms properly?" Table 7 Quotes illustrating the communication pattern in Figure 8.
and policy-maker roles may question whether the system delivers what it is supposed to perform, and how to interpret the validity of its results.Yet, they may not have the technical skills to understand the limitations of a system.They may miss guidance from actors with a developer role, who have the technical skills to understand the uncertainty and the practical limitations they entail.Our conceptual framework (Figure 6) can be used to represent this communication challenge.For example, Figure 8 illustrates the quotes from Table 7.
This finding highlights a need for increased input between end-users, policy-makers, and developers at the right phase.It echoes Pattern 1, where actors with a developer role lack guidance on the implications of their technical choices.These directly impact a system's limitations and uncertainty.Some participants mentioned challenges with the lack of involvement from citizens, e.g., who have the role of data subjects.For instance, in the formulation phase, citizen participation may be missing to give feedback on the design choices of the model.It is interesting to note that most participants did not mention citizens, and may thus overlook issues with their participation in a system's design or evaluation.Our conceptual framework (Figure 6) can be used to represent this communication challenge.For example, Figure 9 illustrates the quotes from Table 8.A lack of citizen involvement may lead to unbalanced fairness-related decisions that do not include key practical considerations.
Fig. 9 Example of challenges with the involvement of citizens: they are structurally absent throughout the algorithm's life-cycle, although they are the data subjects whose data is collected and processed, and who are impacted by the deployment of algorithmic systems.Information exchange is missing for them to understand and comment on the many design choices that impact fairness.
"There is no direct citizen participation."P5 "it depends on the type of AI.If it has an impact on citizens or uses a lot of data from citizens, it would be relevant to include a focus group of citizens from the beginning but it is less relevant for e.g.road repairs."P7 [On previous involvement with citizens] "they said no on the feasibility of the model from the municipality.They did not get it.It was more of a general no to technology instead of asking a targeted question".
Table 8 Quotes illustrating the communication pattern in Figure 9.

Discussion
By characterizing the relations between the concepts in a conceptual framework, we demonstrated that unclear or undefined governance structures for roles, tasks, and skills can lead to misinterpretation of the system's limitations and uncertainty, and even to misuses of the algorithmic system.From our use cases, we also saw that it was not always clear who makes final (mostly policy) decisions on the further development or use of algorithms, or what is the (legal, procedural, information) basis for such decisions.When there is a lack of actors filling the right roles at the right phase, actors can take on multiple roles at once for which they may not be fully equipped, which can lead to a discretionary imbalance.It is possible that participants forgot to mention involved stakeholders in the development process or did not see some social participants as influential for choices, practices, and protocols.Forgetting a particular role or actor does not necessarily reflect the actual governance structure or experienced communication challenges.Participants may not have oversight, be unwilling to provide specific details, or were perhaps steered by how the interview questions were formulated.
We also recognize that, in practice, formulations for roles and groups can vary and can be diffused.For instance, in some cases, actors identified as developers would primarily identify themselves as a researcher who would also carry out"developer tasks".We stress again that counting (co-)occurrences alone is not enough to assess the structure of the communication process that underlies fairness-related decisions.As mentioned in the results, sometimes an occurrence would be counted for role and phase when actually 'role X was missing in phase Y', and thus the relations between concepts needed to be characterized to provide context to our findings.The conceptual framework we constructed covers comprehensively the relations that appear in the interviews, yet, it may not be sufficient for other scenarios.Fortunately, the incremental method applied for its construction allows for easy extension.
The number of interviews was limited (N=11) and based on participants who collaborated on Dutch social domain use cases.The fact that all use cases reside in the Netherlands was for the sole reason that it was more readily available to us.It is important to emphasize that every (public sector) use case will have its own (normative) context, specific governance, and communication structure.With our use cases, we tried to appreciate these local conditions and resist "the portability trap" of stating that every AI use case will function the same from one context to another [48,49].
Regardless of the stated limitations, our findings confirm and complement earlier work emphasizing the growing autonomy and discretion of developers in public sector operations, as well as the unclear role divisions for the usage of automated decision tool [20,[29][30][31].In terms of conceptual reorganization and synthesis, the number of interviews was sufficient to demonstrate the value of studying communication process underlying the choices and criteria for fairness-related decisions.Besides, the methodology we applied, consisting of (a) qualitative research, (b) qualitative coding analysis, (c) incremental construction of a conceptual framework, and (d) the application of the conceptual framework on acknowledged challenges, is rather generic, and we do not foresee constraints to its reuse in different, wider contexts.Yet, in terms of more general factual knowledge, More research is needed to investigate other local governance structures and communication processes around fairness-related decisions.

Conclusion
In this research, we investigate fairness-related decisions through communication processes between diverse stakeholders that work on AI algorithms in the public sector.We conducted semi-structured interviews to analyze the divisions of roles and tasks, the required skills, and the perceived challenges throughout the algorithm's life-cycle.We applied qualitative coding analysis, to identify key elements of the communication processes that underlie fairness-related decisions.The results are formulated in a conceptual framework that represents these key elements as well as missing elements such as actors who miss skills or collaborators for certain tasks.To evaluate the adequacy and value of this methodology for the study of communication processes concerning fairness-related decisions, we applied it to social domain use cases in public organizations based in the Netherlands.The results we found are potentially relevant for policy interventions, as they generally indicate a lack of involvement and feedback between developer, end-user, and policy-maker roles.More precisely, we have captured the following key observations: (i) Developers play the most prominent role in most tasks and phases of the system's life-cycle.They may miss guidance from stakeholders with advisor and policy-maker roles, and domain expertise skills.(ii) End-users and policy-makers often lack the technical skills to interpret the system's limitations and uncertainty and to estimate potential fairness issues.They rely on the technical skills of developers for making apparent technical decisions such as feature selection and balancing error rates, which potentially influence policy outcomes.(iii) Lastly, we observed that citizens are structurally absent throughout the system's life-cycle, even though it is mentioned that their involvement is needed in the future for balanced fairness-related decision-making.These findings indicate that model governance is currently inadequate, and that there is a potential inability to recognize and address fairness issues throughout an algorithm's life-cycle.This can lead to misinterpretation and misuse of algorithms, with critical implications for the impacted populations.The conceptual framework we derived can help to address such issues before deployement, highlighting where to intervene (e.g. with adequate communications, gathering necessary skills currently missing, or introducing new roles), before the algorithm goes actually in production.

Fig. 1
Fig. 1 Example of interview Q&A (left) and the corresponding coding labels (right) from the qualitative coding analysis

Fig. 2
Fig. 2 Main Roles Identified.The number of times a Role is mentioned.Note that Developers are mentioned most (N = 189) and Data Subjects least (N = 22).

Fig. 3
Fig.3Co-occurrences of Roles and Phases.The number of times a Phase is mentioned is shown on the y-axis (in decreasing order).Note that the developer role (blue) is mentioned the most in all phases and that the data subject role (orange) is mentioned the least.

Fig. 4
Fig.4Co-occurrences of Roles and Tasks.The number of times a Task is mentioned is shown on the y-axis (in increasing order).Note that technical decision-making is mentioned the most.Developer roles (blue) are mentioned the most for all tasks except for model usage.

Fig. 5
Fig.5Co-occurrences of Roles and Challenges.The number of times a Challenge is mentioned is shown on the y-axis (in increasing order).The roles of developer (blue), end-users (green), and policymakers (red) are mentioned most.

4. 3 . 1
Pattern 1: Actors with a developer role are predominant and miss guidance from domain experts.

Fig. 7
Fig. 7 Example of challenges with stakeholder involvement and role that constitute Pattern 1.Apparent technical decisions, such as defining which AI method to use and with which data features, have domain implications in practice but are made by developers alone.Other stakeholders with domain expertise are not involved in guiding the technical decisions.The missing information exchange is about the representativeness of the data features and their applicability to the use case.

Fig. 8
Fig.8Example of challenges with model interpretation: the actors that use AI models, or make policies involving AI models, may miss the skills to understand model limitations and error metrics.They may also miss information exchange with developers who can explain the limitations and uncertainty.

Table 1
Description of participants

Table 2
Descriptions of Main Roles

Table 3
Descriptions of Main Tasks

Table 4
information regarding the AI model, such as evaluation metrics Involvement Lack of participation and collaboration between actors throughout the algorithm's life-cycle Risk Oversight Problems concerning governance, legal, ethical, and procedural aspects Resources Insufficient time, planning, infrastructure, money, and documentation Feedback Lack of substantial input, information exchange between actors Bias Problems with the analysis of prejudice towards individuals or groups Role Unclear function or duty division among actors Descriptions of Main Challenges

Table 5
Description of concepts used to characterize the communication patterns underlying fairness-related decisions

Table 6
Quotes illustrating the communication pattern in Figure7where information on data quality was missing.