The Role of Paradata in Algorithmic Accountability

Trace, Ciaran B.; Hodges, James A.

doi:10.1007/978-3-031-53946-6_11

Ciaran B. Trace⁶ &
James A. Hodges⁷

Part of the book series: Knowledge Management and Organizational Learning ((IAKM,volume 13))

2 Accesses

Abstract

This chapter examines how the doings of the algorithm (instantiated through its operations, actions, and steps) and its accompanying algorithmic system are revealed and explored through an engagement with the paradata created as a part of this data-making effort. In doing so, the chapter explores how the concept of paradata helps us understand how information professionals and domain stakeholders conceptualize accountable algorithmic entities and how this influences how they emerge as documented and describable entities. Two complementary frameworks for capturing and preserving paradata for accountability purposes are examined in the process. The first is associated with diplomatic theory and archival notions of context and focuses on the role of paradata for algorithmic transparency. The second is related to knowledge management and to efforts in the AI community to use paradata to create unified reporting models that enhance the explainability of algorithms and algorithmic systems. The chapter concludes by demarcating examples and different use cases for paradata for accountability purposes and the mechanisms by which these agents of transparency and explainability can connect with interested and vested audiences.

Author “Ciaran B. Trace” has died before the publication of this book.

You have full access to this open access chapter, Download chapter PDF

Keywords

1 Introduction

With a surge of new technologies leading to a growth in volume, type, and complexity of data-making efforts and their associated data outputs, steps are underway to discern the concurrent processes of data genesis, maintenance, and use. This chapter examines the algorithm and the system surrounding it as a form of information object in a way that adds to the critical inquiry into the data-making efforts inherent in their construction and use. Drawing on scholarship from information science and the broader research community adopting a critical stance on the study of algorithms and algorithmic culture, the chapter examines how the doings of the algorithm (instantiated through its operations, actions, and steps) and its accompanying algorithmic system are revealed and explored through an engagement with the auxiliary data—or paradata—created as a part of this data-making effort.

Before examining paradata in this context, the central part of this introduction establishes a broader narrative on the meaning of algorithms and the reasons for their critical study as part of today’s complex data-making landscape. Indeed, this complexity is corroborated by the fact that data-making in governmental, organizational, community, and personal contexts now includes data traces generated through mobile apps, wearables, the Internet of Things (IoT), social media platforms, robots, and the blockchain (Wolf & Blomberg, 2020; Trace & Zhang, 2021; Desjardins & Biggs, 2021; Nasir et al., 2019; Mohammad et al., 2022). As a complex digital object comprising more than just content, software also exists as a ubiquitous form and output of data-making, of which algorithms (the artifact of interest in this chapter) are among the most studied kinds. As objects, methods, and tools for getting things done, algorithms exist in written and performed forms, defining the steps that computer programs follow to solve matters in the process manipulating and organizing data in some manner. In algorithmic decision-making, data is used to model aspects of the world. Using datasets as inputs to a model involves processing source data into select features or variables that allow for and are relevant to the predictions to be made. As a dominant type of AI technology, machine learning algorithms “operate over data inputs and learn from them in that they refine and develop their representations of the world (their models) in such a way that they can predict outputs based on new inputs, classify inputs, and infer hidden variables” (Mooradian, 2019).

As agents in the world, algorithms are obviously consequential, existing to extract, search, filter, classify, recommend, prioritize, and make predictions about people’s identities, preferences, and behaviors, feeding into decisions about hiring, criminal sentencing, credit scoring, financial lending, and the like. Given the import and impact of their reach, efforts are underway to demarcate requirements for creating and consuming AI systems to reduce any likely individual or collective risk posed by these technologies (Castelluccia & Le Métayer, 2019; Piorkowski et al., 2020). The accompanying drive to understand AI systems is such that increased importance and visibility are given to the various types and forms of information (including paradata) that document this data-making activity across the management, design, development, testing, implementation, and deployment phases. These documentation efforts aim at audiences whose decision-making is supported or affected by AI systems as well as system auditors and reviewers (Information Commissioner’s Office & Alan Turing Institute, 2020).

Of relevance to this chapter is the fact that the reach, power, and complexity of AI models have solidified calls for organizational accountability around the data-making activity that forms the algorithm, accountability being “an overarching principle characterised by the obligation to justify one’s actions and the risk of sanctions if justifications are inadequate” (Castelluccia & Le Métayer, 2019, p. III). More precisely, accountability overlays several essential requirements for AI models and systems.^{Footnote 1} Understandability is acknowledged as a critical extrinsic requirement (property and method) for accountability, highlighting the need for comprehensible information to be provided to interested parties regarding the link between the inputs and outputs of AI systems (Castelluccia & Le Métayer, 2019). In turn, the foundational components for understandability exist in the form of transparency and explainability (Castelluccia & Le Métayer, 2019).

The call for transparent AI is bound to the need for organizations to comply with legal and regulatory frameworks for AI systems, and, in this instance, it is policy documents along with operational records such as code, design documentation, model parameters, and learning datasets that need to be available for scrutiny for transparency to be present. As a “ways of working” approach to documenting AI systems, this method provides oversight and insight into internal operations, “demonstrating that you have followed good governance processes and best practices” throughout your design and use of algorithms (Information Commissioner’s Office and Alan Turing Institute, 2020). Explanations break into operational, logical, or causal types, generally created and applied to the algorithm or some local and specific result (Castelluccia & Le Métayer, 2019). The call for explainable AI is tied to the benefits that accrue to organizations in cementing external trust with stakeholders, communities, and individuals by way of increased knowledge and awareness as subjects and consumers of AI systems and services (Information Commissioner’s Office and Alan Turing Institute, 2020). In this instance, ex-ante and in medias res analyses and post-hoc reflections need to be available for explainability to be present. Such an outcome-based approach to documenting AI systems involves, for example, “clarifying the results of a specific decision”—that is, “explaining the reasoning behind a particular algorithmically-generated outcome in plain, easily understandable, and everyday language” (Information Commissioner’s Office and Alan Turing Institute, 2020, p. 22).

In the remainder of this chapter, we explore how paradata, as a type of information object, helps give further substance to the notion of algorithmic accountability and its associated concepts of understandability, transparency, and explainability. In particular, through a review of the extant literature, the chapter examines how information professionals and domain stakeholders conceptualize accountable algorithmic entities and how this influences how they emerge as documented and describable entities. Two complementary frameworks for capturing and preserving paradata for accountability purposes are examined in the process. The first approach is related to diplomatic theory, which is an investigative tool used to understand the universal characteristics of archival documents. It focuses on the role of paradata for algorithmic transparency and incorporates archival notions of context. The second is related to knowledge management and to efforts in the AI community to use paradata to create unified reporting models that enhance the explainability of algorithms and algorithmic systems. The chapter concludes by demarcating examples and different use cases for paradata for accountability purposes and the mechanisms by which these agents of transparency and explainability can connect with interested and vested audiences.

2 Professional Considerations and the Concept of Paradata

Paradata is a core construct in information studies research that seeks to capture (literally and figuratively) the means and the mechanisms by which a body of information comes to be. Huvila et al. (2021) clarify that what paradata documents and describes are practices and processes. In a work context, practices encompass resources that manifest as part of pursuing an ongoing and overarching goal or interest. At the same time, processes are put in place to get things done. Processes consist of circumscribed activities or steps carried out using sequential or chained actions (both physical and mental) coupled with appropriate methods, technologies, etc. The result of practices and processes is a defined outcome and an accompanying documentary trail, with paradata functioning as a mechanism to ascertain, model, investigate, contextualize, and reconstruct these past occurrences.

In studying the concept of paradata, the question arises as to why it is necessary to document the practice and process by which information is created and used. In one scenario, paradata arguably serves as a tool for organizations to optimize the business practices and processes from which paradata emerges, providing the knowledge necessary to serve as a feedback loop to create improvements, such as system efficiency and effectiveness. This scenario aligns with the understanding and goals of the knowledge management profession.

As a discipline, knowledge management (KM) focuses on the value that information, whether held personally or in formats that allow it to be shared and exchanged, provides in organizational settings (Williams, 2006). Work by Williams (2006) draws from applied linguistics and semiotics to help delineate the practical and theoretical understandings that frame KM, including how the nature of information is understood within organizational contexts. In this articulation, ante-formal information is “flexible, dynamic, and variable,” while formal information is understood as created and explicitly constructed for means of exchange. As Williams notes, formal information is “the outcome of the strategic choice to forgo some of the play and slippage of everyday language, in order to transcribe and transform particular aspects of everyday conversation into formal information” (Williams, 2006, p. 96, 85). The formalization of information in the context of doing business comes about as part of the ways of “doing things” and “making things” (resulting in information about practices and processes), and of describing the type of context in which they may be used (Williams, 2006, p. 85). As Williams articulates, “these artifacts are, at the most obvious level, physical artifacts, but they can range from simple physical artifacts through the range of natural language, right up to complex computer programs for running, supporting and managing all sorts of processes—both physical and social” (Williams, 2006, p. 85).

The role of the knowledge manager is to help organizations adopt an integrated approach to acquiring, communicating, and utilizing information so that it can be put to optimal use for dynamic learning, situational awareness, problem-solving, decision-making, strategic planning, cost savings, and the like. The KM emphasis is thus on helping people comprehend and gain valuable insights from what is considered an essential resource and asset, no matter its level of formalization or stasis. From the perspective of studying algorithmic systems, the role of paradata within a KM lens is scoped such that tangible and intangible work products (the results of work that include design documentation such as flowcharts, training and learning datasets, internal technical code documentation, etc.) can be utilized by the creator and the user of the algorithm as part of improving and optimizing the data curation and computational processes that feed algorithmic systems. Given the push for algorithmic explainability through “post-hoc interpretability” (Castelluccia & Le Métayer, 2019), paradata (such as documentation about the features/variables and other assumptions used in the design of the algorithm) could also be used in a KM framework to illuminate and impart information about the robustness and logic of the algorithmic process, including helping to explain its associated inputs and outputs (results). In this manner, paradata helps ensure that deployed systems are comprehensible to businesses and other users.

This job of facilitating the subsequent scrutiny of and judgment about practices and processes, including their associated inputs and outputs, is an equally important role for paradata. In this scenario, the problem that paradata solves is tied to the need for AI system accountability via transparency. The information professions of records and information management and archival science are best positioned to support paradata’s role in this scenario. Like KM, these professions are attuned to notions of value, but information’s nature and significance are understood differently. These recordkeeping professions work from the assumption that what we, as a society, do now and have done in the past can be recorded in a manner that can serve as ongoing evidence of, and thus render an account of, what has happened and why. As Hurley notes, records are “especially relevant in documenting the event that triggers the accountability process, and the action or situation under review” (2005, p. 228). The focus here is squarely on information (paradata) that has been formalized, with a record defined as “a document made or received in the course of a practical activity as an instrument or a by-product of such activity, and set aside for action or reference.”^{Footnote 2}

Yeo describes records as persistent or enduring representations of occurrents, occurrents being “phenomena that have, or are perceived to have, an ending in time” (Yeo, 2018, p. 130). The entities that records represent include events, activities, transactions, and “states of affairs,” defined as how things existed at specific points in time (Yeo, 2018). Marrying diplomatic theory (which we will get to in a moment) with Searle’s theory of speech acts, Yeo notes that records also represent “assertive, directive, commissive or declarative acts, which are performed by virtue of a record at the moment of its issuance” (2018, p. 152). This frame provides a view of records in which they are understood as stating propositions and how things are in the world, making inquiries or creating future obligations, undertaking to do or carry out something, and bringing about change by declaring it to be so (Yeo, 2018). Data that is contextualized and that is configured to provide appropriate levels of persistence are also considered records. In this instance, contextualized data is a form of what Yeo calls “assertive records”: “representations of statements or assertions that have been made about people, organizations, places, events, the results of investigations or the state of the world” (Yeo, 2018, p.145). In this telling, records denote and attest to personal, organizational, and governmental action and are thus evidence of what people and systems engage in as part of the ongoing conduct of work. From the perspective of the study of algorithms, recorded information enables, instantiates, documents, describes, and serves as evidence of the practices and processes that come into play as part of the decision to deploy advanced computers and applications to specific problems. Paradata, thus, is married to the notion that packaged data in the form of descriptions and documentation are contextualized understandings of work practices and processes.

Algorithmic paradata and the broader world in which this data-making effort takes place are more fully conceptualized by applying insights drawn from diplomatic and archival theory. Information doings have long been pertinent elements of concern for archival science. In particular, monitoring and capturing conceptualizations of practices and processes find a home in archival notions of context. Context comes into play as archivists assume the role of information broker. As an infrastructure, the archive serves as a conduit between creators and subsequent users of historical records, allowing these information objects to settle permanently in place with a guarantee of continued authenticity and usability (Trace, 2022a, b). As part of the work of transporting information across time and place, archivists seek to transcribe the context of its original production and use. In doing so, archivists document the “biography of the records, their creator and creation, the serial processes and activities that brought them into being, and the acts of sedimentation that settle them in systems, all the while seeking to reconstruct this life history within an archival fonds” (Trace, 2020, p. 92). To unpack the constituent parts that contextualize organizational records, archivists also rely on diplomatic theory to better understand the phenomena at play. Diplomatics offers theories to understand and critique the record and its associated practices and processes.

If in Library and Information Science (LIS), descriptive bibliography entails the close examination and cataloging of a text as a physical object, diplomatics emerged as an analytical technique dating from the seventeenth century to study the authenticity and provenance of recorded information (Duranti, 1998). Now updated to study digital records and recordkeeping systems, diplomatic theory reveals how records emerge from administration by unpacking their foundational and necessary elements. In effect, diplomatics allows us to explore paradata retrospectively while pulling us into the circumstances in which it was created in the first instance. To do so, diplomatics instructs us, entails grappling with a broad recordkeeping system composed of a juridical system, an act, a will (to manifest the act), persons, procedures, and a documentary form.

A juridical system is any circumscribed entity, such as an organization or industry, with rules that bind its members’ behavior (Iacovino, 2005). Tied to notions of governance and regulation, juridical systems establish the boundaries wherein records have authority and from which legal and moral obligations can be ascertained (Iacovino, 2005). Within a juridical system, an act constitutes the reason records are brought into being, with records associated with the moment of action in which they partake. A will to manifest the action (what is done for a purpose) is effected through a procedure that, according to diplomatics, consists of the body of written or unwritten rules created to carry out an activity. The procedure brings acts or actions out in the world into the record. Processes are the series of motions by which a person prepares to carry out the acts involved in a procedure. Diplomatics tells us that pointers or clues to procedural contexts may be evident in the substance of the document’s text or may leave a documentary residue in the form of annotations (or additions to the record’s content) added as elements of intellectual form during various procedural moments.

Different procedure phases are also associated with different types of records and determine aspects of their documentary residue. As modern diplomatics has established (Duranti & Thibodeau, 2006) and the policies of the US National Archives (2020) attest, the algorithm itself constitutes a record, albeit one with no traditional (paper-based) counterpart. In this instance, an algorithm is considered an enabling record that uses its digital form to guide the execution of processes. As Duranti and Thibodeau note, software can be viewed as a record in contexts in which it is “generated and used as a means for carrying out the specific activity in which it participates and stands as the instrument, byproduct, and residue of that one activity” (2006, p. 60). What also ensures that this documentary form rises to the level of a record is that it is “properly maintained and managed as intellectually interrelated parts of records aggregations” (2006, pp. 60, 67).

Overall, the practices, procedures, and processes from which the record is created are noted as a describable context of genesis that results in values and actions out in the world being brought into the record, whether in sequence or in parallel to an action. In effect, what diplomatic analysis allows us to abstract or make visible are critical aspects of administrative activities and action—highlighting the practices and routines that typically govern records creation and their flow throughout an organization. In addition, diplomatics allows us to demarcate those records, and aspects thereof, that are a direct residue and thus evidence of procedural and processual action. In our example, diplomatics highlights that form follows function, with algorithmic code, for instance, a manifestation of organizational priorities, including that of the creators and the practical activities and professional roles they inhabit. The documentary form of the resultant algorithmic code reflects a truism; different parts of the world end up in distinct parts of the record.

3 Further Unpacking Algorithmic Practices and Processes

In revisiting the notion that paradata (in whatever recorded form) document and facilitates subsequent interpretation of and judgment about algorithmic practices and processes, this section delves more deeply into what paradata is necessary to adequately convey the essential parameters that have gone into the production of algorithmic systems for the purpose of accountability. Here we show that extant evidentiary records of AI systems and processes (see paradata for transparency in Table 1) provide the documentary fodder to augment existing explainable AI reporting frameworks (see paradata for explainability in Table 1) together forming a viable basis for emerging documentation standards in the accountable AI sphere (see Lena Enqvist, 2023).

Table 1 Unified Framework of Paradata for Accountability

Full size table

Beginning with the recordkeeping realm, the literature allows us to frame what paradata should be captured and preserved if the target is algorithmic transparency. To do so involves grappling with the difficulties in defining an AI record and what it means to permanently capture and preserve it from an evidentiary perspective (Mooradian, 2019). As Andresen adroitly notes, “There is no universal method for referencing algorithms, or for telling exactly where a specific algorithm, that is supposed to be the unit to be explained, starts or ends within a system in operation” (2020, p. 135). As Andresen also explains, “records that are generated from automated or algorithmic processes do not necessarily differ much from manually created, captured and organized records in matters of evidential value and trustworthiness,” yet “explaining the content of such records may be more difficult” (2020, p. 129). In claiming that “sufficient explanation cannot be obtained from studying process flow or computer program code alone,” Andresen draws attention to the fact that complex systems (particularly those that draw from dynamic and often volatile data sets using ML or probabilistic outcomes) often generate records that can be “difficult to explain, trace, or recalculate after the fact” (Andresen, 2020, p. 130).

In extrapolating what might constitute a sufficient AI record, Mooradian covers familiar grounds in defining it in terms of the “actions, transactions, and events that are carried out (fully or in part) by AI algorithms” (Mooradian, 2019). In providing examples, Mooradian (2019) makes a case for both practice and process documentation, noting that such materials will likely include policy documents, technical documentation on the algorithm and data used as inputs, base systems design, and testing records, along with the forms of compliance documentation being called for within the AI policy sphere. Andresen (2020) also makes a case for practice and process documentation, suggesting that explanations that shed light on the algorithm and its output must be drawn from both. Thus, one source is from discreet external business practices and associated policy documents that provide the context necessary to shed further light on procedural matters. The other source is specific internal transactional processes, operations, and activities from which additional records emerge. If adequate control is available on the data input, Andresen (2020) notes that operational and policy records should be able to capture explanations of algorithmic outputs that are certain, while only policy records are likely to be able to render explanations of scenarios in which different algorithmic outcomes or explanations were possible. Moving forward, Andresen tasks his readers with further ferreting out “what kinds of records, from what kinds of processes, explanations and predictions may reside in” (2020, p. 140). Drawing from the literature and examining AI workflows and online tools for writing AI documentation, Table 1 provides the starting point from which such work can build.

In writing a scoping document on algorithmic decision-making for the European Parliament, Castelluccia and Le Métayer explain that requirements for AI systems can be either “established a priori” (by design) or “checked a posteriori” (using verification) (2019, p. 25). Considering insights from KM and domain experts adopting a critical stance on algorithms, the second framework unpacked here looks at what paradata should be captured and preserved if algorithmic explainability is the target to be deduced. Paradata, in this context, consists of ex-ante and post-hoc documentation scoped for exchange purposes, illuminating and imparting information about the robustness and logic of the algorithmic process, including helping to explain its associated inputs and outputs. Six of the most prominent efforts to generate information about algorithms and algorithmic systems for explainability are considered here, details of which are incorporated into a unified framework below (see Table 1).

Some explainable documentary frames speak to the algorithmic system more broadly. In contrast, others relate to components, including datasets used to train, build, and evaluate models for AI systems and others (artifacts already standard in some areas of the computer industry). In a nod to the former and drawing from the literature on explainable Artificial Intelligence, Sokol and Flach (2020) delineate the parameters of explainability fact sheets, a self-reported list of requirements that offers information to parties (including developers) interested in understanding and comparing new and extant explainability approaches (software tools and techniques) for predictive systems, alongside the method itself. Dimensions, reflecting desired properties of explainable approaches, are operationalized as information “desiderata.” Desiderata encompass information on functional requirements including the learning task and problem type to which the explanation is tailored, the component (data, models, predications) targeted by the explanation, applicable feature types and classes of models for the explanation, and its relation to the predictive system (ante and post-hoc); information on operational requirements that characterize how users interact with explainable approaches and under what conditions (e.g., provenance, type, and delivery mechanism of the explanation; how the system and explanation interact; intended function, application and audience for the explanation, etc.); information on the properties (usability criteria) of explanations that makes comprehension possible (including soundness, completeness, contextfullness, complexity, parsimony interactiveness, personalization, novelty, actionability); the effect of explainability on the robustness, security, and privacy of the system; and information on any validation measures (user studies or synthetic experiments in settings comparable to deployment scenarios) taken on explainability approaches.

Also proposed in terms of looking at these issues from a functional level are documentation efforts in the form of service-level declarations that emanate from suppliers of AI services to increase confidence in their finished products (in contrast to other efforts described below that focus on datasets or machine learning models). In this scenario, the producers are understood as data scientists, while the consumers of the AI service are pegged as other developers (Arnold et al., 2019). Modeled on industry documents called supplier’s declarations of conformity (SDoCs), FactSheets are proposed as self-reported information about the supplier, their services, and the characteristics of the development team; the intended domains, purpose, usage, procedures, implemented algorithms, and outputs of the AI service; the methodology and results of associated supplier and third-party safety and performance testing (including which datasets the service was tested on); any potential harms that could result from using the AI service and associated mitigation efforts (including features that relate to fairness, explainability, and accuracy of predictions); security concerns and sensitive use cases; and maintenance of the lineage of the AI service (which speaks to issues surrounding the auditability of data sets and trained models).

With an intended audience of AI and ML practitioners, developers, adopters, regulators, policymakers, and impacted individuals, a frame dubbed “model cards” extends the notion of what it means to evaluate how well human-centric AI and ML-trained models perform through the inclusion of metrics that “capture bias, fairness and inclusion criteria” (Mitchell et al., 2019, p. 220). “Model cards” provide a means of disclosing the nature of the model (its who, what, when, and how) and the contexts and domains of use to which it is suited or not suited; model performance across relevant factors, including population groups (cultural, demographic, phenotypic, and intersectional), input instrumentation, and deployment environment; model performance metrics; the datasets used to train and evaluate the model (including how they were chosen and any pre-processing activities carried out on the data); results of the model performance (qualitative analysis) disaggregated by selected factors; and any ethical considerations, challenges, and recommendations noted as part of model development. Providing insights useful for interrogating models’ performance, design, adoption, and effects, this documentation is considered a form of ethical reporting intended to be used alongside reporting methods for datasets (Datasheets, Nutrition labels, Data Statements, Factsheets, etc.).

The datasheets for datasets frame is built to address a need for a standardized way to document ML datasets to “increase transparency and accountability within the machine learning community, mitigate unwanted societal biases in machine learning models, facilitate greater reproducibility of machine learning results, and help researchers and practitioners to select more appropriate datasets for their chosen tasks” (Gebru et al., 2021, p. 86). The aim is to convey information from creators (e.g., product teams) to consumers and other interested stakeholders, including policymakers and academics (Gebru et al., 2021). The datasheet template promotes ex-ante reflection and post-hoc recording of information attuned to the dataset lifecycle or workflow: motivation, composition, collection process, pre-processing/cleaning/labeling, uses, distribution, and maintenance. Scoped for natural language processing systems, data statements are similarly envisioned as a new feature of professional practice, in this case, one that allows for linguistic datasets (collections of speech, writing, and annotations) to be characterized in ways that “provides context to allow developers and users to better understand how experimental results might generalize, how software might be appropriately deployed, and what biases might be reflected in systems built on the software” (Bender & Friedman, 2018, p. 587). Viewed as a necessary extension of the NLP field, the goal is to have long, or short-form data statements accompany publications on new datasets and experimental results and be included in NLP system documentation. Ideally created contemporaneously with dataset creation, the information schema for data statements consists of the curation rationale for texts; language variety; speaker, annotator, and curator demographics; speech situations; text characteristics; recording quality; and dataset provenance.

A final exemplar is the dataset nutrition label, a prototype diagnostic tool (consisting of a method, an associated documentary process, and an interactive web-based application) aimed at improving the fairness, accuracy, and transparency of AI systems by allowing training datasets, or proxies thereof, to be interrogated in terms of their quality, viability, fitness for purpose, etc. before and during AI model development (Holland et al., 2018; Chmielinski et al., 2022). The documentation that makes up a dataset nutrition label aggregates and distills essential information for use by data specialists to inform conversations about dataset quality, specifically their fitness for statistical use cases. Established in a modular fashion, the various prototypes of the tool have incorporated data that are technical and non-technical in nature, with modules generated manually from as-is information (e.g., meta-information about the dataset, information regarding data provenance, and textual descriptions of variables in the dataset) and, unlike the datasheets example, by automated statistical processes to find patterns, relationships, or anomalies (e.g., information about dataset attributes via summary statistics, visualized pair plots, and heatmaps of ground truth correlations).

4 Discussion and Conclusion

Now that the nature and form of paradata for accountability have been examined in the AI sphere, the question turns to the mechanisms through which these agents of transparency and explainability can connect with interested and vested audiences, experts, or otherwise. One idea floated for AI services is to have suppliers post and distribute explainability documentation like Factsheets via the blockchain (Arnold et al., 2019). In the local government sphere, public AI registries are touted as a mechanism for people to have understandable and up-to-date information about AI systems (Ada Lovelace Institute, 2021). These online database registries—adopted by city governments in Canada (Ontario), Finland (Helsinki), and France (Antibes, Lyon, Nantes)—are created to capture and make available information from suppliers on the purpose, responsible parties, datasets, data processing, impacts, oversight, and mitigation measures for individual AI systems (Meeri et al., 2020).

In this and other instances of explainable AI, the capture and subsequent use of paradata are situated within an accountability framework in which the populace (those in civil society) is provided with a contemporary window into how and why AI is being used as part of governing structures and activities, with the ability to understand and thus question its benefits and limitations. In moving forward with such registries, additional work will be needed to determine how to integrate appropriate documentation into organizational AI development practices and processes (including ascertaining the responsible parties for explainable paradata creation), as well as how to provision the paradata in terms of scope and detail such that it is fit for purpose (Meeri et al., 2020). Indeed, the pressure to ensure that the use of AI is human-centered (comprising trusted systems and services) has led to the call for at least one associated field (computer vision) to have “dedicated dataset professionals”; professionals undertaking data curation activities in association with external stakeholders and in a manner that aligns trust with “purposefully constructive reporting” (Famularo et al., 2021, p. 2; also see Jo & Gebru, 2020). This chapter contributes to the question of who is qualified to perform this and other human data labor by demarcating how information professionals are already scoped to take on such a role.

As noted in this chapter, the “right to an explanation” approach to accountability is not the only recourse for issues of information asymmetry in the AI environment. As the Ada Lovelace Institute notes, an “under-considered” form of accountability “concerns the preservation and archiving of algorithmic systems for historical research, oversight or audits” (2021, p. 48). As this chapter demonstrates, a complementary form of accountability is possible when records management and archival mandates are put to work to control, manage, and subsequently preserve the paradata necessary to provide transparency about the practices and processes of creators and users of algorithmic systems. Beyond an immediate “need to know” from an internal governance perspective, paradata that can hold people to account provide retrospective and internal transparency (Heald, 2006). As such, it can be utilized by those with a vested interest in auditing and studying the inner workings of the development and impact of AI systems over the long term. Overall, the combination of in situ and post hoc paradata and the requisite skills of information professionals should allow digital registries and archives to function as critical intermediaries between those who create and develop AI systems and those who require or engage in their critical study. In moving forward with AI archives, further work will be needed in the records and information management spheres to review the mandates and regulatory environments surrounding AI practices and procedures and to undertake work process analysis as a prerequisite to developing collection and disposition rules for AI paradata, including identifying what to transfer to an archival repository for long-term preservation. Archivists will also need to supply the curation activities to allow paradata to remain accessible and contextualized, with the specifications for such work currently being investigated in the literature (Van der Knaap, 2020; Hodges & Trace, 2023; Trace & Hodges, 2023).

Notes

1.
Fairness is demarcated as a key intrinsic requirement, bringing with it the notion that the training data of an algorithmic decision system should be free from bias (Castelluccia & Le Métayer, 2019).
2.
InterPARES 2 Project Glossary, http://www.interpares.org/ip2/ip2_term_pdf.cfm?pdf=glossary

References

Ada Lovelace Institute, AI Now Institute and Open Government Partnership. (2021). Algorithmic accountability for the public sector. https://www.opengovpartnership.org/documents/algorithmic-accountability-public-sector/
Andresen, H. (2020). A discussion frame for explaining records that are based on algorithmic output. Records Management Journal, 30(2), 129–141. https://doi.org/10.1108/RMJ-04-2019-0019
Article Google Scholar
Arnold, M., Bellamy, R. K. E., Hind, M., et al. (2019). FactSheets: Increasing trust in AI services through supplier's declarations of conformity. IBM Journal of Research and Development, 63(4/5), 1–31. https://doi.org/10.48550/arXiv.1808.07261
Article Google Scholar
Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587–604.
Article Google Scholar
Castelluccia, C., & Le Métayer, D. (2019). European Parliament Scientific Foresight Unit (STOA). Understanding algorithmic decision-making: Opportunities and challenges (Report No. PE 624.261). European Parliamentary Research Service.
Google Scholar
Chmielinski, K., Newman, S., Taylor, M., Joseph, J., Thomas, K., Yurkofsky, J., & Qiu, C. Y. (2022). The dataset nutrition label (2nd Gen): Leveraging context to mitigate harms in artificial intelligence. arXiv preprint, arXiv:2201.03954.
Google Scholar
Desjardins, A., & Biggs, H. R. (2021). Data epics: Embarking on literary journeys of home internet of things data. CHI ‘21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 1-17. doi:https://doi.org/10.1145/3411764.3445241.
Duranti, L. (1998). Diplomatics: New uses for an old science. Scarecrow Press.
Google Scholar
Duranti, L., & Thibodeau, K. (2006). The concept of record in interactive, experiential and dynamic environments: The view of InterPARES. Archival Science, 6, 13–68. https://doi.org/10.1007/s10502-006-9021-7
Article Google Scholar
Enqvist, L. (2023). Paradata as a tool for legal analysis: Utilizing data on data related processes. In I. Huvila, O. Sköld, & L. Börjesson (Eds.), Perspectives to paradata - Research and practices of documenting data processes (pp. xxx–xxx). Springer.
Google Scholar
Famularo, J., Hensellek, B., & Walsh, P. (2021). Data stewardship: A letter to computer vision from cultural heritage studies. Proceedings of the CVPR workshop beyond fairness: Towards a just, equitable, and accountable computer vision, 25 June 2021.
Google Scholar
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Iii, H. D., & Crawford, K. (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86–92.
Article Google Scholar
Heald, D. (2006). Varieties of transparency. Proceedings of the British Academy, 135, 25–43.
Google Scholar
Hodges, J. A., & Trace, C. B. (2023). Preserving algorithmic systems: A synthesis of overlapping approaches, materialities and contexts. Journal of Documentation. https://doi.org/10.1108/JD-09-2022-0204
Holland, S., Hosny, A., Newman, S., Joseph, J., & Chmielinski, K. (2018). The dataset nutrition label: A framework to drive higher data quality standards. arXiv preprint, arXiv:1805.03677.
Google Scholar
Hurley, C. (2005). Recordkeeping and accountability. In S. McKemmish, M. Piggott, & F. Upward (Eds.), Archives: Recordkeeping in society (pp. 223–253). Chandos Publishing. https://doi.org/10.1016/B978-1-876938-84-0.50009-3
Chapter Google Scholar
Huvila, I., Greenberg, J., Sköld, O., Thomer, A., Trace, C., & Zhao, X. (2021). Documenting information processes and practices: Paradata, provenance metadata, life-cycles and pipelines. Proceedings of the Association for Information Science and Technology, 58(1), 604–609. https://doi.org/10.1002/pra2.509
Article Google Scholar
Iacovino, L. (2005). Recordkeeping and juridical governance. In S. McKemmish, M. Piggott, & F. Upward (Eds.), Archives: Recordkeeping in society (pp. 255–276). Chandos Publishing. https://doi.org/10.1016/B978-1-876938-84-0.50010-X
Chapter Google Scholar
Information Commissioner’s Office & Alan Turing Institute. (2020). Explaining decisions made with AI. https://ico.org.uk/media/about-the-ico/consultations/2616434/explaining-ai-decisions-part-1.pdf.
Jo, E. S, & Gebru, T. (2020, January). Lessons from archives: Strategies for collecting sociocultural data in machine learning. Proceedings of the 2020 conference on fairness, accountability, and transparency, pp. 306–316.
Google Scholar
Meeri, H., van de Fliert, L., & Rautio, P. (2020). Public AI registers: Realising AI transparency and civic participation in government use of AI. https://algoritmeregister.amsterdam.nl/wp-content/uploads/White-Paper.pdf
Mitchell, M., Wu, S., Zaldivar, A., Barnes, P. B., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru T. (2019, January). Model cards for model reporting. Proceedings of the conference on fairness, accountability, and transparency, pp. 220–229. doi:https://doi.org/10.1145/3287560.3287596.
Mohammad, A., Vargas, S., & Čermák, P. (2022). Using blockchain for data collection in the automotive industry sector: A literature review. Journal of Cybersecurity and Privacy, 2(2), 257–275. https://doi.org/10.3390/jcp2020014
Article Google Scholar
Mooradian, N. (2019). AI, records, and accountability. ARMA Magazine.
Google Scholar
Nasir, J., Norman, U., Johal, W., Olsen, J. K., Shahmoradi, S., & Dillenbourg, P. (2019, October) Robot analytics: What do human-robot interaction traces tell us about learning? 2019 28th IEEE International conference on robot and human interactive communication (RO-MAN) (pp. 1–7). IEEE. doi:https://doi.org/10.1109/RO-MAN46459.2019.8956465.
National Archives, Office of the Chief Records Officer. (2020). Cognitive technologies white paper: Records management implications for internet of things, robotic process automation, machine learning, and artificial intelligence. https://www.archives.gov/files/records-mgmt/policy/nara-cognitive-technologies-whitepaper.pdf
Piorkowski, D., González, D., Richards, J., & Houde, S. (2020). Towards evaluating and eliciting high-quality documentation for intelligent systems. arXiv preprint. arXiv:2011.08774.
Google Scholar
Sokol, K., & Flach, P. (2020, January). Explainability fact sheets: A framework for systematic assessment of explainable approaches. Proceedings of the 2020 conference on fairness, accountability, and transparency, USA. doi:https://doi.org/10.1145/3351095.3372870.
Trace, C. B. (2020). Maintaining records in context: A historical exploration of the theory and practice of archival classification and arrangement. The American Archivist, 83(1), 91–127. https://doi.org/10.17723/0360-9081-83.1.91
Article Google Scholar
Trace, C. B. (2022a). Archival infrastructure and the information backlog. Archival Science, 22(1), 75–93. https://doi.org/10.1007/s10502-021-09368-x
Article Google Scholar
Trace, C. B. (2022b). Archives, information infrastructure, and maintenance work. Digital Humanities Quarterly, 16(1) http://www.digitalhumanities.org/dhq/vol/16/1/000603/000603.html
Trace, C. B., & Hodges, J. A. (2023). Algorithmic futures: The intersection of algorithms and evidentiary work. Information, Communication, and Society. https://doi.org/10.1080/1369118X.2023.2255656
Trace, C. B., & Zhang, Y. (2021). Minding the gap: Creating meaning from missing and anomalous data. Information and Culture, 56(2), 178–216. https://doi.org/10.7560/IC56204
Article Google Scholar
Van der Knaap, T. (2020). Honesty through archiving: The contribution of archiving to fair algorithm use by municipal authorities. [Master’s thesis Heritage Studies: Archival and Information Studies (dual), University of Amsterdam].
Google Scholar
Williams, R. (2006). Narratives of knowledge and intelligence … beyond the tacit and explicit. Journal of Knowledge Management, 10(4), 81–99. https://doi.org/10.1108/13673270610679381
Article Google Scholar
Wolf, C. T., & Blomberg, J. L. (2020). Making sense of enterprise apps in everyday work practices. Computer Supported Cooperative Work, 29, 1–27. https://doi.org/10.1007/s10606-019-09363-y
Article Google Scholar
Yeo, G. (2018) Records, information and data: Exploring the role of record keeping in an information culture. Facet.
Google Scholar

Download references

Acknowledgments

This work was supported by Good Systems, a research grand challenge at the University of Texas at Austin.

Author information

Authors and Affiliations

School of Information, The University of Texas at Austin, Austin, TX, USA
Ciaran B. Trace
School of Information, San Jose State University, San Jose, CA, USA
James A. Hodges

Authors

Ciaran B. Trace
View author publications
You can also search for this author in PubMed Google Scholar
James A. Hodges
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James A. Hodges .

Editor information

Editors and Affiliations

Uppsala University, Uppsala, Sweden
Isto Huvila
Uppsala University, Uppsala, Sweden
Lisa Andersson
Uppsala University, Uppsala, Sweden
Olle Sköld

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Trace, C.B., Hodges, J.A. (2024). The Role of Paradata in Algorithmic Accountability. In: Huvila, I., Andersson, L., Sköld, O. (eds) Perspectives on Paradata. Knowledge Management and Organizational Learning, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-031-53946-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-53946-6_11
Published: 18 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53945-9
Online ISBN: 978-3-031-53946-6
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics