Keywords

1 Introduction

When a critical incident happens—a fire, medical event, natural disaster, crime in progress—we expect to be able to take out our phone, dial a short number, and almost immediately be paired with a person who will quickly send us the help we need. In fact, there is increasing demand for a broader range of emergency services communications systems (ESCS), broadly named “next generation systems.” The public expects these tools to be fully current and compatible with available technology. Call receivers and dispatchers are now expected to not only answer wireless and hardline calls, but to also intake video, photos, and text. To evaluate these systems, however, it is critically important to first understand what exists at the present, how it functions, how it can be optimized and improved upon, and where its most critical vulnerabilities and points of failure (from cybersecurity threats, natural disasters, technical problems, etc.) are.

Our research, Emergency Communications and Critical Infrastructure, aims to both understand and optimize existing emergency call and response processes by coupling large-scale simulations, existing, real-world datasets generated from government records management systems and practices, and artificial-intelligence-driven data analytics tools. Through this multiyear effort, we are developing generalized abstract models of ESCS, focusing initially on the “911” system in North America (and more specifically, the United States), building a computer environment capable of simulating ESCS at large scale (city to national level), validating these simulations against real-world data, using the software created for training and “what if” scenario exploration, and developing tools that examine call data in real time to perform tasks such as prioritizing calls when systems are overloaded and monitoring community health changes. Aspects of this work are supported by government (critical infrastructure) and by information science collaborators (data provenance and applications of AI in records and archives).

Like most, if not all, government functions, emergency communication management has workflow and documentation built into its process. Records Management Systems are a necessary and inherent byproduct of the activity, legally required for evidentiary and other public records purposes. This can mean that records may act as evidence of crimes committed, but also as evidence that the activity (in this case, emergency phone calls) itself is happening, or that the agency is functioning at all. These records can be compiled into large datasets that form the basis of our research.

These two activities—academic research and ESCS operations— each generate their own sets of paradata. Thus, this chapter is concerned not only with these two forms of paradata, but also how both of them inform the modeling and analysis efforts.

While ESCS in many North American jurisdictions look similar, they are not identical. For the purpose of the first phase of our research, we particularly focused only on priority-based, critical incident, nonbusiness calls (911 calls from citizens) in King County, Washington, USA, which was selected via processes discussed later in this chapter.

2 Paradata in Archival Science

The term “paradata” first appeared in the social sciences as a way to describe data collected during surveys and interviews that were, in some way, extra (Kreuter, 2013). This data, often described as marginalia or annotations, is generally created and used by researchers in the process of first collecting and later analyzing their data and have long been recognized as critical elements of research in both qualitative interviews and survey methodologies (O’Connor & Goodwin, 2017; Schenk & Reuß, 2023). Because of the way it is created, paradata is often referred to simply as “process data.”

As computers became a common tool for conducting surveys, the kinds of information that could be considered paradata changed. Common examples of these new kinds include mouse movement information and keystroke response time data, which can only be collected by a computer and which have been found to be so meaningful that they have warranted research in and of themselves (Fahmy & Bell, 2017). Paradata has even been used as the basis for machine-learning-based predictive analyses (Fernández-Fontelo et al., 2020). While paradata is an accepted phenomenon in some humanities and social sciences, it has not yet taken hold in archival science, where the distinctions among data, metadata, and paradata remain unclear.

In part, this is because many of the original examples of paradata, such as annotations and marginalia, have long been recognized as elements of records in the study of archival diplomatics (as discussed also in Trace and Hodges, 2023). Archival theory has a great deal to say about the role of annotations in the execution, handling, and management of records (Duranti, 1991). But while annotations have traditionally been considered an extrinsic element of records, defined, and examined by form rather than content, the term “paradata” instead considers annotations by their intrinsic and qualitative value, by content rather than form. We can see this in the InterPARES Project’s definition of paradata: “information about procedures and tools used to create and process information resources, along with information about the persons carrying out those procedures” (Davet et al., 2022). Archival science traditionally has very little to say about the content of a given annotation or information resource, and this departure is significant.

Paradata, inherently, is less relevant to the organization of an information resource than it is to the interpretation and reuse of that resource, either by its creator or by a third party at a later date. Most examples of paradata (field notes, annotations, mouse click data, etc.) are in the form of supplementary data within an initial collection process—marginalia, for example, must inherently be in the margins of another, likely more official or formal, document. Paradata, if defined as any information relevant to the creation processes or persons, is a far broader category of information than early definitions of “extra” or “exhaust” data (Pomerantz, 2015). It is entirely reasonable that information intentionally collected in the regular course of government activities could inform the acts and persons of creation. (For example, the identity of the call technician in an emergency call is intentionally recorded and stored and is paradata.) Conceiving paradata as something exclusively extra or unintended, therefore, is a mistake.

Now the lines between data, metadata, and paradata are blurry to the point of being indistinctive. That is not a bug but a feature—if data and metadata are defined by their process of creation, and paradata is defined by its informational content, there will inherently be some overlap.

Some data is paradata, and some metadata is paradata.

3 Paradata in Modeling the Real World

The researchers here are not data creators, at least in terms of the actual ESCS operation (“real world data”). Instead, researchers were granted access to a (modified) pre-existing dataset to manipulate and interpret. We needed paradata, either what was interpretable from the metadata or what could be learned through interviews and ongoing contact with government agencies, to inform the modeling process.

Modeling and simulation not only seek to understand the world but also construct a formally defined and precise representation of the world and implement that representation in software. Thus, there is a sequence of complex links in the chain from the real world to the software and, moreover, that final step to software brooks no ambiguity and implements exactly any discrepancies that have crept into the research process.

As a result, this chapter focuses on two separate forms of paradata within this research. First is paradata generated by researchers over the course of the research itself. Examples include documentation of the iterative development of the modeling workflow and meeting minutes among researchers. The second is paradata that the researchers seek out, created by others via processes beyond the researchers’ control, to understand the context of creation of the records and dataset that they are modeling.

The goal of the models in this research is to, as much as possible, recreate the real-world conditions and transactions that generated the dataset. The data itself is not sufficient to accomplish this; we need to understand the processes, policies, decision-making procedures, and people involved in the process. This information might be called “forensic reconstruction paradata.”

3.1 The Map Is Not the Territory: Forensic Reconstruction of the Real World

Over 240 million, 911 calls are made in the USA each year. The private sector drives the business of 911 in terms of technology and hardware and software infrastructure. Federal, state, and local governments drive the day-to-day implementation and operation of the system. While trade and quasi-regulatory organizations such as the US National Emergency Number Association (NENA) capture, coordinate, and catalog the national operation and policies around 911 generally, there is no centralized and coordinated US clearinghouse for baseline hardware and software standards, data management, R&D, or privacy standards, among other 911 policy issues.

Unlike the European Union and its harmonized regulations, or Israel and its single, national emergency call receiving and dispatch system, the USA and its over 6,100 call centers represent a patchwork of independent policies specific to their unique unit-of-government structure. The cascading effect related to 911 call data alone relates directly to issues such as states’ rights down to various regional (counties) and local government (municipal) regulatory structures (cities, townships, fire districts, etc.) and the philosophical, political, and mostly emotional concept of “local control.”

Thus, the paradata collected in the process of operating 911 primarily serves 911 call centers’ needs. Moreover, regardless of current technology, 911 call and dispatch is first and foremost reliant on human behavior. Call takers’ actions, responses, and interactions, both formal and informal, are grounded in their training. Paradata within the call intake and dispatch process can be as simple as the screen notes made by a single individual call taker who is also dispatching. These “notes” may or may not become part of a call record—either due to technical limitations or local data retention policies.

This paradata may also be incredibly complex. A single event may involve many call takers and dispatchers in the same public safety answering point, or PSAP, who may communicate informally by voice or email. While these notes and interactions among PSAP members may have a direct influence on decision-making in real time, they are not likely to be captured as paradata for research purposes.

However, it is certainly the case that the value of paradata in emergency response is recognized as critical to “after action reporting” and to day-to-day call center management. Given that call centers operate 24/7 and top management is not present for two of three shifts, notes and recollections of supervisors and call receivers and dispatchers directly influence personnel management as well as operations and call process, evaluation, and policy.

Therefore, while the formal records can still suit the needs of research, these records are not enough to create the forensic reconstruction that will actually address the research questions. It is more likely that activities such as interview “deep dives” will be needed to understand relevance to the research process (Simpson, 2020).

3.2 Iterative Workflow

Since this research does not require researchers to generate their own real-world datasets but rather collect existing ones, the workflow and process for obtaining data have evolved over time. This is integral to the process of constructing an abstract model and then realizing that model as simulation algorithms and software in some programming language. Each stage forces the researcher to make explicit what might have been implicit and reveals gaps in understanding or a collection of real-world data, metadata, or paradata. This is one of the primary benefits of modeling and simulation: It admits no fuzziness of thought. The current workflow has evolved and may well evolve going forward as we interact with more government agencies, seek out different datasets, and have new gaps in our understanding revealed by the modeling and simulation process. Our workflow, like our understanding of the datasets we work with and the needs of our simulations, develops iteratively.

This also means that while the workflow below is presented in a specific order, that order has not always been the case. For example, some interviews of emergency management officials were conducted before any jurisdiction was selected to request data from, simply so that researchers could begin to understand processes surrounding emergency communications record creation.

Workflow

  1. 1.

    Identify geographic region of interest:

    1. (a)

      Determine characteristics of desirable geographic area (for example, urban, suburban, rural).

    2. (b)

      Determine critical metadata elements for simulation.

  2. 2.

    Determine legal restrictions on collecting 911 data, including federal, state, and local frameworks:

    1. (a)

      Research federal, state/provincial, and local regulations.

  3. 3.

    Determine formal processes for collecting data, and determine the difference between the formal process and the “way it is done”:

    1. (a)

      Conduct outreach and interviews with employees and managers to understand the processes of creation and capture of records, of creating and storing large datasets, and of access by the public.

    2. (b)

      Select storage methods and access restrictions according to legal and ethical standards.

  4. 4.

    Obtain and begin work with a given dataset.

Each of the steps along this workflow produces, in turn, paradata of its own in various forms. For example, the process of selecting a given region for data collection is a group decision and takes place during a group meeting, where formal minutes are taken by the research lead. Those minutes are sent to each member of the team for future reference. In addition, individuals may take their own notes, which may or may not be shared with other researchers. In some cases, decisions in this step have been formalized with documentation that shows a more academic, or literature-justified, intention behind the decision. Step two is also documented. The researcher who finds the information puts the sources and a summary of the research into a document or perhaps presents the information at a group meeting.

Step three, the interview process, is where recordkeeping and documentation become more complicated and nuanced. Interviews necessarily generate a lot of documentation. There may be emails, for example, to set up the meeting time. Researchers come to interviews with prepared questions that they have each put together, and they may send these in emails to the interviewee ahead of time. Researchers take their own notes, in addition to the formal write up created by the head researcher. There may also be emails sent after the interview to forward resources, send thanks, or follow up for clarification purposes.

A final step in this process that is not outlined in the workflow but is critical to the success of the research is the evaluation of the data that comes once it has been worked with (for example, to create, refine, or expand a model or simulation design). What we learn as a group is based on how usable the dataset is, whether or not we got the metadata we needed to create an interpretable and implementable model, or if there were any unforeseen lessons from working with a given set. These meetings and conversations can take place over weeks and happen between different clusters of researchers, which means that some kind of documentation is critical to the iterative improvement of data collection processes (in addition to the extensive documentation that is generated by any modeling or software development processes).

3.3 Obtaining King County 911 Call Data

Early in this research, King County was selected as a region that could be helpful to work with. It was selected for a few reasons beyond simple convenience. King County is made up of multiple municipalities and various independent units of government (for example, some PSAPs include fire districts with separately elected boards and officials). It includes a major city (Seattle, Washington) as well as less densely populated areas. It is a region with a potential for high variability of data. There are twelve separate 911 PSAPs, including university, fire district, fire department, police departments, police and fire cooperatives, and a county Sheriff’s office. There are unique and distinct units of governance involved (such as municipalities, universities, fire districts, Sheriff, etc.) that in turn represent separate and independent call data collection policies, methods, and databases.

This decision was made in the autumn of 2020 but was not recorded in those terms until almost a year later, when additional researchers had joined and were searching for other jurisdictions to begin soliciting data from. The justification for choosing King county was ultimately recorded in one researcher’s personal, paper notes.

The identification of preferred metadata elements took place over a number of meetings between different combinations of group members and was recorded entirely on personal devices or as personal notes, likely by multiple members of the team.

Research into the legal requirements and regulations surrounding emergency communications data created a few pieces of paradata. Some of it came from notes taken during interviews with county officials. Some came from formal research and documentation undertaken by individual researchers.

Through the aforementioned interviews, researchers learned how to get access to a King County dataset and did so. Record of this process was also kept in meeting minutes, or in personal notes. The logic behind decisions on where the King County dataset was to be kept, and to whom access would be granted, was not recorded.

Finally, paradata that refers to the progress being made by working with the dataset shows up in a large number of places: in meeting minutes, in private notes, in plenary reports, where researchers present regularly about the status of their findings, and even in conference posters. Lessons learned through this process can also be found in the secondary and tertiary iterations—for example, the lists of desired data, metadata, and paradata have changed as researchers have had more working time with datasets and developed more refined models and software.

4 Paradata and Interdisciplinarity

A pattern noted above is a certain amount of chaos in the paradata from this research. No small amount of this comes from the number of different researchers and their differing areas of expertise. This research is inherently interdisciplinary, bringing together experts in the fields of computer science, machine learning, archival science, cybersecurity, emergency systems management, and critical infrastructure management. These fields are diverse and bring diverse concepts and taxonomies into an already complex effort.

Initial meetings with the full research team (including those directly associated with the InterPARES Trust) were overcomplicated by linguistic confusion between group members. For example, archival science, as noted earlier in this chapter, has a very specific definition for what counts as a record, and how a record can be differentiated from a datum, or a dataset. Language between archivists and computer scientists needed to be discussed at length and over multiple meetings for researchers to be able to communicate effectively. Similar discussions also occurred between the computer-focused researchers and those with experience in critical incident response communications management. These problems are not specific to paradata, but an inevitable element of interdisciplinary research that spilled into the paradata: Not all of the notes, marginalia, minutes, etc., are necessarily written in a manner that is intuitive to every group member.

What is more is that, since this is a long-term effort, spanning multiple years, not every researcher was involved when it began, nor will every researcher still be working on it when it ends. When this work began, before InterPARES was involved, the scope of work and expertise involved was much smaller. The duration of this work and the potential rotation of researchers, research assistants, and other participants will surely complicate the documentation process across the waves of anticipated data gathering and collaborations.

Another critical element of this diversity is a diversity of physical locations. Researchers meet mostly online, communicate online, and keep research materials in online data storage. Data and paradata are stored in mutually accessible digital spaces for all to access. That does, however, inherently limit what paradata can look like. For example, while Google Docs does have a “comment” function on documents, those comments are harder to write than typical marginalia might be and can be later deleted by any party with editing authority, meaning some amount of this paradata is inevitably lost to efforts to keep documents “clean.” In the cases where individuals keep their own notes in a freehand manner, this may not be such an issue, but the geographical disparity between researchers means that those notes are not likely to be of help to any other members of the team.

As the data gathering in this research is iterative, our paradata is being actively reused as part of ongoing data retrieval efforts. We are not done yet, and this means that our paradata is particularly useful to us as we get deeper into the research. It also means that we cannot yet grasp the totality of the role our paradata will play in our results.

5 Paradata in Developing Simulations

In fiction, a character like Hari Seldon (in Asimov’s Foundation series) may be able to accurately predict the fall of a galactic empire using an algorithmic science of his own making. In real life, we cannot yet predict the fall of a civilization, but we do have models that allow us to forecast the weather with some accuracy. Most people are familiar with this predictive nature of modeling. But what is a model? And, how does a computer scientist, foreigner to the inner workings of the 911 system, develop a mathematical model for it?

A pertinent definition of a model is that offered by Pidd (1999) within the context of Operations Research and Management Science: “A model is an external and explicit representation of part of reality as seen by the people who wish to use that model to understand, to change, to manage, and to control that part of reality in some way or other.” This definition grants us some important characteristics; a model:

  • Can be examined, challenged, and be formally defined

  • Is a partial and simplified representation of reality

  • Is dependent on the viewpoint of its stakeholders, and

  • Is fitted for a specific purpose, therefore goal-oriented

Given those characteristics, it is not surprising that there is no prescriptive methodology for modeling, a process described by Holland (2000) as the act of extracting regularities from incidental and irrelevant details. In the context of this research, our first instinct would be to dive into a 911 call log looking for regularities but without an understanding of the processes that generated it; as a result, our prospective insights will be limited. For instance, we can determine the pattern of call arrivals, service time, and wait times from a call log, but we would not be able to infer the staffing policy of a given PSAP. Unlike machine learning (ML) models that are black boxes where prediction is the primary interest, simulation models are meant for emulating the behavior of the system being modeled to elucidate greater understanding of that system.

To illustrate this modeling process, let us take a look at the six principles of modeling suggested by Pidd (1999):

  1. 1.

    Model simple; think complicated.

  2. 2.

    Be parsimonious; start small and add.

  3. 3.

    Divide and conquer; avoid megamodels.

  4. 4.

    Use metaphors, analogies, and similarities.

  5. 5.

    Do not fall in love with data.

  6. 6.

    Model building may feel like muddling through.

Not surprisingly reminiscent of the empirical model of science, these six principles outline an iterative process where abstraction, simplicity, and decomposition are key aspects; careful thought and analysis drive the collection of data and not the other way around. As we learn more about the 911 system, we are able to formulate better questions and determine what datasets we need. At the time of writing, the relevant dataset of interest includes caller data, responder data, and Geographic Information System (GIS) datasets. The identification of these arose from conversations with stakeholders in addition to technical documents that outline the policies and procedures of the 911 system. For instance, the decision to look into the GIS datasets was due to the call routing and dispatching procedures. Calls are routed based on their geographic location and the service boundaries of PSAPs; moreover, dispatching is dependent on the first responder proximity to the emergency event.

5.1 Modeling Emergency Services Communication Systems

Human activity systems, such as ESCS, have the following characteristics: boundaries, components, behavior, an internal organization, human activity, human intent, openness to the environment, limited life, and self-regulation (Pidd, 2007). Following the previously discussed modeling principles, we address these characteristics in our models through simplification and abstraction. We begin by identifying the main components of the system and then move into modeling three main characteristics:

  1. 1.

    The internal organization of the system

  2. 2.

    The internal behavior of its main components, and

  3. 3.

    The interactions among these components

Although ESCS are complex multinetwork systems, we must shear away details because a model must be simpler than the real system (to be useful). The seemingly simple act of dialing 911 triggers a sequence of steps that involve many layers of technology and emergency personnel, generating data and governed by paradata at each layer. However, we have identified three main types of entities: Caller Regions (CR), Public Safety Answering Points (PSAPs), and Responders. In our models, we use the name Caller Region (CR) to denote a geographic area where calls originate from, a PSAP is an emergency call center responsible for answering emergency calls and dispatching first responders, and Responders indicate the headquarters where first responders are dispatched (e.g., police and fire stations). Moreover, these components are arranged in a network that underlies the GIS-based call routing and dispatching dynamics.

To model ESCS networks, we leverage mathematical constructs known as graphs. Graphs are used to model network nodes (called vertices) and their connections (called edges) in a pairwise relationship. This particular model is a directed graph where vertices denote the aforementioned ESCS entities and edges represent the communication channels between them. In a directed graph, the relationship expressed by the edges has a direction (for example, every call has a caller and an answerer, and thus the communication is asymmetrical). Because the connectivity is based on geographic coordinates and boundaries of some of the components, we extract the network topology from GIS datasets by encoding the components’ jurisdictional and neighboring relationships in a directed graph.

The abstract concept for modeling the internal behavior of ESCS entities comes from the realization that PSAPs are specialized call centers. Discrete Event Queuing Models, extensively used in the management of call centers, are suitable for modeling the processing of emergency calls by PSAPs. Furthermore, the same concept can be applied on the responders’ side for modeling dispatching and response actions (this will not be discussed in this writing as it is not meant to be a complete model description).

To illustrate the concept, we will use the general queuing representation of a call center, shown in Fig. 1. Conceptually, a call center contains k trunk lines with up to the same number of workstations (\(w \leq k\)) and agents (\(n \leq k\)). One of three scenarios occurs when a call arrives: It is answered right away if there is an available agent, the call is placed in a queue if there are no available agents, or the caller receives a busy signal if there are no trunks available. In this model, we think of an agent as a resource that is occupied, while a call is being answered, then immediately released once it has been served. Calls are lost due to blocking when all trunks are busy or when a caller abandons the queue due to impatience, possibly redialing soon after. Consequently, arriving calls come from either those who make an initial call, those whose got a busy signal, or those who abandoned the queue after waiting.

Fig. 1
A diagram of a queuing system in a call center. The arrival of calls is in a queue before the agents respond and serve the needs of the callers. While waiting, callers may receive a busy signal or abandon the call. A busy signal results in lost calls, while an abandoned call leads to redialing.

Call center as a queuing system. The diagram represents a system with 3 agents (n) and 8 trunk lines (k); therefore, the size of the waiting queue is 5 (\(n - k\))

Stochastic processes such as call arrivals, service time, and customer impatience must be modeled through random variables drawn from a suitable probability distribution. Existing research in this area provides possible candidates, but their goodness of fit has to be evaluated based on data obtained from the real system. Call arrivals, in particular, exhibit burstiness—intraday, interday, and seasonal variability—that requires extra modeling effort. One idea for modeling call arrivals comes from the realization that calls are the consequence of emergency events; following this logic, one can model the arrival as a cluster point process (Cox & Isham, 1980) characterized by:

  • A primary process that defines the emergency events as the realization of a stochastic process

  • A subsidiary process that defines the number of calls triggered by each emergency event and the separation among them through discrete probability densities; and

  • A pooling that consists of the superposition of all clusters that results in the cluster point process

The last abstract concept in the model is that of Communicating Finite State Machines (CFSM). A simulation starts with an initial setup with defined parameters and runs for a defined number of time steps; the values of parameters and simulation variables at a given time step are known as the state of the system. In the model being discussed, the state of the system is the compound state of every vertex and edge in the graph. At every time step, ESCS entities consume inputs and undergo state changes that might cause them to send outputs to other vertices.

For performance purposes, our in-house simulator (Stiber et al., 2017; O’Keefe et al., 2022) is designed to facilitate the implementation of simulation models on Graphic Processing Units (GPUs). These GPU implementations allow us to achieve high-performance simulations of large complex systems, but at the same time, their high parallelization presents challenges for interconnections. Modeling the interaction between ESCS entities as Finite State Machines that communicate with each other by transferring event messages through their connecting edges provides a useful abstraction for highly parallelized processes such as our GPU implementations.

5.2 Iterative Paradata

Thus, we see the close of the loop in the iterative process described herein. What at first seems straightforward—ESCS involves emergency calls—becomes more involved as specific decisions are required in the model and its implementation. Roughly speaking, the road traveled has been:

Paradata Iteration Workflow (Reconstructed)

  1. 1.

    Start with a basic understanding of the system and initial questions/goals for inquiry.

  2. 2.

    Consult with stakeholders, refine conceptual model of system operation, and secure sample call data.

  3. 3.

    This new information reveals new questions for the simulation to answer and that the data cannot be treated from a simple “black box” point of view:

    1. (a)

      Stakeholders are interested in inferring the state of the world in real time as calls come in, and so a simple statistical model of calls as a sequence of random events is insufficient. Instead, the model of call generation should include the “primary process” of events, each of which can generate one or more calls.

    2. (b)

      Call data includes substantial information, such as locations, unique call identifiers (substituted for phone numbers, to identify repeat calls), call type, etc. The metadata surrounding such information is not interpretable without understanding the ESCS processes that govern such things: how caller location is determined, how calls are categorized by PSAPs (free form text, selection from a limited set of choices, decision tree), etc.

    3. (c)

      Original call data may include personally identifiable information. Local laws and procedures (which may vary from one jurisdiction to another) control what parts, if any, are public information and the processes for data sharing (Coyle & Whitenack, 2019). The investigators must develop a data management plan that applies sufficient control to satisfy external stakeholders so that memoranda of agreement for data sharing can be created and agreed to by both organizations:

  4. 4.

    The model is expanded to include first responder dispatch, and conversations with stakeholders reveal details of how this dispatch occurs and what data is collected as part of that process.

  5. 5.

    The responder dispatch process is based on geographic locations of entities (callers and responders) and PSAP service boundaries. Therefore, GIS data must also be incorporated into the simulation, and agreements must be reached with external stakeholders to share such data. Luckily, there are widely adopted standards for GIS.

6 Conclusion

The traditional archival understanding of the contexts of records or collections is composed of two concepts: respect des fonds, the idea that every set of records can be traced to a single creator or creating body, and original order, the idea that the order that the creator put records into is meaningful and should be preserved. By preserving the identity and order of the creator, the context and meaning of records can be preserved. This forms the archival concept of provenance, a tool that facilitates the arrangement and description of records for archival preservation and access. Provenance focuses, again, not on the content or information within materials but on their contexts of creation and storage by creators, and perhaps for electronic records static structure or metadata.

This traditional approach may be most useful for archivists who manage records like those of government agencies, which tend to fit neatly into static structures and corporate ideas of creatorship. Forms of records and information, however, have changed and continue to change in the digital age. Records and information are so often decontextualized into data, where they can be easily analyzed but not so easily understood. Analysis can combine and recombine data in a vast number of different ways, including AI algorithms that render provenance opaque (as is also discussed in Trace & Hodges 2023). But what is the purpose of analysis if not to create understanding? And shouldn’t the construction of models of systems’ operation, implementation of such models in software, and analysis of the results of simulation be considered core to our process of understanding?

Records are representative of the thing they document. In the case of the work described in this chapter, everything we are modeling is itself a process—making a phone call happens over time, emergency response vehicles take time to get from A to B. The model we have created, therefore, is itself an embodiment of paradata, process data. Such models can allow researchers to ask questions not possible to ask of the original dataset, and there is unique utility in using models to recreate processes as opposed to querying a database or spreadsheet.

While paradata can help record creators, it is especially important for users, especially if those users are not the original creators. Our research group is a good example here, with diverse backgrounds and needs that caused us to focus on different elements of ESCS systems. The archival understanding of context, creation, and provenance needs to expand.

Paradata in archival science could come to add a third dimension to provenance, as archivists come to realize that structural metadata, the name of the creating agency, and the order in which records are stored may not be enough context to make records usable, either to actors within a given agency or, as in this case, to researchers.

There is limited utility in litigating whether a given piece of information is data, metadata, or paradata. These terms, as previously discussed, are not mutually exclusive. This project shows how this can be a feature. A flexible mindset can allow users of information resources to adapt those resources to answer their specific questions. In our case, where our initial questions concern real-life processes rather than outcomes, paradata is essential. The records themselves may not answer the questions that a model can. This can help drive change, in this case framing a more structured process in ESCS to evaluate, adjust, improve, advance, and ultimately uptrain emergency call receivers and first responders. Respect des fonds, original order, and paradata together create a much more complete view of the process of how a record or dataset came to be than the first two alone.