Analyzing archives and finding facts: use and users of digital data records
- First Online:
- Cite this article as:
- Adams, M.O. Arch Sci (2007) 7: 21. doi:10.1007/s10502-007-9056-4
- 447 Views
This article focuses on use and users of data from the NARA (National Archives and Records Administration), U.S. Who is using archival electronic records, and why are they using them? It describes the changes in use and consequently user groups over the last 30 years. The changes in use are related to the evolution of reference services for electronic records at NARA, as well as to growth in the types of electronic records accessioned by NARA. The first user group consisted mainly of researchers with a social science background, who usually expected to handle the data themselves. The user community expanded when electronic records with personal value, like casualty records, were transferred to NARA, and broadened yet again when a selection of NARA’s electronic records became available online. Archivists trying to develop user services for electronic records will find that the needs and expectations of fact or information seeking data users are different from those of researchers using and analyzing data files.
KeywordsArchivesResearch dataNARAElectronic recordsArchival reference servicesUsers of electronic records
An introductory backdrop
More than a decade ago, a reference archivist at the U.S. National Archives and Records Administration (NARA) who works with electronic records received a call from a colleague who works with textual paper records. He had a request from a researcher seeking to identify artists among the Japanese Americans who were evacuated to relocation camps at the outset of U.S. participation in World War II. The archivist had found paper records in Record Group (R.G.) 210, Records of the War Relocation Authority (WRA), that included occupational codes and mimeographed reports with tables categorizing the occupations of persons in the camps.
The Preliminary Inventory for R.G. 210 predated NARA’s receipt of any electronic records in this Record Group. Despite this, the archivist was generally aware that NARA had data on each person relocated to a WRA camp preserved in a contemporary computer-readable format. He wondered if the occupational codes corresponded to the electronic data. The series he had in mind, known as the [electronic] Records about Japanese Americans relocated during World War II, contains demographic, educational, and occupational data collected on a Form WRA 26 from evacuees as each arrived at a camp in 1942. In offices of the WRA, coders, verifiers, and keypunch operators recorded information from the forms so that the data could be analyzed using the punch card tabulating machines then found throughout federal government offices (Paulauskas 1973). A couple of decades later a copy of the punch card records was migrated to magnetic tape. After two more decades the records were a primary source in a federal program to compensate the evacuees, after which the tape copy was transferred to NARA’s custody (Adams 1995, p. 198, note 57).
The electronic records archivist replied to her colleague that in all likelihood someone could readily identify evacuee-artists through computer analysis of the data. After reviewing the technical information for this series, she realized that doing so might be more complex than the extant reports on the occupations of the evacuees suggested. Depending upon the criteria used to produce the reports, reanalysis of the data might also lead to a revised enumeration of the Japanese American artist population. This was because for each person the data recorded not only primary occupation, but secondary and tertiary as well, and had two fields for “potential” occupations. The reports did not say which levels of occupation were used for the tabulations.
Even if the researcher made a visit to NARA to examine the approximately 109,400 Forms WRA 26 that are in the case files that NARA preserves among its textual paper holdings, his research could have been frustrating. The case files are arranged alphabetically by evacuee surname. Given their volume, there was no efficient way for the researcher to obtain the information he sought using any traditional access mode. As a practical matter, easily finding analog records in a voluminous series depends upon the way the records are arranged. In contrast and regardless of their volume, neither the preserved order of digital records within a file nor the ordering of fields within data records has an impact on their accessibility, as long as there is technical information identifying their structure. For this reason some records in digital form, especially in data files, can be significantly more useful than related analog records.
The researcher had posed a fairly straightforward request and had not expressed any interest in analyzing electronic data records. Nonetheless because the data are preserved in a contemporary digital format, anyone using standard data processing software can create a subset of the records on the basis of the various data fields with occupational information, or on the basis of any other combination of fields. Since the records are a primary source directly related to the researcher’s inquiry, even if as a type and in a format he did not seem to have anticipated, the archivists agreed that they needed to tell him about the electronic records series.
When the researcher phoned the electronic records reference unit, he said that he was a Professor of Art History at a public research university, interested in WRA evacuee-artists for a study of Japanese–Americans who had artistic careers after World War II. He indicated that he was nearing retirement, had never used a computer, and so was unfamiliar with data processing or analysis. He added, almost in passing, that he had lived in a WRA camp.
As the archivist described the WRA digital data it seemed that the professor may have remembered the kinds of biographical information the WRA collected from evacuees. Upon hearing that the records with these data were available as an electronic records series, he became more interested and asked how he could use them. The archivist explained that he could order a copy of the digital file on removable computer media of his choice, and hinted at the kinds of analysis he might attempt, depending upon his goals and the type of computer and software he or an aide would utilize. At the time the only access mode that NARA supported for digital records was to provide copies of files on removable media of the user’s choice.
The archivist also suggested that since the researcher had said he was unfamiliar with computer-based data analysis, he might wish to contact one of the social science departments or the computer center on his campus to see if either had consultants available who could assist him in retrieving and analyzing the data records of interest. She explained what he needed to know, before placing an order, about the technical specifications of the computer hardware and software he would be using for his project, as these could affect the specifications for his copies. Finally, she encouraged him to order a copy of the technical information for the file so that he could review it and decide for himself whether the digital records seemed potentially useful for his research. The order for the documentation arrived within days; with a lag of only a couple weeks he ordered a copy of the electronic records file on a magnetic tape.
Subsequently the researcher told the archivist that he had identified 60 artists on the basis of his defined occupational criteria, and was linking these data with other material he had found in his research. He also was contemplating a project based on the WRA records of more than 1,000 additional evacuees whom he identified through the secondary, tertiary, or “potential” occupations coded in their data records. In addition, now that he had access to what he called “a concrete record of the Japanese and Japanese American population in the U.S. in 1942,” he was considering compiling a biographical dictionary of artists of Japanese ancestry working in the U.S. from 1893 to 1946 (S. Omoto 1993, personal communication).
This saga serves as an informative backdrop for our essay. Initially WRA managers had used the Form WRA 26 punch card data to learn about and plan for the relocated population. The migration of the records to magnetic tape meant that when the U.S. Department of Justice began implementing the Civil Rights Act of 1988, their staff could directly load them from tape into a network of databases that served as the source for finding and verifying the Act’s beneficiaries. Fifty years after the creation of the punch cards, their archival preservation in contemporary digital form facilitated research on artists among the evacuees. Copies of the data file have been ordered many times since their transfer to NARA; academics have placed about half of the orders, presumably for a variety of research purposes. Within the recent past, someone organizing a reunion for persons who completed high school in the WRA camps wrote to NARA about how effective the on-line Access to Archival Databases (AAD) resource was in identifying that cohort. She was among those who, having discovered the records online through AAD and with little or no guidance from NARA’s staff, collectively ran almost 30,000 successful queries for records from this file during the first 2 years of the public AAD.1
Describing the multiplicity of uses of this one archival electronic records file among the thousands of files also focuses our preliminary and general commentary. These particular record date from the punch card era, during which relatively few series of this records type were retained. Their contemporary accessibility offers a context for the frequently noted complexities and challenges related to primary source documentation in a technological society. In our day, many commentators sound the alarm about the challenges of preserving the contemporary digital heritage. They are real and all of us have an interest in any effort to raise awareness about them, but some historical perspective may also help. The digital records on WRA evacuees and the few other digital series remaining from that period provide evidence that over more than half a century the principles and practices that guide archivists and users of archives have evolved and adapted to “information society” realities. As an example of a blending of personal information and professional research motivations, the art historian’s experience reminds us that finding documentation that matches the information sought may lead researchers to learn about and use new types of records. The example also suggests that archivists need to understand both distinctions and complex complementarities among record types and formats in order to assist researchers so they can benefit from the expanding universe of primary source documentation in the 21st century.
The context of reference services for archival electronic records
Twenty-first century discussions among archivists reflect the new technologies. In his consideration of the changes of recent decades, John Fleckner identified one aspect that is central for archival reference services. Information and communication technologies have, he wrote, “so increased access to archives (and changed archival methods) that our entire enterprise has been transformed” (Fleckner 2004, p. 9). Helen Tibbo argues that the “rise of a ubiquitous networked information environment has revolutionized archival descriptive practice” (Tibbo 2003, p. 10). Both assessments are well founded. Yet perhaps the technology-induced expansion in the varieties and volumes of records with archival value will ultimately be even more transformative and revolutionary.
All the information processing and communication tools of desktop technology are producing digital variations of previously analog records, as well as new kinds of records. Perhaps because at least some of these newer forms of electronic records are narrative or visual, and thus familiar record types, they have been under the archival spotlight, becoming a focus for archivists and traditional communities of archival users. Yet most of the literature of the archives profession dealing with what Elizabeth Yakel has called “those pesky electronic records” (Yakel 2000, p. 149) does not distinguish between types of electronic records, nor has it asked questions about the similarities or differences in reference services for digital contrasted with analog records. Rarely, if ever, does it specifically consider digital data, nor are the accomplishments of the data archives looked to as models. A quick review of the “Digital Records Bibliography, 1995–2003,” maintained on the webpage of the National Archives of Australia (2003), suggests that only one article related to the work of any of the academically based social science data archives was judged of sufficient relevance to be included (a 1998 report from the Data Archive at the University of Essex (England) to the British Library Research and Innovation Centre). Richard Cox’s literature review of archivists and electronic records was likewise limited to four journals in the traditional archival profession (Cox 2000). Helen Tibbo’s article on the types of material used by American historians allowed for the possibility that the respondents might identify electronic databases or electronic texts. Her analysis did not specify the time period of research interest for the historians, but only historians whose research subject begins no earlier than the period of World War II could be considered likely candidates to use “born digital” data records as a primary source. Nonetheless she reports that electronic databases were identified as a source by 67 of the 152 respondents (44%), although only 20 (13%) ranked them among their top three types of materials used (Tibbo 2003). Given that the primary intent of her study was to examine historians’ information-seeking behaviors “since the advent of the World Wide Web, electronic finding aids, digitized collections, and an increasingly pervasive networked scholarly environment,” it is likely that the databases identified were either digitized or bibliographic sources, rather than “born digital” primary sources.
To date, few archival institutions report accessioning many electronic records and only a few traditional archives have accessioned digital data records. The U.S. National Archives and Records Administration (NARA) has the history of this oldest and largest program for archival electronic records of any traditional archives is provided in Ambacher (2003). Its holdings are “born digital” federal records; using them always involves some computer hardware and software. As with the WRA example, a small volume originated as punch card records in the early to mid-20th century that later were migrated to more contemporary computer-readable formats and media. Some of the time U.S. federal agencies migrated them to tape for their own needs; in a few cases NARA staff did this in order to preserve their usability. A much larger volume of NARA’s electronic records are from data processing applications that used mid-20th century and later mainframe computers. They are structured data files with records that are neither narrative nor eye-readable; quantitative data predominate. Yet, as Thomas Brown (1996, pp. 235–237) has shown, it is inaccurate to generalize that all records accessioned from the mainframe era were statistical records or that all data records are statistical in origin. Some textual electronic records produced with mainframe computer text processing applications that can be displayed in an eye-readable fashion with appropriate software but without any “translation” of codes, were also transferred into the National Archives from this era (e.g., electronic records of the Presidential Commission on the Space Shuttle Challenger Accident, etc.).
For three decades electronic records have been transferred to archival custody on magnetic tape. With the decline of mainframe computing dominance, data processing now most often takes place on desktop computers. Thus since the 1990s, many data processing records transferred to NARA come from desktop applications and a declining proportion are transferred on magnetic tape, although some are still in this format. A similar evolution occurred in the transfer media used to fulfill orders for digital data files and NARA now ships most copies of digital data on optical media. As work in U.S. federal government agencies migrated to the desktop, and as staff adopted office automation and the new communication systems that became societally ubiquitous in the 1990s, the types of digital records that agencies identify for permanent preservation likewise multiplied in variety as well as volume.
Currently the organization with custodial responsibility for NARA’s federal archival electronic records is the Electronic and Special Media Records Services Division, Modern Records Programs, Office of Records Services—Washington, DC. This division and its predecessors have long used administrative records from its reference program as, in the words of Helen Tibbo and Brian Dietz, an internal “data-rich foundation for improving archival operations and administration... [that] support organizational decision-making, program initiatives, design of user services...” (Tibbo and Dietz 2004, p. 8).
In keeping with that practice this essay uses administrative records from NARA’s electronic records reference program and attempts to answer some familiar questions: who is using archival electronic records, and why are they using them? Are the uses and users of digital records similar to or different from the use and users of analog archival materials? Have they changed over time? An underlying assumption is that the answers to these questions may have general applicability for archives now and into the future. This essay is also a practitioner’s attempt to address some of the challenges of 21st century archival reference services. She assumes that sharing experiences of services for the first type of digital records may help inform user services for other types, as they come into the sphere of archival responsibility.
Finally, writing for this journal’s special issue on digital archiving of research data provides an opportunity to share a perspective on this topic. Some of the digital data NARA preserves and to which it offers access is a by-product of governmental research, and all of the holdings are potential primary documentation for research. By highlighting selected research use of some of NARA’s unique digital data, we also provide a basis for readers to consider how NARA’s reference services for electronic records complement not only NARA’s reference services for analog records, but also the services offered through the community of social science data archives. The work of some of those institutions is reflected in articles elsewhere in this issue.
As a group, these institutions, varied though they are, have similar goals and share many of the basic functions of traditional archives. Foremost are preservation, standards-based description, and user services. The commonalities between traditional archival institutions and data archives point to the value of sharing lessons learned. The arena of digital research data and governmentally produced digital data used for research provides an intersection for doing this.
The professional organization of the data archives community is the International Association for Social Science Information Service and Technology (IASSIST). In the IASSIST context, archival materials are computer-readable data produced by and maintained for research, as well as, at some of their institutions, useful for teaching in the social sciences, broadly considered. The collections of some data archives, especially those with national or international clientele, include selected governmentally produced data that may, depending on the country, also be within the purview of the nation’s traditional archives. Differences in laws and archival traditions, as well as a variety of pragmatic decisions by the varied researcher groups the data archives support, have led to variation in their missions and clientele. The archival responsibilities of a few of the national data archives overlap or are part of their traditional national archival infrastructures; many preserve and provide access to data not elsewhere available. The late Per Nielsen recounted the fascinating history of the evolution of the Danish Data Archives (DDA) from an academic research data organization to an independent unit of the Danish State Archives Group (Nielsen 1995). There have been further changes at the DDA in the decade since publication of Nielsen’s article.
Users of archival digital data
In reviewing three recent research projects on archival users, Barbara Craig reflected upon their commonalities. The projects had focused on two key archival user groups: historians and genealogists, and she posits that collectively they represent “archives’ most numerous clients.” The context of the projects related to traditional archives, where textual narrative materials dominate, be they on paper, microform, or potentially as digitized or digital text (Craig 2003, p. 98). Can we similarly describe users of non-narrative archival source materials, in particular digital data?
The research clientele of NARA’s electronic records program has never included many historians. Yet experience with reference services for federal digital data records also supports the concept of a dichotomy among researchers. We generalize one group as archival users involved in “original” research projects, a category that defines much of the archival research of academics, but is not solely their province. The other general category consists of those who seek archival materials as a source in their quest for any kind of factual or personal information. The nature of the differences is reflected by the numbers. Information-seeking archival users tend to dwarf the number of “research” users and many of the sources they use are similar. Research users tend to seek a far broader range of archival records, pursuing uniquely defined objectives. The two user groups are often distinct although not mutually exclusive. There are some archival series that both types rely upon repeatedly, yet use in disparate ways. Perhaps these multi-layered dualities describe archival use and users as a whole, whether the records are analog or digital; whether narrative, cartographic, photographic, audio, moving images, other visual record types, or alpha-numeric data. If so, understanding archival use and users requires distinguishing between the communities for each of the various types of records, whether the archival institution has a small focused collection or is national in scope.
The depiction of users of structured computer-readable data files by two leaders in the social science data community, Ann Gray and Diane Geraci, offers further perspective. The above categorization echoes their experience; they distinguished informational use of data from research data use. Informational users employ facts from data directly as recorded. By contrast, in research use data are the primary source material for studying a phenomenon or the relationship between phenomena to produce “new knowledge” that will be learned in the future (Gray and Geraci 1995). The Gray–Geraci distinctions consciously built upon the earlier ideas of archivist George Chalou, who described two types of traditional archival reference services: information service and document service (Chalou 1984, pp. 48–50). Chalou, in turn, could have been blending his own experience with some of the ideas of Theodore Schellenberg, who had distinguished between records with “individual research value in relation to particular persons or things” contrasted with public records whose “research values are usually derived from the importance of information in aggregates of records, not from information in single items” (Schellenberg 1956).
In information service, the reference archivist does one of two things when responding to requests—offers brief, factual information from records, or provides information about them. In the latter case, the researcher does not expect the archivist to know all the factual information within the records in his or her custody, but expects the archivist to be familiar with the finding aids that describe the archives’ holdings, guiding the researcher to the records most likely relevant for the purpose at hand. With automated finding aids, similar expectations influence a user’s satisfaction with and confidence in the information gleaned through such tools.
In Chalou’s framework, document service makes primary source materials available for original analysis, either by retrieving the records so that they can be examined in a research room or by sending reproductions to wherever the researcher requests. In the NARA experience, until recently research services for electronic records were almost exclusively via reproduced copies of electronic records files on removable computer-readable media sent upon request, i.e., a distance-service akin to Chalou’s document service. The Internet and the World Wide Web have introduced the potential for new variations.
Only rarely have those wishing to use digital archival records for original research traveled to NARA for this purpose. When they do, their onsite work is limited: reviewing the paper technical information for the records or consulting with an archivist. As a result, NARA’s electronic records reference staff are adept in supporting users whom they only rarely meet in person, a skill that many archivists can anticipate needing in an era of increased reliance on online modes of reference service. Since 1991, e-mail has been the dominant mode for communicating with requestors who contact the NARA electronic records program (Hull 1995, p. 75).
Responsive information service and successful research service depend upon comprehensive and accurate archival descriptions or other finding aids and/or the availability of an archivist with knowledge and experience related to the primary sources. Automated or online access to archival descriptions or other finding aids that enable enhanced information service about primary sources of any type, like that available from NARA’s Archival Research Catalog (ARC) (2006b) usually is distinct from online access to records. Nonetheless, increasingly available enhancements link selections of records and their descriptions, blurring the distinctions. Online access can either support information service from records or document/research service. In recent years onsite NARA researchers have been able to use NARA’s public access computers for searching and retrieving a selection of archival “born digital” data records, digitized records, or digital descriptions of all types of records. This is the same work that they can do from anywhere on their own computers, if they have Internet access.
Craig also notes that each archival user community has unique expectations, independent of the differences between first-time archives users and experienced “professionals.” Further, each community within the two larger groups she identified has its own “language,” reflecting both explicit and tacit knowledge acquired by discipline, profession, or life experiences. Extrapolating, we apply similar concepts to our more generalized categories of users. In particular, users of archival records whose “language” differs from those of traditional users, like social science data analysts, for example, also may have different expectations for archival user services. Responsiveness may require some degree of specialized services within the distinctions of information service and research service.
As recently as 10 years ago, the category “users of electronic records” fairly clearly delimited a community in which most were applying some form of social science research methodology in their use of archival electronic records. For them, electronic records in structured data files that U.S. federal agencies transferred to NARA were and are primary sources of choice. For a sense of the expectations of researchers who advocated creation of a federal data center before there was a program for preserving and providing access to computer-readable data records at the National Archives, see Ruggles et al. (1965). The ability to order copies of data files and retain them indefinitely constitutes exactly the form of access this community expected—then and now.
As noted earlier, electronic records in structured data files rarely are eye-readable; they do not usually consist primarily of words, and so most often are not narrative. They are not visually comprehensible in any meaningful sense, and as digital objects they have no immutable physicality. All of these characteristics set them apart from most other types of archival records. This form of primary source nonetheless is the basis for much of the social scientific research undertaken since at least the second half of the 20th century.
Throughout the first decades of NARA’s custodial electronic records program, the researchers contacting it generally came from communities distinct from those who consulted all the other types of accessioned records in NARA’s custody. During this time use of structured, computer-readable data records was limited to those who “did” data analysis, grounded in the methodologies of anthropology, business and finance, demography, economics, geography, international affairs, journalism, political science, public policy, quantitative history, sociology, or the natural or physical sciences. It assumed that each researcher had knowledge in how to use computer hardware, its operating system, and the software designed for whatever type of analysis the researcher intended, as well as appropriate methodological expertise. To the best of our knowledge, none of the work undertaken using NARA’s electronic records was ever intended to replicate the functionality that the records had while the originating federal agency maintained them.
Researchers who ordered copies of electronic records files from NARA in the early decades in large part mirrored the researcher community supported by the academically based social science data archives. This is also true of contemporary researchers. In fact, those institutions themselves were and continue to be a part of the researcher base for NARA’s electronic records program. By contrast, people seeking factual information preserved in electronic data files are typically the same kinds of people, and sometimes the very same people, who come to NARA seeking specific information in all other types of records. Until the advent of the Internet, the ubiquity of personal computing technologies, and NARA’s development of an online search and retrieval tool, there was no efficient way to satisfy requests for information from electronic records when they came from people unfamiliar with computer-based data analysis.
Nonetheless, as noted elsewhere, variations among the user communities for federal archival electronic data records began to emerge even before personal computing or the World Wide Web. In the early 1980s, the accessioning of electronic data records of U.S. military casualties of the Korean and Vietnam wars precipitated an expansion of NARA’s electronic records clientele to include Chalou’s category of seekers of information from records or in the Gray–Geraci framework, information users of data. To respond to at least some of this demand, NARA’s electronic records reference staff extracted records from digital files on behalf of researchers, especially when it was the only known source for the information sought (Adams 2003, p. 72).
One highly memorable example of pre-personal computing customized service from an electronic records series came after a plaintive telephone call from a veteran, “I lost my leg but not my life, and the ranger said you could help me.”2 A search in a printout of the U.S. Army’s digital casualty records helped to salve the shock the veteran had experienced the night before when he had found his own name inscribed on the Vietnam Veterans Memorial, familiarly known as the Wall. The names memorialized on the Wall are arranged by date of death. The park ranger on duty that night referred the veteran to NARA’s electronic records program because he knew that it preserved and offered reference services for records of Vietnam-era U.S. military casualties. It turned out that a combination of an error and missing data in the veteran’s digital casualty record apparently led the volunteers making the initial decisions about the casualties whose names would be inscribed on the memorial to include his name next to that of his buddy who died in the incident in which the veteran was wounded.
Expanding the use of archival digital data
Factual information service from electronic casualty or prisoner of war records was mostly an anomalous though frequently sought and offered service in the years before NARA unveiled the online Access to Archival Databases (AAD) resource in 2003. Without a tool like AAD, information service from NARA’s electronic records was labor and resource intensive. In the years before AAD and in keeping with the electronic Freedom of Information Act (E-FOIA), staff used an in-house automated verification tool and developed an in-house search and retrieval application so they could respond to requests for specific records from electronic data.
Most recently, as Fleckner suggests, the combined ubiquity of personal computing and the Internet have fostered society-wide expectations for more or less instantaneous information access, including to archival records. Today’s users, the “future” of the past, come to archives in increased numbers, virtually through the Internet, because contemporary communication technologies make this possible. In this environment, many people assume that technology will enable their use of any electronic records in archival holdings, in a manner meaningful to them. AAD is one NARA response to such expectations. Through it, a burgeoning number of people now search for and retrieve selections of archival data records, virtually. They do this 7 days a week, 24 hours a day. The average of several thousand AAD queries per day suggests a scale of use of archival electronic records without precedent.
Genealogists and others engaged in a quest for personal information constitute a substantial part of the category: archival users who seek factual information. The self-reporting role delineation among those who have responded to voluntary online AAD customer surveys indicates that AAD users conform to this categorization. The majority are predominately genealogists, family historians, or veterans or their representatives.3 Confirming this, the vast majority of AAD queries have been in series where the records identify persons. Slightly more than 94% of the approximately 1.6 million valid queries run in AAD series in its first 2 years of public accessibility were in the 17 series in which the records identify individuals. The records of these series accounted for approximately 24% of the 72 million electronic records then included in AAD.
In the period immediately following public release of AAD, the electronic records reference program experienced, as expected, some increase in requests directed to staff. Yet by the second anniversary of its online availability, only about 15% of all requests directly received by NARA’s electronic records reference staff related in some way to AAD. The overall level of interpersonal demand in this period has remained relatively stable, increasing somewhat. Most AAD-related requests, which come from all user communities, are questions about the content of records retrieved through AAD, seek information about the possibility of additional documentation on the subject of records found in AAD, or are requests related to ordering copies of the full files from which the user has retrieved some records via AAD. A small number report some problem or have questions about the tool. Perhaps most importantly, since AAD has been available, electronic records reference staff have referred requestors to records accessible through AAD as the most appropriate reply to approximately 40% of all reference requests they received. This closely mirrors the results of analysis of types of requests received by the electronic records reference staff in fiscal year 1999, done in preparation for development of AAD. That review indicated that 45% of all requests received were for information from records (Adams 2003, pp. 82–83).
These percentages take on added meaning when we examine the number of requests the reference staff received, and the volume of valid AAD queries users have run or the number of AAD “virtual visitors.” Any query submitted in AAD that searches for records at the file level is counted as a valid query if no error results. During the first 2-years of AAD availability—beginning in February 2003, the electronic records reference staff received 3,878 direct requests. During the same period, almost 1.3 million “virtual visitors” ran approximately 1.6 million successful queries in AAD. While comparing either AAD virtual visitors or AAD queries with interpersonal reference requests is an apples and oranges kind of comparison, we can conclude that the information in NARA’s archival electronic records is reaching a vastly expanded number of people through AAD. Most of these users, we assume, had neither previous knowledge of nor access to these records. All of this activity is causing a shift in one part of NARA’s electronic records reference program: its information service from records. The shift is from an archivist-user interpersonal exchange to a user self-service mode.
At the same time, requests for information service about NARA’s electronic records and orders for reproductions of files, including those with records accessible through AAD, continue. Perhaps the most surprising development since AAD is that a sizeable number of the files ordered as reproductions on a cost-recovery basis continue to be files that are accessible through AAD for search and retrieval on a no-fee basis. During AAD’s first 2 years, NARA staff reproduced 268 of the 438 unique files (e.g., 61%) in AAD a total of 509 times. These copies accounted for 28% of all the files reproduced for a fee during the period.4 Academics ordered many of them and in discussions about their orders with reference staff, several suggested that the online availability of data through AAD brought the research potential of the archival electronic records to their attention.
Within 6 weeks of the online appearance of AAD, a professor of finance, who is frequently cited in the business pages of major national newspapers for his analysis of trends in the U.S. stock market, ordered copies of an AAD file from the Records of the Securities and Exchange Commission (SEC). In years past he had ordered copies of most of the files from a related SEC series. The two series are databases that enable the SEC to regulate “insider trading” of common stocks. They are the fully public records of required reporting of insiders’ acquisitions and sales of common stock, transaction by transaction, that have been digital for over 25 years. Both series are accessible through AAD. During the period, 2000–2004, researchers ordered one of the SEC files a maximum of nine times in a year, making the file the most frequently requested file, in any of the 5 years, of any electronic records file in NARA’s holdings.
The phenomenon of continuing fee orders for copies of the full files whose records are freely accessible online may be similar to the experience of the Open Book Project of the National Academy Press (NAP). Roy Rosenzweig reports that the press, the publishing arm of the National Academy of Sciences, has found that putting all its current publications and many from its backlist online at no charge has “increased NAP’s sales because people now order books that they have browsed online but want to own in hardcopy” (Rosenzweig 2001, p. 569). Federal electronic records generally have no copyright and are easily reproduced. Perhaps some researchers who use federal archival digital data for analytical rather than information-seeking purposes share traits with their colleagues who read online monographs and then order hardcopies. They “discover” archival data online, “browse” sample records they retrieve through an AAD query, and when they find that the data suit their research needs, order copies of the full files to analyze further. AAD was not designed as an analytical tool and includes no statistical software. To assure its stability and real time availability to all, it also limits the volume of data that can be retrieved and downloaded from a single query.
Research uses of archival digital data
As mentioned, historians have been a minority of the research users of NARA’s electronic records. Instead, academics and researchers from the government and private sectors, including the media, who analyze data using various social science research methodologies, continue to prevail as the dominant group ordering copies of archival electronic digital data files. In fiscal years 2001–2004, these research groups accounted for almost 90% of the reproductions of electronic records that NARA shipped. The percentages are almost identical whether calculated on the basis of number of files (5,323 of 5,992 files) or by logical data records (434 million of 487 million). In the years immediately preceding, the dominance of these research groups in terms of electronic files ordered was even greater (Adams 2003, p. 85). A sampling of the social scientific analysis that has been undertaken in recent years provides some perspective on these uses, supplementing the examples of research use already described. We focus on research use based upon a small selection of unique record series.
A group of academic public health researchers reanalyzed the spraying of Agent Orange and other herbicides during the war in Southeast Asia. Their work drew upon archival electronic data of the use of herbicides from the Office of the Secretary of Defense (OSD); population and other electronic data from OSD’s (South Vietnam) Hamlet Evaluation System (HES); HES electronic gazetteer data from the National Police Infrastructure Analysis Subsystem in the Records of the Joint Commands; electronic records from the Joint Chiefs of Staff on the U.S. military and allies’ air sorties over Southeast Asia; and electronic records on the locations of U.S. military battalions at various times during the war compiled for research on this subject by the U.S. military. They also reviewed series in NARA’s paper textual holdings, at least some of which were reports produced from the digital data when it was in active use by the military agencies, or conversely were the input records for the digital data, as well as records located elsewhere. Their revised estimates of the use of herbicides almost doubled previous public estimates on the amount of dioxin sprayed in South Vietnam (Stellman et al. 2003; Butler 2003). News accounts when these findings were first released mentioned some of the source material as “files from the dark days of the Vietnam War that had lain forgotten in the National Archives for nearly 30 years” (Perlman 2003). The researchers themselves downplayed the creative ways in which they wove disparate sources together: “we have unexpectedly come upon primary data which expanded existing herbicide spraying databases and could help guide the design of human health and of environmental studies” (Stellman et al. 2003, p. 686).
An article about efforts to locate and neutralize unexploded ordnance in Laos reports an Air Force historian’s “find” of some of the same Vietnam-era archival air sortie data. The historian located “two useful tape databases at the National Archives ... about missions flown in Southeast Asia from October 1965 through ... August 1975” (Lovering 2001). The archival preservation and availability of the air sortie data enabled both groups of researchers, and others, to “find” them. Their discovery of the records for which they subsequently ordered copies proceeded in the manner of most primary source research. Reference archivists, the technical information related to the records, various finding aids and citations, as well as each researcher’s knowledge, ingenuity, and experience, all played a role in identifying the particular files. The non-standard format of many of the military operational series from the Vietnam War has made their contemporary use challenging, but through NARA’s efforts to preserve them and researchers’ programing work, they have been overcome. As the title of Lovering’s article suggests, the digital records are a primary source for the database of a contemporary geographic information system that is supporting efforts to locate unexploded ordnance in Laos. In addition, in the case of the air sortie data, the technical information transferred to NARA with the digital data reflects the latest release of the systems manual. Assuming that the coding schemes changed over time, it may be incomplete for code meanings from earlier years when the two database systems were being used. Lovering also makes this point (2001, p. 69).
A team of political scientists used the Hamlet Evaluation System (HES) data as input for a geographical information system-based study of the determinants of violence and control in civil wars (Kalyvas and Kocher 2003). They reference work by Elliott that also relied upon the HES data, some from the holdings of the National Archives (Elliott 2003). An earlier team of geographers also used the HES data, among several of the operational series from the Vietnam War, to test what the U.S. military might have learned from all the operational data it collected during the war, had contemporary geographic information systems technology been readily available at the time (B. Foust and H. Botts c.1992, personal communication). The HES data represented a major effort by the U.S. Military Assistance Command Vietnam (MACV) to collect periodic data on the population in South Vietnamese hamlets and villages, 1967–1974, and included ratings by MACV of the success of rural “pacification” or counterinsurgency programs by location.
Agencies of the U.S. government undertake nationally significant research as well as sponsor research by others. The records of government research, including that done under contract, frequently are identified (scheduled) in keeping with the tenets of the Federal Records Act and appraised as having sufficient long-term value to warrant their preservation in the National Archives. Data generated with grant funds from a federal agency but not obtained by the agency, however, are not considered to be federal agency records and do not fall under the requirements of the Federal Records Act (Brown 1982). Some funding agencies now require the deposit of such data into an archival repository, and academic data archives often preserve data created with the support of grant funds.
The National Collaborative Perinatal Project (NCPP), 1959–1974 is an example of major federal research conducted by one of the National Institutes of Health (NIH) whose primary data are preserved by NARA. The data on a sample of women and their children are an important national resource for biomedical and behavioral research in, among other fields, obstetrics, perinatology, pediatrics, and developmental psychology. In the period 2000–2004, NARA’s custodial electronic records program filled 10 orders for copies of a total of 244 files from this series; the NCPP series preserved at NARA has 32 public use data files. Academics placed eight of the orders. As a result of one of them, the National Bureau of Economic Research, a major private, non-profit and non-partisan research organization with whom a number of academics are affiliated, scanned the voluminous NCPP technical documentation and now makes it freely available from its website. This is a potent example of the multiplier effect of the reproductions of digital data and documentation that NARA provides. An online search (March 28, 2005) on the then beta site for Google Scholar for ‘Collaborative Perinatal Project’ returned 535 citations, most for articles dated since NIH transferred the NCPP records to NARA in 1985.
This brief survey has skipped over examples of the research use of the vast holdings of federal economic and demographic census, survey and other forms of quantitative digital data in NARA’s custody. Their research values are well-known by social scientists and they are the kinds of governmental digital data also selectively maintained at academically based data archives. Researchers from international governments, U.S. federal agencies, and U.S. state and local governments, and from non-academic research institutes or firms, and occasionally the media, regularly contact NARA as a source for U.S. federal digital data records from this category. Space limitations also prevent our describing, for example, the research use by international social scientists, of records from the large collection of public opinion surveys that the U.S. government has sponsored throughout the world, or other digital data from U.S. government programs abroad.
Taken together, all of these uses of archival digital data reflect the varieties of records that governments create, compile, or collect. Selectively, federal digital data constitutes a publicly sought-after treasury of factual information. Data of comparable coverage or scope are less frequently produced by academic or other grant or foundation-sponsored research so are extremely valuable source material for academic research. A greater degree of collaboration between traditional archives and social science data archives could enhance the long-term availability of both governmental and privately produced research digital data. Such a suggestion echoes ideas Martin David (1980) espoused several decades ago. One of the five forms of “utopia for the researcher” that he envisioned was collaboration between academic social science data archives and public state archives to assure that valuable machine-readable records of subnational governments and industry were preserved for posterity. Unfortunately, few, if any, of the state archives in the U.S. have a formal partnership with an academic social science data archives, but this remains a possibility worth pursuing. A potential model for doing this may perhaps be one of the outcomes of a partnership in which the U.S. National Archives is now participating. NARA has joined with several major social science data archives in the U.S. in the Data Preservation Alliance for the Social Sciences (DataPASS, nd). The alliance is under the auspices of the Digital Preservation Program of the National Digital Information Infrastructure and Preservation Program, which is sponsored by the Library of Congress (NDIIPP, nd).
We expect that archival reference requests will evolve over time to encompass new types of digital records, as they are transferred in greater volume into archives. This demand has not yet materialized to any significant degree at NARA. In advance of it, experience of NARA’s electronic records program indicates that efforts to meet user expectations related to digital data have led to a significant expansion in the communities of users of primary source archival records. Having concluded that the general categories of digital data users: analysts and fact-finders, parallel the general categories of users of analog records, we can anticipate that these patterns will extend into the future. When archivists seek models for user services for each new type of digital records, their first step may be to recognize the differences in the needs and expectations of factual or information seeking requestors compared with requests from persons engaged in original research, as relevant for each type of digital records. Inevitably, doing this will mean adding new types of archival reference services.
The increasing variety among the types of digital records also will likely introduce greater complexity to the process of archival reference services, even as technological innovations continue to expand upon the services that are possible. In such an environment reference archivists may wish to blend the lessons learned from reference services for digital data with traditional archival user services for analog records. Presumably what will emerge will be reference services that emulate past experiences while ultimately forging new paradigms.
Since February 2003, NARA offers online search and retrieval access to a selection of its archival electronic records, including the Records about Japanese Americans relocated during World War II, through the Access to Archival Databases (AAD) resource (NARA 2006a). AAD was developed under the auspices of NARA’s Electronic Records Archives (ERA) Program (NARA 2006c). Information on the volume of queries run in specific AAD series, and other statistics from the administration of NARA’s Electronic and Special Media Records Services Division are available upon request.
The author’s first-hand experience in March 1990. The veteran subsequently told his story to a syndicated newspaper reporter and it was widely published in the media in the U.S.
For example, unpublished AAD customer survey self-reporting of role from ForeSee Results, Inc., May 2004 to July 2004 (N = 930) indicated that 82% of the respondents identified themselves as either genealogist/family historian or veteran/veteran’s representative. The comparable self-identification of these groups in a later sample of 793 respondents (November, 2004 to March 2005) was 75%.
One AAD file was copied eight times, one seven times, two six times, one five times, 21 four times, 15 three times, 121 twice, and 106 AAD files, once. Unpublished reports prepared by Lee A. Gladwin.