1 Introduction

My conversations with Ted Nelson began in earnest in 2004 when we shared an office at the Oxford Internet Institute (OII). He was working on Xanadu, and I was working on Scholarship in the Digital Age: Information, Infrastructure, and the Internet [7]. My work was in conversation with Ted’s since I was a graduate student, having read Computer Lib early on. Ted signed my copy of Literary Machines [25] at a talk in the mid-1990s, thus I was in awe of the man when Bill Dutton put us together as visiting scholars in the OII attic, a wonderful space overlooking the Ashmolean Museum.

Ted and I arrived at concepts of data and metadata from very different paths. He brought his schooling in the theater and literary theory to the pioneer days of personal computing. I brought my schooling in mathematics, information retrieval, documentation, libraries, and communication to the study of scholarship. While Ted was sketching personal computers to revolutionize written communication [24], I was learning how to pry data out of card catalogs and move them into the first generation of online catalogs [6]. Our discussions that began 30 years later revealed the interaction of these threads, which have since converged.

2 Collecting and Organizing Data

Ted overwhelms himself in data, hence he needs metadata to manage his collections. He drapes himself in data collection devices (Fig. 10.1). On any given day, he carries some combination of paper notebooks, a packet of colored marker pens draped on a string over his shoulder, a video camera, still camera, audio recorder, and other recording devices.

Fig. 10.1
figure 1

Ted Nelson, 2005, carrying data collection devices at the Oxford Internet Institute (Photo by Christine L. Borgman)

Ted’s data immersion is not simply about recording one’s life experiences, as in Gordon Bell’s MyLifeBits project [5]. Rather, Ted’s data collection encompasses information relevant to documentation, writing, networks, and hypertext – anything that could possibly inform the design of Xanadu and related technologies. The common thread of the data collection projects of Ted Nelson and Gordon Bell is that both acquire heterogeneous data types that must be integrated. Bell, a distinguished computer scientist at Microsoft, has the resources to build a testbed for studying and exploiting those data (Gemmell et al. [15]). Ted, for whom necessity is the mother of invention, takes a much more informal approach to capturing, describing, and integrating the content he gathers. One of our first conversations was about metadata – he asked me to explain it, and as I started to do so, he asked me to stop and wait a moment. He pulled an audiocassette recorder from his jacket pocket, turned it on, said “Christine Borgman on metadata.” Then he turned to me and said, “now talk about metadata” … and we did! At the end of that conversation, he made an entry in his daily diary about the conversation and where it was located on which cassette. Thus, Ted created a document (the recording), assigned a subject heading (“metadata”) and a personal name entry (“Christine Borgman”) as metadata about the document, and created a catalog record (the entry in his notebook). In this case his action was recursive, as he created a metadata record about metadata.

2.1 Theoretical Traditions

Formally, metadata is “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource” [23]. The NISO definition breaks metadata into the three general categories of descriptive, structural, and administrative. Other definitions of metadata make finer distinctions among types [2, 17].

Ted developed a fundamental understanding of data, metadata, and documentation through his work on hypertext and literary machines, despite his lack of familiarity with the field of information studies. He recognized that documents do not stand alone, even if they look like independent objects. Rather, they are deeply connected to many other objects. These relationships can be abstract, as in the influence of one text on the meaning of another – known as “intertextuality” in semiotics and literary studies. Relationships also can be explicit, when one document cites another, includes portions of other documents (“transclusions”), or makes any other direct link. These explicit relationships are the basis for hypertext and hypermedia, terms coined by Ted in the 1960s. The body of relationships among documents is sometimes known as “hypertextuality.”

In documentation, usually dated to the Belgian, Paul Otlet, in the early twentieth century, texts are deconstructed into component parts and linked together. In the information sciences, Otlet’s work is considered to be the precursor to hypertext [2931]. Building upon the complex history of bibliography, documentation, identity, and philosophy of information, modern cataloging rules link together nodes of documents, authors, publishers, and other entities as a network [35]. The model known as FRBR, for Functional Requirements for Bibliographic Records, establishes four levels of entities: work, expression, manifestation, and item [36]. The work is the distinct intellectual creation, such as Shakespeare’s play King Lear. The expression is the specific form, such as the text of the play as published in Shakespeare’s First Folio. The manifestation is a physical embodiment of an expression, such as the Royal Shakespeare Company’s 2007 production of King Lear in Stratford-upon-Avon starring Ian McKellen. The item is a single exemplar and a concrete entity, such as a specific copy of the program for a performance of that 2007 production. FRBR also establishes relationships among persons, corporate bodies, concepts, objects, events, and places.

2.2 Practical Consequences

Metadata, such as the familiar entities in a catalog record—author, title, publisher, date, place, physical description, subject, and classification—are essential descriptions of documents and other entities. Without metadata, a library would be no more than rooms full of books and documents shorn of their title pages. Metadata describes, enables access, and provides links to other documents. Some forms of metadata creation can be automated, such as extracting keywords and citations from a text, and others are created by human experts, such as descriptions of the intellectual content and history of an object.

Having stumbled upon the concept of metadata in our conversations, Ted was an eager student of knowledge organization. I introduced him to Ann O’Brien of the Department of Information Science at Loughborough University, one of Britain’s experts on knowledge organization [20, 37]. Dr. O’Brien specialized in multi-media documentation, a particular challenge for Xanadu. While she was at first daunted by Ted’s style of inquiry (Fig. 10.2), they quickly became able sparring partners. Ted, Ann, and I explored many aspects of metadata that might be applied in Xanadu.

Fig. 10.2
figure 2

Ted Nelson and Ann O’Brien, Oxford, 2006 (Photo by Christine L. Borgman)

Among the challenges that Ted encountered, long known to Ann and other experts in knowledge organization, is that the apparatus necessary to represent relationships between documents can be very large. Data, including texts, can be the tip of the iceberg. The metadata required to manage, to find, and to follow relationships amongst documents is often much more voluminous than the documents themselves. Furthermore, as networks grow in size, they become more complex, requiring other layers of representation and more sophisticated tools for navigation. Ted’s concept of hypertext supports multi-directional links between documents (Fig. 10.3). His approach is aligned with semiotics, philosophy, and information science thinking about relationships between works [14]. However, multi-directional links are complex to implement computationally, which was especially true in the early days of personal computing. Technical compromises made in the early days of the World Wide Web undermined Ted’s ability to implement hypertext on a large scale. He continues to rail at this constraint. Forty years after Computer Lib, computers are far more sophisticated and the networks among digital objects are much richer and more complex. It is time to revisit fundamental assumptions of networked computing, such as the directionality of links, a point made by multiple speakers at the symposium—Wendy Hall, Jaron Lanier, Steve Wozniak, and Rob Akcsyn amongst them.Footnote 1

Fig. 10.3
figure 3

Ordinary hypertext, with multi-directional links. From Literary Machines (Used with permission)

2.3 Managing Research Data

Managing research data is similarly a problem of defining and maintaining relationships amongst multi-media objects. Research data do not stand alone. They are complex objects that can be understood only in relation to their context, which often includes software, protocols, documentation, and other entities scattered over time and space [8]. The need to model these complex relationships stimulated technical research in persistence, identity, and linking of research objects [4, 26, 28, 38]. These approaches build upon—and are limited by—the technical capabilities of the World Wide Web.

As research data become valued as objects to be maintained, reused, and repurposed, many stakeholders are coming together to address questions of linking, identity, and stewardship. These concerns cross boundaries of scholarly communication, computer science, publishing, research funding, libraries, archives, data repositories, and education [8, 9, 13, 34]. Breakthroughs on these data problems may contribute to understanding hypertextuality, and vice versa.

3 Provenance and Pluralism

Provenance, another fancy word that was unfamiliar to Ted but basic to his ideas, has meanings both narrower and broader than metadata. The term was borrowed from French in the eighteenth century to indicate the origin or source of something. It can mean simply the fact of the origin or the history of something and the documentation of that record. In the narrower sense, provenance can be a type of metadata that describes the origin of an object. Provenance on the World Wide Web includes aspects such as the attribution of an object, who takes responsibility for it, its origin, processes applied to the object over time, and version control [16, 21]. The ability to establish the provenance of a dataset, for example, may influence whether a result is deemed trustworthy, is reproducible, is admissible as evidence, or to whom credit is assigned [10, 22].

Provenance is particularly difficult in hypertext because it requires not only establishing authoritative links between objects, but also sustaining those links and information about the links over long periods of time. These links remain reliable only if the identity of the object can be established uniquely at the item level [1, 32, 33]. Unique and persistent identifiers need an institutional home, whether an International Standard Book Number, which is maintained by national libraries [19]; a Digital Object Identifier (DOI), which is maintained by the DOI Foundation and stored in interconnected registries (“Digital Object Identifier System” [11]); an Open Researcher and Contributor Identifier (ORCID) for author names, which is maintained by a non-profit foundation and stored in interconnected registries [18]; or domain-specific identifiers, such as those for genomics, chemistry, and so on. Lighter weight solutions, such as Linked Open Data, can be used to establish rich sets of relationships among objects, but these are not intended for long-term stability [3, 27]. In scholarship and in research data, stable linking is essential to follow chains of evidence. The apparatus to establish and to maintain those links cannot exist in a vacuum. Rather, it is part of a larger knowledge infrastructure, one that is now being imagined anew [8, 12].

Ted’s notion of “pluralism” is that “anyone may revise anything – harmlessly” ([25], 2/61). Pluralism expresses today’s notion of use and reuse of digital objects. The social movement toward open access is predicated on the ability to borrow and reuse content, with attribution to the original source. Authors and other creators are more willing to share their works openly if they can expect credit for that work. Both credit and harmlessness thus depend on provenance. The original object must stay intact and later references to those originals must be sustained.

4 Conclusion

Ted has tackled—head on—some of the thorniest known problems of information organization. He lacked the background in the information sciences to know how hard these problems were. Yet hard problems often are solved by those who approach unaware of the littered path of failure. Ted brought fresh ideas to knowledge organization and stimulated those inside the field to revisit fundamental premises. The challenges that have stymied Ted are those that frustrated many who came before. Ted, like Paul Otlet, tried to develop a pure new system that did not depend on the technologies and bureaucracies of the day. Reinventing infrastructure is even harder than reinventing literature, and he has tried to do both. Ted has a large following in the library world because he dared to reimagine the library. Everything is indeed intertwingled, another provocative term of Ted’s invention. Xanadu, the hypertext system, is related to Samuel Taylor Coleridge’s 1797 poem about the summer palace of Kublai Khan, is related to the Yuan dynasty, is related to the ruins of Shangdu in Inner Mongolia, is related to … the many other paths of inquiry to be pursued in the ideal world of comprehensively networked knowledge.