Introduction

Among the many innovations in the field of digital humanities (DH), the development of digital archives has been one of the most exciting for accessing historical materials. “Mass digitization” initiatives such as Google Books, Project Gutenberg, and HathiTrust allow users to access documents at a large scale (Dahlström et al. 2012). Whereas many digital archives are massive in scope, others are more qualitative and focused on particular scholarly projects, infusing the archival process with a critical orientation in which designing and building the archive itself becomes an act of scholarly intervention. Dahlström et al. describe this approach as “critical digitization”, in which “scholarly research is embedded in the objects themselves” (2012, s 4). Such archives are especially significant for guiding scholars and students in accessing materials for a particular research project.

Literary archives such as the Modernist Journals Project, The Walt Whitman Archive, English Broadside Ballad Archive, and our project, The Modernist Archives Publishing Project (MAPP) represent some of the possibilities for “critical digital archives” in which materials are curated with a particular scholarly intervention in mind from the beginning (Battershill et al. 2017, pp 72–3). Users who visit MAPP can expect to access a significant trove of documents related to twentieth-century publishing and to export metadata regarding the circuits of correspondence and collaboration central to the business practices of important publishers like Leonard and Virginia Woolfs’ Hogarth Press (1917–1946) in the history of modern literature. Unlike Google Books or Hathi Trust, which provide access to objects at scale without critical comment, thereby facilitating large-scale computational analysis, projects such as MAPP intervene in the scholarship by making selection, digitization, curation, and digital content production strategies transparent and legible within a particular scholarly context. Such critical interventions foster ideological interventions because the act of curation and selection is always political and interpretative (Wallace 2020; Adler 2017; Fox 2016). For the MAPP team, this means an explicit commitment to feminist collaborative practice in building the archive, curating materials, expanding metadata and, most importantly, recovering marginalized voices and women’s labor in literary history and in the history of the book.

In this paper, we outline the archival practices and critical ethos that have informed the collaborative creation of MAPP (2012-). We address issues of selection that arise in creating a critical digital archive; feminist critical metadata practices; our approach to copyright; and conclude with an example of an archival document type (author photos) in which the issues of feminist critical curation and copyright collide. Our approach throughout is informed by a lineage of feminist thinking and writing that draws from the first-wave feminist works present in the archive and the Woolfs’ own praxis as radical, anti-colonial publishers, running through to a feminist organizational practice that aspires to what Caswell and Cifor have termed “radical empathy” (p 23). For us, the critical digital archive is a site of exchange between multiple archivists, cultural heritage institutions, academics, students and users, all of whom inform the character of, and access to, the resource.

Laura Mandell and others working in feminist DH and bibliography have argued that feminist digital projects must “perform structural work” in advancing a feminist intervention at both systemic and conceptual levels (2015, p 512). We concur with Jacqueline Wernimont that DH can and should feature more “sustained inquiry into the evolving relationships between feminist theory and DH work” (2015). As with other feminist archival and bibliographic projects (including Orlando: Women’s Writing in the British Isles (Brown et al. 2009); Women Writers Online; The Women in Book History Bibliography and The Women’s Print History Project), we center feminist interventions in both the structural elements of the project itself, attending to questions about expansions of metadata, copyright strategies, and data models that reflect our feminist ethos, but also at the level of support and recognition of our collaborators who include students, library partners, peer reviewers, business archivists/records managers and literary estates. The complex steps to pursuing copyright clearance for digitized items that we spell out in this essay—something we hope will be helpful on a practical level to other teams—are part of our feminist contribution to digital humanities, archival curation and recovery. As Sara Ahmed (2017), Katherine Holland and Susan Brown remind us—the latter drawing on the critical work of Moya Bailey—citation and recognition are essential to the feminist archival project (2018). For us on MAPP, recognizing the heretofore invisible labor of female publishing staff is an essential component of our scholarly intervention, and we have developed expanded metadata fields to capture the group of (almost entirely women) typists and secretaries who remain unacknowledged in the historical record. Re-centering these silenced participants in literary publishing and production changes the landscape of digital archives in the field of book history by making discoverable, and accessible, a whole layer of overlooked (female) labor. Similarly, making visible and legible the labour involved in copyright matters in the creation of digital archives in our contemporary moment offers an important intervention in valuing the often underrecognized labour of archivists in preparing historical materials for digital presentation. In this way, our data model and scholarly foundations reflect both the Woolfs’ and our own investment in critical feminist praxis, and is part of a broader shift in the contemporary landscape of book history, archival studies, library and information science, and critical digital humanities that foregrounds gender, labor and affect (Wermimont and Losh 2018; D’Ignazio and Klein 2020).

Critical digitization: selection and discrimination

“Access is a function of selection, because without selection there can be no access” (Astle and Muir 2002, p 68).

MAPP is grounded in the fields of book and publishing studies, rooted in understandings of textual production and the history of the book as theorised in Robert Darnton’s “communications circuit” (1982) and Pierre Bourdieu’s “field of cultural production” (1993) (see Battershill et al. 2017, pp 23–5). Our data model consists of eight key content types with attached metadata: Work; Edition; Primary Object; Correspondence; Related Material; Person; Business and Library (See Our Data Model | Modernist Archives Publishing Project). These content types allow us to animate digitally what Johanna Drucker calls the book “as a distributed object, not a thing, but a set of intersecting events, material conditions, and activities” (2014, para. 2). The data model allows us to show the book as “event space within an ecology of changing conditions” (Drucker para. 2). By viewing a particular literary Work in the digital archive–Mrs Dalloway, for example–one is linked to a digitized copy of the Hogarth Press first British edition of the work, archival correspondence relating to its publishing history, and a born-digital peer-reviewed essay on Virginia Woolf and one on the Hogarth Press. Links to other materials, people, presses, and library collections in the MAPP resource are also provided along the way.

Our data model draws on the Functional Requirements for Bibliographic Records (FRBR) standard, with some modifications to terminology. We have extended FRBR to include two more types of archival records found in publishers’ archives: Related Materials (including production documents, ephemera and financial records) and Correspondence. Archival materials are catalogued to ISAD (G) standards, with the addition of spatial data suitable for geo-coding. In order to contextualise the archival records within the relational database and data model we have built, we create further item-level metadata to animate the relationships between different entities. As Dahlström et al. note, critical digitization projects have sometimes been criticized as an “over-costly luxury undertaking” (s 4), necessitating working practices that would be too labor-intensive in most archival settings and can cause shortages in other areas. In a survey of 20 UK public libraries, Astle and Muir (2002) point to the tensions and “disparity between funding for access to a very small proportion of library holdings through digitization, and the funding of traditional preservation for the remainder of collections” (p 67). But working collaboratively with cultural heritage institutions and a wider group of users—especially with an international team of invested students—we have been able to attach useful, critical metadata to a publisher’s archive at item-level in a way that would not be possible for many archival institutions working in-house. The education and training elements of projects like ours cannot be overstated in determining and measuring their value: these pedagogical outcomes justify a more patient and even “luxurious” approach to metadata and curatorial materials.

In its first phase (2012–2020), MAPP began with the records, book objects, and persons associated with the Hogarth Press, the independent publishing house set up by writers Leonard and Virginia Woolf in London in the midst of war in 1917. The Hogarth Press has long been associated with the Bloomsbury group and the idea of the Woolfs printing (as Virginia said in a letter of 1916) “all our friends stories” (Woolf 1976, p 120). But the Press grew throughout the 1920s from small start-up to medium-sized publishing house, capable (in Virginia’s words of 1930) to “manage a best seller as well as Heinemann” (Woolf 1978, p 177). By 1946, when it became a subsidiary of Chatto & Windus, the Hogarth Press had published over 400 works by a diverse group of writers. This was our first dataset (created by founding member, Battershill) and involved an initial upload of approximately 1000 Works and Persons into the database. Dust jackets from two partner libraries with Hogarth Press book collections (The Bruce Peel Special Collections at the University of Alberta in Edmonton and the E. J. Pratt Library at Victoria University in the University of Toronto) were attached to Work entries, and basic metadata for Person records was attached by the team and students drawing on international authority records (VIAF).

Much of the work of Phase 1 has involved working with transatlantic partners in cultural heritage institutions to create, select, ingest, preserve and access data relevant to the critical resource. In our next phase, we are working towards reuse and making the data more open Curation Lifecycle Model | DCC. In our case, this has involved working closely with Special Collections at the University of Reading (UoR) which holds the business archives of the Hogarth Press; alongside E. J. Pratt at the University of Toronto; Bruce Peel Library; Smith College; Washington State University Pullman; and the Harry Ransom Center at the University of Austin, Texas. MAPP is a “diasporic” archive: a collection of related materials from geographically disparate archives (Battershill et al. 2017, p 71). Each digitized object includes detailed library metadata, so users know where that specific object has come from, and our library partners include digitized images (and in some cases our additional, collaboratively-produced, metadata) in their own digital asset management systems for sustainability and preservation.

The Hogarth Press Business Archive held on deposit at UoR Special Collections is an author-based publishers’ archive (Wilson 2014, p 83), with loose leaves of paper—a mixture of editorial and production correspondence, publicity material, and publishing ephemera—deposited in specific book files relating to the individual Works the Woolfs published. As such, it is ideal for digitization and complements our text and author-based data model. It is also a gift organizationally in the discipline of literary studies, since users of the archive interested in a particular author can easily locate complete sets of material relating to a given text. This differs substantially from large publishers’ archives organized chronologically as business records, collections which require different navigation methods in order to provide a comprehensive view of a particular text’s production cycle (Ryder 2000). The archival materials contained within the book files are curated in our database as Correspondence and Related Materials. But the sheer amount of archival material poses a significant challenge. Business correspondence between a publisher and an author, for example, may include dozens of individuals, some with existing estates holding copyright on the materials and some without. Negotiating copyright with hundreds of estates raises complex challenges for project management and sustainability, in addition to the technical challenges of content management.

Our initial criteria for selecting book files to include in MAPP was chronological, limited in scale by funding and personnel. In Phase 1 (2012–2020) we digitized and created metadata for surviving publishing records at UoR for the period 1917–1925 (a total of 38 book files, representing just under half of the total number of works the Hogarth Press published during these years). Folders vary in size, with some containing tens of letters, others hundreds.

In MAPP we begin describing at specific item level instead of general file level. Item-level metadata requires item-level description, item-level catalogue reference codes, and item-level rights information; all are attached to each item of Correspondence or Related Material in the files. The work of attaching metadata to archival objects has evolved through various stages of the project but its keynote has been that of academics, students, and archivists working together to create this material. To guide the non-archivists through MAPP’s processes, training materials have been devised with guidance on all relevant archival standards. All of their labor is credited on the MAPP site and the metadata created whilst cataloguing to make the item accessible—and discoverable—in MAPP’s relational database is also added to the UoR library catalogue (Adlib), enhancing the top-level descriptions. Digital assets created while digitizing are given unique image reference numbers (matching the item reference codes created when cataloguing) and then uploaded and held in the University’s digital assets management system (Asset Bank). This then links back to Adlib.

As the resource grows, our selection criteria is evolving. Recognising the scale of potential digitization if we were to continue proceeding chronologically, we have moved in Phase 2 (2021–2024) to a more explicit criteria of thematic project-based and feminist interest. The complexity of our metadata workflows and the amount of archival material preserved would make a continued chronological approach to the publisher’s archive untenable (there was better record-keeping at the Hogarth Press after its early years and more has survived, with almost complete book file folders preserved from 1924 on). As a research team, we have worked in our individual scholarship to challenge myths about the insularity of the Hogarth Press and so are prioritising book files relating to a variety of middlebrow, working-class, anti-colonial and overtly political authors (Willson Gordon 2009; Southworth (ed) 2010; Wilson 2012; Battershill 2018). Through these choices, we are seeking to confirm that the Hogarth Press was “a radical left-wing publisher” (Nasta in Battershill and Wilson 2018, p vii) and “key disseminator of anti-colonial thought” (Snaith p 103), offering avenues for research on less-explored but historically significant authors. We now have RCUK funding to explore the transatlantic connections of the Hogarth Press. This involves onboarding new partners—Washington State University Pullman; the University of Sussex; New York Public Library—and will enable us to include papers relating to Alfred A. Knopf publishing house (focalised through correspondence by Blanche Knopf, Alfred’s wife, to reveal her contribution to the firm), the William A. Bradley literary agency (highlighting the work of Jenny Bradley), and Harcourt, Brace (the Woolfs’ American publishers of choice). This stress on transatlantic modernism, literary networks, and women in publishing leads the next phase of data creation.

Critical feminist metadata: typists

Buurma and Shaw (2020) point out that metadata may provide important context for a project “as a work in progress with a literary character of its own” (p 188). For critical digitization projects like MAPP, metadata needs are tied to a critical orientation toward our objects of study, an orientation that seeks to intervene in literary history and the history of publishing. Although our objects are not necessarily unusual, our emphasis on feminist recovery and critical curation requires a more expansive metadata element set that can describe the unseen labor of secretaries, personal assistants, and staff beyond the figures of interest within traditional literary or bibliographic scholarship. Publishing houses (such as the Hogarth Press), printers (such as R. & R. Clark) or literary agents and authors often relied on secretarial staff to type or respond to correspondence. While such figures have long been marginalized in the history of literature and print culture, they were instrumental to the success of literary culture and publishing (Price and Thurschwell 2005). Throughout the documents we are curating at MAPP, we see clues to the identities of these people, often women. By recording and crediting their labor—making visible their contributions in the history of twentieth-century publishing—we hope to contribute to scholarly efforts around making and descriptive tagging in digital archives as feminist practice (Mandell 2015; Holland and Brown 2018; Sharren et al. 2021).

Expanding our metadata to include typists, secretaries, and press staff allows us to credit and recognize the voices that often go unnoticed in the archive. Archival silences are often the result of curatorial emphases; as Rodney Carter (2006) argues, archives are replete with such gaps: “Silences haunt every archive” (p 217). Metadata schemas can tend to replicate these silences by not including fields that capture the labor processes of textual production. Such gaps can reflect power imbalances but can result from misrecognition too. Schwartz and Cook (2002) argue that power imbalances demonstrate that the archive is not an objective, apolitical site for collection but rather “a site for the contestation of power, memory, and identity” (p 6). As a result, archivists and digital humanists have implemented critical interventions in catalogoing and archiving, imagining new, feminist archival practices and protocols (Moravec 2017). In “A Manifesto for Feminist Archiving,” Jenna Ashton (2017) argues for a “reorganisation and re-selection of knowledge” in resistance to “the authority and structures of the regulated physical archive” but also to “unite threads of tech and materiality” (n.p.). While Ashton focuses on the physical archive here, the construction and curation of feminist digital archives can and indeed must respond to the same authoritative structures (Wernimont and Flanders 2010; Wernimont 2013; Holland and Brown 2018). Such a confluence sparks exciting opportunities for feminist scholarship, literary criticism, and DH. Feminist data work is crucial, as D’Ignazio and Klein (2020) argue, intervening, interrogating and changing “not just who gets digitized but how such digitization necessitates new data models and interfaces” (Caughie, Datskou and Parker p 230). “The dual aspect of feminist digital humanities—subject and design—is recursive and mutually sustaining” (Caughie, Datskou and Parker, pp 230–31). Our work foregrounding “typists'' as a new aspect of our metadata is one example of this.

When the business archives of the Hogarth Press were deposited at UoR Special Collections in the 1980s, the names of unknown, as opposed to better-known, employees were not included as creators within authority records, descriptions or scope and content. So for instance, whereas the names of John Lehmann, Ian Parsons and Norah Smallwood were included in the file descriptions, little-known, generally women workers at the Press were not. As University archivist Michael Bott’s introduction to the original paper catalogue noted:

Much of the correspondence on the Hogarth Press side was conducted by Managers and Secretaries including Aline Burch, Norah Nicholls, Margaret West, Barbara Hepworth and Cherrill Newman. Letters from them are not noted in the descriptions, but carbon letters from Leonard Woolf, John Lehmann, Norah Smallwood, Ian Parsons, Harold Raymond and Piers Raymond have been noted. Many of the early carbon copies have signatures or are initialled; later copies have a typed reference at the head of the letter which identify the writer (1986, p i).

Staveley draws on this note to explore how cataloguing methods have contributed to obscuring the presence of women workers in the modernist archive (2009). This bias has now been addressed in new iterations of the catalogue online, but we were concerned when thinking about our encoding of correspondence for MAPP not to inadvertently recreate such omissions. For instance, as we initially started to tag correspondence in terms of the ISAD (G) field “Name of Creator,” we realised that various points of critical debate were being opened up. Letters written by Leonard Woolf at the Hogarth Press, for instance, are often either “dictated but not signed by” him or simply signed by another press employee. In still other cases, authorship is not indicated at all in a letter, and it is unclear whether the “creator” should be tagged as Leonard Woolf, the corporate entity of “The Hogarth Press,” or, for instance, one of the press managers of the time, like Margaret West. Sometimes letters would be signed by the press worker directly (as in Fig. 1), oftentimes the typist would include their initials at the top of a letter (as Bott describes) then sign it by, for instance, Leonard Woolf of the corporate The Hogarth Press (Fig. 2).

Fig. 1
figure 1

Letter from The Hogarth Press to Thomas Seltzer (14/12/1923) | Modernist Archives Publishing Project, with manuscript signature by M. Joad (Marjorie Thomson Joad). University of Reading, Special Collections, Hogarth Press Business Archives, MS 2750/38. Permission by Penguin Random House Archive and Library

Fig. 2
figure 2

Letter from The Aline Burch to Samuel Solomonovich Koteliansky (18/02/1948) | Modernist Archives Publishing Project, with typed reference, AB (Aline Burch). University of Reading, Special Collections, Hogarth Press Business Archives, MS 2750/483. Permission by Penguin Random House Archive and Library

As we contemplated our metadata creation in terms of our research questions and what would be useful for the users of MAPP, this was a clear example of where looking at this data anew enabled us to expose an overlooked layer of professional creativity, enabling further research on women working at the Hogarth Press (Southworth and Wilson 2022). For some records, we have created an additional (second) Creator to the field, but this did not always capture the nuance of the work involved. So we have added an additional field to our Correspondence metadata of “typist”, which captures the layer of professional women’s work we are interested in, and enables users for instance to discover all of the correspondence included (so far) in the database, typed by a particular worker at the Press. Recent literary history has become interested in the staff that made literary production possible (McGrath 2021), and MAPP opens possibilities for further scholarship to recover the labor of typists, agents, secretaries, printers, and more.

Copyright as critical curation

While feminist questions about the recovery of labor and disciplinary debates concerning the formation of modernist literary culture through specific publishing relationships motivate much of our relationship to our data, there are, of course, also practical considerations that inform our curation practices. As we are working with collections at item level, copyright as a selection tool has had to be embedded into the project’s strategy for content imported into the MAPP site. But using copyright in determining content selection is also important in the sense that MAPP needs to make timely, risk-based decisions based on copyright law. MAPP is an international, collaborative DH project with transatlantic partners and transatlantic funding, but we need to work within and respect national parameters and provisions of copyright law depending on where we and our materials are based. Within the UK for instance, where much of our current archival material resides, copyright of much unpublished material produced before 1989 lasts until 31 December, 2039. In this section, we highlight our copyright categories, search strategies, and copyright documentation as crucial parts of our infrastructure that have fundamental bearings on users’ access to materials in the site.

A large reason copyright is such an important issue and a barrier for MAPP can be linked to what is often referred to as the “20th Century black hole” (Boyle 2009; Europeana project 2015; Korn 2016). Generally speaking, collections dealing with the twentieth century are less likely to be made available online due to copyright restrictions and a further hesitancy arises from the desire to protect an organisation’s reputation from potential litigious situations caused by allegations of copyright infringement. MAPP’s material is modern by definition and resides firmly in the twentieth century, so copyright is a significant issue for us to negotiate. But we have chosen not to be deterred from creating the resource just because the materials are not all automatically open access. Instead, we follow clear protocols for establishing and obtaining clearance from estates. In turn, MAPP has become a resource for other users with questions about copyright or who are seeking information about authors,’ illustrators’, or other businesses’ estates.

In the first instance, permission to digitize Hogarth Press publishing materials has been sought and granted from the Penguin Random House (PRH) Archive and Library UK (the legal owner, thanks to various late twentieth-century mergers, of the Hogarth Press collection). As a working business with historical materials deposited in a public institution, their agreement with MAPP rests on the project clearing copyrights with estate holders of the main authors of chosen archival material, which can be a laborious process. However, permission to access material (which for researchers using the physical reading room can be hard to obtain for modern publishing records, especially those that are still owned by a publishing house), is pre-authorised. This means that researchers—and eventually users—accessing the resources in MAPP are spared not only a physical visit but also access protocols that can accompany archives of this type.

As MAPP’s workstreams for digitization and item-level cataloguing are labor intensive, so too are rights clearances. The efforts involved in pursuing copyright holders and how copyright dictates selection methods are spoken of extensively in studies by Dryden (2008) and Akmon (2010), based on studies conducted in America and Canada. However, deciding if there are any exceptions when pursuing rights clearances to potentially cut down on intensive research, it is important to discuss the term “fair use” (Akmon p 46). “Fair use” is not in practice at UoR as it is a US legal principle and so is not applicable in the UK. “Fair” is not defined in UK law, it is also worth noting that “fair use” is a different concept to “fair dealing” and though both refer to exceptions to copyright, these exceptions cannot be applied when seeking permission to publish archival material online. “Fair dealing,” which can be applied in the UK, is more limited than “fair use” and although “fair dealing” can apply to private and non-commercial study, this only extends to the individual who has gained access, unlike MAPP which makes material available globally to multiple users. As a rule, for UK content, MAPP follows the fact that within the UK unpublished materials are still in copyright until 2039. This is largely the case with all materials in MAPP as most of the archival literary materials are unpublished letters. For published literary and artistic works, copyright lasts 70 years after the year of the author’s death. This fundamental difference between published and unpublished material is the main obstacle for MAPP to navigate, requiring permissions for the vast majority of the material.

As Akmon (2010) also points out, one of the major obstacles regarding copyright is the time and resources required to contact rights holders—if they can even be determined—and there is little empirical evidence on these processes (pp 46–7). Documenting this information is of high importance to MAPP due to twentieth-century material being more likely to be subject to copyright restrictions and the fact that, as previously stated, the material is unpublished. Building time into the project for permissions is essential: of a full time working week, roughly 2–3 days of our project archivist’s time is spent tracing and researching permissions, with the remaining time allocated to managing cataloguing activities and digitising.

The importance we place on particular items of correspondence from the publisher’s files is determined by the volume of correspondence between publishers (and individuals) with the press and our thematic interests in readership and distribution; the volume of letters can also indicate the significance of a business relationship, and this influences how much time we spend chasing a particular permission. This also affects selection decisions, as it is clearly beneficial to gain permission from the key, frequently contacted estates in order to get valuable research content into the site.

Responses from records and permissions managers, individuals and estates can vary, and are either immediate (i.e. 1 day) or need to be chased several times which, in extreme cases, can go over a year. Some publishers will have clear wait times for permissions; follow up emails are sent after 1 month of no response or after their specified wait time has passed. This does not account for business disruption due to the pandemic, which has caused further delays. The speed of responses can also affect selection decisions as it is far easier to work with responsive organisations and estates. Specific search strategies for MAPP are discussed below.

Except where otherwise stated: copyright language

Copyright language is a major facet in our copyright workstream. Due to evolving issues and priorities, the language crafted for use on the site has been honed over time: blanket statements are too constraining, and we cannot simply rely on the wording of MAPP’s chosen Creative Commons licence (CC-BY-NC-ND) which allows users to share work that has been given due credit for non-commercial purposes but not to share adaptations. There are clear gaps with this form of licence as there are instances where we believe we are able to share material online despite not locating the copyright holder, and circumstances where a rights holder will not agree to CC licence terms and require their own different terms. MAPP has needed to attend to these nuances and create clear distinctions. Working with UoR’s legal department and seeking advice from the WATCH File database, we have made sure our language covers 4 categories:

  • Where copyright status and holders have been identified and permission granted

  • Where copyright status and holders have been identified and permission not granted

  • Where copyright status and holders have not been identified

  • Where copyright status and holders could and could not be identified, but after diligent searches minimal risk has been determined.

Overarching copyright statements and information about CC licencing are detailed on MAPP’s Terms and Conditions page, but individual statements are placed with individual documents and materials where we have needed to note exceptions. Current industry efforts on standardising rights statements, like those of rightsstatement.org who have created 12 standard copyright statements for heritage institutions, have been invaluable, and while rightsstatement.org make clear that their statements are “top level summaries” that may not address “all existing scenarios”, their section on “other rights statements” (rightststatement.org, n.d) has been useful to MAPP when considering the last two categories mentioned above, particularly the statement “copyright undetermined” (rightsstatement.org, n.d). However, we have had to create bespoke statements of our own, either at the wish of the copyright holder, or as mentioned in the above categories, where a copyright holder has been identified but are unable to confirm confidently.

The last exceptions to discuss in connection with copyright language are instances when MAPP shares items after applying for an Orphan Works Licence through the Orphan Works Licencing Scheme (OWLS) and then must follow the OWLS terms when sharing the content. Though this can be onerous, MAPP cannot always rely on take-down statements (and after Brexit the UK can no longer rely on the EUIPO, EU Orphan Works Database) and EU Orphan Works exception. Where we can, we also share works covered by an Open Government Licence (OGL). This is applied in cases of Crown Copyright where works have been published by the UK government: we must confirm that the works are indeed Crown Copyright and check that no exemption (technically a “delegation”) is in force, which would allow a Crown body to charge permission fees and to also license copyright under their own terms. If the OGL does apply, we attach a link to the licence, along with the correct rights statement. Lastly, there is the instance of Bona Vacantia. Bona Vacantia occurs in the case where a UK company is liquidated or dissolved, or where an individual dies intestate, assets belonging to these companies or individuals (assets include any copyrights) may have automatically been vested with the Crown. In this situation MAPP investigates through The Treasury Solicitors (depending if the particular case is English or Welsh) and with Companies House (UK) to see if a notice has been lodged that may disclaim any copyrights. There is often very little information and the situation can be highly complex, compared with the Crown Copyright / OGL scenario, i.e. there is no simple licence application process. Therefore additional legal advice may have to be sought and ultimately a risk-based decision may be required.

Determining strategic elements using copyright

Our strategy at MAPP is to approach the identified rights holders with consent forms to sign, which also requests authorisation to use the CC licence we are choosing to share content with. Our approach is firstly dictated by the identified archival folders after authors and works are chosen. This can be a slow and iterative process, but it enables us to work with individual estates and organisations more closely and allows for the development of positive relationships between all parties on the project—the archive, the scholars, and the estates—and the establishment of a community of interest around these documents and their preservation, digitization, and scholarly framing. There are also those estates and organisations that will set their own terms, rather than sign a blanket permission form created by MAPP.

Determining copyright in a publisher’s archive at item level is complex. Publishers’ book and editorial files sometimes contain no letters from the named author of a book file and the copyright in correspondence written by publishing employees at work—as part of their regular employment—rests with their employer (Owen 2010). For employees of the Hogarth Press (the bulk of our current publishing correspondence), permission is covered by our agreement with Penguin Random House. Where those employees are also authors with literary estates (as in the case of John Lehmann and Leonard Woolf) we have also secured permissions from the literary estates involved. But publishers’ production and editorial files contain much more correspondence than with the named author or publisher of a work. This includes translators, readers, other publishers, and speculative requests from little-known individuals for later reprints or permissions for excerpts, as well as (in our era) the day-to-day publishing traffic with printers, binders, and paper makers. All these individuals need rights clearance before materials go online.

With larger organisations, locating copyright holders can be a procedural relationship. It is often a long process to get through to the correct department, to then follow their permission processes (especially where we are dealing with unpublished archival material). As historical publishing companies have often merged or may have passed through many hands, it is very likely the current organisation does not know if they own the copyright. In these instances the chain of ownership has become unclear, and they cannot or are simply not confident to grant permissions. There is also the possibility that a business has been liquidated, in which instance any copyright could pass to the Crown as Bona Vacantia.

Where estates have passed into the hands of family members, we embrace family histories, and here there is a living element to MAPP. The authors of the past remain connected to the present through their descendants who sometimes share lively anecdotes and are proud and excited to have a tangible link to their family history. Often, there is warmth in these interactions, which are particularly positive when the author has become unknown and information about them has been lost over time. This element offers a compelling effective argument to put in the required labor and time to find and engage with copyright holders: these interactions are often rewarding for all concerned.

There are 4 categories prioritised for rights searches in the book and editorial files: The literary author on file; the individual or creator; The Hogarth Press/ Random House employees; and the organisations that interact with The Hogarth Press (see Fig. 3). After these categories have been identified, the strategies employed in searching and contacting rights holders can begin.

Fig. 3
figure 3

copyright workflows. Image created by Helena Clarkson

MAPP digitisation, archival and

The search strategy

This uses the following steps:

  • To follow the steps laid out for “diligent search narrative” in relation to the process of applying for a UK orphan works licence (Orphan Works Diligent Search Guidance for Applicants 2021)

  • To then consult resources listed in diligent search guides including: ancestry.co.uk/.com, online obituaries, the government will and probate records database, and the WATCH file (Writers, Artists, and their Copyright Holders) or FOB (Firms out of Business) database.

  • Determine the quantity of time to be spent chasing a copyright holder for a response when they are found

  • To send off consent/permission forms or to complete forms sent to us

  • Determine if the search has sufficiently eliminated risk, enabling use of MAPP’s risk managed approached which may involve a take-down statement

The search strategy is largely bespoke and tailored to each search or batches of similar correspondence.

Documenting copyright metadata

As a final step, all information gathered when attempting to contact copyright holders must be recorded.

When copyright status and the holder has been identified by asking the aforementioned questions (or has been determined too difficult to trace), records of all research correspondence need to be maintained (for example: who was contacted and how many times). Once copyright is ascertained, and if permission has been granted, MAPP must retain copies of any licences granted and document information such as: date granted, the cost, and length of the licence. Finally this will culminate in a decision of what copyright statement to attach to a particular document or letter. Any further notes gathered in the copyright research deemed relevant is also recorded and maintained in spreadsheets.

These points have been considered in line with the “PREMIS Data Dictionary for Preservation Metadata” used as a base guide, particularly the section termed “Rights Entity” and a selection of the associated “semantic units” listed within this section. (PREMIS Data Dictionary for preservation metadata, 2015, p 178). “Semantic units” are elements relating to areas of copyright that are advised to be recorded, examples include: rights statements, copyright status, copyright jurisdiction, applicable dates and relevant licence information (p 178–9). We also looked for guidance within the ISAD (G) standard where information on copyright is to be documented under “conditions governing access” (ISAD (G), 2000). MAPP needed to extend on the ISAD (G) guidance (which provided brief notes centring on asking archivists to document whether copyright was known or unknown and how this restricted reproduction). The PREMIS Data Dictionary (though expressed in XML) was therefore considered a far more extensive resource regarding rights, and followed when it came to creating documentation.

Figure 3 offers an overview of these complex work processes.

Copyright and critical archival praxis

As discussed above, working with twentieth-century materials requires engagement with copyright at almost every step and there are numerous instances when copyright concerns clash with MAPP’s “feminism of archival recuperation” (Battershill et al. 2018, par. 6). As demonstrated in the discussion of copyright as critical curation, considerations regarding copyright can be resource-consuming and constrictive, limiting the amount and types of materials that can be included in the archive and, at times, preventing the inclusion of materials that would strengthen an archive’s critical arguments. Pragmatic concerns about copyright can also prove fruitful insofar as they encourage critical digital archives like MAPP to reflect upon their values and priorities and make their critical philosophies explicit. However, the difficulty of documenting omissions (as opposed to materials that are present in digital archives) means that, despite archives’ attempts at transparency, copyright restrictions often create and perpetuate archival silences. One arena where the clash between feminist digital archiving ideology and copyright restrictions often results in archival silence is in the selection of photographs that MAPP displays for author biographies.

The most significant born-digital scholarly content on MAPP are the biographies of people and businesses. These are short essays introducing the lives or business histories of key figures and companies whose materials are included in the archive. The entries are authored by students, emerging scholars, and prominent experts and go through a peer review process before publication as open-access materials on MAPP’s website. Whenever possible, MAPP includes a photograph or portrait of the subject for each of our Person biographies. If it is not possible to find an image of the subject, we use an image of the dust jacket of one of their works instead. This image is also used to promote the biography on the MAPP homepage. Much like a book’s dust jacket, the image creates the first impression that MAPP users have of the biography. In cases where the biography subject is not well-known, the image also often serves as the first impression that users have of the biography’s subject. Similar to the decisions publishers make in selecting a dust jacket for a new book, the selection of the subject image is a political act that is reflective of MAPP’s own values and arguments as a critical digital archive.

Selecting an image for a biography often involves balancing MAPP’s ideological stance with concerns regarding copyright. MAPP attempts to select images that portray the individual subjects as professional figures. This is in keeping with the focus of most MAPP biographies, which emphasize subjects’ publishing histories and careers rather than their personal lives. Portraying biography subjects as professionals in the world of publishing is often a deliberate act of feminist recuperation and intervention. Many of the subjects of MAPP biographies encountered significant gender-based obstacles in establishing their careers and reputations as writers, editors, or press workers. Using photographs from a domestic or childhood context then risks recreating the stereotypes that many subjects of MAPP biographies worked to reject. Instead, MAPP seeks to use biographies and the images that accompany them to present biography subjects as publishing professionals.

Another consideration in the selection of biography subject portraits is whether it is best to choose a portrait in which the subject is on their own or with others. The latter has the advantage of highlighting the subject’s connections to other historical figures. The modernist period is no stranger to photographs that capture interpersonal connections, such as the 1942 London Calling photograph in the BBC recording studio, featuring Mulk Raj Anand, Nancy Barrett, Una Marson, T.S. Eliot, and others. Such photographs have played an important role in our understanding of the complex connections between different modernist figures and organizations. MAPP is certainly interested in drawing connections between writers, publishers, press workers, and others. As part of each biography, MAPP captures metadata about the relationships between person and business entities with the view of enabling and potentially creating network visualizations in the future. However, each MAPP biography is focused on a single person or business. To reflect this focus, we try to select subject images in which the subject is portrayed on their own. The politics of this decision become amplified in biographies of less well-known figures and those who are primarily known by their association to other, more canonical names. In such circumstances, selecting a single-person photograph works to highlight the biography subject’s own contributions to modernism and/or publishing rather than simply reinforcing their associations with other, better-known names.

For international projects like MAPP, copyright difficulties are compounded. For example, many of MAPP’s biography subject portraits are out of copyright in Canada, where copyright typically expires 50 years after the death of the photographer (Murray 2005, p 651). There is some legal debate about how the 2012 Copyright Modernization Act (CMA) impacts copyright for historic photographs. Prior to the implementation of the CMA, all photographs taken before January 1st 1949 were in the public domain in Canada. However, some legal scholars have argued that the CMA brought copyright protections back to photographs whose creators died less than 50 years ago (Wilkinson et al. 2015, p 2). Even in cases where the photographs are out of copyright in Canada, they may still be under copyright in other jurisdictions, such as the United States, where copyright does not expire until 70 years after the author’s (or photographer’s) death (Hirtle 2019, p 1). The differences and inconsistencies in different nations’ copyright laws create additional complications for critical digital archives like ours, which may receive funding from multiple government sources and whose materials can be accessed from all over the world.

Having to be selective about which images we include has caused MAPP to be more self-reflexive about the criteria we use to select images for biographies. We also attempt to be transparent in documenting our decisions and explaining our ideological approach to users through our “About” pages and blog posts. Documentation for images and other materials that are left out of archives due to copyright concerns is typically sparse. For example, MAPP makes note of copyright holders and sources for any biography subject photographs that it uses. But MAPP does not refer users to photographs that we are not able to feature due to copyright restrictions. While doing so would be possible, using a link to a photograph on another archive (or in a print publication) rather than an actual image would negate much of the dust-jacket-like functionality of the biography subject images. General explanations for how MAPP deals with copyrighted images are also unsatisfying, as the copyright status of each biography subject’s photographs can vary significantly depending not only on the subject’s lifespan, but also on the activities of the subject’s estate.

The gaps caused by copyright restrictions often exist simultaneously at both the level of archival content and at the level of metadata and documentation. The duality of these silences poses particular challenges for projects like MAPP, which see openness and transparency about archival selection and decision-making processes as part of a feminist ethos (Battershill et al. 2017 p 88). The interpretive multiplicity of absence also poses challenges: users may assume that a photograph they had encountered elsewhere was not used by MAPP because it did not align with MAPP's ideological stance rather than due to copyright restrictions—an assumption that would perhaps not be unreasonable given the explicitly feminist nature of our archive’s curation and selection processes. Copyright restrictions, then, pose dual challenges to critical digital archives by not only limiting access, but also by making it difficult to adequately draw the contours of those absences.

Conclusion

Projects such as MAPP offer examples of rigorously political engagement with literary history with attendant dedication to feminist collaboration and archival practice that critically responds to research needs, aiming to further feminist research. The MAPP team has developed principles—such as a commitment to free, public access as well as crediting labor and publicly acknowledging contributions of all those who help build the resource—as part of the ethics of developing and managing a feminist project. In this we have been influenced by the “Student Collaborators’ Bill Of Rights” as well as by other projects in the field, including The Orlando Project, and The Map of Early Modern London. D’Ignazio and Klein have recently called for challenging power, embracing emotion, and making labor visible as a cornerstone of “data feminism” and digital feminist projects (2020). In an early intervention on feminist digital archives, Wernimont and Flanders argued that “we need ways of reintroducing an awareness of discipline, theory and method into our understanding of ‘access’, rather than seeing digital production as a form of transparency” (p 433). Ensuring that we see digitization as a critically mediated practice and not as a merely material transformation is crucial. “‘[A]ccess’ is not a simple problem of inclusion”, Wernimont and Flanders remind us, “but a complex adjudication of competing views” (p 433). Creating critical digital archives requires balancing the needs of a variety of stakeholders: archivists, scholars, students, copyright holders, and general users are all groups who interact with these resources and shape their precise character. Considering the ways in which each of the elements that shape digital archives interact with one another (how, as we have discussed in this paper, copyright informs curatorial practice; how theoretical models might drive the alteration and adaptation of metadata schema; and how labour and time are apportioned in a project) is one way of making transparent the layers of mediation that go into the creation of critical digital archives.

This article is our intervention into a field of “competing views” about the value of digital archival resources, based on over ten years experience of working with cultural heritage institutions to give greater access to publishing collections that are currently restricted, diasporic, and crucially still largely untapped in our scholarly fields. Curation of both the born-digital and digitized archival materials included within the resource also requires sustained effort around questions of intellectual property. We have therefore explored how copyright and feminist praxis are fundamental to our workflows and to how we select, describe and display our data in the digital resource. Through a collaborative approach that crosses professional and disciplinary boundaries; a careful and considered development of copyright practices for all materials; and an ethos of honouring and recovering labor both past and present, we seek with MAPP to create a critical digital archive that embodies feminist principles at all levels of practice.