Canon-Making Versus Large-Scale Patterns of Cultural Production

This chapter offers a data-driven approach to constructing and examining the English canon (ca. 1500–1800). This is part of a long-term project started in 2013, which involves extracting data from The English Short Title Catalogue (ESTC), a database that contains records of works published between 1473 and 1800 in Britain and its former colonies. The project has been undertaken by the Helsinki Computational History Group (COMHIS) to which the authors of this chapter belong. COMHIS is an integrated multidisciplinary team that combines big data approaches with expert subject knowledge in intellectual and book history to study the early modern period.Footnote 1

For this chapter, we have quantitatively constructed a canon of works that were published most often, most frequently, and for the longest period of time by making use of a processed version of the ESTC and analyzing it in terms of time, people, places, and materiality.Footnote 2 Importantly, our aim has not been to curate a canon but to extract one based on a systematic quantitative investigation of publishing patterns. Therefore, this chapter provides both methodological insights into such a task and historical insights into the history of (mainly English) printed works. One crucial aim of this study is to demonstrate the enormous analytical potential of harmonized metadata catalogs. For us, this study functions first and foremost as a proof of conceptFootnote 3 and lays out groundwork for a series of case studies developed with the ESTC.

Defining the Canon: A Brief History

When speaking of written works, the canon commonly refers to “lists of approved authors” of literature.Footnote 4 In his important monograph on the making of the English canon in the eighteenth century, Jonathan Kramnick calls it “a pantheon of high-cultural works from the past.”Footnote 5 What works are to be included in such a pantheon, however, is far from unambiguous. Samuel Johnson believed that canonical works of literature are able to communicate timeless human experiences due to the uniformity of human nature; these are works which transcend the particular.Footnote 6 Similarly, David Hume believed in a “universal” form of art which is superior to lesser forms, such as those rooted in the mundane and the particular.Footnote 7 It is from this perspective that the literary critic emerges as a moderator of the canon, or, as Ernst Robert Curtius put it, “the artificial safeguard of a tradition.”Footnote 8 Conversely, there are those who dismiss the idea of objective canonicity, and argue, instead, that the canon is a reflection of contemporary or past values.Footnote 9 While our age, 250 years after Johnson’s time, is “as much post-canonical as post-modern,”Footnote 10 this is not a recent view. In late eighteenth century, Isaac Disraeli wrote that “different times … are regulated by different tastes. What makes a strong impression on the public at one time, ceases to interest it at another … and every age of modern literature might, perhaps, admit of a new classification, by dividing it into its periods of fashionable literature.”Footnote 11 Other thinkers, such as Jean Jacques Rousseau, went even further, arguing that the very idea of culturally celebrated works being universally good is a dangerous fiction.Footnote 12 More recently, Richard Proudfoot has argued that “‘canons’ were abhorred as restrictive practices by critics and theorists [of Shakespeare’s time] who wanted to impose other kinds of restriction, of their own choosing, on the study of literature.”Footnote 13 It is unsurprising, then, that defenses of particular values, such as those found in Harold Bloom’s The Western Canon, continue to raise questions of inclusion and exclusion.Footnote 14 While, for Johnson and Hume, it was the task of the literary critic to identify canon-worthy authors and works, for critical theorists and post-colonial advocates building canons upon individual judgments is open to, and perhaps deserving of, criticism.Footnote 15

It is this ambiguity around the very idea of canonicity that this chapter aims to overcome through a quantitative analysis of patterns of large-scale cultural production. By moving away from debates around style, subjectivity, objectivity, and universality, and turning, instead, to availability, this chapter hopes to sidestep this debate. While most scholarship on early modern and eighteenth-century British canon formation approaches this subject from the perspective of canon-makers like Samuel Johnson or John Dryden, a key aim of this chapter is to introduce a systematic method for studying the canon and the process of canon formation which does not focus on particular authors or literary critics.Footnote 16 To achieve this goal and move beyond ontological debates on what the canon actually is, we have also expanded the temporal limits of canon formation.

Much of the debate around the early modern English canon focuses on whether literature, in its modern sense, was born in the eighteenth century or earlier.Footnote 17 As Kramnick argues, “the decisive reception of the English literary past was settled during the mid-eighteenth century. Years of critical discussion coalesced then into a durable model of literary history and aesthetic value.”Footnote 18 However, in a similar vein to the individual-centered approach, this assessment starts from the position that, before one can define the canon, one must define its contents. To avoid this question, we take the preliminary position that printing itself was the starting point for a work to be potentially included in the canon. While we do not deny that the eighteenth century, and particular types of written works, are key to the canon, we think that defining the canon by these features is to put the cart before the horse.

Since Elizabeth Eisenstein’s seminal study, The Printing Press as an Agent of Change, it has been clear that the history of the book is an integral part of the formation of early modern civil society.Footnote 19 On the whole, however, the role of the printing industry in the process of canon formation during the early modern period has remained understudied, especially from a quantitative perspective, despite scholars from outside of book history highlighting the benefits of such an approach. Benedict Anderson, for example, coined the term “print capitalism” to describe the complex dialectics of modernization through literacy and commerce tied to the book trade.Footnote 20 Similarly, Jürgen Habermas identified the print and its growing mass consumption as key to the emergence of the public sphere.Footnote 21 In the same tradition, this chapter acknowledges that large-scale cultural production is related to the formation of common values, but it starts from the premise that to engage with the canon at this level requires a wider view of what the canon could be.

To come to understand what the canon could be, we turn to Alastair Fowler. Fowler’s taxonomy identifies six kinds of canon: the potential, the accessible, the selective, the official, the personal, and the critical.Footnote 22 While the selective, official, personal, and critical canons are not directly within the scope of our study of large-scale cultural production, the other two get closer to what we aim to engage with here. The potential canon “comprises the entire written corpus, together with all surviving oral literature”Footnote 23; that is, an ideal canon is made up of the totality of the written corpus, including what has not survived, as well as oral literature which is potentially lost and often scattered when not. Therefore, for our purposes, we turn to the “accessible” canon, or the portion of the potential canon which is available at a given time. An important upshot of this approach to the canon is that “creative writings [are] not dissociated from referential ones such as history, oratory, letter writing, and preaching.”Footnote 24 To be printed is to be potentially canonical (we could call this the commercial canon ) and, importantly, we have a robust record of these works in library catalogs, such as the ESTC.

This study takes the ESTC and treats it as a record of cultural production, and thus as the “accessible” canon. Its contents (which we will return to in more detail shortly) are taken as data points which can be quantified and tracked over time. Based on these records, some works emerge as having been printed more frequently than others; some subjects and topics come into and fall out of fashion; and some authors gain or lose audiences at different times. When approached from this perspective, decisions about who should and can be canonical are no longer made by us, but by the cultural environment they emerge from. We disconnect ourselves from individual canon-makers and focus, instead, on large-scale cultural production. Thus, Shakespeare and cookbooks are treated as equals in their potentiality to be part of the canon. Our underlying principle is that these are works which particular people decided that should be printed at particular moments in time.

By focusing on products that were printed over an extended period of time, our approach to the canon could run the risk of being exploratory, mundane, intrinsic, and perhaps even positivist. This should not be regarded as a weakness, however. Instead, it is, perhaps ironically, in line with the tradition initiated by Quentin Skinner and the Cambridge School: it allows us to create a canon which is born out of its own historical context, similar to the one that would have been accessible to those living at a given time.Footnote 25 Moreover, by looking at all works in the ESTC, and comparing them across subjects and genres, a new way of seeing how cultural capital is formed emerges. That is, by comparing the popularity of aesthetically valuable literature with that of more mundane works of domestic economy, we may be able to recognize things about literature that definitions of aesthetic value miss. As Alex Thomson suggestively put it, “the very idea of literature might be a function of the way that we look at the past. What has been seen as literary in the past has often been treated dismissively by subsequent generations, so it seems perfectly reasonable to say that a book can be literary at one time and not at another.”Footnote 26

Defining the Canon: Available Data

The ESTC is a comprehensive union catalog listing early modern books, serials, newspapers, pamphlets, broadsides, and other ephemera printed between 1475 and 1801.Footnote 27 Covering over 480,000 documents held by more than 2000 libraries, it is an essential record of early English print culture. However, like other library catalogs, it is also “a greatly underestimated source of knowledge.”Footnote 28

It is important to note that the original purpose of an analytical bibliography is not to support quantitative research, but to preserve as much original information regarding the printed material as possible. However, at the same time, each ESTC record represents a unique printed document, edition, reprint, or variant. Theoretically, every known variant of a work should have its own distinct record in the ESTC. When we combine this fact with the realities of the hand-press printing period, a technology which remained remarkably stable until the nineteenth century, it becomes possible, through careful harmonization,Footnote 29 to treat these records as comparable units. For example, estimating the popularity of a particular work based on the number of its editions and reprints makes sense due to the relative stability of print run counts (something which, following technological innovations in the print industry during the nineteenth century, becomes more difficult due to the larger variations between print runs). Thus, the ESTC allows us to examine the canon from a data-driven perspective.Footnote 30

It is also important to understand that, even though an immense amount of work has gone into processing the ESTC, it remains imperfect both as a catalog and a dataset.Footnote 31 Like other early modern catalogs, it cannot be considered as comprehensive because it does not contain information about every existing early modern British publication. Some works have been lost while others have been collected with concern for posterity.Footnote 32 Thus, while a quantitative analysis of the early modern period is exciting, we must recognize the limits of the data we have access to. Although there are statistical approaches to estimating lost works, this chapter will focus, instead, on one aspect of the dataset which we can more confidently explore: the most popular works recorded in the ESTC.Footnote 33 These are works which are both historically more likely to survive and quantitatively more representative of the overall publication output.Footnote 34 With these reservations in mind, one of the strengths of the ESTC’s records lies in their robustness: they contain as many as 420 fields, each with its own attributes, ranging from a work’s physical features to information on the libraries holding copies of the work. Their weakness, however, lies in the significant effort needed to extract this information at scale. While the ESTC records use the Machine-Readable Cataloging (MARC) 21 standard, making use of this data requires extensive processing.Footnote 35 A brief overview of the process clearly shows this challenge.Footnote 36


The data for actors connected to entries in the ESTC are contained in a wide variety of MARC fields.Footnote 37 The extraction process varies depending on the type of actor (author, printer, publisher, editor, etc.) and the amount of information about that actor documented in the metadata. In an ideal case, we are provided with the actors’ names, years of birth and death, and their role in a particular printed work; in less ideal cases, we may only be provided with a set of initials or a verbatim repeat of a work’s imprint.Footnote 38 As each record is distinct and there is no explicit information linking the actors or the works recorded, a large amount of cleaning and unification is needed. Strings representing data points must be extracted, cleaned, corrected, compared (internally and externally, by making use of other databases), unified (when appropriate), and finally harmonized. Incomplete or varying spellings make the task even more difficult as identifying and grouping mentions of the same historical actors requires extensive post-processing. We have currently compiled a database of 144,399 unique actors (of which 56,693 are primarily book trade actors and 67,924 primarily authors), with 1,107,777 links to titles documented in the ESTC.Footnote 39


An additional issue with the ESTC is the lack of unifying links between multiple editions of the same work. If one searches for Romeo and Juliet, for instance, there may be 80 results based on exact string matching, but no explicit links among these works. Moreover, when one accounts for title variations, repetitions, or commentaries, connecting the correct items becomes a very difficult task. To address this issue, we have used a work-field dataset as the foundation of our process. Since this is an integral part of the workflow that enabled us to start discussing the canon based on ESTC records, it is necessary to explain this process in more detail.

Our aim was to draw out relations between discrete records and link them as single works; in other words, to create a relational model. The fields we chose for matching different records included, first, the edition statement, which provides a specific edition number for each record. Currently, only a small subset of the whole ESTC contains information in this field (around 44,000 records). Second, we used the title and title remainder fields to identify the complete title of a record, while the title uniform field provided a representative title for the work. Lastly, we used the publication year of the record to provide a chronological ordering of the various editions. Distinguishing information, such as publication date or edition number (if available), was then used to determine more precisely what a particular edition is and organize all the editions of a work in chronological order.

The dataset was created through a multi-stage harmonization process, which began with an initial cleaning of the various fields needed for this task. Any unwanted characters and stop words were removed, and the text was converted to lowercase. An initial dataset was then created by combining the actors responsible for works with harmonized titles; these actors were used to differentiate among different works with the same title. This provided a unique work-field identifier for each record. The dataset was then segmented and harmonized on a per-actor basis.Footnote 40 This was performed by specialized algorithms, which were able to determine the representative work-fields of these editions. This allowed for an effective grouping of editions despite differences in title lengths and spelling. Finally, as some works have multiple actors attached to them (often at different times), another harmonization step was necessary to properly collate these duplicated works into single, unified work-fields.

Figure 3.1 shows a comparison among the original record counts from the ESTC (including first editions and reprints), the processed unique works as derived from the work-field dataset (where the reprints have been removed), and the first prints of the works included in the canon that we extracted. The original records include prints without any actor information, which raises the number of records to over 480,000. However, by processing unique works information, we effectively normalized the whole ESTC dataset, which resulted in a significantly lower number of works for each decade. As the canon dataset is based on the work-field dataset, it follows a similar trend.

Fig. 3.1
figure 1

Total documents, works, and canon items in the ESTC per year (1500–1800)

It is important to note that the work-field harmonization and dataset creation is an ongoing process, which is iteratively and continuously being improved for more accurate grouping of editions. At the moment, however, the work-field dataset consists of 200,378 works covering a total of 361,245 records.


It was important for this study to include information about the subject-topics (such as “Religion,” “Literature,” and “History and Geography”) of the works examined.Footnote 41 While other approaches may see this as a necessary step for determining what to include in the canon (i.e., which subject-topics are more aesthetically or culturally valuable and, therefore, worthy of being included), for our purposes it allowed for further qualitative reflections, such as temporal changes within the canon. This information, however, has not been recorded in the ESTC comprehensively (roughly only half, or 266,207 documents in the ESTC, have subject headings) or systematically (there are 12,553 unique subjects).Footnote 42 It was, therefore, necessary for us to modify and enrich this data.

There are numerous models proposed for categorizing subject-topics, and they range from classification systems developed by ancient authors, to early modern attempts to revise such systems, to current efforts to create appropriate models for quantitative analysis.Footnote 43 Because the approach in this study is computational, a hierarchical classification system by subject was aimed for. To this end, we chose the Dewey Decimal Classification (DDC) system.Footnote 44 As classifying every single work in the ESTC was not an option, we chose, instead, to hand-classify a selection of works and, then, used this information with the existing subject headings in the ESTC to create a conversion table which was used to further extrapolate the data. That is, each entry with subject-topics in the ESTC was compared with the manually entered data, and this training dataset was used to find the most common equivalent ESTC subject-topic for each DDC category. The resulting DDC-to-ESTC translation table was then used to assign Dewey-style subject-topics to non-hand-categorized ESTC entries. In total, we hand-classified 1153 works, which represent a total of 47,041 individual documents. From these, we were able to classify another 62,342 documents with the conversion table.

We should note that the catalog has many unique and rare topics: 7957 topics were used in only one instance and they range from individual psalms to specific years to “Granby (Race horse).” In the end, we identified 53,683 works with a subject-topic in the original ESTC but with no equivalent in the DDC. This is a typical example of the diminishing returns of manual work in digital humanities: the remaining untranslated subject-topics are increasingly rare, so the payoff for each additional manual entry decreases.

Defining the Canon: Methods

As stated, our aim in this study is to construct a canon using a data-driven approach and analyze the works contained within it. As this is a data-driven approach, there is no qualitative judgment made at the curation stage. Instead, we aim to construct a list of canonical works that is born out of historic publication records. To do so, it is necessary to define a set of features which could be considered representative of canonical works and examine the entirety of the ESTC for these features. The features we have chosen include the total publication count and publication frequency, which have been further normalized by the overall publishing activity during the same period. Based on this canon index,Footnote 45 we identified the top 1000 works that were printed in relatively high numbers, relatively frequently, and over a relatively long period of time.Footnote 46

This method has resulted in the inclusion of works that qualitatively would never be considered as canonical but, nevertheless, have been included in the data-driven index as they have identical data profiles to works that we would expect to be included. For example, while almanacs may be considered less historically important to many, their frequent and stable reprinting patterns mean that they feature in our list. Their relevance, of course, depends on the context of publication. While an almanac lacks literary significance, from a commercial standpoint it is very significant, a point which can also be made about commercial catalogs that emerged in the latter part of the eighteenth century. Similarly, while grammar books played a key cultural role at the time of their publication, some may argue for their exclusion from lists of literary bestsellers.Footnote 47 Indeed, for the most part, these works do not fit the category of printed cultural material we are interested in. For this reason, printed versions of laws and political reports, liturgies and local church documents, catalogs, almanacs, curricula, periodicals not published as collected works, and annual reports from various associations have been excluded from our canon,Footnote 48 leaving us with 856 works.Footnote 49

The Canon: Works, Time, and Subject-Topics

When treated as one continuous historical canon, the top 20 works we have extracted based on the current data are displayed in Table 3.1.Footnote 50

Table 3.1 Top 20 canonical works (1500–1800)

One can immediately notice the diversity of these works. Included are the expected works of poetry and fiction, but also devotional literature and language grammars—works which would normally be excluded from a literary canon. However, as we have stated, our aim is not to curate a canon but to extract one. It is possible to purge works post hoc, but before such a step is taken, it is worth noting and investigating what does make the list. Questioning the works and authors who make up the extracted canon will be a recurring theme throughout this chapter, but to give one example, we will turn now to William Vicker’s A Companion to the Altar.

A relatively obscure work today, this short text was written to spiritually prepare readers for holy communion. During the eighteenth century and beyond, the work was often offered as a gift to those preparing for confirmation and was frequently printed along with The Book of Common Prayer. This practice can be considered as an explanation for its 131 records, and perhaps be seen as a good reason to exclude this work from our canon. However, the strength of our approach lies in confronting and reflecting on works such as these. To have been published in such large numbers means it reached a large audience, and it is, therefore, worth reflecting on the number of people who would have been familiar with its contents. We know, for example, that it was one of the 20 volumes owned by Jane Austen and that “she made constant use of the devotions contained in it.”Footnote 51 Thus, while the work itself may be seen as distinctly uncanonical by other definitions, its relationship to both the canon and the historical-cultural moment of its publication should encourage further reflection.

Of course, raw statistics offer only one view of many, especially if we are looking at the canon atemporally. For example, when one looks at the frequency of publication for works included in the ESTC, the emphasis will be on the latter part of the eighteenth century, when most printing activity took place. However, by constructing a data-driven canon which takes into consideration the relative longevity and publication frequency of these works, we can also provide a temporally representative selection. While this temporality can be seen in the overall distribution of these works (Fig. 3.2), this approach also allows us to make more specific analyses. For example, in Fig. 3.3, we can also see the works which were most frequently printed per decade.

Fig. 3.2
figure 2

The Full Canon (1500–1800). These canonical works have been sorted by the first publication year. Individual dots indicate the publishing year for the initial publication and all subsequent reprints

Fig. 3.3
figure 3

Most frequently printed works (1500–1800). The point size indicates the number of reprints for each work (rows) during a given decade (columns)

As previously noted, we have also assigned subject categories to many of the works in the ESTC, as well as to all the works which make up our canon. By categorizing these works, we can also examine the canon by subject. For example, the top-ten literary works in each category can be seen in Table 3.2.Footnote 52

Table 3.2 Top 10 canonical literary works (1500–1800)

This data also allows us to examine the changes in the distribution of subject-topics in the canon over time and recognize that subject-topics emerge and subside at particular historical moments. These shifts are not entirely surprising. As Fowler noted, “the complete range of genres is by no means equally, let alone fully, available in any one period … Moreover, each age makes new deletions from the potential repertoire.”Footnote 53 We can see this in Fig. 3.4: by including more than strictly literary works in the canon, we can see the emergence of new types of printed documents from the mid-seventeenth century onward.

Fig. 3.4
figure 4

Temporal variation in the relative frequency of the most common subject-topics in the canon (1500–1800). The relative variability in publishing frequency is higher in earlier time periods due to the lower number of total published works

Religion and Literature

By combining data covering top-works, temporality, and subject-topics, we can begin to construct more complicated versions of the English canon. One can see, for example, the importance of grammar books during early printing era (under the category “language”), and then recognize the dominance of religious works up to the late seventeenth century, followed by the rise of literary genres (drama, fiction, and poetry) at the beginning of the eighteenth century.Footnote 54 On the whole, it becomes possible to recognize epistemological shifts between the Renaissance and the eighteenth century.Footnote 55 In particular, frequently printed late Renaissance works are largely religious tracts, classical works, and grammar books. While the classics, especially Aesop’s Fables and Cicero’s On Duties, continue to be printed until the end of the eighteenth century, religious works, such as The Book of Hours, The Book of Common Prayer, and The Book of Psalms, become relatively less apparent. When looking at the eighteenth century, however, works that are more traditionally considered canonical do emerge—particularly with respect to literature. Thus, while literature clearly had a central place in the canon, its volume in terms of printed works compared to religious works is cemented by 1700. When analyzing the subject-topic distribution in the canon before and after the eighteenth century, the decline of religious and grammar books, and the rise of literary genres, especially drama, becomes apparent. This pattern holds true for all the works in the ESTC that include subject-topic information (although, overall, religion holds its place better than history and geography, for example).

Thus, confirming some scholars’ historical expectations, the eighteenth century emerges as a key moment in the history of the canon. We would hesitate to state that this makes the eighteenth century representative of the canon, however. We would, instead, note that there is an epistemological shift regarding what is considered canonical at this point in time. This does indicate that, depending on one’s analytical aims, the year 1700 is a potentially useful marker.Footnote 56 Other potential markers do exist, however, as we will see below.

Donaldson v. Becket and the Importance of 1774

It has been claimed that, “[o]n 22 February 1774, literature in its modern sense began.”Footnote 57 With Donaldson v. Becket, the House of Lords ended the perpetual copyright and, consequently, London’s monopoly over print in Britain, allowing for a back catalog of cultural goods to enter the public domain. The impact of this event on the print industry was profound. As we have noted in previous quantitative research into the ESTC, the relationship between the London printers and publishers in the 1770s changed radically.Footnote 58 Our concern in this chapter, however, is to identify what was offered to the public at a large scale after this act. While there is clear evidence that Ross’s thesis of radical change should be visible in our data, the question is whether our canon diverges from this expectation or not.

To answer this question, we created a post-1774 canon which was compared with the complete data-driven model. We extracted the 1000 most printed works originally published before 1746 (thus making them potentially public domain), reprinted after 1774, and printed in Great Britain (where the law applied).Footnote 59 We then compared these works with those included in the larger canon, looking for substantial differences. It must be noted, however, that the limits placed on the post-1774 canon mean that it is a substantially smaller subset. While the entire processed ESTC has currently 361,245 harmonized recorded documents (representing 200,378 works), only 8925 of those documents (1997 works) meet the post-1774 criteria. Therefore, to compare the two directly is not entirely meaningful as a substantial number of works in the data-driven canon are missing. However, we can still examine how the top-works in the post-1774 canon are distributed in our data-driven canon.

The results show that almost a quarter (23.2%) of all post-1774 works can still be found in the larger data-driven canon. Moreover, the overall distribution of the highest ranking post-1774 works (i.e., most printed) overwhelmingly falls at the top end of the larger canon. This is not entirely surprising, though, as printing frequency is one of the defining features of the data-driven canon. This comparison does, however, verify that this data-driven approach recovers works similar to those in the post-1774 canon with substantial coverage and accuracy. What is more important to this study, however, is what is not captured in the post-1774 canon.

Amongst the top 500 post-1774 works, only 41 are not in the data-driven canon. Of these, 16 are works of literature. The types of works not included in these 41 titles are spread across subject-topics, but the most common type of work is drama (eight titles). On the other hand, amongst the top-500 data-driven canonical works which were in the public domain, 59 are not in the post-1774 list. Of these, 21 are works of literature. Importantly, the works which did not make the larger data-driven canon but are in the post-1774 list are generally printed less frequently. When including all prints (not just those in Britain) the mean number of prints in the data-driven canon is 19, while the number of works in the post-1774 canon is 7. If we only look at prints in Britain, 90% of post-1774 works have 10 or fewer reprints.

Interestingly, a total of 145 literary works found in the larger data-driven canon are missing from the post-1774 canon. A significant number of these works (50) are works of fiction. These include works which were still covered by copyright in 1774, such as Henry Fielding’s Tom Jones (1749), Samuel Johnson’s Rasselas (1759), and Oliver Goldsmith’s The Vicar of Wakefield (1766), as well as older works which, while still published in the eighteenth century, did not reach the same edition count as some newer works, such as Boccaccio’s Decameron (first recorded in the ESTC in 1562), Robert Greene’s Pandosto (1588), and Philip Sidney’s Arcadia (first recorded in the ESTC in 1590).Footnote 60 Additionally, 34 works of poetry, 34 works of drama, and 27 miscellaneous literary works are not included in the post-1774 canon. Authors of works that did not make the post-1774 cut include Jonathan Swift, Hannah More, Jean-Jacques Rousseau, Voltaire, Michel de Montaigne, Alexander Pope, Ned Ward, John Dryden, Benjamin Franklin, Thomas Paine, and Richard Steele.Footnote 61

Overall, while there is clear evidence that the 1774 legal changes did have a substantial impact on publishing, especially of works that one would consider canonical, this is an event which takes place so late in our dataset that its impact is likely to be seen more in the following periods not covered by our dataset. This includes works which, for various reasons, fell out of favor toward the end of the eighteenth century, as well as works which were still under copyright in 1774. As we are interested in the canon as it developed and was available over the entirety of the period covered by the ESTC, the actions of a temporally specific group of people should not be over-represented. However, it is worth noting one important upshot of the comparison between the two datasets: the general overlap of coverage and the opportunity to understand the causes of the missing works provides further verification of the methods used to construct our canon.

People: Authors

When discussing the people responsible for the canon (with particular emphasis on authors), we must acknowledge that the importance we place on them today is radically different from the importance their contemporaries had placed on these authors in their time. As Adam Rounce has noted with regard to Samuel Johnson, “[t]he desire to be recognized as an author and to profit from it coexists with an awareness that much work in the burgeoning world of print and literary journalism was not especially intended to be handed down to posterity; the sparsity of works with Johnson’s name on the title page in his lifetime indicate his pragmatism.”Footnote 62 In fact, it was only toward the end of the eighteenth century that authorship began to take the form that we recognize as typical today and, therefore, for the majority of the era covered by the ESTC the notion of authorship was quite different.

One of the clearest examples of this is anonymity. Up until the eighteenth century it was quite common for authors not to be credited, by choice or by practice, for their work. For instance, between 1679 and 1800, the ESTC has 239 records with authorship attributed to a “Lady.” Many similar examples can be found in our canon: multiple works by Defoe were initially published without any attributed authorship, Philip Francis’ criticisms of George III’s government were penned under the name “Junius” for obvious reasons, and Richard Steele used the nom de plume Isaac Bikerstaff in The Tatler (which was itself borrowed from Jonathan Swift).Footnote 63 Bickerstaff is indicative of another aspect related to authorship: the practice of collaborative writing. While Steele can be credited with most issues of The Tatler, he was not the sole author: Joseph Addison and Jonathan Swift also contributed pieces to the periodical, a practice which would be taken further with The Spectator. For many, in fact, authorship was not something that was held in any regard. The “hack” author, paid by the page or word, willing to put his or her skills to use for any topic or patron, was a common presence during the eighteenth century and was successfully immortalized in Pope’s Dunciad (1728, 1729, 1743). In other words, when thinking about the canon, it is important, on the one hand, not to put undue weight on authorship, and, on the other, to recognize the exceptionality of those who were able to step into the authorial spotlight.

Based on our records, 833 works in the canon have a person or organization as author and 556 of these are unique. There are different ways of speaking about who the top authors were or may have been: one can count the authors who published the most canonical works, the authors who published the most editions of these works, or the authors who have the most records in the ESTC. In each case, a different picture emerges, although we see repetition (Table 3.3).

Table 3.3 Top authors and works, canon editions, and total works recorded in the ESTC (1500–1800)

When we look at the most published works per decade (Fig. 3.3), we can generate a complementary list that attempts to estimate which works were “bestsellers.” In this case, no author who makes the top-ten-per-decade list has more than four works included. Additionally, the list emphasizes early authors, such as John Stanbridge (1463–1510), Robert Whittington (approximately 1560), Richard Allestree (1619–1681), John Brinsley (active 1581–1624), and Edward Coke (1552–1634), none of whom make the general list. This is likely due both to the lack of competition when printing first emerged and to the high number of reprints of grammars (see also Fig. 3.5). In contrast, authors like Isaac Watts and Daniel Defoe have only two works that make the top-ten-per-decade list. This is an important insight: it shows that there are various ways to integrate the data that allow for both contextual and longue durée insights.

Fig. 3.5
figure 5

The most popular subject-topics for the ten most printed works in each decade from 1500 to 1800

When examining the subject-topics by top-ten canon authors (Table 3.4) literary works dominate the list, although there are a few outliers, such as Defoe’s familial advice, The Family Instructor (1715), the conduct piece, Religious Courtship (1722), and Wesley’s medicinal textbook, The Primitive Physick (1747). While temporality plays an obvious role in these results, overall there is a decent spread of authors, covering various time periods and genres (Fig. 3.7).

Table 3.4 Distribution of subject-topics among works by top authors (1500–1800)

In addition to identifying canonical authors, we have also generated some general statistics regarding the lives of the authors working during the ESTC era. By looking at the authors whose birth and death years are available, we notice that the average age of authors when publishing a first work is quite high (over 40). Additionally, many authors, and especially canonical authors, continue to be published after their death, and, with the advent of the public domain, more and more frequently (Fig. 3.6).

Fig. 3.6
figure 6

Top authors (1500–1800). The point size indicates the number of publications for each author, including reprints (rows), per year (columns). The color indicates publication before (red) and after (blue) death, respectively. The authors have been sorted by their death year

Analysis of posthumous publication frequency indicates that being published after death is more common for authors during the early modern period, ancient authors excluded (Fig. 3.6). According to our data, the median number of years to the first publication after death is one; however, for 2.7% of the 1455 authors who were included in this analysis, the first posthumous publication appears over 100 years after their death. The frequency of posthumous publications within the first 50 years after death shows a steadily declining pattern over time.Footnote 64 For the first half of the sample (until the 1650s) the percentage is higher, but we should remember that at the time it was more difficult to be printed due to limitations in the print industry. Thus, existing resources were most likely directed toward printing canonical works, or works by established authors (Fig. 3.7).

Fig. 3.7
figure 7

Posthumous publication frequency. The percentage of authors in the ESTC with posthumous publications during the first 50 years after death, grouped by decade (1470–1800). This analysis includes the 1544 authors whose lifetime data is available for the investigated period with one or more posthumous publications

As noted previously, while most authors we have extracted from the ESTC can uncontroversially be labeled as canonical, there are some whose inclusion could be contentious. However, as was the case with William Vicker, we may want to reflect on their inclusion before deciding to purge them from the canon. William Lily, for example, is worth noting. Strictly in terms of publication counts, it is understandable why he ranks so high: his Latin Grammar (although written by many hands, including Erasmus) was granted a royal monopoly as the only Latin textbook to be used in schools from 1540 onward. However, while Lily was mainly known as a schoolteacher and grammarian, his contribution to humanist education should not be overlooked.Footnote 65 Instead, we should acknowledge that his work was a cultural constant amongst all “upper-class British males” for over two centuries, and that its contents, including hundreds of quotes from Roman writers, were known by most educated readers by heart.Footnote 66 This fact was disapprovingly recorded by John Locke in Some Thoughts Concerning Education (1692): “Custom serves for reason, and has, to those who take it for reason, so consecrated this method, that it is almost religiously observed by them, and they stick to it, as if their children had scarce an orthodox education unless they learned Lilly’s grammar.”Footnote 67 If one of the aims of a data-driven approach to canon formation is to highlight works that were essential to the cultural space of the time, Lily’s inclusion is an important one—the impact of his grammar was profound, influencing a host of canonical authors, including John Lyly, Ben Jonson, Thomas Fuller, George Borrow, Charles Lamb, Edgar Allan Poe, and, of course, William Shakespeare (Fig. 3.8).Footnote 68

Fig. 3.8
figure 8

Timeline of Shakespeare’s publications included in canon. The point size indicates the share of the publisher with most prints of the indicated work (rows) per decade (columns)

Finally, it is worth turning to, perhaps, a more conservative canonical author who emerges from the above analysis. One benefit of a data-driven approach to the canon is that it enables us to shed new light on the printing and publishing history of specific authors. If we focus on Shakespeare’s publications, for instance, a few points of interest emerge. For instance, while popular during his lifetime and continuously published throughout the seventeenth century, Shakespeare was printed less in the mid-seventeenth century. This was partially caused by censorship during the English Civil War and Interregnum years (1642–1659), which was not the most propitious time for printing and performing plays.Footnote 69 However, the data also shows that Shakespeare reemerged most strongly in the eighteenth century. This is directly related to the impact of several publishers who, by printing individual plays by Shakespeare, encouraged a growth in the popularity of his works and initiated a newly invented canon-making business (in which Pope, Dryden, Johnson, and others were an important part). Thus, we can see in our data the effect that the publishing efforts of Robert Walker, Jacob Tonson, John Bell, Edward Harding, and others, had on Shakespeare’s canonization. It is, therefore, important that we also discuss these actors.

People: Printers and Publishers

Developments in the literary canon went hand in hand with developments in the book trade itself. This trade was transformed and driven by the economic expansion that occurred during the seventeenth and eighteenth centuries, when the scale and volume of all kinds of printings greatly expanded. By the end of the eighteenth century, this led to significant changes in the structure of the book trade, one of which was the increased specialization of the people involved.Footnote 70

Within the existing data regarding book trade actors (printers, publishers, and booksellers), the data on booksellers is the most sporadic. Canonical works, however, provide better information on publishers than the dataset as a whole (Fig. 3.9). Of all the titles cataloged in the ESTC, 27% do not mention any book trade actors; in the subset of titles included in the data-driven canon, this number is 13%. Out of the three categories of book trade actors, publishers are the best represented in the data while booksellers are the least represented, with 85% of titles not mentioning them. The printer information is also missing from 64% of titles. Overall, the number of specialized roles linked to canonical publications increases toward the end of the period, as expected.

Fig. 3.9
figure 9

Titles with missing book trade actor data. This figure charts out the coverage of the book trade actor data, as found in the catalog records (1500–1800)

An observation relating to data quality should be made here. The ESTC joins two major catalogs, STC and Wing, with the dividing line between the original catalogs running at the year 1640. The pre-1640 period seems to have more carefully documented metadata but, at the same time, the number of entries in the database immediately shoots up after the divide. Part of this can be attributed to an increase in the publication activity, but this is also partially due to the better coverage of the published material in the latter half of the catalog. The number of unique book trade actors mentioned in the catalog follows a more consistent curve compared to the absolute number of titles, which can be taken as an indication that the individual actors involved have been detected reasonably well.

Looking at the publications linked to individual book trade actors also reveals a wide variety of profiles. A publisher’s output can vary from a few to hundreds of titles, so it makes sense to create a rough categorization of the publishers based on this variable.

To explore this aspect of the book trade from a data-driven perspective, we divided the publishers into percentiles according to their publication output. The publishers were ranked yearly by their output, and as expected, the highest quantiles of the book trade dominate the data. What is immediately apparent is that the top 1–5% publishers account for over half of all publications (where the publishers are known), with no major variance over time (see Fig. 3.10 for a closer look at the first percentile’s share). This matches historical expectations: the discussion around the 1774 Donaldson v. Beckett decision (and the propositions made to the Parliament by both established London booksellers and the “little low Stall Booksellers in Middle Row Holbourn”Footnote 71) emphasizes the division between the economic leaders of the book trade and the numerous, but less established, latter actors. The dividing line runs between the top London publishers, who owned many of the more lucrative and valuable copyrights, and members of the London trade who were not part of this select circle, as well as the less central publishers in Scotland and provincial England.Footnote 72

Fig. 3.10
figure 10

Share of publications by the largest publishers (top 1%)

The copyright battles of the eighteenth century were not over the right to print in general, but rather over the possession of intellectual property rights that had greater financial potential, such as canonical works. Here, too, the leading publishers had higher percentiles. Their share of the data-driven canon is proportionally higher, as illustrated in Fig. 3.11.

Fig. 3.11
figure 11

Share of the canon by largest publishers (top 1%) for unique works, as derived from the work-field dataset

It has been claimed that the end of the eighteenth century was pivotal in changing the nature of the publishing business, with publishers starting to rely less on profits from “safe” reprints and taking on a more modern, entrepreneurial character.Footnote 73 The increase in mentions of booksellers in the publishers’ statements could be taken as a confirmation of this claim, as it reflects the commercialization of the print industry in more than one way (see the overview of the actor data in Fig. 3.9). With increasing specialization within the trade, and a growth in the market for books, an increased need for advertising followed. Indeed, publisher statements often include advertising-like language, and provide practical details, such as bookshop locations, lists of other works available, and so on. On the other hand, however, the distribution of canonical and non-canonical works does not significantly change during the eighteenth century. The canon stayed in relatively few hands, even after 1774. While there was a gradual increase in the “fluidity” of the publication business during the eighteenth century, that is, works tended to change hands more often, this trend is relatively temperate, with no sudden hike in the 1770s. Additionally, while the relative number of new works and reprints by new actors compared to reprints by established actors does increase (Fig. 3.12), the changes are less dramatic than it must have appeared to the worried copyright-owning printing elite of the time.Footnote 74 In fact, consumers’ demand for books increased dramatically toward the end of the eighteenth century, which can be seen in the number of printings documented in the ESTC for that period.Footnote 75 Thus, even if the established elite was challenged by those seeking to make inroads into an expanding market, they were well positioned to defend their hegemony and exploit the new opportunities created by the public’s increased demand for books.

Fig. 3.12
figure 12

Publishing and reprint patterns by publisher role in the printing sequence. This figure indicates changes in the reprint publishing patterns by different publishers over time and charts out the “fluidity” of the book trade. Each work had its publishers explored in chronological sequence to find out how often the publications changed hands. New publications (“New work”) and new editions of the same works by the same publishers (“Stable publisher”) were traced. Publications changing hands were traced both in the cases where the previous publisher disappeared from the book trade (“New publisher, old inactive”) and where the publication changed hands, but the previous publisher stayed active (“New publisher, old active”). Cases of the publication returning to the hands of a previous owner are relatively rare (“Return of earlier publisher”)

It is also known that, after 1774, other publishing monopolies, such as those covering almanacs, grammars, and law books, were under attack.Footnote 76 This highlights the importance of specialization within the trade and the significance of printed materials outside the subject-topics generally seen as canonical. When looking at the division between these categories, we can see that many publishers specialized in relatively limited subject-topics (Fig. 3.13). At the same time, new subject-topics were clearly part of a broader portfolio of publications. This means that we have different types of publishers throughout the early modern era: those who specialized and those who published in response to demand. Literature, religion, social sciences, and, to some extent, information and general works (almanacs and the like) appear to be fields where the market was large enough to accommodate specialization, and a limited group of well-established publishers were occasionally able to dominate these markets based on monopoly rights. A good example of this phenomenon is the single most voluminous publisher of the time, the Stationers’ Company of London, which dominated the information and general works publishing market of eighteenth-century London.

Fig. 3.13
figure 13

Publisher subject-topic specialization and canon share. This figure illustrates the differences and similarities in the publishing landscapes of the identified subject-topics (individual scatterplots). Each dot represents an individual publisher, the horizontal axis indicates the publisher’s topic specialization, the vertical axis indicates the portion of all publications by a publisher included in the canon, and the size of the dot indicates the publication volume of a publishers in a particular field. The large dot in the center right in “Information, general works” represents the Stationers’ Company

Gender and the Book Trade

While it would be exciting to claim that our data-driven approach has allowed for a reassessment of the gender imbalance (amongst other imbalances) in the history of the canon, this is not the case. As seen in Fig. 3.14, most canonical authors were male. In fact, only 1 in every 32 authors is female, and men have on average 38 reprints per work compared with only 31 by women. Within the book trade as a whole the gender imbalance is not as great, although it is still significant: 1 in roughly 14 book trade actors in our data is female.

Fig. 3.14
figure 14

Works by female authors in the data-driven canon per decade

In total, 21 authors in the canon are female, and only three of them have multiple works included: Susanna Centlivre (1667?–1723) with three, and Elizabeth Singer Rowe (1674–1737) and Hannah More (1745–1833) with two each. The remaining authors have only one work each: Hannah Glasse (1708–1770), Lady Mary Wortley Montagu (1689–1762), Anne Fisher (1719?-1778), Mary Collyer (?−1763), Madame Marie-Catherine d’Aulnoy (1650 or 1651–1705), Elizabeth Raffald (1733–1781), Madame Jeanne-Marie Leprince de Beaumont (1711–1780), Hester Chapone (1727–1801), Catherine Talbot (1721–1770), Frances Brooke (1724?–1789), Eliza Smith (died cca.1732), Frances Burney (1752–1840), Madame de Gomez (1684–1770), Madame de Graffigny (1695–1758), Elizabeth Moxon (life years not available), Mary Brook (cca. 1726–1782), Anna Letitia Barbauld (1743–1825), and Elizabeth Grey, Countess of Kent (1581–1651). There is, however, a clear growth in the number of female authors and in the number of works by female authors that make up the canon over time. In fact, if we only look at the eighteenth century—which is relevant in this case as there is only one female in the canon prior to this period, Elizabeth Grey, Countess of Kent—the split is less severe, albeit still unequal: the disparity between reprints drops by nearly two works, with one female author for every 18 male authors.

The subjects covered by female authors include domestic economy, drama, education and manners, fiction, language, miscellaneous literature, and religion. Since the number of female-authored works is quite small, it is somewhat difficult to compare their subject coverage with that of the larger number of male-authored works. However, we do see that, in general, more women write fiction, and fewer women write educational and religious works.Footnote 77


The centrality of London (and of the Stationer’s Company) to the early modern English book trade is well documented.Footnote 78 This reality is visible in the publication records extracted from the ESTC as well, with London publishing more works by a factor of ten to its closest rival, Edinburgh. What is more, the non-London-based publication industry only begins to mature toward the end of the seventeenth century (and even then, works printed in London continue to dominate local markets).Footnote 79 As we see in Fig. 3.15, the early prints outside London are coming almost entirely from Paris, with Edinburgh, Cambridge, and Oxford dominating the seventeenth century, before Dublin and North American publishers begin to emerge in the eighteenth century. 

Fig. 3.15
figure 15

Fraction of publications by place for the top publication places excluding London (1500–1800)

When it comes to the canon, however, a different geographical picture emerges, as seen in Table 3.5. There are at least two findings revealed by this data: first, the importance of particular areas as producers of canonical works, and, second, the importance of particular areas as producers of reprints of popular works.

Table 3.5 Top 10 printing locations in the whole ESTC and in the canon (1500–1800)

Regarding the former, London unsurprisingly dominates the print market. In fact, of the 30 cities which are recorded as sources of first editions of canonical works, London is responsible for 606, or roughly 84% of all titles. With regard to the movement of individual prints of canonical works, this means that more than ten times as many works originally printed in London were subsequently printed elsewhere than the other way around. Of course, London was the political, financial, and cultural capital of Britain, so these findings may not be surprising. However, what is surprising is the magnitude of this imbalance, with London dwarfing the rest of the canon. Edinburgh, the center of the Scottish Enlightenment, comes a distant second as the source for first editions that will become canonical, with 38 titles (Table 3.6).Footnote 80

Table 3.6 Locations for the first printed editions and for subsequent editions of canonical works (1500–1800)

On the other hand, there are cities that are centers for reprinting editions of canonical works which were not originally printed there. Edinburgh fits this case, with the second highest number of reprints (288), followed by its Scottish compatriots Glasgow (252) and Aberdeen (57). The key player in reprints of canonical works, however, was Dublin. While Dublin did have a number of first editions (18), these are disproportionate to the number of subsequent editions it printed (427, the most of any city). This is almost certainly due to Dublin’s privileged legal and cultural position: being beyond the reach of the British law and the Stationers’ grasp, and having a large English-speaking population, meant that Dublin’s printers were in a privileged position. And while there was certainly a local market for these works, it is also clear that Dublin was not the only intended market for these works, and many were exported to Britain and beyond. Thus, while we know that Dublin was an importer of canonical works first printed elsewhere, it was also an important exporter of these works.

A similar, yet smaller, pattern is also seen in the colonies, with Philadelphia, Boston, and New York reprinting numerous non-local works. Overall, the movement of works between Europe and the colonies remains largely unidirectional, according to the ESTC records (Fig. 3.16).Footnote 81 On the whole, however, there is a remarkable movement of various editions of these works among various locations. While books obviously traveled as any other material object would, the works themselves as less tangible things were recreated in various locations throughout the world, being reprinted by local book trade actors for commercial and intellectual profit.

Fig. 3.16
figure 16

Movement of canonical editions from original print location

Population must also be taken into consideration when looking at publication counts. When one considers the size of London compared to that of other cities at the time, its dominance may not be surprising. However, size does not appear to be the key contributing factor to the number of prints, relative to population size, that emerge from a city.

As seen in Fig. 3.17, smaller cities were capable of producing many more works than larger cities in both absolute and relative terms. The reasons for this vary. University cities like Oxford and Cambridge printed for specific niche markets, while colonial cities developed their own markets, which were less reliant on imports.Footnote 82 Dublin and Glasgow were also important reprint centers, although we can see a decline in their print output compared to London following the liberalization of the market in 1774. While we have already touched upon Dublin’s relationship to reprints, Glasgow is also worth noting given that its print history is arguably tied to the production of editions of “classics.”Footnote 83

Fig. 3.17
figure 17

Number of prints per capita (1700–1800)

Glasgow’s dominance in canonical printing (Figs. 3.17 and 3.18) can be tied to a specific printing house run by the brothers Robert and Andrew Foulis.Footnote 84 Initially elected as printers to the University of Glasgow in 1743 and charged with printing works by ancient authors, by the 1750s the Foulis brothers had turned to printing Elzeviresque editions of English classics by Milton and Gray, before engaging in mass market reprints of dubious legality. With the deaths of Andrew in 1775 and Robert in 1776, the period of Glaswegian dominance ends, although Robert’s son, also Andrew, was able to revive the business to some extent from the 1780s on, with a renewed focus on exporting books to the American market.

Fig. 3.18
figure 18

The fraction of canonical editions compared to all editions per city (1700–1800)

Importantly, their genius was found not only in whom they printed but also in how they printed. While their editions of the ancients were often expensive folios or quartos, their reprints of contemporary English poets were duodecimos: they were cheaper to print and thus more affordable to a broader audience. As Thomas F. Bonnell notes, “they had crossed a divide from an old world of monumental scholarly and typographical ventures devoted to ancient Greek and Latin texts into a new world of selling multi-volume collections of modern vernacular classics to a larger and more diverse readership.”Footnote 85 It is this transformation of the print trade which we turn to next.


There is one further aspect of print that is worth taking into consideration when thinking about the canon: materiality, or the composition of a printed work, such as page size, format, number of pages, and print area.Footnote 86 This is, again, an aspect of print which is often overlooked when one engages with historic works quantitatively.Footnote 87 However, the material attributes of a printed work have meaning to a reader, as made explicit by Joseph Addison in issue 529 of The Spectator: “I have observed that the Author of a Folio, in all Companies and Conversations, sets himself above the Author of a Quarto; the Author of a Quarto above the Author of an Octavo; and so on, by a gradual Descent and Subordination, to an Author in Twenty Fours.”Footnote 88 In other words, within the materiality of a work, there were ingrained literary and social signifiers that contemporaries would have been familiar with.

When evaluating the most dominant book formats attached to subject-topics over time, the significance of the smaller book format emerges. In particular, we notice the growth of the octavo and duodecimo formats (Fig. 3.19), which were more portable, easily fitting in one’s pocket, and thus more easily perused or read at different occasions. While the trend can be seen across all subject-topics, it is particularly visible in the case of literary and philosophical works. Legal and public administration books, on the other hand, were slower to change, remaining in folio for much longer. Interestingly, we have also noticed a regional variation with respect to preferred formats, with duodecimo being the most popular format in the New World.

Fig. 3.19
figure 19

Dominant book formats for the most frequent subject-topics

When looking at the changing materiality of the canon with respect to some of the most popular works published between 1500 and 1800, we notice that early publishing mixed folio, quarto, and octavo formats. This trend continues, in some cases, until the end of the eighteenth century (see, for instance, Paradise Lost, in Fig. 3.20), but with time the octavo and duodecimo formats become dominant (see Aesop’s Fables and Short Introduction to Grammar in Fig. 3.20). The choice of format, therefore, is tied to complex relations among economic viability, perceived importance, and pragmatics. A folio edition of Milton, for example, was a worthy and desirable endeavor in the late eighteenth century, while a grammar was much less likely to be imbued with the same subjective value and was, therefore, more convenient to own in a smaller format. Thus, this type of analysis allows us not only to touch upon the realities of printing practices but also to gain insight into the changing preferences of the public with respect to materiality and early modern reading habits.Footnote 89 At the same time, when looking at the overall paper consumption for different formats of books included in the canon, we can see that octavo and duodecimo formats started to dominate the book printing market only in the second half of the eighteenth century (Fig. 3.21).

Fig. 3.20
figure 20

The distribution of common book formats for selected canon-works over time

Fig. 3.21
figure 21

Estimated paper consumption for different formats over time for books included in the canon


The goal of this study has been to extract from the ESTC a data-driven canon which could be used to demonstrate that quantitative investigations of this type are valuable for historical research. While quantitative analyses of the history of the book trade exist, there has been no attempt so far to engage with the complex process of canon formation at such a large scale.Footnote 90

To this effect, we have constructed a method for extracting a list of “canonical” works from the ESTC based on three publication features: count, frequency, and longevity. We have thus generated a data-driven list of canonical works that considers subject-topics, top-works, authors and publishers, publication place, and materiality from a historical perspective. While we believe this quantitative approach is in itself a methodological contribution worth reporting, it also allows us to make a number of historical claims worth studying further.

At the same time, it is important to recognize the limitations of this type of analysis. Data reliability, representativeness, and completeness will improve over time, and this will influence all quantitative estimates derived from the data. Algorithmic questions, such as the exact definition of the canon index or genre classification, and choices made in model parameters, such as investigated time window, will also affect this analysis. From our perspective, however, making such interpretations explicit allows one to evaluate these choices and propose alternative solutions. The reproducibility of the analysis will then allow us (and others) to explore how sensitive the qualitative conclusions are to different analytical choices. Here, however, we have primarily focused on broad historical patterns and trends that are expected to remain stable to variations in data and algorithmic details.

When examining the early modern English canon from this data-driven perspective, it becomes obvious that an epistemological shift takes place during the late seventeenth century-early eighteenth century, when religious works lose their dominant position within the canon and are increasingly replaced by literary works (Figs. 3.4, 3.5, 3.7, and 3.8). Although literature in all its forms was historically an important part of the canon, changes in its production and consumption allowed for its growth in the eighteenth century.Footnote 91

Additionally, this analysis allows us to highlight the essential role played by the publisher in the process of canon formation, besides that of the literary critic. In particular, the role of the elite, London-based publishers (Fig. 3.11) and that of the arguably more dubious printers operating outside of London (Table 3.5, Figs. 3.17 and 3.18) become evident. Overall, we can now visualize the lengthy and arduous process of a work becoming canonical. While the total number of publications grew exponentially during the latter half of the eighteenth century, the distribution of canonical works remained relatively stable in comparison (Fig. 3.1), which indicates that the works that are most often reprinted over long stretches of time are comparatively few. This is perhaps a finding worth reflecting on further: this description of large-scale cultural production and competition within the literary market, which directs the canonization process, may allow scholars of the period to extrapolate further and use these statistics to develop prediction models for an author’s or a work’s likelihood of becoming canonical.

The broader claim of this chapter is that the development of the print market as a cultural producer has driven the changes we are able to witness in the ESTC when studied in a data-driven manner. This builds on previous work by Bourdieu, Anderson, and Habermas, who tied print capitalism to historical, cultural, political, and social changes.Footnote 92 Our contribution is to apply quantitative methods to demonstrate the accuracy of these qualitatively-grounded studies in a manner which has not been attempted before.

As this analysis demonstrates, what Wendell V. Harris wrote more than thirty years ago is truer today than ever: “The ‘canon question’ … proves much more complex than contemporary ideological criticism admits.”Footnote 93 While large-scale cultural production is certainly a key factor in the canon-making process, were it to be taken as the only factor, we would dismiss numerous individual voices and would offer yet another version of the revisionist approach to canon formation. There is no such thing as an “absolute” canon, only different takes on it. While it is inevitable that different works matter in different ways, our main concern in this study is with the impact of print culture on canon formation. By considering these recorded works and their historical availability over extended periods of time, we hope to offer a more nuanced understanding not only of the history of the book trade but also of the cultural context from which it emerged.