1 Introduction

Why on earth should the study of space configurations and their structure be beneficial when dealing with meaning in language? - Dominic Widdows, Geometry and Meaning, 2004.

The aim of this contribution is to systematically describe, using a multi-dimensional conceptual space and a set of descriptive terms for recurring scenarios within that conceptual space, a specific research practice in the field of Digital Humanities (DH), a mode that, broadly defined, could be be termed repetitive research (RR).

Firstly, the mode of research I’d like to describe is one that repeats, in the sense that studies following this mode actively seek to align their research questions or hypotheses, their datasets and/or their methods of analysis, with research practiced and published earlier. This is done with the explicit aim to approximate an earlier study, but conscious also of the fact that perfectly identical repetition is virtually impossible to achieve. In many cases - and this might be a particularity of this kind of research in the humanities, where influential research may remain relevant for many decades -, this also means that the earlier research that is to be repeated has been practiced within the non-computational, or at least in the non-digital, paradigm. Any attempt at exact repetition can in fact only, in such a scenario, result in a reenactment and an approximation of earlier research.

Secondly, this mode of research is repeatable, in the sense that it (typically) makes all the efforts it can to provide the data, code, and explanatory information that make it possible for others, at a later point in time, to perform the same (or very similar) research again. Of course, an earlier study that aimed to be repeatable will be more amenable to a later study repeating it than one that did not consider this to be an important goal. In that sense, not only does repeatability foster repetition, but also the other way around; both issues are in fact two sides of the same coin. Actually, most research that repeated earlier research also aims to be repeatable, because this kind of repeatability, closely related to transparency and sustainability, is something that researchers come to value when they practice the hardships of repeating earlier research.

Repetitive research has important functions in the research process and makes essential contributions to the production of knowledge that are closely related to core values of the scholarly enterprise such as reliability, trustworthiness, transparency and sustainability, as discussed further below. As formulated elsewhere: “In this way, this kind of research is located between past and future: a (never identical) reenactment of past research, and an invitation for (never identical) further reenactments in the future. This mode of research is practiced with the conviction, or at least in the hope, that this cycle of repetitions is not a futile treading in the same place, but a productive, insightful upwards spiral.” (Schöch, 2023b)

In terms of the scope of this contribution’s argument, the examples and use cases employed in the present contribution for illustration and clarification pertain to the growing subfield within DH called Computational Literary Studies (CLS). Most of the arguments with respect to the role of data and methods in RR are valid primarily for research that operates with datasets that represent the domain being investigated as well as with algorithmic implementations of the method of analysis. This means that the basic ideas are likely to be applicable only to other fields within the Digital Humanities that use algorithmic methods applied to evidence available in the form of digital data, sometimes collectively called Computational Humanities. Such studies are most amenable to RR but clearly do not represent all or even most of research in DH.

The remainder of this paper is structured as follows. In Section 2, a trailblazing example of RR is provided as a first way of approaching the issue. Then, Section 3 briefly motivates the paper, mostly by way of a defense of RR as a useful and indeed necessary answer to the limited transparency, corroboration and sustainability that characterizes much of present research. In Section 4, earlier work on the topic is discussed, with a focus on the terminological issues surrounding RR. In Section 5, a solution to this situation is proposed, based on a systematic description of the practice of RR as a multi-dimensional semantic space. From this space, one can derive not only an understanding of the structure of RR, but also a clear and well-motivated terminology for recurring scenarios within RR. In Section 6, several such scenarios are defined, discussed and illustrated. By way of a conclusion, in Section 7, some of the benefits and limitations of the proposed terminology, but also of RR more generally, are discussed.

2 Example

In 2015, Geoffrey Rockwell gave a talk at the University of Würzburg titled “Replication as a way of knowing” (Rockwell, 2015). In this talk, he presented work he had done together with Stéfan Sinclair on reenacting a classic, very early, stylometric study by Thomas C. Mendenhall. This talk is an early example of the idea of RR, not just as a practice, but as a programmatic research principle, in the domain of CLS.

The story starts in 1887, when the US-American scientist Thomas C. Mendenhall (1841-1924) published an article in Science titled: “The Characteristic Curves of Composition” (Mendenhall, 1887). His fundamental idea was that it was possible to identify the author of a text by the characteristic distribution of word lengths in his or her texts. For example, Fig. 1 shows the word length distribution plot that Mendenhall obtained for the first 1000 words of the novel Oliver Twist by British 19-century author Charles Dickens.

Fig. 1
figure 1

Plot from Thomas C. Mendenhall’s study “The Characteristic Curves of Composition”, here showing the word length distribution for the first 1000 words of Charles Dicken’s novel Oliver Twist

Fig. 2
figure 2

Plot obtained by Sinclair and Rockwell for the word length distribution for Charles Dicken’s novel Oliver Twist, based on the first 1000 words

In their repeating study, Stéfan Sinclair and Geoffrey Rockwell started out with the idea to implement Mendenhall’s study once more, but using digital texts and some simple algorithms (Sinclair & Rockwell, 2015). When they did so, they obtained nearly identical results (Fig. 2). Then, however, they repeated the same analysis for the entire novel and now obtained a distribution that, despite being recognizably related, clearly deviated from Mendenhall’s results as an effect of the much longer text being analysed (Fig. 3).

Fig. 3
figure 3

Plot obtained by Sinclair and Rockwell for the word length distribution for Charles Dicken’s novel Oliver Twist, based on the entire novel

Many of the typical properties of RR are already present in this seemingly simple study. The starting point is a more or less famous example of early quantitative (in this case of course non-digital) research. We find a close (but most likely not perfect) alignment in terms of the data: the digital edition Rockwell and Sinclair used (from Project Gutenberg) is likely to be based on a print edition that is roughly contemporary, and therefore very similar, to the one Mendenhall used. Given that the process of scanning and text recognition is likely to introduce errors and deviations, even if exactly same edition had been used, it seems fair to assume that the data is not strictly identical (at the level of the character sequence). Also, given that this repetition involves crossing the analog-digital domain, it shares the basic methodological approach of determining the number of words of any given length, but makes use of a functionally-identical, but fundamentally different, algorithmic implementation of this method: Mendenhall must have tabulated and visualized this information by hand, whereas Sinclair and Rockwell have of course implemented the same process algorithmically, in Python.

In addition, we can notice efforts to approximate the earlier results, but also an attention for the slight and inevitable deviations. Note in Fig. 2, for example, that there seem to be a few words with 13 letters that Sinclair and Rockwell found but Mendenhall did not. This might be an effect of differences in the underlying text (e.g. due to OCR errors, or to slightly deviating editions of the text), of differing understandings of what an individual word form or token is (Mendenhall says nothing about this), or of a (systematic or accidental) difference in the way that Mendenhall and Sinclair/Rockwell established the word length counts. It becomes clear that the authors employ this research strategy not so much as a way of checking the work of Mendenhall for flaws or errors (as would be the case in a strict replication study primarily designed as a verification of quality and integrity, basically inconceivable in this scenario), but in order to better understand his approach and, in the process, also think about their own computational methodology. One can also see a display of the advantages of the digital paradigm in terms of scale: once the code works with 1000 words, to which Mendenhall limited his investigation for obvious pragmatic reasons, it is trivial to expand the analysis to entire novels. This second, modified step in their approach, where the additional data used is purposefully distinct but still closely related to Mendenhall’s original data, places Sinclair and Rockwell’s study even further away from a strict replication. Given that replicating was not the goal, and that Sinclair and Rockwell’s results were quite similar to those of Mendenhall, one may still say that they confirmed Mendenhall’s findings. In the terminology proposed below, this research would best be described as a (successful and relatively close) reinvestigation (of a question).

What Rockwell and Sinclair do not do, however, is follow Mendenhall’s further claims and actually try to perform authorship attribution with this kind of data. C.B. Williams had, in fact, in 1975, determined that this does not work very well. The primary reason for this is that word length count distributions appear to be at least as strongly affected by form (notably, by the distinction between texts in verse and in prose) as they are by authorship (Williams, 1975). So while Sinclair and Rockwell confirm that Mendenhall had worked with considerable accuracy, when they reimplemented his method, Williams had shown that the more general methodological conclusions regarding authorship attribution that Mendenhall had hoped to prove based on his experiments do not hold up to scrutiny.

Finally, Sinclair and Rockwell understood very well the link between repeating earlier research and making one’s research repeatable. Therefore, they made their own research processes transparent and easily repeatable by providing a Jupyter notebook that not only contains the code and the required data, but also some explanatory prose that helps others understand how the study works and how to easily run the code themselves.Footnote 1

3 Motivation

One may ask, however, why a more systematic approach to RR, in terms both of practice and theory, appears to be particularly useful today. The first motivation is the “reproducibility crisis” in a range of academic fields, as discussed further below.Footnote 2 The term has appeared around the year 2015 and has served to highlight findings from a study conducted by the Center for Open Science in Charlottesville (Open Science Collaboration, 2015). In this meta-study, the authors attempted to reproduce the results of 100 papers from several key journals in psychology, all first published in 2008. They were able to do so in just 40 percent of the cases. This was a shocking finding, despite the fact that there are many good reasons for such a result: Not only is it hard to avoid any and all so-called “questionable research practices” threatening reproducibility. The studies were also actually gathering relevant empirical data once more from scratch, not just checking the integrity of datasets and code provided. Finally, there are arguments that a certain level of non-reproducibility is to be expected as a matter of course (Hedges, 2019; Bird, 2021; Grieve, 2021). Still, a fundamental principle of the scientific method appears questioned by these results. The argument here is not that research as practiced today is not connected to the past and is not open to the future. Of course, we all read, learn from and quote earlier research extensively and hope that our own research will be read and quoted by others in the future. Research as practiced today is also not all unsound. In fact, it is an open question whether a field like DH can and should even be held to the kinds of standards relevant in the natural sciences (see, e.g., Peels & Bouter, 2018; Peels , 2019; Penders et al. , 2019). Different answers probably apply to different areas within DH, with the more immediately computational areas such as CLS more closely aligned with RR than other areas. But it seems clear that RR can be an important aspect to make sure our research is well-integrated into the research tradition and will be useful, trusted and well-received in the future.

The second motivation, more specific to DH, are fears of a disconnect between DH and the established Humanities. There is an ongoing discourse expressing fears of a disconnect or a mismatch between the contextualizing, interpretive, detail-oriented modes of research typical of disciplines in the Humanities, on the one hand, and formalizing, quantitative approaches in DH, on the other (e.g. Marche , 2012; Eyers , 2013; Da , 2019. One of the main reasons for this fear is the assumption that, for instance, literary texts, historical documents or cultural artefacts are simply not suitable for elucidation through formal modeling, algorithms and computational analysis, because they are complex, semiotic, non-deterministic, contextual artefacts. Inversely, this would mean that CLS and other similar subfields within DH that ultimately rely on counting surface features, can only ask and answer a specific set of questions (such as authorship attribution) but are unable to make meaningful contributions to established, qualitative research in these fields (such as literary history or textual interpretation). This is of course a highly disputed position and the paradigm of RR, in particular, with its explicit focus on modeling and operationalisation across an earlier and a later study, can be a useful bridge between established, qualitative research on the one hand, and computational or quantitative research, on the other.

The third motivating factor is the most closely connected to CLS: it is the 2019 paper by Nan Z. Da in the influential journal Critical Inquiry, with the title “The Computational Case Against Computational Literary Studies” (Da, 2019). The author basically argues that computational, quantitative approaches are fundamentally unsuited for investigations into literary texts. And she argues that the selection of studies she looked at either had statistically solid results that were meaningless, or seemingly meaningful results that were not statistically sound. This paper is highly problematic and has been commented on and criticized extensively, for a large number of good reasons.Footnote 3 However, it also points to some serious and relevant challenges for the field of CLS: notably, the difficulty to reproduce work in this field, starting with issues of access to data and code, but also concerning questions of lacking reporting standards, limited scholarly recognition, and missing community commitment and capacity that would all be needed to foster a culture of RR in CLS and beyond Romero (2018).

4 Earlier work

Moving on from an understanding of the general relevance of RR, it appears useful to now consider more closely two key issues in selected contributions to the rather large body of existing conceptual work on RR: first, regarding terminological discussions around the definitions of repeatability, reproducibility, reproduction, replication, reanalysis and others; and second, regarding the purposes, functions and epistemological added value of RR in the research process.

The disciplinary focus of the present paper notwithstanding, the principle of RR is of course not limited to this field, quite the contrary. The “reproducibility crisis” (Baker, 2016) has first become very visible in fields such as social psychology, biology and medicine (e.g. Open Science Collaboration , 2015; Freedman & Inglese, 2014; Hunter , 2017) and has more recently become a major issue in Artificial Intelligence, Natural Language Processing and Linguistics (e.g. Hutson , 2018; Cohen et al. , 2018; Porte & McManus, 2019; Belz et al. , 2021; Berez-Kroeker et al. , 2022), three fields closely related to CLS. In recent years, the issue has also begun to be discussed in the (Digital) Humanities (e.g. KNAW , 2018; Peels & Bouter, 2018; Schöch et al. , 2018; Herrmann et al. , 2023), with the most prominent and controversial work applying replication in CLS certainly being the one, already mentioned, by Da (2019). However, the debate about the definition of relevant terms has occurred mostly outside the domain of DH.

The terminological situation, which can certainly be described as complex and confusing, is in itself another motivating factor for this paper, but earlier efforts to get to grips with the conceptual space and the terminology also present a substantial learning opportunity. The fact that the terminology has developed over time and often in parallel in different fields is part of the reason why it has become so unwieldy. Researchers have variously noted this, for example Goodman and colleagues, who note that “the language and conceptual framework of ‘research reproducibility’ are nonstandard and unsettled across the sciences" (Goodman et al. , 2016, 1). Cohen and colleagues remark that, in addition to using terms with varying definitions, “it is not uncommon to see reproducibility and replicability or repeatability used interchangeably in the same paper” (Cohen et al. , 2018, 156). Hans Plesser describes his contribution as “a history of a confused terminology” (Plesser , 2018, 1). As Cohen and colleagues also note, “[t]his lack of consensus [in] definitions related to reproducibility is a problem because without them, we cannot compare studies of reproducibility (Cohen et al. , 2018, 156).

Hans Plesser has summarized the “Claerbout terminology”, in which reproducing means the exact repetition of earlier research (where data, implementation and results should be identical), whereas replicating means using a new implementation and reaching similar conclusions (Plesser, 2018).Footnote 4 Plesser also explains that this is at odds with terminology in the sciences, for instance in Chemistry, where repeatability is understood as exact repetition within a lab under identical conditions (to measure within-run precision), and reproducibility means repetition of the same experiment, but performed by a different person in a different lab with different conditions (to measure between-run precision, so robustness). Since 2020, the ACM uses repeatability for the exact repetition of the same experimental setup by the same team, reproducibility for repetition of the same experimental setup by a different team, and replicability for repetition of a different experimental setup by a different team, swapping the latter two terms relative to earlier ACM usage (ACM, 2020). The ACM terminology, however, does not differentiate between several distinct aspects of the experimental setup, notably concerning the dataset and the method of analysis.

The contribution by Prasad Patil and colleagues is helpful not so much for the clarification of the terminology, but rather for a clear understanding of the conceptual space around reproducibility and replication (Patil et al., 2016). They address this issue by identifying a relatively large number of features (around a dozen) that can be used to describe the relationship between an earlier, original study and a later, repeating study. However, they do not attempt to directly relate the terms they propose to these descriptive dimensions.

Another contribution that explicitly aims to achieve a clarification of the terminological confusion is by Goodman et al. (2016). Their contribution is quite systematic in the way it considers key dimensions of RR, such as data, method and results. They propose research reproducibility as the cover term and note that an essential difference between different kinds of such research practices is whether or not they use new evidence, that is newly-gathered or otherwise additional data. They also note that the terms reproducibility and replication are frequently used to mark this difference, but without consistency as to which of the two terms is used for which scenario. Their proposal is to only use the term reproducibility, but to characterize it further depending on the specific scenario, in order to distinguish methods reproducibility (the exact repetition of earlier research), results reproducibility (using additional data to corroborate earlier results) and inferential reproducibility (where the methods of analysis and the conclusions drawn from results may be different). As far as I can see, this terminology has not been adopted widely.

A further terminological tradition, proposed for example by Drummond (2009) and explained very clearly by Huber et al. (2020) in the context of Natural Language Processing (NLP), is the following: “We use the term replication to refer to the activity of running the same code on the same dataset with the aim of producing the same (or sufficiently similar) measurements presented in the original paper. We use the term reproduction to refer to the activity of verifying the claims with experimental settings that are different from the ones in the original paper. For NLP experiments, this typically means reimplementation of the method(s) and the use of different datasets and/or languages” (Huber et al. , 2020, 5604). In another contribution to the terminological debate within NLP, Cohen et al. (2018) follow this terminological tradition, but use reproducibility also as a relatively broad cover term.Footnote 5 Another terminological strand using replication as terminological point of departure is the one found in applied linguistics, where strict, approximate and conceptual replication are sometimes distinguished (see e.g. Porte and McManus , 2019).

What also becomes clear from the research literature is that there are not only different kinds of RR, but that they also have different functions in the research process or make different contributions to the production of knowledge. Indeed, these functions and contributions are closely related to core values of the scholarly enterprise such as reliability, trustworthiness, transparency, sustainability and progress, fundamental  not only in the context of Open Science. As formulated by Freedman and Inglese: “Research advances build upon the validity and reproducibility of previously published data and findings” (Freedman & Inglese, 2014). As a consequence, functions of RR mentioned in the research literature include the “verification of a previously observed finding” (Gomez et al., 2010); “enhanc[ing] the reliability of research” Gomez et al. (2010); the “corroboration” of earlier evidence, claims or findings (Babin et al. , 2021; Goodman et al. , 2016); converting “tentative belief to accepted knowledge” (Berthon et al., 2002), also in a Bayesian perspective of accumulating evidence (Sprenger, 2019); establishing the degree of robustness and/or generalizability of results (Goodman et al., 2016); and verifying the quality and integrity of research, for example with respect to identifying “(i) fraud, falsification, and plagiarism, (ii) questionable research practices, partly due to unhealthy research systems with perverse publication incentives, (iii) human error, (iv) changes in conditions and circumstances, (v) lack of effective peer review, and (vi) lack of rigor” (Peels , 2019, 1). In summary, one may say with the authors of the Open Science Collaboration report: “Replication can increase certainty when findings are reproduced and promote innovation when they are not” (Open Science Collaboration , 2015, 943). Being aware of these important functions may also help address what has been called a “publication bias” in replication studies, where (among other effects) research corroborating earlier work has a lesser chance of being submitted and published than work revising earlier conclusions (Berinsky et al., 2021).

Several conclusions can be drawn from this brief discussion: First of all, given that NLP is a field closely related to CLS, and for further reasons explained below, I propose to accept the terminology in the tradition of Drummond, where replication refers to the exact repetition of earlier research, adding further terms as necessary to describe research scenarios that differ, in some defined way, from replication. Second, it appears useful to use a limited number of relevant dimensions to describe the conceptual space of RR, notably research question, dataset, method, team, and results. And third, the discussion of the dimensions of the conceptual space and the distinct scenarios of RR should not only describe them in terms of their position in the conceptual space, but also in terms of their functions or purpose in the research process.

5 The conceptual space of RR

In this section, I would like to propose a typology of and terminology for RR that is based on a simplifying but useful multi-dimensional description of the conceptual space of RR. Describing the conceptual space in a systematic manner helps clarify which aspects of RR are relevant and what the meaning is of any term proposed in the terminology, because its meaning can be described by specifying the subspace in the overall conceptual space that is covered by the term. It also shows the similarities and differences as well as the distances between the meaning of specific pairs of terms and, of course, the relationships between studies that can be described using those terms.

The fundamental assumption behind this description of the conceptual space is that any research can be described, not fully of course, but fundamentally and usefully in the context of RR, by five dimensions:

  • (Q) The key research question being studied (or the key hypothesis or claim to be verified)

  • (D) The dataset used (or more generally, the empirical basis of enquiry)

  • (M) The research method employed (and its implementation, e.g. in a code-based algorithm or tool)

  • (T) The team performing the research (including, of course, the case of a one-person team)

  • (R) The result of the research (and the claims or conclusions supported by the results)

In addition, this model of the conceptual space assumes that the relationship between an earlier study that is being repeated and a later study that repeats it can, for any of these five aspects, be described as corresponding to one out of three simplifying types, conceived not so much as distinct categories, but as contiguous areas on a gradient from perfect identity to complete unrelatedness:

  • (1) Identical (exactly or virtually the same)

  • (2) Similar (more or less closely related)

  • (3) Unrelated (largely dissimilar or entirely different)

It is true, of course, that assuming just five dimensions and three possible values for each dimension is a simplification. It is also true that it may not always be possible to clearly distinguish between a scenario, for instance, where the dataset is functionally identical and one where the dataset is very similar to that of an earlier study, as the three values are meant to describe a gradient rather than sharply distinct categories. However, considering that this conceptual space defines no less than 35=243 theoretically possible positions, I would argue that it provides more than enough differentiation to support the definition of a clear descriptive vocabulary that allows to characterize any repeating study with relatively little ambiguity. For some recurrent scenarios, we can define labels that function as shortcuts. For others, we may prefer to describe their exact position in the conceptual space. This gives us three ways to describe an instance of RR: We can use one of the terms proposed below, if one is applicable, for recurring scenarios of RR. We can describe a particular instance using a verbal description containing the five dimensions and three values. And we can express such a verbal description in the condensed form of a vector-like representation.

As a first illustration, we could describe Sinclair and Rockwell’s study repeating earlier research by Mendenhall (discussed above) in the following way, focusing in a first step on the initial part of their study: it pursues an identical research question or hypothesis (Q=1), uses very similar data (D=2) and a method that, while aiming to be functionally identical, is realized in quite a different manner (M=2), was performed by an entirely unrelated team (T=3) and obtained very similar, though not entirely identical, results (R=1). The vector-like representation of this scenario would therefore be RR(Q,D,M,T,R) = (1,2,2,3,1, respectively). With regard to the set of terms proposed below, this scenario is best characterized as a (rather close) reinvestigation (of the question).Footnote 6 One could further deduce from the description that the repeating study likely corroborated the earlier study because with additional relevant data, it came to a very similar conclusion to the same research question.

6 A terminology of RR

The question that arises from the description of the multi-dimensional conceptual space is how some of the recurrent scenarios within this semantic space can be delimited and labeled. This is likely to always remain controversial, but once the conceptual space is clearly defined, the terminology in fact becomes somewhat less of a matter, because it is always possible to define terms with respect to the conceptual space, or to describe a specific scenario independently of a given term. In this perspective, the terms become convenient shortcuts that have their usefulness and importance, but that are ultimately not essential.

As mentioned, 35=243 possible combinations or different scenarios of RR are clearly more than we would care to qualify using individual terms. One part of the terminological complexity in the field stems from the fact that, even if one were to agree on the structure of this conceptual space, it can be divided up in multiple ways; another stems from the fact that the inventory of semantically suitable terms for different areas of the conceptual space contains many terms that are very similar to each other in form and meaning and, as a consequence, have been used interchangeably at different times and in different fields (see Section 4).

However, if we set aside for a moment the dimensions four and five (team and results) - important descriptive aspects that do not require, however, to be included in distinctions between recurring scenarios of RR -, we obtain a three-dimensional conceptual space that can be visualized as a cube and that, with its three possible values in each dimension, distinguishes just 33=27 distinct subspaces (see Fig. 4).

Fig. 4
figure 4

The three-dimensional conceptual space of repeating research. - The corner at the front, bottom, left side of the cube represents the closest possible relationship between an earlier and a later study (1,1,1 - replication), whereas the corner at the back, top, right side of the cube represents the area of maximum difference between two studies (3,3,3 - unrelated research)

Within this cube, we can define specific subspaces and provide terms for them. I propose to distinguish the following terms, as shown in Table 1 with a short definition and their vector shortcut, to provide an overview before describing them in more detail in the following sections:

Table 1 Overview of terms proposed for specific subspaces of the conceptual space of repeating and repeatable research. - The vector concerns only research question (Q), dataset (D) and method of analysis (M), whereas the issues of team (T) and results (R) are left to be verbally specified in addition to each term

These terms and the scenarios of RR they designate fall into three groups: (1) Replication (of research), the only scenario that implies an exact (strictly identical or very close) repetition of earlier research; (2) reproduction (of results), revision (of method) and reinvestigation (of the question), in which the research question remains the same but the data, or the method, or both deviate to a limited degree and which can be understood to be approximate repetitions, due to the clear and close relationship that they establish between an original study and a repeating study; and (3) the remaining scenarios, reanalysis (of data), reuse (of data), reuse (of method) and follow-up research, scenarios that can still be called related research, but are further removed from the original study than the previous group, given that they deal with a differing research question and include additional differences. The exact subspace that each term covers as well as the best term for a given subspace may be subject to debate and revision, of course; a debate that is, however, supported by the explicit conceptual space that underpins the terminology.

Note that the recurring scenarios of RR described here do not exhaust the space defined by the three dimensions and three values. Certain combinations do not make a lot of sense and are unlikely to be a recurring practice in research. For example, investigating an entirely different or unrelated research question with the same data and method, while a theoretical possibility of the conceptual space (3,1,1), does not appear to be easily feasible. Similarly, the scenario where question and data are identical, but the method is unrelated (1,1,3), appears simply unrealistic and would probably rather be a case of revision (of method) (1,1,2). Finally, it is of course possible to envision research that would employ a question, data and method that are all unrelated to earlier research (3,3,3), but in the absence of any systematic relationship to earlier work, this scenario cannot be meaningfully included under the category of RR.

Note also that the terms proposed here for the recurring scenarios of repeating research, as defined in Table 1, are meant to describe the relationship between an earlier, original study and a later, repeating study. However, if the concern is to characterize a given study with respect to the degree to which it enables repetition, rather than performing it, then the corresponding terms expressing the ability to perform a certain type of RR can be used, that is replicability (of research), reproducibility (of results), reanalysability (of data) and reusability (of data or code). More than one term may be used to describe a given study, in this case, for example to say that the reusability of its data is high (e.g. because the data is encoded in a widespread data format and is well-documented), but the replicability of the research as a whole is expected to be low (e.g. because the code has not been provided). Finally, the terms or values, respectively, may be further characterized by describing a study, for instance, as being a ‘close reproduction’ or a dataset as being ‘broadly similar’, etc.

The following discussions of the scenarios describe each scenario’s location in the conceptual space, including by way of a visualization, argue what constitutes a successful repetition in that scenario, comment on the functions or purpose of this scenario in the research process, name the requirements for enabling this scenario and provide one or several examples of this scenario from CLS.

6.1 Replication (of research)

Within the terminology proposed here, the term replication (of research) designates practices of RR in which the research question, the dataset and the method of analysis of the repeating study are all identical (or virtually identical) to the original study (1,1,1) (see Fig. 5). The term can be used irrespective of the team and the results. The term replication is preferred for this configuration of strict repetition without any significant modification because it is etymologically related to Italian ‘replica’ (copy, repetition) and Latin ‘replicare’ (to duplicate, to repeat) and because this is the accepted meaning of the term in Natural Language Processing, a field closely related to CLS (as discussed in Section 4).

Fig. 5
figure 5

The scenario replication (of research) (1,1,1) within the conceptual space of RR. - Note that the boundaries are meant to be fuzzy, and that even a single cell within the cube encompasses a range of degrees of similarity and difference

Within a strict replication, only if the results of the repeating study are exactly identical to the original study, can such a replication be said to be successful, while all other outcomes would signal some flaw in the data and/or the code and would mean the replication was unsuccessful (but see remarks below). In this sense, the purpose of replication is to check the integrity and correctness of the code when applied to the data. Generally speaking, then, such a replication does not add new knowledge about the domain or research question but rather serves as a quality check. If the team performing the replication is identical or very similar, for example coming from the same research group, the function of this replication would amount to an internal check of the research as a whole, ensuring for example the completeness of the dataset, the functionality of the implementation and the correct match between data and code. If the team is unrelated, for example when a replication is performed as part of a peer-review process, the function of this replication would be quite similar, but would likely also be helpful in order to ensure the completeness of documentation that allows a third party to run the code on the dataset; in addition, it would serve to ensure the integrity of the research process in the sense of verifying that no manual manipulation has occurred in the original study that would affect reported numerical results or data visualisations. Compared to other scenarios, strict replication, then, is important for quality assurance rather than for advancing knowledge.

Note, however, that even within this relatively narrow scenario, some degree of variation is conceivable. In any case, entirely exact replication will frequently not be feasible, and strictly identical results can therefore often not be expected. The implementation and the results obtained from applying it to the data may deviate from the original results, for example based on the properties of the statistical and/or probabilistic methods employed in it or because the algorithms are run in an environment with fundamental technical properties related to hardware and/or operating system that are distinct from the original study. In any case, the greatest epistemological value lies not in strict replication, but in subtle and controlled departures from the original study design, as also noted, e.g. by Porte and McManus (2019) or Hedges (2019) and as described in the following scenarios. Research designed for repetition can also purposefully enable such insight through opening up paths for controlled deviations, for example by making it easy and explicit, in the code, to explore alternative choices with respect e.g. to selection of data or features, parameters for preprocessing, or use of specific measures or calculations.

The requirements for enabling replication (of research) are quite high, as the complete dataset and all code need to be available in a form that allows running the code without modifications or new implementation. It can be noted that providing data (and code) with publications - supporting both transparency, replicability and sustainability - is becoming increasingly common, and sometimes even expected, in CLS.Footnote 7 If data and code are available and can be run, then performing the replication itself is comparatively trivial. This usually requires some degree of documentation, although at minimum or in a first step, a replication can treat the code and data as a black box, focusing on the comparison of the results reported in the original study and those obtained by running the code on the data once more. However, as soon as something does not quite work as expected, documentation and insight into the code and data become crucial, of course. Also, sustainability is an issue here, when underlying libraries and environments become incompatible over time. Depending on how strict the replication is meant to be, specific versions of operating systems, programming languages and packages may need to be used, or data and code may even need to be packaged within a containerized environment Börner et al. (2023).Footnote 8

The study by Nan Z. Da mentioned above can be understood to have been intended, in part, as a replication study, although it is not quite clear from the paper and the relevant Github repositories to what extent individual studies were actually replicated in their entirety. It appears that for the most part, either data or code, or both, were not available and a full replication of the research was therefore not possible. In any case, such complete replications have not been documented online.Footnote 9 As mentioned, another scenario where strict replication happens is either as a project-internal quality-assurance measure or, more formally, as part of the peer-review process. However, a full replication step in the peer-review process is still the exception in DH-related journals and in most cases, there would be no publicly available record of this process, except in an open peer-review scenario.Footnote 10

6.2 Reproduction (of results)

Another recurring configuration is the one in which research question and method of analysis remain identical, with respect to the original study, but in which the dataset, instead of also being identical, may be more or less similar, but not unrelated (1,2,1). (See this and the following scenarios, all instances of approximate repetition departing from replication in a particular way, in Fig. 6). Again, this is independent of the team and the results. The term that appears best suited to describe this configuration is reproduction (of results), because this is a frequent scenario and the term is well-established, albeit not always in this precise meaning. In addition, one may motivate it by its etymology, in the sense that it is derived from biological reproduction which, at least in the case of sexual reproduction, does not imply the production of organisms that are identical to their parents, but rather the production of somewhat modified organisms.

Fig. 6
figure 6

Three scenarios of approximate repetition within the conceptual space of RR. - Reproduction (of research), revision (of method) and reinvestigation (of the question) are distinct ways of a limited departure from the replication scenario

In this scenario, the function or purpose is to verify whether or not, with another set of similar (i.e., relevant but not identical) data, the results of the original study can be confirmed. If the reproduction of the results is successful, then the results of the original study are corroborated in the sense of that the initial results, being valid across more than one dataset, are in this way shown to be valid more generally or more broadly than if they hold true only for one particular dataset. Such a success is also a confirmation that the method of analysis is robust across multiple datasets or that a theory or claim holds up even when it is tested on another relevant dataset (on theory testing, see Brendel et al. , 2021). The more dissimilar the data, the stronger the corroboration and generalizability, because the results can be understood to be more robust and supported by more evidence.

The requirements for enabling reproduction (of results) are relatively high, in the sense that either the original implementation of the method needs to be available in order for a (very close) reproduction to be able to reuse it, or the method needs to be documented with sufficient detail and precision for a functionally-identical reimplementation and thus a somewhat looser reproduction to be possible. In addition, because it is important to be able to determine the exact degree of similarity between the dataset of the original study and that of the repeating study, in order to correctly interpret any differences in results, the dataset needs to be available or at least described in sufficient detail.

This scenario appears to be frequent in the sciences, for example in the famous case of the Open Science Collaboration cited above (Open Science Collaboration, 2015), where new but equivalent data was obtained empirically in order to verify whether applying the same analysis to the data would lead to a confirmation of the results obtained in the original studies. In CLS, where we have very different conditions for obtaining new but equivalent data, studies that can be described as a reproduction (of results) occur especially in the context of the development and evaluation of methods. For example, in stylometric authorship attribution, methods and measures have typically been proposed using English-language corpora (classic cases being Burrows’ Delta and Zeta, see Burrows , 2002, Burrows , 2007). There are quite a number of studies implementing these measures as closely as possible (given that Burrows described the measures, but did not provide a code-based implementation) or reusing a reference implementation in a tool such as stylo (Eder, 2016) and evaluating the measures using additional datasets containing different literary genres and/or different languages, both for Delta and Zeta (see Hoover , 2004; Rybicki & Eder, 2011; Evert et al. , 2017; Craig & Kinney, 2009; Schöch et al. , 2018).Footnote 11 A different example of such a study is Du (2023), in which the author performed an identical evaluation study twice, once with a dataset consisting of newspaper texts, and once more with a comparable dataset that, however, consists of literary texts. The purpose of this approach is clearly to reach increased generalizability of the results.

6.3 Revision (of method)

This scenario involves an identical research question and an identical dataset, but the use of a method of analysis that is only more or less similar, rather than identical, to the original one. This scenario can be termed revision (of method), whether it concerns a functionally-similar reimplementation of an earlier method or a revised but closely-related method or measure. Again, this is irrespective of the team or the results. The idea here is to use a similar method to investigate the same question using the same data, as a way of verifying whether or not it is possible to arrive at the same conclusions using a new, but functionally-similar implementation or a different, possibly superior, version of the same method of inquiry.

The requirements for enabling such a revision of an earlier method are similar to those for reanalysis (of data) described below, namely that the dataset be made available. In addition, it is essential in this scenario that the research question be defined with considerable precision, so that exactly the same issue can be investigated. In case the method used in the original study was not implemented algorithmically or the code is not made available, a very detailed, step-by-step description of the method is required for an equivalent reimplementation to be feasible.

The purpose of a revision of the method using the same data is to investigate the robustness of earlier results or to propose an improved method. If a non-identical but similar method (for example involving a distinct but related statistical measure) investigating the same features of the same data nevertheless produces the same findings or solves an issue with comparable performance, then the results of the earlier research are corroborated. Similarly, if the reimplemented method involves the selection of dissimilar features in the same dataset, and still comes to the same results, this is again a corroboration of the earlier results. Note that in terms of systematic progress in scientific knowledge construction, the controlled departures from replication that are constituted by reproduction and revision are probably the most useful approaches. Scenarios that depart from the original study in more than one dimension at a time make it harder to clearly identify the source of any differences in results.

Papers in stylometric authorship attribution proposing a new measure of textual similarity (that follows the same distance-based methodological paradigm) and using one or several datasets for evaluation identical to those used in earlier research on similar methods are good examples of revision (of method). A surprisingly rare example of such a systematic study is Smith and Aldridge (2011). The purpose of conducting such a study lies, in this case, in the fact that only by reusing the same dataset can results (e.g. the performance of a classifier or an attribution based on a distance measure) be comparable across studies and differences be precisely attributed to the new measure. A very different example of RR that includes a clear example of a revision (of method) is a review of Nicholas D. Paige’s book Technologies of the Novel (Paige, 2020) published by myself (Schöch, 2023a).Footnote 12 Paige has made the full dataset used for the book publicly available and describes his analyses in detail in the book. However, he has not provided the code used to analyse and visualize the data. In a first step, most repetitions that I performed were therefore attempts to reconstruct specific plots contained in the book by developing reverse-engineered code that would functionally approximate the analysis that the author must have performed to obtain the plots, in order to verify that the plots are really based precisely on the data provided. This is probably as close to a replication as one can get when the code is not provided, but my Python code is likely to be sufficiently different from the method Paige used to generate the plots to call this a revision (of method). In a second step, I purposefully departed from the original study by using slightly different analyses of the data and proposing alternative visualizations based on the same data, which places the study firmly in the domain of revision (of method).

6.4 Reinvestigation (of the question)

The term reinvestigation (of the question) is used for the scenario where the research question is identical, but both data and method are more or less similar rather than identical, but not unrelated or radically different (1,2,2). This scenario is therefore somewhat more removed from an exact replication than both reproduction (of results) (where only the data is not identical) and revision (of method) (where only the method is not identical), but because the research question is still the same, it is assigned to the group of approximate repetition.

This is a rather frequent scenario, despite the fact that its purpose and usefulness are somewhat complicated by the fact that, with the repeating study differing from the original study in two key factors rather than just one, differences in results are difficult to relate specifically to either one of these two factors. This means it can neither serve as a quality check nor to evaluate a new method or to support generalization. However, it still allows to approximate an earlier study and to acquire further information on the earlier study’s research question. The reason for this scenario’s frequency is most likely pragmatic: Indeed, with proper benchmarking datasets still being rare in CLS, both the original data and the original method are often not readily available for reuse. In such cases, they have to be reconstructed with more or less accuracy from verbal descriptions or documentation. Also, a new dataset is often of greater interest to the researchers than earlier ones, for example because of its language or genre. In such cases, a strict reproduction (of results) or a revision (of method) is excluded.

As spelled out above, the study by Rockwell and Sinclair presented initially falls into this category. This approach is also typical of studies proposing, for example, a new method of stylometric authorship attribution and evaluate it on a new dataset, rather than on a dataset that had already been used earlier in similar authorship attribution studies. An example is Evert et al. (2017), who used datasets previously not analysed in distance-based authorship attribution evaluation. The recommended best practice here would be to perform three scenarios, when introducing a new method or measure: test the earlier method on a dataset used in earlier work, to prove the equivalence of one’s own implementation (replication); test the new method on the earlier dataset, for a comparison of performance (revision); and test the new method on a new dataset, to demonstrate the generalizability of the new method’s usefulness. A rare example actually following this best practice is, again, (Smith & Aldridge, 2011).

Outside of the development of measures and methods for authorship attribution, a study by a French team on new methods for direct speech recognition reusing but also modifying multiple earlier, suitably-annotated literary corpora falls into this scenario (Durandard et al., 2023). Also, my own study on sentence length in books by Belgian writer Georges Simenon, and in books by contemporary authors (Schöch, 2016), can be understood as an attempt to repeat earlier work on sentence length using a non-digital but clearly quantitative approach by Richaudeau (1982). The research question was the same, namely, whether Simenon’s success could be explained by his use of particularly short sentences. In a first step, I aimed to approximate the corpus of the original study, with limited success, then used a substantially expanded corpus of texts. Also, with the (manual) method of establishing sentence length not documented in detail by Richaudeau, my algorithmic procedure for this task was almost certainly quite different, even if the basic methodological approach was still fundamentally the same. This means the study combined a relatively close reinvestigation and a somewhat looser one.

6.5 Reanalysis (of data)

The scenario I propose to call reanalysis (of data) differs from revision (of method) in that in addition to the method of analysis of the repeating study not being identical to the original study’s method, the question being investigated can also diverge to some degree, while the data remains the same (2,1,2). In fact, a changing method of analysis may of course induce the research question to shift to some extent, whether intended or not by the researchers conducting the reanalysis. The term reanalysis appears fitting because the data remains the same and the overall perspective is still closely related to the earlier study. In contrast to this scenario, I propose the term reuse (of data) (described in Section 6.6) for a scenario in which question and method are clearly distinct from the original context and the same dataset is hence used for a largely unrelated purpose. (For a visual overview of three of the scenarios grouped under the term related research, see Fig. 7).Footnote 13

Fig. 7
figure 7

Three scenarios of related research within the conceptual space of RR: reanalysis (of data), reuse (of data) and reuse (of method)

The function of reanalysis (of data) is primarily to examine a similar research question from a new but related methodological angle. If the reanalysis is successful in the sense of producing results supporting conclusions that are identical to those of the original study, then these conclusions are corroborated and their robustness is confirmed. Depending on how closely related the method used in such a reanalysis is to the original method, more or less similar results are to be expected. If the results turn out to be different, the reanalysis can be said to have been unsuccessful, pointing, however, to a potential flaw not only in the original study, but possibly also in the reanalysis of the data. Only once this can be ruled out would a differing result point to a potential flaw in the original study’s method and results.

The requirements for enabling reanalysis (of data) are clearly lower than for replication (of research), because strictly speaking, only the dataset needs to be available in identical form, whereas the other aspects of the research will deviate from the original study in any case (but see the remarks on reusable datasets in Section 6.6.

The practice of reanalysis is typically based on well-established datasets or corpora that have been available for a considerable amount of time. For example, there is a considerable number of studies broadly concerned with distinctions of text types or genres using the Brown Corpus (Francis & Kucera, 1979) but employing a wide range of more or less similar methods and approaches for a range of distinct, albeit related, research questions. For example, Karlgren and Cutting (1994) used discriminant analysis to support a classification mechanism for genres, with a focus on identifying discriminatory features. Some years later, Kessler et al. (1997) again used the Brown Corpus for genre classification, but using a different set of features and classification methods (logistic regression and neural networks), with a focus on classification accuracy. More recently, Kazmi (2022) again used the Brown Corpus for genre classification, but focusing on the fiction/non-fiction distinction and using logistic regression as the key method. Each time, the same dataset is analysed with a similar question and a related method, as well as with various degrees of success.

6.6 Reuse (of data)

For the scenario where the dataset used in a later study is identical to an earlier study, but the research question and method of analysis are very different or unrelated, I propose the term reuse (of data) (3,1,3). This scenario is therefore even further removed from strict replication than reanalysis (of data). Note, however, the adjacent position of the two scenarios in the conceptual space and the potential overlap between the two terms.

The requirements for enabling reuse (of data) are comparatively simple: the dataset needs to be publicly available. However, as in all other cases where the dataset is required, this seemingly simple condition hides considerable complexity. Not only does the dataset need to be available, but it also needs to be understandable, interoperable, sufficiently well-documented and suitable for an inquiry into the new research question. For example, corpora need extensive metadata and a detailed documentation of the provenance, encoding and annotation of the included texts, while tabular datasets need a clear documentation of how the data was obtained and of the meaning of the various column headers.Footnote 14 As a consequence, in many cases where reuse of data is the goal, considerable effort is first invested into augmenting, cleaning, annotating or otherwise enhancing the dataset. The upside of such efforts is both the potential attention for one’s data by other researchers and a contribution to the sustainability of research.

These departures from the original study in terms of question and method are legitimate, in this scenario, because we are here in the domain of related research, where the function of reusing the data is not to check its quality or the quality of the study it was used in, but simply to save time and effort by reusing an existing dataset rather than creating a new one.

With the publication of corpora and datasets becoming increasingly common in CLS, there are many cases of reuse of data. While project-specific, research-driven datasets are sometimes not easily reused, others, in particular curation-driven corpora that are large in size, contain reliable texts, use standardized encoding, include rich metadata and provide detailed documentation, are routinely being reused. Examples for such corpora relevant to CLS include the Oxford Text Archive, the Deutsches Textarchiv, DraCor or the European Literary Text Collection (ELTeC).Footnote 15 Among the many examples of reuses of these corpora in CLS, one may mention a study of the chorus in a Spanish-language drama corpus included in DraCor (Dabrowsa & Fernández, 2020) or a study of the titles of novels across multiple languages included in the European Literary Text Collection (ELTeC) (Patras et al., 2021). Reuse of data is not limited to analysis, of course, but can also be performed when one or several existing corpora are used to create a new corpus, as in the case of the KOLIMO corpus which was created using texts from the TextGrid Digital Library, the Deutsches Textarchiv and Gutenberg-DE (Herrmann & Lauer, 2018).

6.7 Reuse (of method)

When the research question is similar, different or unrelated and the dataset is different or unrelated with respect to earlier research, and only the method or code is used in identical or very similar form, then we may speak of reuse (of method) (or of code) (3,3,1). Depending on how similar the dataset is and on how flexibly the code can be used, the research question may of necessity be more or less closely related to that of the original study.

Again, this scenario does not fall into the realm of exact or approximate repetition of earlier research, but it is nevertheless a specific mode of RR. And it does have a specific function, which is to save implementation time and increase the reliability and robustness of an implementation. In a sense, any use of a tools or software packages developed by others is reuse in this sense. Using existing tools such as stylo, MALLET or TXM (see Eder , 2016; McCallum , 2002; Heiden et al. , 2010) for one’s own purposes is, of course, a normal practice in Digital Humanities and CLS, so mentioning specific examples does not appear particularly useful in this case.

The main requirements for enabling efficient reuse (of method) are the availability of the code, software package or tool with a minimum of (financial or technical) hurdles as well as a detailed and understandable documentation.

6.8 Follow-up research

If the research question remains similar, but the dataset used as well as the method of analysis are either similar or unrelated, then I propose to use the term follow-up research (2,2-3,2-3). This relatively broad scenario is clearly in the domain of related research, rather than exact or approximate repetition, because there is a comparatively distant relationship between earlier studies and the later study in this case, linked essentially by the similar research question (see Fig. 8, covering four subcubes). If the research question is different, but either data or method are identical, then reuse (of data) or reuse (of method) are more applicable scenarios. If all three dimensions are different, there is no (relevant) relationship anymore between the earlier and the later study and the scenario falls outside the scope of RR.

Fig. 8
figure 8

The scenario follow-up research) within the conceptual space of RR. - The other scenarios of related research are provided for visual context

In terms of functions of follow-up research, a later study on the same question that obtains similar results using distinct data and methods can certainly be understood as a corroboration of earlier studies on the topic, showing that these results are robust against variations in the dataset and the method used to elucidate the question. However, the relationship between the studies being much looser than in the scenarios of approximate repetition, differing results based on a follow-up study do not necessarily indicate that anything is wrong in the earlier study.

The requirements for enabling follow-up research are comparatively low, as research question, data and method or code do not need to be and usually are not identical and therefore need not be readily available for reuse. Pushed even further, for example by focusing on a research question that is different or unrelated to any other, earlier research questions, and this scenario can be understood either to break the chain of incremental increase in our knowledge about a domain, or to break entirely new ground.

My own study repeating a classic study by stylistician Leo Spitzer about French playwright Jean Racine is an example of both reinvestigation and follow-up research (Spitzer, 1931, 1969; Schöch, 2023b). The original intent was to perform a very close reenactment of this non-computational and qualitative study using digital data and methods, but it became clear rather quickly that a strict replication was impossible and that even a revision (of method) was hardly feasible, given the analog-digital and the qualitative-quantitative divides as well as the lack of information about the exact editions used by Spitzer. As a consequence, this instance of RR proceeded in multiple steps that became increasingly distant from the original study: Things started out with a most likely very similar corpus and a closely-related but distinct method (hence, reinvestigation, 1,2,2), but ended with a larger and more diverse corpus, a mixed-methods approach combining modeling of stylistic devices with statistical analysis that ended up even shifting the research question from Racine himself to Racine’s position among the contemporary authors: clearly, follow-up research, rather than any more specific type of repetitive research.

7 Conclusion

As a way to conclude, it appears useful to briefly reflect on the affordances and limitations of the proposed conceptual space and terminology of RR.

The primary affordance of the description of RR as structured into a five-dimensional conceptual space appears to be that this allows us to describe, with considerable granularity, precision and transparency, and without necessarily using a given terminology, the relationship between an earlier study and a later, related study. The main advantage of the set of terms proposed above is that they are clearly defined with respect to the conceptual space and that they provide us with convenient and distinct labels for a certain number of recurring scenarios within the conceptual space of RR. Several studies that have been designated with the same term, within this terminology, can reasonably be expected to show a considerable amount of similarity. Taken together, especially when considering that the conceptual space and the set of terms also come with a description of functions and requirements, the hope is that this conceptual work can support the community of researchers in CLS and DH more broadly to more clearly understand, value more appropriately, and more frequently practice RR.

A possible limitation of the conceptual space is the choice of the five dimensions. One may argue that the concrete implementation of a method, for example as an executable algorithm implemented in a programming language, should have been separated out from the method understood broadly and turned into an additional dimension. A similar argument may be made for separating the claims or conclusions from the raw results, instead of treating them as one shared dimension of research. That is true but doing so would come at the cost of additional complexity not only of the conceptual space, but also of the terminology it supports. Another potential limitation of the terminology as it is proposed here is that the terms identical, similar and unrelated, while clear enough in everyday settings, are of course not categorically separate classes, but three rather fuzzy areas on a continuum. This fuzziness, however, is embraced here. Specifications like functionally or strictly identical, or like closely or broadly similar may help clarify the usage in certain cases. Another possible limitation of the terminology is, of course, that despite efforts to minimize gaps between the terms, some degree of this appears to be inevitable if the number of terms is to remain manageable and if terms are reserved for particularly frequent or important scenarios.Footnote 16 Finally, people may disagree with the choice of terms themselves, in one or several cases. Fortunately, given that the terms and the conceptual space are defined quite explicitly, alternative ways of dividing up that space, alternative definitions of terms, or alternative terms for a given scenario can easily be proposed and exchanged.

With respect to the practice of RR itself, whether in CLS or in DH more broadly, there are some challenges, among them the considerable effort required to enable or perform RR, especially in the case of replication and the various forms of approximate repetition. In addition, there is the issue of a potential (or apparent) conflict with disciplinary values. Clearly, RR may be perceived to be at odds with the ways in which value is usually ascribed to research. Research usually needs to be original, innovative, ground-breaking, relevant, timely in order to be considered valuable. Apart from replication as quality control, the kind of research I advocate for here, in contrast, is fundamentally concerned with repeating research that has been completed years or even decades ago. Can such research be said to be valuable in this sense, or to foster excellence in research? Of course it can, precisely because it serves such important and varied functions in the research process, whether it is quality assurance and building trust (as in strict replication), corroboration or generalization of results (as in several different scenarios), efficiency and sustainability (as in the reuse of data or tools) or incrementally but methodically pushing the boundaries of knowledge (as in reinvestigations and follow-up research). In addition, practicing RR appears to be a learning opportunity to me, because one understands previous research much better when trying to replicate, reproduce or otherwise repeat it, including its strengths and limitations. More generally speaking, we also need RR as a way of guaranteeing the continuity, over time, of the disciplinary context of our work, especially in the Digital Humanities. Finally, and maybe most importantly, many of the functions of RR constitute or support best practices in the perspective of Open Science.