Abstract
Drawing on the minimal computing paradigm, this article overviews the creation of a "minimal research compendium" (MRC) and how it can aid digital humanities scholars in addressing the need for robust statistical reproducibility and transparency. The minimal research compendium uses a streamlined, single-document approach to outline four key steps in the research process: (1) providing an overview of the computing environment; (2) documenting data and data processing; (3) applying the GLM for robust statistical analysis; and (4) assessing model assumptions using residual plots. The article demonstrates the practical implementation of this framework using R and Quarto and discusses its potential applications and adaptability for digital humanities research.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42803-023-00074-x/MediaObjects/42803_2023_74_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42803-023-00074-x/MediaObjects/42803_2023_74_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42803-023-00074-x/MediaObjects/42803_2023_74_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42803-023-00074-x/MediaObjects/42803_2023_74_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42803-023-00074-x/MediaObjects/42803_2023_74_Fig5_HTML.png)
Similar content being viewed by others
Data availability
The Zenodo repository containing a copy of the dataset used in the example and the minimal research compendium used in the article is available here: https://zenodo.org/record/8104629. The data used in the examples is also readily available through the Journal of Open Humanities Data: https://openhumanitiesdata.metajnl.com/articles/10.5334/johd.33.
References
About | Journal of Cultural Analytics. (n.d.). Journal of Cultural Analytics. Retrieved June 30, 2022, from https://culturalanalytics.org/about
Akiki, C., & Burghardt, M. (2021). MuSe: The Musical Sentiment Dataset. Journal of Open Humanities Data, 7.
Allaire, J. J. (2022). quarto: R Interface to “Quarto” Markdown Publishing System (1.2). https://CRAN.R-project.org/package=quarto
Arnold, T., Ballier, N., Lissón, P., & Tilton, L. (2019). Beyond Lexical Frequencies: Using R for Text Analysis in the Digital Humanities. Language Resources and Evaluation, 53(4), 707–733.
Arnold, T., & Tilton, L. (2019). New Data? The Role of Statistics in DH. In Debates in the Digital Humanities 2019. University of Minnesota Press.
Ball, R., & Medeiros, N. (2012). Teaching Integrity in Empirical Research: A Protocol for Documenting Data Management and Analysis. The Journal of Economic Education, 43(2), 182–189.
Barr, D. J. (2021). Learning Statistical Models Through Simulation in R. https://psyteachr.github.io/stat-models-v1
Bates, D., Maechler, M., Bolker, B., Walker, S., Christensen, R. H. B., Singmann, H., Dai, B., Scheipl, F., Grothendieck, G., Green, P., Fox, J., Bauer, A., & Krivitsky, P. N. (2022). lme4: Linear Mixed-Effects Models using “Eigen” and S4 (1.1-31). https://CRAN.R-project.org/package=lme4
Benjamin, D. J., & Berger, J. O. (2019). Three recommendations for improving the use of p-values. The American Statistician, 73(1), 186–191.
Betensky, R. A. (2019). The p-value requires context, not a threshold. The American Statistician, 73(sup1), 115–117.
Button, K. S., Ioannidis, J., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power Failure: Why Small Sample Size Undermines the Reliability of Neuroscience. Nature Reviews Neuroscience, 14(5), 365–376.
Cohen, J. (1973). Brief Notes: Statistical Power Analysis and Research Results. American Educational Research Journal, 10(3), 225–229.
Cohen, J. (2016). A Power Primer. In A. E. Kazdin (Ed.), Methodological Issues and Strategies in Clinical Research (4th Ed.). (pp. 279–284). American Psychological Association.
Cohen, J. (1992). Quantitative Methods in Psychology: A Power Primer. Psychological Bulletin.
Collaboration, O. S. (2015). Estimating the Reproducibility of Psychological Science. Science, 349(6251).
Committee on Reproducibility and Replicability in Science. (2019). Reproducibility and Replicability in Science. National Academies Press. https://www.nap.edu/catalog/25303
Computational Literary Studies: A Critical Inquiry Online Forum. (2019). In the Moment. https://critinq.wordpress.com/2019/03/31/computational-literary-studies-a-critical-inquiry-online-forum/
Da, N. (2019a, April 3). Computational Literary Studies: Participant Forum Responses, Day 3. In the Moment. https://critinq.wordpress.com/2019/04/03/computational-literary-studies-participant-forum-responses-day-3-4/
Da, N. Z. (2019b). The Computational Case against Computational Literary Studies. Critical Inquiry, 45(3), 601–639.
Da, N. Z. (2020). Critical Response III. On EDA, Complexity, and Redundancy: A Response to Underwood and Weatherby. Critical Inquiry, 46(4), 913–924.
D’Ignazio, C. (2020). Data Feminism. MIT Press.
Field, A. (2022). An adventure in statistics: The reality enigma (2nd ed.). SAGE Publications.
Fife, D. (n.d.). A graphic is worth a thousand test statistics: Mapping visuals onto common analyses. Retrieved February 22, 2023, from https://rstudio-pubs-static.s3.amazonaws.com/528244_66e18fe19a2f40388a8a1cdc90d5c3a0.html
Gil, A. (2015). The User, the Learner and the Machines We Make. Minimal Computing: A Working Group of GO::DH. https://go-dh.github.io/mincomp/thoughts/2015/05/21/user-vs-learner/
Greenland, S. (2019). Valid p-values behave exactly as they should: Some Misleading criticisms of p-values and their resolution with s-values. The American Statistician, 73, 106–114.
Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press.
James, G., Witten, D., Hastie, T., & Tibshirani, T. (2022). An Introduction to Statistical Learning with Applications in R (Vol. 6). https://www.tandfonline.com/doi/full/10.1080/24754269.2021.1980261
Landau, W. M., Warkentin, M. T., Edmondson, M., Oliver, S., Mahr, T., & Company, E. L. and. (2023). targets: Dynamic Function-Oriented ’Make’-Like Declarative Pipelines (1.2.0). https://cran.r-project.org/web/packages/targets/index.html
Lindeløv, J. K. (2019, June 28). Common statistical tests are linear models (or: how to teach stats). Neuroscience, Stats, and Coding. https://lindeloev.github.io/tests-as-linear
Loukissas, Y. A. (2019). All Data Are Local: Thinking Critically in a Data-Driven Society. The MIT Press.
Lüdecke, D., Makowski, D., Ben-Shachar, M. S., Patil, I., & Wiernik, B. M. (2022). easystats: Framework for Easy Statistical Modeling, Visualization, and Reporting (0.6.0). https://CRAN.R-project.org/package=easystats
Marwick, B., Boettiger, C., & Mullen, L. (2018). Packaging Data Analytical Work Reproducibly Using R (and Friends). The American Statistician, 72(1), 80–88.
Müller, K., & Bryan, J. (2020). here: A Simpler Way to Find Your Files (1.0.1). https://CRAN.R-project.org/package=here
Nelder, J. A., & Wedderburn, R. W. (1972). Generalized Linear Models. Journal of the Royal Statistical Society, 135(3), 370–384.
Nowviskie, B. (2014). On the Origin of ‘Hack’and ‘Yack.’ Journal of Digital Humanities, 3(2), 3–2.
Piper, A. (2020). Do We Know What We Are Doing? Journal of Cultural Analytics, 5(1). https://doi.org/10.22148/001c.11826
Plesser, H. E. (2018). Reproducibility vs. Replicability: A Brief History of a Confused Terminology. Frontiers in Neuroinformatics, 11, 76.
Redfern, N. (2022). Computational Film Analysis with R (version 0.9.004). https://cfa-with-r.netlify.app/cfa
Sayers, J. (2016). Minimal Definitions-Minimal Computing. Minimal Computing: A Working Group of GO::DH. http://go-dh.github.io/mincomp/thoughts/2016/10/02/minimal-definitions/
Schmidt, B. (2019). A Computational Critique of a Computational Critique of Computational Critique. Ben Schmidt. https://benschmidt.org/post/critical_inquiry/2019-03-18-nan-da-critical-inquiry/
Siddiqui, N. (2022). Hidden in Plain-TeX: Investigating Minimal Computing Workflows. Digital Humanities Quarterly, 016(2).
Speelman, D., Heylen, K., & Geeraerts, D. (Eds.). (2018). Mixed-Effects Regression Models in Linguistics. Springer International Publishing. https://doi.org/10.1007/978-3-319-69830-4
Stigler, S. M. (1981). Gauss and the Invention of Least Squares. The Annals of Statistics, 9(3), 465–474.
Submission Guidelines | International Journal of Digital Humanities. (n.d.). Springer. Retrieved February 24, 2023, from https://www.springer.com/journal/42803/submission-guidelines
Szucs, D., & Ioannidis, J. P. (2017). When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment. Frontiers in Human Neuroscience, 11, 390.
Tenen, D., & Wythoff, G. (2014). Sustainable Authorship in Plain Text using Pandoc and Markdown. Programming Historian, 3. https://programminghistorian.org/en/lessons/sustainable-authorship-in-plain-text-using-pandoc-and-markdown
Underwood, T. (2019). Distant Horizons: Digital Evidence and Literary Change. University of Chicago Press.
Underwood, T. (2020). Critical Response II. The Theoretical Divide Driving Debates about Computation. Critical Inquiry, 46(4), 900–912.
Ushey, K., Software, P., & PBC. (2023). renv: Project Environments (0.17.3). https://cran.r-project.org/web/packages/renv/index.html
Warwick, C. (2015). Building theories or theories of building? A tension at the heart of digital humanities. In A new companion to digital humanities (pp. 538–552)
Weatherby, L. (2020). Critical Response I. Prolegomena to a Theory of Data: On the Most Recent Confrontation of Data and Literature. Critical Inquiry, 46(4), 891–899.
Wellek, S. (2017). A critical evaluation of the current “p-value controversy.” Biometrical Journal, 59(5), 854–872.
Author information
Authors and Affiliations
Contributions
Article is solo authored.
Corresponding author
Ethics declarations
Ethical approval
Not Applicable
Competing interests
The authors declare no competing interests.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Siddiqui, N. Minimal research compendiums: an approach to advance statistical validity and reproducibility in digital humanities research. Int J Digit Humanities 5, 405–429 (2023). https://doi.org/10.1007/s42803-023-00074-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42803-023-00074-x