Technology is global, but our use of it is subtly local. Digital scholarship in the humanities is no different. Where one is doing digital scholarship affects the types of methods and approaches one will find most fruitful for humanities research. These applications are profoundly cultural and are sculpted by historical factors such as colonialism, language, archiving practices, attitudes towards openness, teaching priorities, and economic development. In digital humanities (DH) this means that the digital methodologies most needed in Boston differ from those most important to Bogotá, Barcelona, or Bengaluru. This has implications for education, software development, and government skilling-up agendas in the humanities the world over.

The scholarship in digital humanities has recently started to take notice of this local-factor, and in the past half-decade, particularly outside of the English-speaking West, publications have emerged asserting that where matters in digital humanities. In very short order, we have had new histories, contextual essays, and personal reflections of digital scholarship in Portugal, Mexico, Latin America, the Caribbean, Australia, South Korea, Argentina, India, Ireland, Sweden, Brazil, Colombia, Ghana, Spain, Russia, China, and Poland (Alves, 2016; Arthur, 2019; Chen & Tsui, 2022; Crymble & Afanador-Llach, 2021; del Rio Riande, 2019; Exploring Digital Humanities in India, 2020; Golub et al., 2020; Josephs, 2019; Kajsa, 2021; Kizhner et al., 2022; Lee & Lee, 2019; Maryl, 2022; O’Sullivan, 2020; Ribeiro et al., 2020; Roig-Marín & Prieto, 2021; Sánchez & Pimentel, 2017; Sibaja & Balloffet, 2018). This includes book-length treatments: Exploring Digital Humanities in India (2020), Digital Humanities in Latin America (2020), and The Digital Black Atlantic (2021) to name a few. Notably under-represented in the geo-centric title list is the United States. There is no shortage of books that centre American ideas, examples, and assumptions without necessarily feeling the need to include ‘in the United States’ in the title or to reflect openly on their American positionality. Canada and the United Kingdom too tend to suffer from this same Western cultural blind spot in digital scholarship, including one of the co-author’s own books, which looks in detail at local DH culture in the US, UK, and Canada without signalling it on the title page (Crymble, 2021).

These new works complement longer discussions about what ‘digital’ means for different disciplines (Bardiot, 2021; Carter, 2022; Crymble, 2021; Eve, 2022; Zaagsma, 2022),Footnote 1 but these more recent geo-focused works have argued that what research questions people tackle depends on the cultural and linguistic influences of where they live, not just on the dogmas of their disciplines. Isabel Galina Russel, and more recently Paul Spence and Renata Farla Brandao have argued for the need to address monolingual and English-dominated infrastructures in DH (Russell, 2014; Spence, 2021; Spence, 2021), while Roopika Risam and others have helpfully described this as a DH ‘accent’, doing DH in the way that makes sense locally (Lam & Wong, 2020; Risam, 2017).

This socio-spatial turn in DH is part of a wider trend of de-centring the Anglo-Western perspective in digital scholarship, and what Simon Mahony and Jin Gao have described as a ‘lack of engagement’ with global culture and texts by English speaking DH scholars (Mahony & Gao, 2018). Notably, it builds very closely upon the advice of linguist Emily Bender, whose #BenderRule rightly demands that English-speaking linguistic scholars state the language they are working in, even if it’s English – especially if it’s English (Bender, 2019). She argues we have to stop letting English be the unspoken default, just as we need to stop allowing American or British examples to act as globally neutral standards of the human experience, and must instead view them as locally-situated and limited perspectives. Perhaps what we are seeing in this socio-spatial turn is an adoption of a #GeoBenderRule in DH: state the cultural context or location in which you are working, even if it’s the Anglo-West – especially if it’s the Anglo-West.

This paper takes those local discussions further by laying some groundwork to evidence these different local needs as they relate to skills development agendas in regional DH. It does so by comparing web traffic patterns of visitors to DH self-learning tutorials at Programming Historian, focusing on visitors from the top six countries by traffic volume (United States, India, United Kingdom, Spain, Mexico, and Colombia). This list of countries, which includes both wealthy Western states, and emerging economies, and which is spread across four continents, provides a helpful test case for the hypothesis that there are varied local needs for DH skills. It means we can ask if there is a greater demand for network analysis skills in India than in the United States (it does not seem so), or if Colombia shows a greater interest in digital mapping skills than Spain (probably, a little). This knowledge of local variation can be used to better understand and thus support the development of digital skills in different parts of the world.

1 The Programming Historian case study & study design

As an open project devoted to digital humanities skills development through online text-based tutorials, Programming Historian usage is one of many possible proxies for digital skills demand as it relates to the humanities. By looking at the types of tutorials that are most popular with learners, we can discern what areas scholars are collectively focusing on for their professional development years before their research findings appear in journals. The project sits in a wider ecosystem of DH skills initiatives, including DariahTeach and TEI By Example, as well as in-person summer schools, such as the Digital Humanities at Oxford Summer School in the UK and the Digital Humanities Summer Institute in Canada, which help scholars learn new digital skills to support their research (DariahTeach, 2015; Digital Humanities at Oxford Summer School, 2000; Digital Humanities Summer Institute, 2001; TEI By Example, 2006). Programming Historian was founded as a series of blogposts in 2008 by William J. Turkel and originally aimed at teaching Python programming. It was one of the earlier resources of the Internet era dedicated to remotely teaching technical skills specifically to historians (Turkel, 2008). That made it distinct from the examples above, which were more aligned to the needs of the scholarly editing and linguistic communities. Turkel’s approach was to provide free written step-by-step instructions that introduced new programming concepts, but for the purpose of solving the types of problems that historians faced in their research: downloading primary sources, or organising materials for further study. This was a step-change from the tasks in most programming how-to books of the day, which often involved building a digital inventory at one’s widget factory. The large catalogue from the O’Reilly group is typical of that generalist approach to tech education (Tidwell, 2008).Footnote 2 Instead, Turkel drew upon Canadiana.org, a historical database historians used to collect materials, and he taught them how to scrape historically themed text from the website using only code, with the hopes that they could automate some of the steps of their research processes (Canadiana.org, 1999).

Programming Historian underwent a transformation in 2012, launching as an open access, peer-reviewed publication under the control of an editorial board (Turkel, 2011). The project openly solicited new tutorials from the DH community, which significantly diversified the offering, moving well beyond Python programming and into a wider range of DH skills, including but not limited to topic modelling, regular expressions, transliteration, digital mapping, network analysis, and a range of data handling and data cleaning skills. Shortly thereafter, the project began a multilingual journey after receiving an email from Victor Gayol of el Colegio de Michoacán in Mexico suggesting translations of the tutorials. The team agreed, first building a Spanish editorial board that included Gayol in 2016, to translate and then solicit Spanish-first tutorials to serve the world’s 400-million Spanish speakers. French (2019) and Portuguese (2021) teams followed shortly thereafter.

Until relatively recently, the Spanish, French, and Portuguese editorial teams focused on translating and localising English-language lessons. Localising sometimes included swapping out English-language primary sources and examples for ones that were more relevant to the local audience, or adding extra context as required, with the aim of best serving the local needs of the community through culturally relevant learning materials (Isasi & Castro, 2021). For example a Spanish-language tutorial by Jennifer Isasi on sentiment analysis uses a Spanish novel Miau by Benito Pérez Galdós (1889) as its case study corpus, but the Portuguese translation has opted for a Brazilian novel, Dom Casmurro by Macado de Assis (1899), which the editors and translators felt was a better learning example for Lusophones (Isasi, 2021, 2022).

The editorial teams also had oversight over which existing tutorials to prioritise for translation, based on their understanding of local demand, and limited time available to devote to the project. Many of the earliest translations were undertaken in-house by the board members, whose selection of tutorials has sculpted the shape of the offering to this day. This has since been supplemented by Spanish-first tutorials, acquired via the open submission process, which reflect the interests of the authors, but were deemed of enough value to the community to justify the time and expense of developing and improving a tutorial. The English-language publication has only just started its journey in translating original lessons from other languages, which will have begun to appear by the time this article is available. While taking different approaches, this means that the various language publications were actively seeking what they believed was the best way to serve the learning needs of DH scholars who worked in their language community. In both cases, as publications that accepted submissions from the public, this was a blend between what authors offered to write, and what editors encouraged the community to contribute via special calls or in-house translation work.

The breakdown of original tutorials versus translations by language publication can be seen in Table 1. Celebrating its tenth year as an open resource in 2022, Programming Historian has thus played a long-term and ongoing role of promoting remote multilingual digital skills development in DH and in de-centring the hegemony of the English language in digital pedagogy, while also providing an infrastructure for editorial boards operating in other languages.

Table 1 Number of original lessons and translations of lessons from other Programming Historian language publications, to 30 May 2022

Like most websites, the project owns data that details usage, and in this case has Google Analytics web traffic data dating back to June 2012. The tool tracks which webpages a reader visited, as well as some basic demographic data that includes the country where someone was visiting from. These data have been collected in line with Programming Historians’ privacy policy for the purpose of understanding visitor needs. Given its potential to understand regional patterns of DH skill seeking, an anonymised subset of that data for the period 31 May 2019—30 May 2022 has been provided to the authors for this study (Privacy Policy, 2021). This period includes approximately one year leading up to the Covid-19 pandemic, which drastically affected lives in 2020–21, and includes roughly the same amount of time after many countries eased the strictest laws on gathering in public (Bessette, 2020; Horton, 2022).

The data have been automatically collected via Google Analytics, a Google-owned tool launched in 2005 to help website owners track and make sense of web traffic patterns, and is often associated with the search engine optimisation and web marketing industries. A website owner must consciously apply Google Analytics to their site, and Programming Historian made this choice in 2012. When a visitor’s computer attempts to visit a page, it sends a request to a server that includes information on which page they want to read, as well as their IP address and information such as which type of web browser they are using, or their screen resolution. If Google Analytics is installed, it collects a copy of this request information.

These data are then made available to the website owner via an online dashboard (Google Analytics for Beginners, n.d.). While it is possible to opt out of being tracked by Google Analytics, the authors believe that the traffic data we have acquired, which includes records of 3.6 million distinct users, are representative of the full picture of traffic to the site over the study period, and that visitors who have opted out of tracking are not disproportionately likely to visit certain pages. To protect individual privacy, only country-level summary statistics are shared in this paper, and will be rounded or discussed in relative terms so as not to imply a greater number of significant figures than is defensible. Google estimates the analytics figures are accurate to within 2% (How users are identified for user metrics, 2017).

This study approach, which relies heavily on passively-collected data, is widely used in digital user experience studies, and has the advantage of quietly monitoring what web users actually have done, rather than what they might think they want or need (Crymble, 2016; Porsche et al., 2022; Warwick, 2012; Young, 2014). This necessarily gives a different set of evidence than if users had been actively surveyed, but provides an opportunity to look for patterns across a much larger population. The passive data approach does come with limits. The results are inherently positivist, only showing patterns that exist, and not unfulfilled needs of users looking for skills not available on Programming Historian. For example, if the editorial board has mis-judged local need and failed to produce tutorials on needed skills in a given country, that mis-judgment would not be visible from these data. The traffic is also largely driven by Google and other search engine algorithms, which programmatically decide whose site ranks highest in search results, and thus affect incoming traffic patterns, even for expert DH users, in ways that cannot be fully accounted for (Mao et al., 2018). Quality of visit is also difficult to discern at scale, as web traffic statistics cannot tell us if a visitor found a tutorial useful (Jansen et al., 2022). The authors believe that across broad trends and millions of users, the patterns are probably indicative of interest in the themes and keywords offered in various Programming Historian tutorials, and can provide a starting point for conversations on local DH needs, if not the full picture.

To provide focus, this paper only looks at traffic patterns of the top six countries by source of visitors to Programming Historian, which account for just over 2-million visitors (55% of all unique visitors in the period). These represent the top three countries whose primary browsing language on the site according to the data analysed, is English (United States, India, United Kingdom) and the top three whose primary browsing language is Spanish (Spain, Mexico, Colombia) (see Table 2). Not all readers in any country read tutorials in the dominant local language, however, most did. For example, in the United States there was a significant minority of readers accessing Spanish language tutorials. Meanwhile, India is the second largest source of Programming Historian traffic both overall and to the English publication, even though English is a second or perhaps third language for most of those people in India who can read it.Footnote 3 The focus of the study is to consider the consumption of tutorials in the ‘local’ language, and where multilingual readership is discussed, that will be made evident.

Table 2 Web traffic ‘users’ visiting Programming Historian by country, 31 May 2019 – 30 May 2022, rounded to the nearest 1,000

As the total readership of Programming Historian (7-million site visitors in ten years) outnumbers the likely population of humanities or digital humanities scholars, the site evidently attracts users beyond its target audience. To better understand the different interests of generic learners interested in technology, from the more targeted DH users, as well as to understand what lesson themes were most popular in different regions, the authors coded each of the English and Spanish publications into six categories based on the primary learning outcome or key skills offered by the tutorial (Table 3):

  1. 1.

    Data Handling (cleaning, manipulating, or downloading data and source material)

  2. 2.

    Research Analysis (putting data or sources through a computational process to generate new knowledge)

  3. 3.

    Web Development (skills and technologies that are commonly used to build websites or apps)

  4. 4.

    Programming or Encoding (programming skills in R or Python, or encoding skills such as markdown or XML)

  5. 5.

    Audio/Visual (working with non-textual sources such as sound or video)

  6. 6.

    Operating System Basics (skills for working effectively with your computer, such as command line skills).

Table 3 Counts of English and Spanish language lessons coded by the authors based on primary learning outcome

Most lessons can fairly objectively be coded into one of those categories. Where more than one categorisation was viable, for example, ‘Introduction to MySQL with R’, which teaches database skills (commonly used in web development) but also a form of data handling (structured database) and skills using the R programming language (programming), the authors attempted to opt for the learning outcome (MySQL as web development) rather than the tool used to achieve it (R). The full classification is provided as Appendix Tables 6 and 7 and we acknowledge a degree of subjectivity. Coding was done before analysis, and was not changed to alter the narrative.

The snapshot in Table 3 shows that the Spanish language editorial board has placed greater proportional effort into lessons that provide programming or encoding skills than the English board (+ 12%), including several lessons on the Text Encoding Initiative, which are only in Spanish. These collective decisions over time were made by a Spanish language editorial board which has been chaired by scholars in Colombia and Chile up to 2022, and which included editors who live or work in Mexico, Spain, Colombia, Belgium, Germany, Brazil, and the United States, providing a broad and balanced understanding of digital scholarship in a range of Spanish-speaking countries. By comparison, the English publication has proportionately more lessons that offer research analysis skills (+ 10%). The English editorial board has been chaired by scholars in Canada, the United Kingdom, and more recently, the United States. Its editorial membership has skewed towards editors in the United States, particularly under American managing editors, and its decisions may reflect that local bias. In the next section we explore if the users have made the same decisions as the editorial board, by analysing their traffic patterns, and find that in those two examples, the answer is largely yes.

2 Macro patterns by country & language

Perhaps because of the strategies and focus of the respective English and Spanish editorial boards, there were clear differences in visitor statistics between the two publications at category level, but fewer obvious ones within them. This may appear to undermine the authors’ claims of local variation of need. This can be seen in Fig. 1, which shows a breakdown of visitors from each of the six countries, across each of the six skills categories. The three English-audience countries are on the left half of each stack of columns.

Fig. 1
figure 1

Percentage of visitors from each of the USA, India, UK (English tutorial traffic only), Spain, Mexico, and Colombia (Spanish tutorial traffic only) visiting Programming Historian tutorials of each categorisation (Data Handling, Research Analysis, Web Development, Programming or Encoding, Audio / Visual, and Operating System Basics. Data from 31 May 2019 to 30 May 2022.

What Fig. 1 makes clear, is that overwhelmingly, visitors to English language tutorials, regardless of geographic origin, were using Programming Historian to learn web development skills. In the case of India, this was even more-so, with six in ten unique readers clicking onto a web development-related tutorial, compared to just under half of those from the USA and UK. This includes skills such as building Web APIs with Python and Flask, using JSON and jq, creating augmented reality experiences, and building static websites using Jekyll and Github (Smyth, 2018; Lincoln, 2016; Greene, 2018; Visconti, 2016). The skills within these lessons have obvious relevance to humanities scholars who are building web-based outputs, but also provide key skills of broad relevance to anyone interested in building both simple and advanced websites. India has a very large web development and tech industry, particularly in the Bengaluru area of southern India, and the accessible and open learning resources on Programming Historian are evidently being used widely by professionals or would-be professionals seeking to pick up new web skills (Parthasarathy, 2004). Given the scale of the use of this type of lesson, the authors would posit that a majority of readers of web development tutorials at the English version of Programming Historian are not digital humanities scholars or humanities researchers, but are instead individuals with a much broader interest in learning various technological skills.

Also prevalent amongst these English-site users, were data handling skills, which accounted for about one in five unique visitors to the site. This included skills such as downloading web content with Python, transliterating text with code, cleaning data with OpenRefine, or using regular expressions to clean up machine-transcriptions (Crymble, 2012; Bernstein, 2013; van Hooland et al., 2013; O’Hara, 2013). These tutorials, particularly those with the highest traffic volume, tended to be amongst the oldest published by Programming Historian. Despite in some cases already spending a decade online, the skills within these tutorials, which mostly focus on working with textual content at scale, have stood the test of time with English-site readers. As these skills are very generalisable to any data, again, the authors would suggest that these readers probably include humanities scholars and students, as well as a wider readership outside of the target audience.

The final significant group of readers of the English-speaking site were those studying research methods, and skills that are most directly relevant to the analysis of data. These skills could in some cases be more broadly applied to data-related questions, but of the tutorials on the site, this set is the most likely to appeal specifically to humanities or digital humanities researchers, as well as data scientists and scholars from other academic disciplines. They include lessons on network analysis, corpus analysis, topic modelling, sentiment analysis, and stylometry, to name a few (Brey, 2018; Froehlich, 2015; Graham et al., 2012; Saldaña, 2018; Laramée, 2018). These skills do have commercial and political value in specialist industries, but are used less commonly than basic programming or web development skills (Liu, 2012; Kumari & Singh, 2016; Casas-Valadez et al., 2020). Notably, while still used in India (10%), this research-focused set of tutorials was about half as prevalent than it was in the UK (19%) or USA (17%). This was one of the biggest measurable differences between Indian visitors and those from the UK and USA at category-level and may reflect a greater emphasis on publishing analysis-based research in Western DH working cultures within universities.

Other types of lessons, including those focusing on audio or visual material, those aimed at teaching programming or encoding skills as their primary focus, and those that introduced the command line, offered important skills, but were not at that time collectively the primary draw of traffic to the English Programming Historian site. It may be the case that audio and visual processing skills will grow in the coming years, as those forms of data continue to proliferate, but in the period covered by this paper, that was still an area for future growth both for the publication and for visitors seeking new skills. Or perhaps learners looking for those skills were not using Programming Historian to build them.

The broad patterns of use on Programming Historian en español were quite different. Web development tutorials were only a fraction as popular in Spain, Mexico, and Colombia, as they were on the English site. About a tenth of readers came to learn web development skills in Spanish compared to about half of English readers. Instead, overwhelmingly Spanish visitors read tutorials on data handling skills. These tutorials were three times more popular in Spanish than English (accounting for readership size differences). Some of the more popular tutorials in this category were translations of tutorials written by Turkel back in 2008 and since revised, including a series of lessons that used Python to manipulate and clean up textual content downloaded from the Internet (Turkel & Crymble, 2017a). Data handling tutorials made up the same proportion of published lessons in both English and Spanish, so the much higher percentage of readers using them in Spanish is notable.

The next most commonly read lessons were basic programming or encoding tutorials, including some generic advice on setting up your computer for Python programming, or installing library packages using ‘pip’, which were more popular than some of the more advanced skills, suggesting a need for ground-setting tutorials that may not be widely available in Spanish (Gibbs, 2017; Turkel & Crymble, 2017b). This set of lessons also included a series about the Text Encoding Initiative (TEI) that were written in Spanish and that are not currently available in English (Calarco and del Río Riande, 2021; Vaughan, 2021a). As the TEI documentation is primarily in English (Learn the TEI, n.d.), this choice by the editorial board to encourage Spanish-language resources shows an awareness of local needs, which was reflected in the traffic statistics. As Fig. 1 shows, tutorials in this category attracted about three times as much of the total Programming Historian en español audience as they did of the English publication.

The category most notably lacking in readers was ‘Research Analysis’. Despite nearly a quarter of Spanish tutorials covering some form of analysis skill, the readership hovered around the 4 percent mark in all three of the Spanish speaking countries examined. At least in the case of Colombia, this may be down to history. Maria José Afanador-Llach has previously pointed out the effects of Colombia’s long history of colonialism, the fact that much of its early history, which one might like to datamine or digitise, is still kept in the Spanish state archives in Seville, and the fact that Colombian archives must prioritise preservation over digitisation, all reasons why machine-intensive research analysis may not be on the top of the agenda in Colombia (Crymble & Llach, 2021). That does not, however, explain why research analysis tutorials are equally under-read in Spain, which holds the very records Colombian researchers might want to study at scale.

While users of the English Programming Historian and Spanish Programming Historian en español sites have very different patterns of reading from one another, with a few exceptions, at the macro-level there are few key differences between people of different countries reading the same publication. To identify those differences, one must look much closer at the individual tutorials most and least popular in different regions.

3 The micro view in English

Traffic to Programming Historian lessons was not evenly distributed across tutorials. A small number of English language lessons between 2019 and 2022 outperformed the rest from a web traffic perspective. This ‘outperformance’ in traffic terms should of course not be confused with their value to the field of DH.

With 89 English language tutorials, a lesson with more than about 1% of total traffic was punching above its weight in terms of popularity (1.12% to be precise). Four lessons far exceeded this threshold, each capturing greater than 5% of the total English audience. Another sixteen had more than a 1% share, meaning twenty out of eighty-nine lessons could be described as very popular. These were joined by a long tail of tutorials with more specialist appeal. This section briefly looks at both the popular and long tail ends of traffic, considering geographic differences. These more micro-variations between demand for different skills and technologies will undoubtedly change by the time many readers access this article. What is important to note is that they do exist, and that some form of local variation will continue to exist in future.

If an English language lesson was popular in one country, it was likely popular in all three of them. For the period covered by this study (2019—2022), the top ten lessons were nearly all shared, though their rank order was not necessarily the same (Table 4). At the top of the list by a wide margin was Patrick Smyth’s ‘Creating Web APIs with Python and Flask’ (2018). The lesson promises to teach readers to ‘Learn how to set up a basic Application Programming Interface (API) to make your data more accessible to users. This lesson also discusses principles of API design and the benefits of APIs for digital projects’ (Smyth, 2018). Because readers could swap their own data into the lesson after learning these new skills, the tutorial is broadly useful to users around the world interested in building an API, and not just to DH scholars. Its enduring popularity is a testament to Smyth’s written pedagogical skill. Despite being the most popular lesson in all three countries, it was far more popular in India, drawing about a third of all Indian traffic to Programming Historian, which is twice as many proportionately as in the USA or UK. A second API-related lesson by Go Sugimoto, introduced how to pull data from an API and to use it to populate a website, too cracked the top ten most popular in both India and the UK (Sugimoto, 2019). This shows a great interest in API implementation, but perhaps a greater need in India, as its web development industry continues to grow and more people seek to build marketable skills in those areas.

Table 4 Rank Popularity of top ten English-language Programming Historian lessons, showing key theme of lesson, differentiated by country, 31 May 2019—30 May 2022

Also in the top ten were a number of web development and data handling tutorials, such as Matthew Lincoln’s ‘Reshaping JSON with jq’ (2016), and a few of Turkel’s original lessons about working with textual data using Python (both 2012), providing broad transferable skills (Lincoln, 2016; Turkel & Crymble, 2012a, b). These similarities across all three countries show a fairly global need for certain skills, and if projects like Programming Historian were chasing web traffic, this popular set of very repurposable lessons could guide them towards those figures by highlighting skills in demand.

However, what is more interesting for a conversation about regional differences in DH and the regional needs of the field is in the long tail. These are the lessons that cover important DH research skills, but that may never appeal to hundreds of thousands of learners the way generic web or data skills may do. Given their importance to DH research, and the gap between the UK & USA (19% / 17%), and India (10% of readers), the relative popularity of these lessons is a good way to identify regional differences in interest (Fig. 2). This section considers the relative ranking of a lesson’s popularity, as a way of ironing out population and total user differences between the three countries (for numbers of visitors, see Appendix Table 6). Each lesson was ranked by the number of unique visitors, for each of the three countries. This means a lesson could have a higher number of visitors in the US than the same lesson in the UK, but could be relatively less popular within that country compared to others.

Fig. 2
figure 2

Page rank of ‘Research Analysis’ lessons on Programming Historian by country of visitor. Taller bars show lessons with bigger differences in rank between USA, India, and UK, showing different patterns of use. Data from 31 May 2019 to 30 May 2022

The 30 lessons in this category are perhaps the most uniquely ‘DH’ focused of the lessons on Programming Historian, covering topics such as stylometry, corpus linguistics, topic modelling, and geospatial skills, all of which have the potential to lead to new research knowledge on a range of humanities questions. Despite the fairly large number of lessons in this set, the average publication date was later than those in the data handling and web development categories, with two-thirds appearing since the start of 2017. That may mean that their readership had not yet had time to percolate into systems that drive traffic, such as class syllabi or library workshops.

The UK stood out in this category, having the highest ranking for 18 of 30 ‘research analysis’ lessons (with two tied). Britain was at the bottom of the list for only 3 of those lessons. This suggests that for DH skill-seekers in the UK, skills that can lead to peer reviewed humanities research findings are of greater importance than in the US or (particularly) India. This may speak to the UK’s emphasis on research outputs as well as the ‘Research Excellence Framework’ which occasionally ranks the research quality of all university academics in the country and financially rewards or punishes their employer according to the results (REF, 2021). It also aligns with the nature of the British humanities PhD, which is a relatively short period of intensive research study leading to a dissertation, often completed in 3–4 years, compared to 6–9 years in the USA where students put a greater emphasis on coursework (Kehm, 2006).

Lessons in this category were relatively less popular in India, and half of the lessons were ranked lowest in India. In a few cases this ranking was substantially lower, as can be seen by the taller bars on Fig. 2, which show a wider spread in ranking for certain lessons. The biggest outlier is a tutorial on corpus linguistics basics using Antconc, an easy-to-use tool aimed at aspiring linguists (Froehlich, 2015). The lesson was popular in Britain (ranked 21st) but was much less used in India (ranked 58th). In real terms, four British visitors used that tutorial for every one Indian visitor, despite a much larger Indian user base overall. It is not immediately clear why, as the tool itself is a good candidate for Indian researchers. Antconc is equipped to handle a range of character sets, including Indian scripts such as Hindi and Urdu, so its relevance to Indian scholarship is high (Anthony, 2022). It may be the case that Antconc use has become more popular in Britain through other means, such as being taught on courses, or recommended by research supervisors, and that learners have merely found their way to Programming Historian to learn how to use it effectively, rather than discovered it through the website.

Case studies used in lessons may have some impact on regional use. A lesson with a case study and learning outcomes tied quite closely to the HathiTrust collection had a much higher use in the United States (ranked 62) than in either the UK (78) or India (82) (HathiTrust, 2008; Organisciak & Capitanu, 2016). This collection was compiled in 2008 to bring together out of copyright works in a series of American university libraries. The skills in the lesson focus on learning to make the most of advanced features in HathiTrust materials, and this was evidently enough to make it more popular in the US.

In a few cases, interest in India was highest, but this may be for economic reasons. For example, a tutorial on sentiment analysis was more popular in India than elsewhere, and this could be linked to the outsourcing of data processing in the ‘opinion mining’ industry, in which companies seek to capture the pulse of social media chatter about their company or political candidates (Saldaña, 2018). This tendency to outsource certain types of data processing, including both optical character recognition work and data cleaning, may also explain why a lesson called ‘Cleaning OCR’d Text with Regular Expressions’ was much more popular in India (ranked 30th) than in the other countries (47th & 49th) (Gray & Suri, 2019; Risam, 2019). Overall, these variations, some of which can be linked to historic, bureaucratic, and economic factors, show that an important degree of regional skills variation can be identified in the traffic patterns of English language Programming Historian lessons.

4 The micro view in Spanish

The view in Spanish shows different patterns, but local patterns, nonetheless.

The first notable difference was that Spanish, Mexican, and Colombian readers seemed to be more capable of or willing to learn in English. Perhaps not surprising, with the exception of the United States, Spanish lessons were very rarely used in the UK or India. Many of the more popular lessons accessed in these Spanish-speaking countries were English tutorials. This included the very popular lesson on APIs that topped the list in all three English countries and was also in the top ten in all three of the Spanish-speaking ones. It also included the popular lesson on JSON with jq, and one of the network analysis lessons, none of which are yet available as a Spanish translation (Ladd et al., 2017; Lincoln, 2016; Smyth, 2018). This shows demand for these skills within Spanish-speaking countries that was not yet met by Spanish-language materials, and could be a signal to the Spanish editorial board that they could better use these traffic statistics to prioritise publication agendas for in-demand themes.

Other English lessons were well-used in Spanish countries, including those already translated into Spanish but accessed in English instead, suggesting both a higher level of bilingualism, and a level of comfort searching the web in English and working with technology jargon in a second language. For example, in Spain, ‘Counting Word Frequencies with Python’ was the 16th most popular across all lessons, and its Spanish translation was 6th most popular, meaning both were well-used within Spain (Turkel & Crymble, 2012b). This use of English materials in a Spanish country aligns with findings of Alison Hicks on the bilingual pressures faced by many Spanish native speakers operating in English-dominated fields, and who have thus adapted to sifting through English-language content on search results pages designed for English audience needs instead of searching through Spanish materials (Hicks, 2014).

Focusing only on Spanish language tutorials, as in English, a small number of lessons were particularly high traffic. With 55 Spanish lessons total, a lesson attracting more than 1.8% of total Spanish traffic would be above average. 9 Spanish-language tutorials meet this threshold of popularity, with the rest forming the long tail. Also as in English, if a lesson was popular in one Spanish country, it likely was in all three (Table 5). This set was different, however, to the lessons most popular in English, skewing much more towards data handling, data cleaning, data manipulation, and advanced computing skills, rather than web development. Translations are thus far generally more popular than Spanish original lessons, but the translated materials have been published for longer, and this is likely a product of time to build an audience rather than evidence of interest, as Programming Historian tutorials need to embed in educational settings, and do not generally go viral.

Table 5 Rank Popularity of top five Spanish-language Programming Historian lessons, showing key theme of lesson, differentiated by country, 31 May 2019—30 May 2022

Despite most lessons closely mirroring their popularity across all three countries, a quarter of the Spanish lessons (13 out of 55) showed fairly marked regional differences. Each of the countries had one or more lessons that were significantly more popular than elsewhere, as well as one or more lessons that were less popular in that region. These differences are the clearest look into regional differences in learning needs (Fig. 3).

Fig. 3
figure 3

Page rank of Spanish-language lessons on Programming Historian en español by country of visitor, showing only those with a spread of at least 10 places across the three countries. Taller bars show lessons with bigger differences in rank between Spain, Mexico, and Colombia, showing different patterns of use. Data from 31 May 2019 to 30 May 2022

Spain stood out for its use of a translated lesson about the basics of linked open data, which was the 25th most popular lesson in Spain, but 42nd in both Mexico and Colombia (Blaney, 2018). Linked open data and its most common query language SPARQL, are fairly widely used in the European Union (Linked data and SPARQL, n.d.; Europeana SPARQL API, n.d.). Users can access European law and court judgments using linked open data queries, as well as use it to query European-wide projects such as Europeana, a transnational catalogue of cultural heritage (Europeana, 2008). Without the same emphasis on linked open data in Latin America, it is perhaps not surprising that the greatest interest at present was in Spain.

Mexican readers had their own anomalies. An introductory lesson on using Google Maps and Google Earth was far more popular in Mexico (13th) than either Colombia (21st) or Spain (24th) (Clifford et al., 2018). Meanwhile, there are two Spanish original lessons: one on the Text Encoding Initiative (TEI), and a second on machine translation. Both are written by Latin American authors, and were significantly less popular in Mexico than in the other two countries (though both were recently published and that may correct itself with time as educators add it to their curriculums) (Luza, 2019; Vaughan, 2021a).

In Colombia’s case, the biggest anomaly was a home-grown lesson written by Colombian authors Anthony Picón and Miguel Cuadros on using ‘Map Warper’, an English-language tool for georeferencing and georectifying maps. The tutorial uses Colombian case study data, and by offering instructions in Spanish, it becomes more accessible to a Spanish-language audience. This lesson was the 29th most popular Spanish-language lesson in Colombia, but 48th in Mexico and 56th in Spain, suggesting that mapping-related skills and lessons written with Colombian expertise were both valued within that research community. The biggest Colombian anomaly in the other direction was a brief lesson on installing Python on a Mac. Similar tutorials are available for installing Python on Linux and Windows. In Colombia, the Mac tutorial stood out as the least popular of the three, used less even than the Linux tutorial. This was quite distinct, and in Spain the Mac lesson was most used. This shows which computer operating systems may be most accessible to scholars in Colombia and the importance of considering incidental costs for tutorial activities (Turkel & Crymble, 2017d, e, f).

As in the three English countries, these geographical variations between Spain, Mexico, and Colombia provide evidence of differing local needs. These differences are subtle and sit alongside clear evidence of shared need for skills in certain areas, such as data handling and data manipulation. This paper does not wish to over-sell the variation, but to note that it exists and that it is an important factor in local education. To maximise the pedagogical benefits, both shared and divergent local needs must be met by educators.

5 Conclusion

This study presents a select comparison between two language communities (US, India, UK / Spain, Mexico, and Colombia) and their digital humanities skill seeking behaviours within a single digital pedagogy resource. It does so with web traffic data from Programming Historian in the period mid-2019–2022, offering an incomplete set of proxy evidence for digital humanities skill seeking of 3.6 million web users. It cannot tell us about unmet learning needs, nor can it answer definitively if what was published reflected the needs of the author who wrote the tutorial or the audience who they imagined might consume it. It is thus modest in its aims; future and ongoing research would be needed to further understand the full picture of regional digital skills needs, as well as to monitor changes over time. However, this study has provided clear evidence for not only similarities, but notable regional differences in DH skills exploration that are not easy to see through the existing scholarship or received practice. As a result of this new evidence, we suggest that the scholarship and professional practice in digital humanities pedagogy needs to evolve in two ways.

Firstly, that pedagogical works must acknowledge their cultural context and should be careful about making global claims about skills development needs that have not been evidenced outside of one’s own cultural bubble. For example, scholars working and living in the US must be careful to inform audiences at workshops or in print that their knowledge may be best suited to the context of American DH unless they know otherwise, and they must strive to understand those contexts and its limits. Building on Emily Bender’s #BenderRule, this means stating the cultural context or location in which you are working, even if it’s the Anglo-West – especially if it’s the Anglo-West.

Secondly, while there is a significant set of shared knowledge that is in demand around the world (web development skills, data handling), different regions have some different digital humanities skills needs, which will continue to evolve. This is already widely understood in regions where English is not the dominant language, and has been argued by Ernesto Priani Saisó et al., and the many authors of the geographically focused DH books and articles cited in the introduction (Saisó et al., 2015).

The new evidence in this article should be a loud signal particularly to English-speaking educators who teach students from multiple regions, and digital pedagogy projects that seek to serve global audiences. Both need to be aware of that fact, and to work to build understanding of how different communities need to be served and how curriculums can be developed to provide flexibility and culturally-relevant pathways for learners. This is particularly important if one’s students or readers come from a cultural context that significantly differs from one's own, such as programmes with significant international student cohorts. Some of these needs can be addressed by following good practice in diversity and inclusion on project teams, editorial boards, and in academic hiring, to ensure the team behind the teaching or resources represents the target audiences (Özlem & DiAngelo, 2017; Sichani et al., 2019). This also means identifying and working to dismantle Anglo-centric power structures in DH, as Urszula Pawlicka-Deger and others have advocated for previously (Pawlicka-Deger, 2022).

Without this diversity of perspective, we risk flooding the internet with the skills that are most needed in the West, and will miss the levelling up opportunity that open learning resources can offer to users around the world. The specifics of the regional skills needed will change quickly, but this paper has reinforced what much of the recent scholarship has started to show: that where matters in DH and if we are to promote learning of digital skills, we need to listen to those local voices whispering at us about their needs, rather than assuming the needs of the metropole will suit us all.