Introduction

The digital revolution and the widespread use of the internet have influenced and changed many realms of empirical social science research. Novel digital sources of data are becoming popular to gain new insights into old and new questions of the social sciences [7, 16, 21, 22, 38]. In today’s internet landscape, the online encyclopaedia Wikipedia takes on a key role: it is one of the most visited websites worldwide, is at the backbone of many modern technologies and functions as an important source of information for the general public. It has also been met with strong interest by the scientific community as the encyclopaedia and its specific software structure have created a rich and freely accessible data source, offering the opportunity to study large-scale, self-organising collaboration networks. In addition to the online activity of Wikipedians, the platform also features a notable offline component which has been largely neglected in previous research. Wikipedia is characterised by regular local offline meetups which give editors a time and place to get to know each other personally. In the typical spirit of Wikipedia, these meetings are organised publicly and are well documented with lists of attendees and minutes. This data brief presents the newly published dewiki meetup dataset containing 4418 meetups that have been organised on the German-language version of Wikipedia. The dataset contains information on the date and place of a meetup, of the users attending it, of those excusing their absence, of the minutes recorded, and the source information. The dataset thus provides a rich and granular view of the offline component of Wikipedia. It does not only shed light on the magnitude of meetups but it also paints a narrative of their social dynamics.

The dataset covers almost 20 years of offline activity of the German-language Wikipedia, one of the largest and most active language versions (featuring, as of January 2023, over 2.7 million content sites, having been edited a total of over 225 million times).Footnote 1 From its launch in 2001 to March 2020 when face-to-face meetings came to a halt due to the outbreak of the Coronavirus pandemic, almost all meetups were collected and can now be easily merged with online activity data. These data allow to shed light on a part of Wikipedia which has previously been overlooked. Wikipedia has been a popular source of data for scholars of various disciplines: from computer science to linguistics and the social sciences, researchers have used data from Wikipedia to answer a multitude of scientific questions. For example, Wikipedia’s structured and large body of text in multiple languages has been used to develop, improve, and showcase new algorithms, particularly in the field of natural language processing and supervised machine learning in general [see e.g. 8, 9, 14, 17]. Other studies are more focussed on Wikipedia as an online phenomenon itself and do not consider it as only a data-generating platform. Such research includes studies about the (determinants of) quality and credibility of article content [see, e.g. 1, 10, 18, 20, 24, 25, 34, 37, 42, 46], or tries to better understand the community of Wikipedians as contributors towards an online public good with remarkable sustainability [see, e.g. 11, 26, 30,31,32, 44]. In a broader societal context, Wikipedia data have also been used to predict real-world events such as election outcomes [47], movie success [27], or stock market moves [28], and cultural differences have been investigated by making use of different language versions [4, 12]. The rich source of data and its publicness also allow us to investigate fundamental questions of the social sciences, such as the role of networks when explaining norm enforcement [33] or how group size affects contributions to a public good [48]. The dewiki meetup dataset is a valuable source of data for social science research: it captures the development of the offline network over time of one of the most sustainable, online public goods and the community producing it. It allows to extend previous research of Wikipedia with the important dimension of offline interactions.

This article is structured as follows: in the next section, Wikipedia and the community of Wikipedians will be described in more detail. After this, it will be described how offline meetings have been collected and how the dewiki meetup dataset can be combined with online data to unzip its full potential. The collected data will then be described and novel research opportunities for computational social scientists will be highlighted thereafter. In the last section, conclusions will be drawn and access and limitations regarding the data discussed.

Wikipedia: more than just an encyclopaedia

There are specific challenges to using digital trace data for research. It is important to know the underlying data-generating process and context, meaning the dynamics of the site where the data were produced; understanding this context further helps to contextualise and understand potential ethical, legal, and methodological issues which come with the data. This section will introduce Wikipedia as a community creating an encyclopaedia.

Wikipedia was founded in January 2001 and has been dedicated to the building of free encyclopaedias in all languages of the world. The goal of Wikipedia was and is to engage ordinary, uncredentialed people using a wiki format. The focus on creating an encyclopaedia provides the common task and the open content license works as a motivating force for people to work for the good of the world. By now, Wikipedia exists in almost 300 language versions. The German-language Wikipedia is the second-oldest language version, founded in March 2001, and, by now, it is now one of the largest ones. While Wikipedia can be read and edited freely and anonymously, the German-language Wikipedia features over 3 million registered users, of which over 17,000 are currently actively contributing.

With the encyclopaedia, a community has grown. Wikipedia features not only encyclopaedic contents, but also offers spaces for interaction between contributors: articles are generally the result of a collaborative effort of multiple users [see, e.g. 2], contributors take care of the enforcement of norms by making use of positive and negative sanctions—for example, through awards [see, e.g. 36] and through reverting unsuitable article edits [see e.g. 11, 33]—both the article pages as well as the profile pages of registered users feature designated discussion pages where Wikipedians discourse [see e.g. 19], and users can take part in democratically organised polls and election processes where new guidelines are set or new administrators selected [see e.g. 3, 15]. Offline meetups between Wikipedians are one further avenue for interaction and are important to the community: as long-term member and active Wikipedian Richter [35: 132–136] writes, such face-to-face meetings allow users to connect to others and help in times of conflict; they can fulfil a Wikipedian’s needs for social contacts, community and personal exchange, in the same way as other local associations. Richter [35: 148] further states that personal acquaintances are central to a project that is based on anonymous contributions. These ties allow to create a net of trust, making the collaborative labour easier.

Much research has focussed on the online component of Wikipedia with the online activity data generally coming in a readily available, process-generated, structured format through the Wikimedia data dumpsFootnote 2 or, at least, being available online in a consistent format, making it straightforward to web-scrape [see e.g. the election data in 23]. A systematic study of offline gatherings proves to be more difficult with the user-generated data being inconsistent and messy. Data on (almost) all meetups organised in the German-language Wikipedia were collected and published in the dewiki meetup dataset to open up this avenue of research. The collection process will be outlined in the next section.

Collection of meetup data

Offline meetups between Wikipedians are generally organised online. Meeting data are thus publicly available on Wikipedia; however, the data are lacking a clear and consistent structure as they are user-written. The process of collecting these data is explained in detail in the following.

Scraped pages

The starting point of the meetup collection was an overview list of meetings between Wikipedians.Footnote 3 This list includes the links to over one hundred subpages relating to regions and cities where meetups between German-speaking Wikipedians are organised and archived. These regionally based meetups are most often informal meetings with the point of socialising in public spaces (the so-called Stammtische). In addition, all editathons and open editing events were collected.Footnote 4 These are events where (potentially new) editors of Wikipedia meet to edit and improve a specific topic or type of content. They generally include basic editing training for new editors and are often combined with a social meetup. There are both online and in-person editathons; however, virtual editathons were not collected as part of this dataset.

Furthermore, all events listed on two overview event sites were collected.Footnote 5 These events include activities such as attending and looking after stalls representing Wikipedia at fairs, partaking in workshops about photography, and similar events. This also includes events organised as part of the GLAM initiative (GLAM stands for Galleries, Libraries, Archives, Museums)Footnote 6 in which cultural institutions are supported through collaborative projects with experienced Wikipedia editors. Similar to these are the so-called KulTourenFootnote 7 which were also collected: these are smaller scale events where Wikipedians visit exhibitions or take part in excursions. Lastly, all WikiProjectsFootnote 8 and task forcesFootnote 9 were checked for meetings. WikiProjects and task forces are central places for discussing specific content; they are used to communicate, collect sources, and provide summaries on specific topics. They form a sort of virtual gathering place for Wikipedia editors interested to work on a specific cluster of topics.

Throughout the scraping of all these pages, a snowballing approach was followed: when one meetup page was linked to other ones, these were additionally collected until no new pages were found. Still, there is no guarantee that all pages with meetups were visited and scraped. The approach of data collection has been outlined in a Wikimedia Learning Pattern—a Wikimedia specific guideline that provides advice, shares insights and is easily accessible to people and researchers in the Wikimedia community—including a code example to foster future data collection across different language versions.Footnote 10

Excluded pages and excluded meetings

Some pages and meetings were excluded from the data and/or the data collection process. First, all meetups that took place only virtually were skipped. In the observation period, only a small number of editathons have taken place online (this rapidly changed after the outbreak of the Coronavirus pandemic). Next, portalsFootnote 11 were not checked for meetups unless they are covering regional entities (portals about cities or regions). Portals are somewhat similar to WikiProjects and task forces but are directed towards readers instead of editors. Portals provide well-maintained introductory landing pages into the encyclopaedia; they provide an overview of the most important articles in a certain topic area. As these are thus not places where authors are gathering, it was not expected that any meetings are organised in the context of portals.

This dataset is further restricted to meetings organised on the German-language version of Wikipedia. Some meetings which are also directed towards German-speaking Wikipedians are organised on other platforms maintained by Wikimedia, such as commonsFootnote 12 or meta.Footnote 13 However, anything that is not organised on the German-language Wikipedia was excluded from data collection.

In addition, very regular meetings taking place in community spaces were excluded from the dataset. Such community spaces exist in some cities in the German-speaking region and offer headquarters for staff members and engaged volunteers. They are places of often extraordinarily high Wikipedia activity: in some spaces, multiple meetings and open editing events per week take place. These meetings are often attended by the same very small, core group of editors. In many cases, the attendees stop recording their attendance. As these community spaces exhibit a very different dynamic to other meetups and as it was often impossible to reliably collect data on the attendees, the following very regular meetings were excluded from the dataset/data collection:

  • Berlin, Tempelufer bureau: exclusion of general open editing events, exclusion of open editing events for women.

  • Berlin, WikiBär: exclusion of general open editing events, exclusion of open editing events for women.

  • Berlin, WikiWedding: exclusion of open Sundays, office hours.

  • Hamburg, Kontor: weekly events are not organised on Wikipedia and were thus not collected in the first place.

  • Hanover: exclusion of Tuesday editing events, open editing events, office hours.

  • Cologne, Lokal K: weekly events are not organised on Wikipedia and were thus not collected in the first place.

  • Munich, WikiMUC: all regular events are excluded, such as open evenings, introductory workshops to Wikipedia, board game Fridays, monthly meetings of another organisation, monthly work meetings,Footnote 14 cleaning events and other internal office events.

  • Stuttgart, Stadtbibliothek: exclusion of monthly open editing events.

  • Ulm, Verschwörhaus: exclusion of monthly open editing events.

  • Vienna: exclusion of Wikipedia Tuesdays.

Other events taking place in these amenities were not excluded. These non-excluded events are those that are still considered regular meetups by the community (for example all events organised in the technik.cafe in Lörrach), are irregular ones taking place in the community spaces (such as specially organised workshops), and include other irregular events which make use of the location but are organised externally, such as meetings by project teams.Footnote 15

The exclusion of certain meetups is not ideal, but nevertheless necessary as those excluded do not allow for a reliable collection of data, specifically on the attendees.

Collection of information: automatic scraping and manual extraction

The data collection aimed to collect information on at least the date, place/venue, and attendees of all offline gatherings, except those excluded, since the launch of Wikipedia until March 2020. In most cases, the data collected also included apologies and minutes about the meetup.

The way meetings were organised is depicted in Fig. 1; the example is taken from the Rhine-Hessian regional organisation page.

Fig. 1
figure 1

Translation of exemplar organisational page, see https://de.wikipedia.org/wiki/Wikipedia:Rheinhessen/Archiv

It varied how meetings were organised and archived. Generally, there were the following two approaches:

  1. 1.

    An organised archive of all meetings with a consistent structure was created.Footnote 16 In these cases, every meetup was recorded, and data on—at least—the attendees, date and place/venue of the meeting are available. In terms of data collection, this was the best-case scenario as it allowed writing an automated script for the archive.

  2. 2.

    Meetings were not archived at all. The organisational pages were used to organise only the most recent meeting.Footnote 17 Due to Wikipedia’s technical structure, it is still possible to retrieve information about past meetings using the version history. In these cases, it was necessary to scan through the complete version history to find past meetings before they had been deleted in favour of the next meeting.

These are two ideal types. In reality, they occurred in different sub- and hybrid forms. In the best version of case 1, all meetings were archived on a single page and all meetings were recorded following a consistent structure. In a less ideal version of case 1, all meetings were recorded in archives, but there were separate archives for single years and the structure and format of the archives varied between years.Footnote 18 In some cases, organisational pages did not maintain an archive, but at least provided an overview of all meetings and linked to the respective pages in the version history.Footnote 19 In other cases, only some meetings were archived; for example, in the case of Berlin, only meetings up to 2010 and then again in 2016 were archived, but not in all other years. In Cologne, most meetings were recorded, but some were simply left out. This was the most unfortunate case as the skipped meetings could only be noticed when the rhythm of meetings was broken (i.e. they seemed to have monthly meetings but there was a month without one) and then the version history needed to be checked manually. Overall, when possible, an automatic scraper was written to extract the information. Still, most meeting data were collected manually.

Combining offline and online data and the challenge of usernames

The offline meetup data collected can be combined with online activity data available on Wikipedia. Structured data about the online activity are readily available: all online actions contributors undertake on Wikipedia are logged. Any changes made—whether it is the creation of a completely new article, the addition of a word, the deletion of a source, or the restructuring of a sentence—are registered in the revision history. The revision history of an article allows to trace the contributions and reverts of authors. It is possible to obtain information on all revisions of all pages which allows for an extremely detailed analysis of contributing behaviour. For example, it could be traced how users have co-authored specific articles, which sentences stemming from which users have been kept in the document for how long, etc. It is also possible to just extract the metadata which provides information on which users have changed what article and to what extent for computationally less expensive analyses.

To combine the dewiki meetup dataset with the rich body of online data, it needs to be merged on the basis of usernames. This comes with one major challenge: Wikipedians can request name changes, which are usually granted, or sign up under a new name with a new account, which allows for greater anonymity.Footnote 20 Research focussing on online activity only is generally focussed on Wikipedia user accounts which have unique identifiers, but when working with meetup data, the interest lies in the people behind those accounts. As people changed their names or signed up with new accounts, it might well be that they have previously attended a meeting and signed up for it with their past name. The author aimed at consolidating these name changes.

All Wikipedia users and the redirection links linking to them were collected using a MediaWiki-supported API (application programming interface) call.Footnote 21 In addition, all renames as logged in the renaming logbookFootnote 22 were web-scraped with an automated browser using RSelenium [13]. This allowed the author to create a list of all current users and their redirections and previous usernames, in as so far they requested an official rename or linked to other accounts using redirection lists. In cases where users created a new account, potentially to gain more anonymity, it is impossible to link them to their previous name. In some rare cases concerning users that took part in meetings or other pages scraped during this project (such as in elections), previous usernames were explicitly mentioned and discussed. In these cases, those changes in usernames were also noted.

In this step of data preparation, substantial effort was spent on guaranteeing the matching of usernames from different sources. In the end, the author created a list of 1,751,808 different usernames (and variants of their encoding and spelling) belonging to 1,149,511 unique IDs (in the ideal case, this would be reflective of 1,149,511 different people). This information is saved within the dewiki meetup dataset, as file nametoid. This list can be merged with the usernames to replace usernames with IDs.

Collected data

The dewiki meetup dataset includes 4418 meetings. Table 1 provides a glimpse into the dataset and its most important variables. Full documentation of all variables is available with the dataset. The aim of the data collection was to collect the place/venue, date, type of meetup, attendees, apologies (if available), and minutes (if available) of (almost) all offline meetings organised on the German-language version of Wikipedia since the launch of Wikipedia until March 2020.

Table 1 Glimpse of the dewiki meetup dataset

Meetings take place in a specific venue and place. The venue is recorded and the longitude and latitude of the place were retrieved using the OpenStreetMap API via geo_osm() of the tidygeocoder R package [5]. The date meetings took place is recorded as well.

Further, meetings are of different types. While a categorisation of the meeting types can depend on the research question at hand, the dataset contains a categorisation into work-oriented meetings and more socially oriented meetings. Meeting types such as informal meetups (Stammtische), parties and celebrations, yearly meetings, biking, walking or hiking tours, funerals, visits to the cinema or a festival, barbecues, outings for breakfast, or other spontaneous meetings can be considered social meetings. Meetings classified as work meetings cover all editathons or open editing events, field trips and guided tours (which tend to be to places which are of interest to the Wikipedia community), administrator conventions, workshops, photo tours, meetups of juries to judge on Wikipedian competitions, supporting Wikipedia booths at events and fairs, or other meetings directly oriented towards a Wikipedian initiative such as GLAM or KulTour, as well as meetings of authors collaborating within the so-called Wikipedia portals and task forces, and any office hours (of Wikimedia or sub-projects). In addition, meetups simply considered as “meeting” without further detail are also considered as work meetups as per default, meetings generally exhibit an underlying motivation of working on Wikipedia and improving it. When working with this dataset, it is important to reflect on the different types of meetings as well as their respective sizes, and to understand their implications for individual users and for the relationships that emerge between users. For example, if the focus of one’s research is on examining close face-to-face relationships, it might be sensible to exclude very large gatherings. This is in contrast to investigations centred around users which feel comfortable travelling to large, international meetups; however, with these meetings, the assumption that all attendees interact with every other one does not hold.

The attendees of meetings were collected from protocols that were published after the meeting took place, if available. If not available, attendance was recorded from the list of registration. Only users which were sure to attend were considered attendees. If available, this information was further double-checked with photo evidence in which users could be tagged, as well as protocols of later minutes if they referred to previous attendees. The usernames that editors signed up with were collected in the list of attendees. The list of apologies reflects the same for apologies sent. Users excusing their absence were thus collected in a separate column from the attendees; users, which initially wanted to join the meetup but did not attend, were added to this list of apologies. Further, the minutes recorded after the meeting were collected. They are generally in German and offer discussions and descriptions of who attended and what happened at meetings. Lastly, the source links to where the information was collected from are supplied.

Description of the dewiki meetup dataset

The dewiki meetup dataset includes 4418 meetings. Basic information about the meetups will be given in the following.

Temporal distribution

The collected data cover the time between 2001 and March 2020. The first meeting recorded took place on October 28th 2003 with five attendees in Munich, the last ones on March 13th 2020 (2 days after the World Health Organisation declared COVID-19 a global pandemic) with three attendees in Cologne and with four attendees in Leipzig (a lot of Wikipedians interested in these meetings did send their apologies closer to the day due to the epidemiological situation).

77.0% of those 4418 meetups are classified as mainly social, while the other 23.0% are considered work meetings using the aforementioned categorisation. The distribution of meetups over time is pictured in Fig. 2. Please note that meetups happening in 2020 are not plotted to allow for better comparability across years as data collection did not collect any meetups after March 2020 (sparse meetups resumed in the summer of 2020). For 2020, a total of 67 meetups are in the dataset with 38 being social in nature (56.7%).

Fig. 2
figure 2

Temporal distribution of meetups

As seen in Fig. 2, the number of meetups increased steadily in the first years after the launch of the German-language Wikipedia until 2009. Numbers have remained on a relatively stable level since then, roughly counting around 300 meetups every year. The proportion of work meetups has been increasing over the years. While the meetups in 2020 are not plotted and data were only collected until March, the number of meetups is expected to have reached a new low for the year.

Spatial distribution

Meetups organised on the German-language version of Wikipedia take place primarily in German-speaking countries. The spatial global distribution of meetups is plotted in Figs. 3 and 4 is restricted to meetups in Germany, Switzerland, Austria, and Liechtenstein (German-speaking area, GSA).

Fig. 3
figure 3

Spatial distributions of meetups (world). The points are coloured according to their longitude and latitude with meetups being close to each other being similar in colour (normalised to the GSA region)

Fig. 4
figure 4

Spatial distributions of meetups (GSA region). The points are coloured according to their longitude and latitude with meetups being close to each other being similar in colour (normalised to the GSA region)

The large majority of meetups, 88.8% (3922), took place in Germany, 5.5% (244) in Austria, 4.3% (188) in Switzerland and 0.023% (1) in Liechtenstein. Even though this captures around 99% of the meetups, the remaining per cent took place in 20 different countries around the globe: Australia (5), Belgium (2), Canada (1), China (1) Czech Republic (4), Finland (6), France (3), Hungary (1), Italy (5), Japan (8), Mexico (1), the Netherlands (2), Poland (10), Slovakia (1), Slovenia (1), South Africa (1), Majorca in Spain (1), Sweden (2), the United Kingdom (7), and Ukraine (1). Meetings can include the more global Wikimanias and WikiConventions, as long as a German-speaking group of people organised themselves on the German-language Wikipedia. Figure 4, restricted to the German-speaking area, highlights how meetings take place all over the area, but most often happen in urban centres and highly populated areas such as Vienna, Berlin, Dresden, Hamburg, Munich, and the North Rhine-Westphalia region. In addition, the map reveals that meetings are also rather frequent in cities which house dedicated community spaces (as these often grew from an active offline community).

Meetup network

Through offline meetups, a network develops: users attending the same meetings get to know each other and develop a tie. In a first step, this leads to an affiliation network of users belonging to meetups [see 43: Chapter 8]. This is a non-dyadic, two-mode network, also known as a membership network. Figure 5 shows the network at different points in time; the plot uses the same colour scheme as the ones in Figs. 3 and 4, thus capturing geographical clusters. The network started in October 2003 with the first meetup in Munich. It then consisted of one cluster with a meeting and its five attendees. By the end of the year, one other meetup took place, giving an additional user the chance to join the meetup scene. By the end of 2004, there were 258 nodes, consisting of 202 users who attended 56 different meetings and belonging to one large component. The colour scheme highlights how three of those meetings took place in Austria and that there was an active meetup scene in for example Bavaria and Berlin. In 2005, the network grew to include 547 nodes (413 users who attended 144 different meetings), belonging to one large and one small component. This means that, even though different meetups happen all across the German-speaking area, they are not just visited by local Wikipedians; the large, interconnected component suggests that at least some Wikipedians must take part in meetups in different places, thus connecting potential local components to a large one. Some geographical clustering is, however, observable, highlighting that users tend to attend multiple meetings from one region (as they might live close by). At the end of data collection in 2020, the meeting-to-user network consists of 8540 nodes (with 4122 users who have attended 4418 meetings). The network features five clusters; most nodes belong to one large component, while the four other components are small, single meetups with eight, five, and two (2x) attendees, respectively.

Fig. 5
figure 5

Meeting-to-user network in a 2003, b 2004, and c 2005. The points are coloured according to their longitude and latitude with meetups being close to each other being similar in colour (normalised to the DACH region)

Using bipartite network projection, this meeting-to-user network can then be transformed into a user-to-user network, connecting those users that have met. In the meetup network as of 2020 (network connecting users with the meetups they attended), there are 8540 vertices sharing 38,314 edges (density of 0.0011). In the user network (network connecting users with other users who have attended the same meetup), there are 4122 vertices with 164,199 edges (density of 0.019). The global, unweighted transitivity (global clustering coefficient) of the user network is 0.41, giving the ratio of triangles and connected triples in the graph. On the level of nodes, the transitivity is on average 0.80 (median 0.92, standard deviation 0.24) with the theoretical minimum and maximum of 0 and 1, respectively, realised. The diameter of the user network, i.e. the longest path in the network, is 7, with an average distance of 2.76.

The average number of attendees per meetup is 8.71 (mean; median of 7, standard deviation 9.28) with a minimum of 1—meaning there were meetups where users were alone—and a maximum of 206—referring to very large, (supra-)national meetings; the distribution of attendees per meetup is displayed in Fig. 6.

Fig. 6
figure 6

Attendees per meetup

Whilst the majority of Wikipedians never attended a meeting, those who did attended, on average, 9.30 (mean; median of 2, standard deviation 21.42) with a minimum of 1 and a maximum of 294 meetups. The degree of Wikipedians relates to the number of other users they have met through meetups. The average degree in the user network is 79.67 (mean; median is 24, standard deviation 136.74) with a minimum of 1 and a maximum of 1377. The strength of ties can further be measured by the number of times two users have met. The mean of the number of times users have met is 2.04 (median of 1, standard deviation 3.37), with a minimum of 1 and a maximum of 161.

These descriptions just form the start of more in-depth analyses using the dewiki meetup dataset. The 20 years it covers allow for longitudinal analysis across the years. In addition, the data can be linked with online activity as outlined above.

Novel research opportunities

The dewiki meetup dataset opens the door to numerous novel research opportunities in the (computational) social sciences. It provides data on a (nearly) complete social network: almost all edges between all users are recorded. The potential of the dewiki meetup dataset is unleashed when it is merged with data on the online behaviour. It can then be used to bridge the gap between offline and online actions. Online relationships and interactions have previously been shown to be relevant in explaining the online behaviour of Wikipedians—for example, users that are invited to an editor support forum—called the “Teahouse”—are more likely to remain active on Wikipedia [29], users are more likely to support a candidate in an election to become administrator when they have previously interacted on Wikipedia talk pages [23], and the density of a user’s collaboration network is important in explaining their norm-relevant behaviour [33]. Wikipedia is more than an encyclopaedia; it is an online community where interactions matter—the dewiki meetup dataset now allows to assess to what extent offline interactions matter as well.

In its first applications, the author has used the data to discuss the meetups' (causal) effects on contributions to the collective good [for an early use of a smaller part of the meetup data on the same topic, see also 41: Chapter 15], to test Coleman’s mechanism of norm enforcement [6] using offline network density, and to explore how offline social ties matter in the context of explaining participation in governance activities [39]. It was found, for example, that attending offline meetups exhibits a positive statistical effect on the contribution behaviour of users. While it is not necessarily the case that users increase their contributions after a meetup in comparison to before the meetup, their reduction in contributions is less than the reduction similar control users, who do not attend meetups, experience. Further, when testing Coleman’s mechanism of norm enforcement, the author found that meetup attendees often behave significantly different than non-attendees; however, the network density did not seem relevant in contrast to the proposition made by Coleman. Regarding elections, there are significant relationships between users’ offline meeting behaviour and their online voting behaviour: for example, the larger the proportion of voters a candidate running for administrator has met, the more likely they are to win, and the higher the proportion of other voters a user has met, the more likely they are to vote themselves (this also holds true for the direction of votes: the more pro-voters a user knows, the more likely they are to also vote supportively, and the more anti-voters they have met, the less likely they are to vote supportively). Users are also more likely to vote if they have met the candidate, and they tend to support those more central in the meetup network. Given that elections on Wikipedia resemble public assembly elections, such findings can be extrapolated to hold significance within the broader realm of political science.

Future research can now build upon this and make use of the dataset collected to explore more complex and nuanced questions. Researchers can delve deeper into core concepts of the social sciences like network structures, information diffusion, representation and bias, or community norms, while also addressing questions that are more specific to the context of Wikipedia and other online communities. For example, the dataset can be used to study the structure and patterns of interactions among Wikipedia editors. The network of face-to-face interactions can be analysed to identify the most central and influential editors and the formation of cliques or subgroups. Researchers could also use the dataset to study the patterns of interactions and collaborations among editors before and after meetups, and analyse whether these meetings lead to the formation of strong, cohesive social networks or whether they result in more diffuse networks. As the data cover a time span of 20 years and dozens of regional communities, it can also be studied how these patterns change over time and depend on the local context.

Further, the offline connections can be understood as a form of social capital which offers additional resources. Against this background, it is also of interest to understand how offline meetings work as places of socialisation where community norms are potentially learnt: combining the dewiki meetup dataset with data about norm-relevant online behaviour, it can be analysed to what extent behaviour before and after meetups is norm-conforming and who is rewarding and punishing whom. Focussing on user pages and adopting a psychological perspective, the meetup data can also be used to understand how offline meetings shape the formation and expression of editors' identities and group membership, and how they influence the way Wikipedians interact with and perceive other editors. Further, the dataset can help to gain insight into social influence and persuasion; researchers can use the data to analyse how offline interactions influence the persuasion and attitude change of editors in specific contexts, like different discussions and voting situations (for example, on articles for deletion, winner of contests, or positions of power within Wikipedia). Within the context of media studies, it is also interesting to explore how Wikipedia as a product and as an important source of information is affected by meetings, for example regarding the inclusion of underrepresented topics and perspectives. The data can further be used by researchers in the field of cultural studies to conduct cultural and linguistic analysis by asking how these often very local meetings contribute to the representation of different regional customs.

To extend previous work, future research can aim to assess the impact of face-to-face interactions on article quality. While previous research focussed on the quantity of edits, researchers could analyse whether articles that were edited by Wikipedians who attended offline meetups have higher levels of accuracy, completeness, or readability than articles that were edited by others. More generally, it can be studied if and how a user’s way of contributing towards Wikipedia changes or improves after attending a meetup. Future research could also focus on explaining meetup participation in the first place and explore how people who attend one or two meetups are different from people who attend twenty or fifty meetups. Furthermore, one could use the dataset to study whether face-to-face interactions have an impact on the location of Wikipedia articles. Researchers could analyse whether articles that were edited by editors who attended face-to-face meetings are more likely to be about places and locations that are close to the physical location of the meeting.

Overall, the dewiki meetup dataset presents a unique opportunity for researchers to study the dynamics of online collaboration and the role of social networks in the functioning of online communities. Wikipedia is one of the most impressive and sustainable online public goods. As it is inherently open and further explicitly lists the possibility that the setting up of a registered user account allows the linking of different edits and actions on Wikipedia to a single account/person and that anyone might investigate and analyse these data in any way,Footnote 23 it is a rewarding and accessible case for researchers from various disciplines.

Discussion and concluding remarks

Data access and future data improvement

The dewiki meetup dataset can be accessed via the Open Science Framework: https://doi.org/10.17605/OSF.IO/EHA4R [40]. It is shared via a Creative Commons Attribution-ShareAlike 4.0 International license (CC BY-SA 4.0), allowing others to both share and adapt the dataset in any format and for any purpose. Appropriate credit must be given and any adaptions must be shared alike. The strengths of the dataset also come with some privacy concerns that must be kept in mind. For example, geographical information is recoverable from the dataset and can be linked to users. It is important to use the data fairly and reasonably. The dewiki meetup dataset covers almost all meetings organised on the German-language Wikipedia up to the outbreak of the COVID-19 pandemic (except those outlined in the section on excluded meetings). The data are shared in a way that allows for updating and expanding the dataset. For one, the data collection can be continued on to the future. For example, the major changes related to the COVID-19 outbreak offer possibilities for (natural) experimental research designs including and beyond offline meetups. Further, it would be of utmost interest to expand such analyses to other language versions of Wikipedia. This is not only interesting to the Wikimedia Foundation to better organise and evaluate their projects but allows the comparisons of different cultures with different user bases. In addition, more cumulative effort can be spent on consolidating name changes (namestoid).

Limitations

The dataset presented does not and cannot include all offline interactions which have occurred between Wikipedians. While the data collection process aimed to comprehensively capture all offline meetings, it is important to recognise that certain types of meetups were deliberately excluded from the dataset and the collection process (such as meetings in community spaces). Further, Wikipedia editors may meet each other in person through various means not documented within the online platform. These exclusions can influence the overall representativeness of the dataset, as for example the exclusion of frequent community meetings can lead to an underrepresentation of certain interaction patterns.

The documented interactions represent a distinct category of edges: this dataset covers the majority of Wikipedia meetings which are publicly organised on the German-language platform and the users which have signed up. It is important to acknowledge that the edges recorded in this dataset thus provide insights into a particular facet of offline interactions among Wikipedians and not the entirety of these interactions. Additional data collection (which goes beyond what is documented publicly on Wikipedia) would be needed to, for example, study the role of meetings in community spaces.

Conclusion

Wikipedia is an online free-content encyclopaedia and one of the largest and most successful examples of online peer-production. Wikipedia celebrated its twentieth birthday in 2021 and has developed into a key figure in the internet landscape. Known by most internet users, it provides the backbone of many information technologies, gives the answers to Alexia, Google, and Siri, and is also a phenomenon that has attracted considerable attention from researchers. Not only does Wikipedia provide a rich and valuable source of data, but it is also a peculiar community to research in itself—a fact often unknown to the end-user. The dataset introduced in this data brief focuses on one of the most unknown facts of the Wikipedia community: Wikipedians meet offline, often, regularly, and across the globe. In the typical spirit of Wikipedia, these meetings are organised publicly and are well documented with lists of attendees, minutes, and photo evidence. The dataset covers 20 years of offline activity which can be combined with online activity logged on Wikipedia—this offers valuable and new avenues for computational social scientists to bridge the gap between the offline and the online and to gain a deeper understanding of the Wikipedia community.