Diversity of editors and teams versus quality of cooperative work: experiments on wikipedia

Sydow, Marcin; Baraniak, Katarzyna; Teisseyre, Paweł

doi:10.1007/s10844-016-0428-1

Diversity of editors and teams versus quality of cooperative work: experiments on wikipedia

Open access
Published: 07 October 2016

Volume 48, pages 601–632, (2017)
Cite this article

Download PDF

You have full access to this open access article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Diversity of editors and teams versus quality of cooperative work: experiments on wikipedia

Download PDF

Marcin Sydow^1,2,
Katarzyna Baraniak^1,2 &
Paweł Teisseyre²

2684 Accesses
1 Altmetric
Explore all metrics

Abstract

We study whether and how the diversity of editors and teams affects the quality of work in a virtual cooperative work environment on the Wikipedia example. We propose a measure of interests diversity of an editor and some measures of team diversity in terms of members’ interests and experience. Statistical and machine learning methods are used to investigate the dependency between diversity and work quality. The presented experimental results confirm our hypothesis that interest diversity of a single editors and team diversity are positively related to the quality of their work. Interestingly, some of our experiments also indicate that diversity may be more important than such attributes as productivity of an editor or size or experience of the team. Our experimental results demonstrate that it is possible to predict work quality based on diversity which is an additional statistical signal that diversity is correlated with work quality.

An automated essay scoring systems: a systematic literature review

Article 23 September 2021

Collaborative note-taking affects cognitive load: the interplay of completeness and interaction

Article Open access 19 March 2021

Systematic Reviews in Educational Research: Methodology, Perspectives and Application

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Common access to the Internet made it possible that virtual open-collaboration environments became an important platform for massive collaborative work. A good example is Wikipedia, where editors work on preparing articles. However, the quality of such work significantly varies between particular articles, editors and teams of editors working together on articles. It is important to study which factors of an editor or team influence the quality of the outcome of such collaborative work. For example, it is interesting to study whether an editor that has diverse interests (i.e. is “versatile”) tends to create better Wikipedia articles. It is even more interesting whether teams that are diverse in terms of interest or experience of their members tend to produce better articles.

In this article we study whether and how the interests diversity of editors and interest and experience diversity of editor teams affect the quality of work in a virtual cooperative work environment on the Wikipedia example. In future, such studies can help to develop and improve the tools supporting open-collaboration team-building process.

Diversity has proved to play an important role in multiple fields of information sciences and applications such as: text summarisation, web search (Agrawal et al. 2009), databases (Vee et al. 2008), recommender systems and semantic entity summarisation (Sydow et al. 2013). Recent research also indicates that diversity of population plays a positive role in evolutionary algorithms (Strzezek et al. 2015)

Our hypothesis studied in this article is that diversity of editors and teams is a factor that positively affects the quality of work in a virtual cooperative environments.

To verify this hypothesis experimentally we statistically analyse data from the Polish and German Wikipedia.

We introduce several quantitative measures of diversity of a member of an open-collaboration environment or a whole team thereof. One of the proposed measures is based on the information-theoretic concept of entropy, whereas other measures are based on statistical standard deviation.

In order to study how these measures influence the work quality we use statistical and machine learning techniques, which are very effective tools to investigate such dependencies. We demonstrate on Wikipedia data that interest diversity of an editor seems to be correlated with the quality of the articles they co-edit. We also extend the concept of interest diversity on whole teams of authors and study how it impacts the work quality compared to their productivity and experience. In the case of teams the reported experimental findings are similar: team’s diversity is correlated with quality.

We also demonstrate that it is possible to use statistical machine learning tools to predict the quality of Wikipedia articles using some attributes that model the level of editors’ diversity (and some other attributes) which can be interpreted as an additional statistical signal that diversity positively affects work quality in Wikipedia.

1.1 Motivation

Team diversity is one of the fundamental issues in social and organisational studies that has been broadly researched on (e.g. Parnas 1972; Sanchez and Mahoney 1996; Langlois and Garzarelli 2008). It has been broadly theorised and tested on virtual communities. One of the most burning questions concerns team coherence vs efficiency. There are two competing theories describing the efficient team organisation: modularity and integrity (Parnas 1972; Sanchez and Mahoney 1996). The first was introduced by David Parnas who suggested that co-dependence between “components” or “modules” (in our context this concept corresponds to an article on Wikipedia) should be eliminated by limiting the communication induced by the modules (Parnas 1972). In this approach participation in a module does not require knowledge about the whole system or other modules, e.g., Wikipedia users can co-author articles about social science without knowing anything about life sciences or mathematics. It leads to higher specialization and less diversity in individual performance. A modular approach enables more flexibility and decentralized management (Sanchez and Mahoney 1996).

In the integral mode the team members have diverse knowledge and skills. We aim to study whether modular/specialized or integral collaboration pattern is more successful in creating high-quality Wikipedia articles.

1.2 Contributions

The contributions of our work include:

the concept of editor’s “versatility” (interest diversity) based on information entropy and various measures of team diversity based on editor’s versatility and statistical standard deviation of selected attributes,
exploratory analysis of two datasets based on dumps of Wikipedia (Polish and German), which indicate that versatility of editors and diversity of teams is positively correlated with quality of articles,
exploratory analysis of relationship between editor’s gender and versatility,
more sophisticated statistical analysis of the studied datasets that includes a series of experiments with various machine learning prediction algorithms (logistic regression, decision trees) that verify whether and how accurately it is possible to predict the quality of articles based on some characteristics of their editors with special focus on diversity,
analogous series of experiments concerning teams of editors, applying logistic regression and random forests,
additional analyses utilising importance measures that further support the thesis that diversity is the most important factor in the presented prediction models,
additional analysis in the form of various graphs concerning the performance of the prediction models (Lift and ROC curves) that further support the previous findings.

This article is a substantial extension of a conference paper (Baraniak et al. 2016) where the parts of the two first of the above contributions were preliminarily presented.

Our experimental results seem to positively confirm hypothesis that diversity of single editors and teams is positively related to the quality of their work and that diversity is usually more important than some seemingly more obvious attributes such as size or productivity of the team.

1.3 Related work

The general comparison of quality of classic and open-collaboration encyclopediae, in particular Britannica vs Wikipedia is discussed in Giles (2005) when it is observed that the quality of Wikipedia (in terms of number of errors) is not much lower than that of Britannica, which is a bit surprising result.

The problem of how the number of editors and the coordination method of their work influences the article quality is studied in Kittur and Kraut (2008). Two coordination methods are considered: the explicit one and the implicit one. In the latter one, the work is planned and coordinated by explicit communication between all the editors while in the second one the most of the work is organised and done by a small subset of the editor team. The presented results demonstrate that adding more editors can improve the article quality only if the applied work coordination method was appropriate. In particular the results indicate that the implicit coordination helps more in larger teams.

The interplay between the phenomenons of social influence and social preference based on similarity between the editors in the context of open collaboration in Wikipedia is studied in Crandall et al. (2008). The results indicate that both phenomenons play an important role in explaining the open collaboration patterns.

In Wilkinson and Huberman (2007) it is reported that the high-quality articles are those that are intensively edited and have high number of editors as compared to other articles of similar age. Our work shows that diversity is not less important in this context.

The important role of a diversity was noticed early not only in complex systems but also in other fields like Operation Research or Information Retrieval (Goffman 1964). One of the earliest successful applications of diversity-aware approach was reported in Carbonell and Goldstein (1998) in the context of text summarisation. Recently, diversity-awareness has gained increasing interest also in other information-related areas where the actual information need of a user is unknown and/or the user query is ambiguous so that a controlled level of diversity introduced to the results increases their quality. Examples range from databases (Vee et al. 2008) to Web search (Agrawal et al. 2009) or to the quite novel problem of graphical entity summarisation in semantic knowledge graphs (Sydow et al. 2013). A recent work (Strzezek et al. 2015) demonstrates that a controlled level of population diversity increases the performance of genetic algorithm for some hard optimisation problems.

The concept of diversity has also attracted interest also in the domain of open collaboration research, e.g. in Aggarwal (2014). From the open collaboration point of view, diversity can be considered from many perspectives, for example as a team diversity vs homogeneity or a single editor’s versatility (called “integrity” in that work) vs specialisation (called “modularity” in that work).

The positive role of team diversity was studied in Chen et al. (2010), where productivity and diversity of teams can be defined in a different sense and it is suggested that other variables may have influence on the quality of article.

In our work we use different definitions of diversity and its measures, since we quantify it with the use of the concept of entropy and on standard deviation, as will be explained in Section 2. Most importantly, in contrast to our work, the mentioned work studies the influence of diversity on the amount of accomplished work and withdrawal behaviour rather than the work quality that is considered here.

In contrast to our work most of previous works focus on diversity of editor teams in terms of categories such as culture, ethnicity, age, etc. López and Butler (2013) studies how the content diversity influences on-line public spaces in the context of local communities. A recent example, with a special emphasis on ad-hoc “swift” teams where the members have very little previous interactions with each other is Aggarwal (2014). Vasilescu et al. (2015) studied gender diversity relationship with work outcome.

A recent article (Ren et al. 2015) studies how tenure diversity and interest variety affect group productivity and member withdrawal and how the two types of diversity evolve over time. The results of this work seem to indicate the importance of the interest and experience diversity in online collaboration but does not directly address the issue of how it impacts the quality of the resulting articles that is the topic of this article.

2 Measures of diversity

In this article, in order to objectively measure how diversity affects work quality, we introduce and apply some measures of diversity.

The first diversity measure that we propose for editors, versatility, is based on information entropy (Shannon 1948) that is commonly used in various domains as a natural measure of diversity. Here it is used to model interest diversity of a single actor of a cooperative network. In this measure we assume that there are available some topical categories in the collaborative work model. The versatility measure is described in Section 2.1.

We also use some other measures of diversity in our experiments concerning teams of editors, that are based on standard deviation. It is one of the statistical concepts that measures how much an attribute varies around its mean value and can also be considered as a natural choice for a diversity measure. We use standard deviation in our experiments concerning teams of actors of a collaborative network. We briefly remind the concept of standard deviation in Section 2.2

2.1 Versatility (measure of interest diversity)

In this section we explain the model of interest diversity that we apply in our approach. We use Wikipedia terminology to illustrate the concepts, however, our model can be adapted to other, similar open-collaboration cooperative work environments.

Let X denote a group of Wikipedia editors. Editors participate in editing Wikipedia articles. Each article can be mapped to one or more categories from a pre-defined set of categories C = {c ₁,…,c _k} that represent topics.

Each editor x∈X in our model is characterised by his/her editing activity i.e., all editing actions done by x. We assume that the interests of an editor x can be represented by the amount of work that x committed to articles in particular categories.

Let t(x) denote the total amount of textual content (in bytes) that x contributed to all articles co-edited (up to the moment of doing the analysis) and let t _i(x) denote the total amount of textual content that editor x contributed to the articles belonging to a specific category c _i.^{Footnote 1}

Now, lets introduce the following denotation: p _i(x) = t _i(x)/t(x) and interpret it as representing x’s interest in category c _i. Henceforth, we will use a shorter denotation p _i for p _i(x) whenever x is understood from the context.

2.1.1 Editor’s interest profile

Finally, we define the interest profile of the editor x, denoted as i p(x), as the interest distribution vector over the set of all categories:

$$ ip(x)=(p_{1}(x),\dots,p_{k}(x)) $$

(1)

Notice that according to the definition the interest profile represents a valid distribution vector i.e., its coordinates sum up to 1.

2.1.2 Example

Assume that the set of categories C consists of 8 categories: {c _i}_1≤i≤8 and that editor x has contributed t(x)=10k B of text in total, out of which t ₂(x)=8k B of text has been contributed to articles in category c ₂, t ₅(x)=2k B in category c ₅ and nothing to articles that were not assigned to c ₂ nor c ₅. Thus, x ^′ s interest in c ₂ is $p_{2}(x)={t_{2}(x)}/{t(x)}=\frac {4}{5}$, in c ₅ is $p_{5}(x)={t_{5}(x)}/{t(x)}=\frac {1}{5}$ and is equal to 0 for all other categories. The interest profile of this user is:

$$ip(x)=(0,\frac{4}{5},0,0,\frac{1}{5},0,0,0).$$

2.1.3 Editor’s versatility measure

There are many possible ways of measuring diversity. Since the interest profile i p(x) is modelled as a distribution vector over categories, we define diversity of interests (or equivalently versatility) of x, V(x), as the entropy of interest profile of x:

$$ V(x)=H((p_{1},p_{2},\dots,p_{k}))=\sum\limits_{1\leq i \leq k}-p_{k}\log_{2}(p_{k}) $$

(2)

The value of entropy ranges from 0 which represents extreme specialisation (i.e. total devotion to a single category) to l o g ₂(k) which represents extreme diversity (i.e. active and equal interest in all possible categories).

Information entropy has several elegant and natural mathematical properties (Shannon 1948) and is a commonly used measure of diversity in various applications concerning information sciences.

2.1.4 Example, continued

The versatility of user x from Section 2.1.2 has the following value:

$$V(x)=-p_{2} lg(p_{2})-p_{5} lg(p_{5})=0.8\times 0.32 + 0.2\times 2.32 = 0.256 + 0.464 = 0.72$$

Now assume that another user x ^′ has contributed equally to the four first categories, i.e. user’s interest profile is: $ip(x^{\prime })=(\frac {1}{4},\frac {1}{4},\frac {1}{4},\frac {1}{4},0,0,0,0)$. The versatility value for this editor has the following value:

$$H(ip(x^{\prime}))=-4\times 0.25 \times (log_{2}(0.25))=2$$

Notice that the versatility measure of x ^′ is higher than that of x and that this is according to the intuition since x ^′ has similar interest in four different categories and x only in two (mostly in one). In other words, x ^′ is more versatile while x is more specialised. Maximum versatility for n categories would have value of l o g ₂(n), for an editor that is equally interested in all categories.

The datasets that are experimentally studied later in this article consider 8 and 12 categories, respectively, so that maximum versatility (entropy) for these cases would be l o g ₂(8)=3 and l o g ₂(12)≈3.584, respectively.

2.2 Standard deviation

In this paper we also use some measures of diversity based on standard deviation. Standard deviation of numerical attribute X taking n values: X ₁,…,X _n is defined as

$$\text{sd}(\mathrm{X}):=\sqrt{\frac{1}{n-1}\sum\limits_{i=1}^{n}(X_{i}-\text{avg}(X))^{2}}, $$

where $\text {avg}(X)=\frac {1}{n}{\sum }_{i=1}^{n}X_{i}$ is an arithmetic mean of attribute X. Standard deviation sd(X) measures how much (on average) an attribute varies around its arithmetic mean. Thus it can be seen as a natural measure of variability or dispersion of a numerical attribute. In our experiments, we will use standard deviations of the number of editors’ contributions in bytes and standard deviations of period lengths between the first and the last contributions (tenure, that may represent the experience of an editor).

3 Data

To verify our hypothesis in this article, i.e. to study the relationship between diversity and work quality in collaborative environments we apply experimental statistical analysis method to real data concerning collaborative work.

We decided to focus on one of the most popular environments of open collaborative work – Wikipedia, since it is quite large, rich in attributes, publicly available and, in addition, provides means of measuring quality of the work.

3.1 Data mining approach to the problem

We prepared datasets and preprocessed them to compute several attributes for editors and teams of editors. We also utilised information available in Wikipedia to attach a quality label to each article that is treated as the decision attribute in our analyses.

We first run statistical tools to preliminarily statistically analyse the relationship between diversity and other attributes and quality.

Next, for a deeper analysis, we additionally applied some more sophisticated statistical machine learning tools such as logistic regression, decision trees, random forests. The methods are described in more detail in Section 4.

In such models it is possible to objectively measure in various ways how strongly any attribute is correlated with the decision attribute (quality in our case). In particular, in some experiments we split our preprocessed data into training and test sets, built prediction models based on them and used the models to predict work quality based on the studied attributes.

The higher performance of such prediction, the stronger statistical relationship between the attributes (including diversity) and the decision attribute (quality). In addition we applied some other statistical tools to objectively measure how strongly diversity (and other attributes) affects the quality of work.

3.2 Datasets

The activity of editors and their teams on Wikipedia are recorded and stored in Wikipedia dumps that are publicly and easily available. Wikipedia shares the latest dumps under the following URL address: https://dumps.wikimedia.org/.

To run our experimental study, we prepared ourselves two separate datasets by processing dumps of the Polish and German Wikipedia from March and September of 2015, respectively. We will refer to these two datasets as wiki-pl and wiki-de, respectively.^{Footnote 2}

We run all the experiments presented in this article on two different language versions of Wikipedia for greater reliability of the results.

Since the results (presented later on in this article) on both datasets wiki-pl and wiki-de are generally compatible, we assume that the choice of these particular language versions of Wikipedia does not significantly affect our general findings presented in this article.

We collected data about editors of articles, articles and editions of articles made by authors.

By edition we mean any contribution of an editor to an article that results in the change of the article’s content by editing it (for example: adding a content by inserting new paragraph or modyfying an existing paragraph, etc.).

The datasets used in our article are summarised in the Table 1.

Table 1 Summary of Datasets wiki-pl and wiki-de, “edition” is any contribution of an editor to an article that results in the change of the article’s content

Diversity of editors and teams versus quality of cooperative work: experiments on wikipedia

Abstract

Similar content being viewed by others

An automated essay scoring systems: a systematic literature review

Collaborative note-taking affects cognitive load: the interplay of completeness and interaction

Systematic Reviews in Educational Research: Methodology, Perspectives and Application

1 Introduction

1.1 Motivation

1.2 Contributions

1.3 Related work

2 Measures of diversity

2.1 Versatility (measure of interest diversity)

2.1.1 Editor’s interest profile

2.1.2 Example

2.1.3 Editor’s versatility measure

2.1.4 Example, continued

2.2 Standard deviation

3 Data

3.1 Data mining approach to the problem

3.2 Datasets

3.3 Means of measuring the quality of wikipedia articles

Note

3.4 Topical categories of articles

3.5 Attributes of an editor

3.6 Additional data preparation for experiments with teams

4 Statistical machine learning tools

4.1 Logistic regression

4.2 Decision tree

4.3 Random forest

5 Experimental results for editors

5.1 Preliminary exploratory analysis of the data

5.2 Exploratory analysis concerning the gender of editors

5.3 Quality-prediction experimental setup

5.4 Explaining quality with logistic regression

5.5 Prediction performance measures

5.6 Prediction results for logistic regression and tree model

5.7 Lift and ROC curves

5.8 Summary of experimental results for editors

6 Experimental results for teams

6.1 Attributes of teams

6.2 Preliminary exploratory data analysis for teams

6.3 Logistic regression analysis

6.4 Experiments with quality prediction for teams

6.5 Lift and ROC curves

6.6 Importance of diversity measures in quality prediction

6.7 Summary of experimental results for teams

7 Conclusions and future work

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation