Diversity of editors and teams versus quality of cooperative work: experiments on wikipedia
 1.2k Downloads
 2 Citations
Abstract
We study whether and how the diversity of editors and teams affects the quality of work in a virtual cooperative work environment on the Wikipedia example. We propose a measure of interests diversity of an editor and some measures of team diversity in terms of members’ interests and experience. Statistical and machine learning methods are used to investigate the dependency between diversity and work quality. The presented experimental results confirm our hypothesis that interest diversity of a single editors and team diversity are positively related to the quality of their work. Interestingly, some of our experiments also indicate that diversity may be more important than such attributes as productivity of an editor or size or experience of the team. Our experimental results demonstrate that it is possible to predict work quality based on diversity which is an additional statistical signal that diversity is correlated with work quality.
Keywords
Diversity of interest Team diversity Wikipedia Article quality Open collaboration Machine learning1 Introduction
Common access to the Internet made it possible that virtual opencollaboration environments became an important platform for massive collaborative work. A good example is Wikipedia, where editors work on preparing articles. However, the quality of such work significantly varies between particular articles, editors and teams of editors working together on articles. It is important to study which factors of an editor or team influence the quality of the outcome of such collaborative work. For example, it is interesting to study whether an editor that has diverse interests (i.e. is “versatile”) tends to create better Wikipedia articles. It is even more interesting whether teams that are diverse in terms of interest or experience of their members tend to produce better articles.
In this article we study whether and how the interests diversity of editors and interest and experience diversity of editor teams affect the quality of work in a virtual cooperative work environment on the Wikipedia example. In future, such studies can help to develop and improve the tools supporting opencollaboration teambuilding process.
Diversity has proved to play an important role in multiple fields of information sciences and applications such as: text summarisation, web search (Agrawal et al. 2009), databases (Vee et al. 2008), recommender systems and semantic entity summarisation (Sydow et al. 2013). Recent research also indicates that diversity of population plays a positive role in evolutionary algorithms (Strzezek et al. 2015)
Our hypothesis studied in this article is that diversity of editors and teams is a factor that positively affects the quality of work in a virtual cooperative environments.
To verify this hypothesis experimentally we statistically analyse data from the Polish and German Wikipedia.
We introduce several quantitative measures of diversity of a member of an opencollaboration environment or a whole team thereof. One of the proposed measures is based on the informationtheoretic concept of entropy, whereas other measures are based on statistical standard deviation.
In order to study how these measures influence the work quality we use statistical and machine learning techniques, which are very effective tools to investigate such dependencies. We demonstrate on Wikipedia data that interest diversity of an editor seems to be correlated with the quality of the articles they coedit. We also extend the concept of interest diversity on whole teams of authors and study how it impacts the work quality compared to their productivity and experience. In the case of teams the reported experimental findings are similar: team’s diversity is correlated with quality.
We also demonstrate that it is possible to use statistical machine learning tools to predict the quality of Wikipedia articles using some attributes that model the level of editors’ diversity (and some other attributes) which can be interpreted as an additional statistical signal that diversity positively affects work quality in Wikipedia.
1.1 Motivation
Team diversity is one of the fundamental issues in social and organisational studies that has been broadly researched on (e.g. Parnas 1972; Sanchez and Mahoney 1996; Langlois and Garzarelli 2008). It has been broadly theorised and tested on virtual communities. One of the most burning questions concerns team coherence vs efficiency. There are two competing theories describing the efficient team organisation: modularity and integrity (Parnas 1972; Sanchez and Mahoney 1996). The first was introduced by David Parnas who suggested that codependence between “components” or “modules” (in our context this concept corresponds to an article on Wikipedia) should be eliminated by limiting the communication induced by the modules (Parnas 1972). In this approach participation in a module does not require knowledge about the whole system or other modules, e.g., Wikipedia users can coauthor articles about social science without knowing anything about life sciences or mathematics. It leads to higher specialization and less diversity in individual performance. A modular approach enables more flexibility and decentralized management (Sanchez and Mahoney 1996).
In the integral mode the team members have diverse knowledge and skills. We aim to study whether modular/specialized or integral collaboration pattern is more successful in creating highquality Wikipedia articles.
1.2 Contributions

the concept of editor’s “versatility” (interest diversity) based on information entropy and various measures of team diversity based on editor’s versatility and statistical standard deviation of selected attributes,

exploratory analysis of two datasets based on dumps of Wikipedia (Polish and German), which indicate that versatility of editors and diversity of teams is positively correlated with quality of articles,

exploratory analysis of relationship between editor’s gender and versatility,

more sophisticated statistical analysis of the studied datasets that includes a series of experiments with various machine learning prediction algorithms (logistic regression, decision trees) that verify whether and how accurately it is possible to predict the quality of articles based on some characteristics of their editors with special focus on diversity,

analogous series of experiments concerning teams of editors, applying logistic regression and random forests,

additional analyses utilising importance measures that further support the thesis that diversity is the most important factor in the presented prediction models,

additional analysis in the form of various graphs concerning the performance of the prediction models (Lift and ROC curves) that further support the previous findings.
This article is a substantial extension of a conference paper (Baraniak et al. 2016) where the parts of the two first of the above contributions were preliminarily presented.
Our experimental results seem to positively confirm hypothesis that diversity of single editors and teams is positively related to the quality of their work and that diversity is usually more important than some seemingly more obvious attributes such as size or productivity of the team.
1.3 Related work
The general comparison of quality of classic and opencollaboration encyclopediae, in particular Britannica vs Wikipedia is discussed in Giles (2005) when it is observed that the quality of Wikipedia (in terms of number of errors) is not much lower than that of Britannica, which is a bit surprising result.
The problem of how the number of editors and the coordination method of their work influences the article quality is studied in Kittur and Kraut (2008). Two coordination methods are considered: the explicit one and the implicit one. In the latter one, the work is planned and coordinated by explicit communication between all the editors while in the second one the most of the work is organised and done by a small subset of the editor team. The presented results demonstrate that adding more editors can improve the article quality only if the applied work coordination method was appropriate. In particular the results indicate that the implicit coordination helps more in larger teams.
The interplay between the phenomenons of social influence and social preference based on similarity between the editors in the context of open collaboration in Wikipedia is studied in Crandall et al. (2008). The results indicate that both phenomenons play an important role in explaining the open collaboration patterns.
In Wilkinson and Huberman (2007) it is reported that the highquality articles are those that are intensively edited and have high number of editors as compared to other articles of similar age. Our work shows that diversity is not less important in this context.
The important role of a diversity was noticed early not only in complex systems but also in other fields like Operation Research or Information Retrieval (Goffman 1964). One of the earliest successful applications of diversityaware approach was reported in Carbonell and Goldstein (1998) in the context of text summarisation. Recently, diversityawareness has gained increasing interest also in other informationrelated areas where the actual information need of a user is unknown and/or the user query is ambiguous so that a controlled level of diversity introduced to the results increases their quality. Examples range from databases (Vee et al. 2008) to Web search (Agrawal et al. 2009) or to the quite novel problem of graphical entity summarisation in semantic knowledge graphs (Sydow et al. 2013). A recent work (Strzezek et al. 2015) demonstrates that a controlled level of population diversity increases the performance of genetic algorithm for some hard optimisation problems.
The concept of diversity has also attracted interest also in the domain of open collaboration research, e.g. in Aggarwal (2014). From the open collaboration point of view, diversity can be considered from many perspectives, for example as a team diversity vs homogeneity or a single editor’s versatility (called “integrity” in that work) vs specialisation (called “modularity” in that work).
The positive role of team diversity was studied in Chen et al. (2010), where productivity and diversity of teams can be defined in a different sense and it is suggested that other variables may have influence on the quality of article.
In our work we use different definitions of diversity and its measures, since we quantify it with the use of the concept of entropy and on standard deviation, as will be explained in Section 2. Most importantly, in contrast to our work, the mentioned work studies the influence of diversity on the amount of accomplished work and withdrawal behaviour rather than the work quality that is considered here.
In contrast to our work most of previous works focus on diversity of editor teams in terms of categories such as culture, ethnicity, age, etc. López and Butler (2013) studies how the content diversity influences online public spaces in the context of local communities. A recent example, with a special emphasis on adhoc “swift” teams where the members have very little previous interactions with each other is Aggarwal (2014). Vasilescu et al. (2015) studied gender diversity relationship with work outcome.
A recent article (Ren et al. 2015) studies how tenure diversity and interest variety affect group productivity and member withdrawal and how the two types of diversity evolve over time. The results of this work seem to indicate the importance of the interest and experience diversity in online collaboration but does not directly address the issue of how it impacts the quality of the resulting articles that is the topic of this article.
2 Measures of diversity
In this article, in order to objectively measure how diversity affects work quality, we introduce and apply some measures of diversity.
The first diversity measure that we propose for editors, versatility, is based on information entropy (Shannon 1948) that is commonly used in various domains as a natural measure of diversity. Here it is used to model interest diversity of a single actor of a cooperative network. In this measure we assume that there are available some topical categories in the collaborative work model. The versatility measure is described in Section 2.1.
We also use some other measures of diversity in our experiments concerning teams of editors, that are based on standard deviation. It is one of the statistical concepts that measures how much an attribute varies around its mean value and can also be considered as a natural choice for a diversity measure. We use standard deviation in our experiments concerning teams of actors of a collaborative network. We briefly remind the concept of standard deviation in Section 2.2
2.1 Versatility (measure of interest diversity)
In this section we explain the model of interest diversity that we apply in our approach. We use Wikipedia terminology to illustrate the concepts, however, our model can be adapted to other, similar opencollaboration cooperative work environments.
Let X denote a group of Wikipedia editors. Editors participate in editing Wikipedia articles. Each article can be mapped to one or more categories from a predefined set of categories C = {c _{1},…,c _{ k }} that represent topics.
Each editor x∈X in our model is characterised by his/her editing activity i.e., all editing actions done by x. We assume that the interests of an editor x can be represented by the amount of work that x committed to articles in particular categories.
Let t(x) denote the total amount of textual content (in bytes) that x contributed to all articles coedited (up to the moment of doing the analysis) and let t _{ i }(x) denote the total amount of textual content that editor x contributed to the articles belonging to a specific category c _{ i }.^{1}
Now, lets introduce the following denotation: p _{ i }(x) = t _{ i }(x)/t(x) and interpret it as representing x’s interest in category c _{ i }. Henceforth, we will use a shorter denotation p _{ i } for p _{ i }(x) whenever x is understood from the context.
2.1.1 Editor’s interest profile
Notice that according to the definition the interest profile represents a valid distribution vector i.e., its coordinates sum up to 1.
2.1.2 Example
2.1.3 Editor’s versatility measure
The value of entropy ranges from 0 which represents extreme specialisation (i.e. total devotion to a single category) to l o g _{2}(k) which represents extreme diversity (i.e. active and equal interest in all possible categories).
Information entropy has several elegant and natural mathematical properties (Shannon 1948) and is a commonly used measure of diversity in various applications concerning information sciences.
2.1.4 Example, continued
Notice that the versatility measure of x ^{′} is higher than that of x and that this is according to the intuition since x ^{′} has similar interest in four different categories and x only in two (mostly in one). In other words, x ^{′} is more versatile while x is more specialised. Maximum versatility for n categories would have value of l o g _{2}(n), for an editor that is equally interested in all categories.
The datasets that are experimentally studied later in this article consider 8 and 12 categories, respectively, so that maximum versatility (entropy) for these cases would be l o g _{2}(8)=3 and l o g _{2}(12)≈3.584, respectively.
2.2 Standard deviation
3 Data
To verify our hypothesis in this article, i.e. to study the relationship between diversity and work quality in collaborative environments we apply experimental statistical analysis method to real data concerning collaborative work.
We decided to focus on one of the most popular environments of open collaborative work – Wikipedia, since it is quite large, rich in attributes, publicly available and, in addition, provides means of measuring quality of the work.
3.1 Data mining approach to the problem
We prepared datasets and preprocessed them to compute several attributes for editors and teams of editors. We also utilised information available in Wikipedia to attach a quality label to each article that is treated as the decision attribute in our analyses.
We first run statistical tools to preliminarily statistically analyse the relationship between diversity and other attributes and quality.
Next, for a deeper analysis, we additionally applied some more sophisticated statistical machine learning tools such as logistic regression, decision trees, random forests. The methods are described in more detail in Section 4.
In such models it is possible to objectively measure in various ways how strongly any attribute is correlated with the decision attribute (quality in our case). In particular, in some experiments we split our preprocessed data into training and test sets, built prediction models based on them and used the models to predict work quality based on the studied attributes.
The higher performance of such prediction, the stronger statistical relationship between the attributes (including diversity) and the decision attribute (quality). In addition we applied some other statistical tools to objectively measure how strongly diversity (and other attributes) affects the quality of work.
3.2 Datasets
The activity of editors and their teams on Wikipedia are recorded and stored in Wikipedia dumps that are publicly and easily available. Wikipedia shares the latest dumps under the following URL address: https://dumps.wikimedia.org/.
To run our experimental study, we prepared ourselves two separate datasets by processing dumps of the Polish and German Wikipedia from March and September of 2015, respectively. We will refer to these two datasets as wikipl and wikide, respectively.^{2}
We run all the experiments presented in this article on two different language versions of Wikipedia for greater reliability of the results.
Since the results (presented later on in this article) on both datasets wikipl and wikide are generally compatible, we assume that the choice of these particular language versions of Wikipedia does not significantly affect our general findings presented in this article.
We collected data about editors of articles, articles and editions of articles made by authors.
By edition we mean any contribution of an editor to an article that results in the change of the article’s content by editing it (for example: adding a content by inserting new paragraph or modyfying an existing paragraph, etc.).
Summary of Datasets wikipl and wikide, “edition” is any contribution of an editor to an article that results in the change of the article’s content
wikipl dataset  wikide dataset  

Editors  126,406  555,355 
Articles  947,080  1,422,940 
Editions  16,084,290  61,266,990 
3.3 Means of measuring the quality of wikipedia articles

GOOD article (G): “wellwritten, comprehensive, wellresearched, neutral, stable, illustrated”

FEATURED article (F): (in addition to the above) “length and style guidelines including a lead, appropriate structure and consistent citation”
Analysed quality groups of editors
Editor quality class  Definition 

N  (normal) edited no good nor featured article 
G\(\cup \)F  (good or featured) at least one good or one featured article 
G (denotes: G∖F)  (good) edited at least one good article and no featured article 
F (denotes: F∖G)  (featured) edited at least one featured article and no good article 
G\(\cap \)F  (good and featured) edited at least one good and one featured article 
Note
As we define the class denoted as G as the class of editors who edited at least one good article and no featured article and analogously the class F, the more obvious denotations for these classes would actually be G∖F and F∖G, respectively. However we use G and F to simplify the notation. Notice that this simplification implies that classed denoted as G, F, G ∩ F are actually mutually exclusive and they split the G ∪ F class into three different subclasses.
It is natural to observe that the introduced editor quality classes exhibit partial order “hierarchy” among the editors. In such interpretation the G ∩ F represents the highestquality editors and N the lowest, etc.
Sizes of articles and editors among quality classes for wikipl and wikide datasets
Number of  wikipl  wikide 

Normal articles  944,585  1,417,318 
Good articles  1,889  3,424 
Featured articles  606  2,198 
Editors of normal articles (N)  124,673  479,908 
Editors of good articles and no featured articles (G)  4,534  34,063 
Editors of featured articles and no good articles (F)  2,272  17,797 
Editors of good or featured articles \((G\cup F)\)  9,939  75,447 
Editors of good and featured articles \((G\cap F)\)  3,133  23,587 
3.4 Topical categories of articles
Our definition of versatility (topical diversity) of an editor presented in Section 2.1 assumes the existence of topical categories.
Wikipedia main content categories
Dataset  Main content categories 

wikipl dataset  Humanities and Social Sciences 
Natural and Physical Sciences  
Art & Culture  
Philosophy  
Geography  
History  
Economy  
Biographies  
Religion  
Society  
Technology  
Poland  
wikide dataset  Art & Culture 
Geography  
History  
Knowledge  
Religion  
Society  
Sport  
Technology 
Wikipedia articles are usually not directly tagged with any of these highlevel categories. Only the most specific categories are assigned to the articles by Wikipedia community. Those are subcategories of more general categories, creating a structure of a directed graph. The nodes in this graph are categories and there is a directed arc from one vertex to another in such a graph if and only if the corresponding category is a subcategory of another. Starting from any node in the graph representing a lowestlevel category directly assigned to a particular article, we employed a standard BFS (breadthfirst search Cormen et al. 2001) graph search algorithm to assign toplevel categories to this article. More precisely, the article was assigned all toplevel categories reachible from the lowestlevel category of this article by the BFS algorithm.
If the article was mapped to more than one category, the contribution size was split equally among them. Articles that couldn’t be classified were excluded from the dataset, as well as users whose production consisted of such articles exclusively. Also, only editions of the pages in the primary namespace were taken into account (that is “proper” articles and not, for example, discussion pages), because only these pages are evaluated with regard to their quality.
3.5 Attributes of an editor
Datasets for editors
Components of the dataset  Description 

Basic articles categories  Article id, article basic categories 
Category graph  Category, more general categories of category 
Main categories  Article id, main categories of article 
Authors contributions  Contributor id, article id, size of contribution 
Author versatility  Contributor id, contribution of author to main categories, versatility, the total size of edition made by author to all articles, flag if author contributes to good articles, flag if author contributes to featured articles 
Authors gender  Contributor id, flag if author is woman, man or no information 
Additionally, we gathered data about the editors’ gender that is available in our datasets. Not all editors share this information on their Wikipedia profiles, but enough information was available to perform some basic analysis.
The sampling frame (observation interval) was restricted to contributors who made at least one edition during the Wikipedia project lifetime.
3.6 Additional data preparation for experiments with teams
In Section 6 we will present a series of experiments that will concern whole teams of editors.
In our model we define team, associated to an article, as a group of all editors who contribute to this article. Our definition of team involves every editor who made any change within a particular article, such as text addition, deletion or some minor corrections. One editor may contribute to many articles but one team, according to our definition, creates only one article.
Datasets for teams
Components of the dataset  Description 

Editors edition  Contributor id 
Article id  
Size of edition (bytes) made by an editor to an article  
The total size of edition made by editor to all articles  
Tenure of contributions  Contributor id 
Article id  
The number of days spent on article  
The number of days on Wikipedia  
Diversity of interest  Article id 
Mean contribution of team members to main categories versatility of team  
The quality of article 
4 Statistical machine learning tools
In this section we describe machine learning models used in our experiments.
4.1 Logistic regression
4.2 Decision tree
Decision tree (Breiman et al. 1984) is an example of a nonlinear classification model. In each node of the tree the data is split into two subsets according to the outcome of the test. The splits are performed in order to decrease the homogenity of the class distribution. The most popular measures of the class homogenity are: entropy and Gini index. The paths from root to leaves represent classification rules. The final decision is made based on the majority class in the given leaf. The optimal size of the tree can be determined using e.g. costcomplexity criterion. Figures 3 and 4 show the trees built based on our data.
4.3 Random forest
Random Forest (Liaw and Wiener 2002) consists of many single decision trees. Each of the tree is built based on boostrap sample (sample drawn with replacement from the original data). In addition, the Random Forest use a modified tree learning algorithm that selects, at each candidate split in the learning process, a random subset of the features. Random Forests corrects for a single decision trees’ habit of overfitting to their training data. The final classification rule is based on the majority voting of the trees. Random Forest can be used to assess the importances of the attributes. The two basic measures are described in Section 6.6.
5 Experimental results for editors
In this section we report a series of experiments whose object of study is a single editor. More precisely, we experimentally study whether and how strongly versatility of an editor is correlated with the quality of the articles they coedit.
In these experiments we measure the level of interest diversity of an editor with the versatility measure defined in Section 2.1.
The order of the experiments is as follows. In Section 5.1 we present a preliminary exploratory data analysis. We complete the exploratory analysis in Section 5.2 where we compare versatility of women and men to see whether the gender has any relationship with diversity of interest and quality.
Next, we present a deeper analysis of the problem by using some prediction models. We describe the experimental setup including split into training and testing sets in Section 5.3. We apply the logistic regression model to explain quality in Section 5.4. Next, Section 5.5 introduces prediction performance metrics such as precision, recall, Fmeasure that are used in the remaining experiments with prediction models such as logistic regression and trees. The prediction results are presented in Section 5.6. The analysis is completed with additional graphs presenting Lift and Roc curves in Section 5.7 to deeper understand the prediction experiments.
A short summary of the experimental results concerning editors is given in Section 5.8.
5.1 Preliminary exploratory analysis of the data
Median of versatility and productivity of editors vs. quality for wikipl and wikide dataset
wikipl  wikide  

Quality  Versatility  Productivity  Versatility  Productivity 
G\(\cap \)F  3.1720  159300  2.351  46080 
G\(\cup \)F  3.011  2992  2.064  1502 
F:  3.000  2322  2.053  1283 
G:  3.016  3347  2.070  1629 
N:  2.807  237  1.891  264 
We also computed several other attributes of editors and preliminarily examined them against quality in order to compare them against versatility. One of the attributes that naturally comes in mind when analysing quality is productivity of an editor (total amount of work committed).
Indeed, our analysis confirmed that productivity is another editor’s attribute that seems to be related to work quality. In the second column of Table 7 one can see that median productivity even stronger discriminates the quality classes than versatility.
As we will see in next experiments this seemingly superiority is misleading, since versatility better explains quality than productivity when more sophisticated statistical tools are applied.
Nonetheless, we selected productivity as the main “competitor” for versatility in the next experiments.
5.2 Exploratory analysis concerning the gender of editors
Editors versatility vs gender (no observable relationship)
Quality  Number of women  Number of men  Versatility of women  Versatility of men 

wikipl  
G\(\cap \)F  1.73e+02  3.98e+02  3.25e+00  3.25e+00 
G\(\cup \)F  2.46e+02  5.69e+02  3.18e+00  3.20e+00 
F:  2.00e+01  4.70e+01  3.01e+00  3.02e+00 
G:  5.30e+01  1.24e+02  3.09e+00  3.06e+00 
N:  1.81e+02  4.14e+02  2.87e+00  2.91e+00 
wikide  
G\(\cap \)F  5.53e+002  1.03e+003  2.51e+000  2.41e+000 
G\(\cup \)F  6.43e+002  1.32e+003  2.46e+000  2.44e+000 
F:  3.40e+001  8.00e+001  2.17e+000  2.14e+000 
G:  5.60e+001  2.11e+002  2.07e+000  2.18e+000 
N:  1.95e+002  5.29e+002  1.84e+000  2.00e+000 
Comparison of editor versatility across all quality classes for both genders is presented in Table 8, columns 3 and 4, and indicates that there is no observable relationship between the gender and versatility across all classes. Versatility of women and men is more or less similar for each quality group and both examined datasets. We excluded gender factor from further experiments in this article.
5.3 Qualityprediction experimental setup
The remaining experiments aim at studying on how accurately it is possible to predict the quality group of the editor based on his/her versatility and productivity. Such approach of applying prediction models makes it possible to make a deeper analysis of the relationship between the examined attributes and quality. In general, higher prediction performance in such models may be interpreted as a statistical signal of dependence. In addition, various statistics concerning the prediction models such as: pvalues, zvalues, estimated coefficients, give more precise information about the relationship between the attributes that was not available in simple exploratory analysis.
Class distributions for wikipl and wikide datasets
wikipl  wikide  

C = 1  C = 0  C = 1  C = 0 
9,939  134,612  75,447  555,355 
6.87%  93.12%  11.96%  88.03% 
We use two classification models, which are among the most popular ones in the machine learning community: logistic regression (Hosmer et al. 2013) and decision trees (Breiman et al. 1984). These two classifiers represent different groups of methods: the former one is an example of linear classifier, as the hyperplane separating the classes is a linear function of attributes. The latter model is a nonlinear classifier. We use implementations available in the R (R Core Team 2013): the glm{stats} function for logistic regression and rpart{rpart} function for the tree (Therneau et al. 2015) (CART trees). Since the training data is unbalanced, we assign larger weights to articles from rare class when fitting a model.
To assess the predictive power of the considered methods, we randomly split our data into training (50 % observations) and testing (50 % observations) sets. The training data is used to build models (i.e. to fit logistic regression and build a decision tree), whereas testing data is used to check the prediction accuracy.
5.4 Explaining quality with logistic regression
Logistic regression model predicting the quality group of editors on wikipl dataset. Interaction (product) of the variables is also included into the model
Estimate  Std. Error  zvalue  Pr (>∥z)  

(Intercept)  −5.35e + 000  1.11e −001  −48.115  <2e16*** 
versatility  9.32e001  3.82e002  24.384  <2e16*** 
productivity  −5.96e −006  2.74e −006  −2.174  0.0297* 
versatility ×productivity (interaction)  6.4e006  9.18e007  6.971  3.15e012*** 
Logistic regression model predicting the quality group of editors on wikide dataset. Interaction (product) of the variables is also included into the model
Estimate  Std. Error  zvalue  Pr (>∥z)  

(Intercept)  −3.539e + 00  2.183e02  −162.110  <2e16*** 
versatility  7.879e01  1.098e02  71.767  <2e16*** 
productivity  3.214e06  5.829e07  5.514  3.52e08 *** 
versatility ×productivity (interaction)  1.213e05  3.317e07  36.581  <2e16 *** 
5.5 Prediction performance measures
In this section we remind some basic machine learning concepts that we use to further analyse our results in prediction experiments presented in the next Sections.
Precision measures how many articles are correctly predicted as C = 1 among those predicted as C = 1. Recall indicates how many articles are correctly predicted as C = 1 among all articles with label C = 1. In addition we calculate the Fmeasure, which is a harmonic mean of Recall and Precision. The higher the measures, the better the performance of the considered model. For highly unbalanced classes (such as in our data) it is hard to build a model that optimises both Precision and Recall. Notice that for highly unbalanced classes the simple accuracy rate is misleading since it is easy to achieve its high value by always predicting the major class. Thus why Fmeasure as the aggregation of both is usually applied in such situations. The above indices are commonly used in machine learning and information retrieval.
5.6 Prediction results for logistic regression and tree model
Evaluation measures on testing data for editors on wikipl and wikide datasets
Measure  Logistic regression  Logistic regression  Tree model wikipl  Tree model wikide 

wikipl dataset  wikide dataset  dataset  dataset  
Precision  87.73%  86.85%  74.50%  75.36% 
Recall  17.72%  17.91%  29.56%  26.04% 
Accuracy  93.40%  88.53%  93.73%  88.84% 
Fmeasure  29.48%  29.70%  42.33%  38.70% 
Importantly, the results for wikipl and wikide datasets are quite similar which again supports the evidence that the choice of particular language version of Wikipedia does not affect the analysis.
In short, the presented prediction performance can be viewed as quite high if one takes into account high disproportion between class cardinalities. This can be interpreted as a signal of positive dependence between versatility and quality.
5.7 Lift and ROC curves
To complete the predictionbased analyses we present additional information about the prediction models that we obtained in our experiments.
This information is in the graphical form of Lift curves and ROC curves of the prediction models (Brown and Davis 2006). A lift curve graphically shows the precision (on the yaxis) with respect to the percentage of articles highest rated by the given model (on the xaxis). The precision in lift curve is calculated for the rule which assigns class C = 1 for articles highest rated by the given model (i.e. those with highest posterior probabilities).
In general, the higher the area below the ROC curve, the better is the prediction model, with (ideal) maximum being 100 % of the area of the square.
Figures 5 and 7 show Lift curves for the users of the wikipl and wikide dataset. Figures 6 and 8 show the corresponding ROC curves. To further explain the Lift curve graphs, observe that Fig. 5 indicates that when we assign class C = 1 to, for example, 10 % of observations highest rated by our models, we achieve precision about 40 %. Similarly, when we assign class C = 1 to say 40 % of observations that are highest rated by our models, we achieve precision about 15 %, etc.
Interestingly, both classification models give similar results. The results are promising as for both datasets we achieve the accuracy value which is definitely above the baseline (random assignment).
5.8 Summary of experimental results for editors
This Section summarizes the experiments concerning single editors. We used two statistical models to verify how much diversity (versatility) influences the quality group of Wikipedia editors. In addition, we also tested whether the productivity is correlated with the quality of an editors’ group. It turns out that in both models, versatility is an important feature. In particular, versatility is the most significant variable according to the logistic model and it is also useful in the decision tree model. Analysis of an output from the logistic model indicates that versatility is positively correlated with the quality. Moreover, we tested the prediction performance of these two models. The values of the applied evaluation measures (Precision, Recall, Fmeasure) are very promising. They are much larger than for the baseline (random assignment of articles to classes describing quality). This is also confirmed by ROC and Lift curves. Both statistical models give comparable results and the conclusions are similar for both languages. The most important remark is that there is a strong dependence of versatility and work quality for editors.
6 Experimental results for teams
The experimental results, presented in Section 5, indicate that versatility of editors is positively dependent on the quality of their work. In this Section we extend the study to whole teams of editors and introduce many more attributes, including some new diversityrelated ones.
We simply define the team assigned to an article as a group of editors who contributed to this article.
6.1 Attributes of teams
In this section we introduce and compute several attributes for editors and teams that will be used in our statistical analyses.
In particular, we consider the tenure of an editor on Wikipedia in the article measured as the number of days spent on editing Wikipedia articles.
Attributes of teams
Name  Description 

Team size  n = T, where T is a team i.e. a set of editors that work on a given article. 
Team versatility  The versatility of a team T is defined as the entropy of the team interest profile t i p(T) defined as follows. First, for each editor x∈T, we compute its individual interest profile \(ip(x) = (p_{1}(x),\dots ,p_{i}(x),\dots ,p_{k}(x))\) as was defined in Section 2.1.1. Then, based on idividual interest profiles, for each topical category i ∈ {c _{1},…,c _{ k }} we compute the average (over team members) team interest in this category as \(tp_{i}(T)=1/n{\sum }_{x\in T}p_{i}(x)\) to form the team interest profile \(tip(T)=(tp_{1}(T),\dots ,tp_{i}(T),\dots ,tp_{k}(T))\). Versatility is defined as entropy of this vector. 
Mean productivity in the article  \(MP(a)=\frac {1}{n}{\sum }_{i=1}^{n}P_{i}(a)\) is the mean amount of editors’ contributions to the article a, where \(P_{i}(a)={\sum }_{e\in E_{i}(a)}newSize(e)oldSize(e)\) is the total contribution of the i−t h editor to the article a, E _{ i }(a) is the set of the editions made by the editor i in the article a and n e w S i z e(e), o l d S i z e(e) are the sizes of the article before and after the edition e, respectively 
Mean total productivity  \(MTP=\frac {1}{n}{\sum }_{i=1}^{n}TP_{i}\) is the mean amount of editors’ contributions to all articles on the Wikipedia. Contribution is the sum of sizes (in bytes) of all editions made by team members to all articles in Wikipedia, where \(TP_{i}={\sum }_{e\in E_{i}}newSize(e)oldSize(e)\) is the total contribution of the i−t h editor to all the Wikipedia articles and E _{ i } is the set of all the Wikipedia editions made by this editor 
Mean tenure in article  \(MT(a)=\frac {1}{n}{\sum }_{i=1}^{n}T_{i}(a)\) is the mean number of days spent on the article a by the team members, where T _{ i }(a) = D f _{ i }(a) − D l _{ i }(a) is the number of days between the date of the first D f _{ i }(a) and the last D l _{ i }(a) date of any contribution of the i − t h editor to the article a 
Mean tenure in Wikipedia  \(MTW=\frac {1}{n}{\sum }_{i=1}^{n}TW_{i}\) is the mean number of days spent on the Wikipedia, where T W _{ i } = D W f _{ i }−D W l _{ i }, is the number of days between the first D W f _{ i } and the last D W l _{ i } date of any contribution of the i − t h editor to any Wikipedia article 
sd of productivity in article  \( SP(a):=\sqrt {\frac {1}{n1}{\sum }_{i=1}^{n}(P_{i}(a)MP(a))^{2}} \) the standard deviation of the P _{ i }(a) variable defined above 
sd of total productivity  \( STP:=\sqrt {\frac {1}{n1}{\sum }_{i=1}^{n}(TP_{i}MTP)^{2}} \) the standard deviation of the T P _{ i } variable defined above 
sd of tenure in article  \( ST(a):=\sqrt {\frac {1}{n1}{\sum }_{i=1}^{n}(T_{i}(a)MT(a))^{2}} \) the standard deviation of the T _{ i }(a) variable defined above 
sd of tenure in wikipedia  \( STW:=\sqrt {\frac {1}{n1}{\sum }_{i=1}^{n}(TW_{i}MTW)^{2}}, \) the standard deviation of the T W _{ i } variable defined above 
Length  L(a) is the size of the article a after the last recorded edition 
Age  A G(a) = D c(a) − D d the number of days between the date the article a was created and the date when dump was created 
In this part we also utilise the division into the same quality classes as before, however there is no article marked as good and featured at once in any of our datasets. Therefore it is irrelevant to consider the (G ∩ F) quality class in the context of a team assigned to an article. In logistic regression and prediction experiments we treat the class G ∪ F as the “high quality” label (C = 1), and normal (N) as the “normal quality” label (C = 0) (similarly as for single editors).
The order of the coming sections and experiments concerning teams is generally analogous to the one concerning editors with some necessary adaptations.
6.2 Preliminary exploratory data analysis for teams
Median of team features vs. quality articles of wikipl dataset
Quality  Versatility  Mean productivity in articles  Mean total productivity  sd productivity in articles  sd total product.  Length 
G\(\cup \)F  3.26e+000  1.80e+003  4.52e+006  6.84e+003  5.35e+006  3.19e+004 
F  3.26e+000  2.93e+003  4.31e+006  9.62e+003  5.42e+006  5.38e+004 
G  3.26e+000  1.73e+003  4.58e+006  6.10e+003  5.33e+006  2.70e+004 
N  3.53e+000  4.99e+002  5.88e+006  7.96e+002  5.96e+006  2.41e+003 
Quality  Team size  Mean tenure in article  Mean tenure in Wikipedia  sd tenure in article  sd tenure in Wikipedia  Age 
G\(\cup \)F  2.00e+001  1.25e+002  1.81e+003  3.56e+002  8.46e+002  2.59e+003 
F  3.30e+001  1.44e+002  1.85e+003  4.11e+002  9.02e+002  3.13e+003 
G  1.70e+001  1.20e+002  1.80e+003  3.37e+002  8.20e+002  2.43e+003 
N  4.00e+000  7.71e+000  1.81e+003  4.39e+001  8.15e+002  2.31e+003 
Median of team features vs. quality articles of wikide dataset
Quality  Versatility  Mean product. in art.  Mean total product.  sd product. in art.  sd total product.  Length 
G\(\cup \)F  2.65e+000  1.16e+003  5.94e+006  6.05e+003  1.31e+007  4.28e+004 
F  2.65e+000  1.44e+003  6.12e+006  8.09e+003  1.37e+007  5.58e+004 
G  2.65e+000  9.98e+002  5.82e+006  4.98e+003  1.27e+007  3.58e+004 
N  2.62e+000  4.07e+002  6.16e+006  9.10e+002  9.20e+006  3.64e+003 
Quality  Team size  Mean tenure in article  Mean tenure in Wikipedia  sd tenure in article  sd tenure in Wikipedia  Age 
G\(\cup \)F  7.45e+001  1.02e+002  2.09e+003  3.33e+002  1.05e+003  3.74e+003 
F  8.60e+001  1.01e+002  2.11e+003  3.30e+002  1.05e+003  3.83e+003 
G  6.60e+001  1.03e+002  2.08e+003  3.36e+002  1.04e+003  3.67e+003 
N  9.00e+000  4.38e+001  2.08e+003  1.33e+002  9.94e+002  2.19e+003 
More precisely, Tables 14 and 15 demonstrate that versatility, mean total productivity, standard deviation total productivity, mean tenure in Wikipedia and standard deviation tenure in Wikipedia seem to be indifferent between four group qualities. Versatility is just slightly higher for better quality articles than for normal ones. Total productivity of editors in teams seems to have no significant relationship with quality of articles. The most considerable differences are observed for the following attributes: mean productivity in article, standard deviation productivity in article, team size, mean tenure in article and standard deviation in article. It seems that productivity and tenure and their diversity have stronger relationship with quality, when measured in article than in the whole Wikipedia. It doesn’t matter how much work was done by editors in all articles but only productivity in particular one article has relationship with it’s quality. These results might indicate that “new” and “old” editors in an article through exchanging their experience create articles of better quality.
6.3 Logistic regression analysis
In this Section we fit a logistic regression model to the data using all 10 attributes described in Table 13 and G ∪ F as the target attribute.
Logistic regression model for teams on wikipl dataset
Estimate  Std. Error  z value  Pr (>∥z)  

(Intercept)  −1.071e+01  8.254e01  −12.980  <2e16 *** 
Versatility  1.730e+00  2.565e01  6.743  1.55e11 *** 
Mean productivity in article  −2.252e04  2.461e05  −9.153  <2e16 *** 
Mean total productivity  8.505e09  1.446e08  0.588  0.556 
Size of team  −2.176e03  1.169e03  −1.861  0.0627 . 
Mean tenure in article  −1.492e02  8.297e04  −17.989  <2e16 *** 
Mean tenure in wikipedia  1.116e04  9.325e05  1.196  0.232 
sd productivity in art  5.824e05  5.636e06  10.334  <2e16 *** 
sd total productivity  −9.579e08  1.482e08  −6.465  1.01e10 *** 
sd tenure in article  8.797e03  3.633e04  24.215  <2e16 *** 
sd tenure in Wikipedia  −5.259e04  1.291e04  −4.074  4.63e05 *** 
Length  5.202e05  1.375e06  37.823  <2e16 *** 
Age  −4.449e04  4.221e05  −10.540  <2e16 *** 
Logistic regression model for teams on wikide dataset
Estimate  Std. Error  z value  Pr (>∥z)  

(Intercept)  −1.167e+01  6.628e01  −17.614  < 2e16 *** 
Versatility  4.612e01  2.315e01  1.992  0.04632 * 
Mean productivity in article  −1.950e04  1.927e05  −10.120  < 2e16 *** 
Mean total productivity  −1.869e07  1.323e08  −14.126  < 2e16 *** 
Size of team  2.379e03  2.719e04  8.750  < 2e16 
Mean tenure in article  −1.741e02  8.874e04  −19.620  < 2e16 *** 
Mean tenure in Wikipedia  1.499e03  9.026e05  16.602  < 2e16 *** 
sd productivity in art  3.170e05  3.262e06  9.718  < 2e16 *** 
sd total productivity  7.595e08  4.947e09  15.353  < 2e16*** 
sd tenure in article  7.421e03  3.126e04  23.737  < 2e16 *** 
sd tenure in Wikipedia  −4.687e04  1.470e04  −3.188  0.00143 ** 
Length  3.939e05  7.311e07  53.884  < 2e16 *** 
Age  5.340e04  3.112e05  17.162  < 2e16 *** 
Observed pvalues demonstrate that almost all variables are statistically significant (assuming significance level 0.05), except mean total productivity for wikipl dataset and standard deviation tenure in Wikipedia for wikide dataset.
We highlight some more interesting observations in the tables by using bold print. Interestingly, in both datasets versatility has the absolutely highest positive coefficient of influence on quality, however it is statistically less significant than most of the other attributes. On the other hand, out of the three statistically strongest attributes (highest zvalues) the majority (two) represent diversityrelated attributes (standard deviations of productivity in article and of tenure or total productivity, depending on the dataset). In both datasets the remaining statistically strong (high zvalue) attribute is team size, that is intuitionally obvious (large team likely improves the article).
6.4 Experiments with quality prediction for teams
In this Section we present experiments with prediction models concerning teams.
We used the aggregated data from the previous Section, split into training (50 % observations) and testing (50 % observations) datasets, and built logistic regression and Random Forest models (Liaw and Wiener 2002). Similarly as in case of editors, a response variable can take two values C = 0, which represents normal quality articles or C = 1, for both “higher” quality (G ∪F). In other words we want to predict the probability of being a G ∪F article produced by a team over the normal quality article.
As in the case of experiments for editors, we would like to verify how accurately it is possible to predict the quality of the article based on the features describing teams. Because the number of features is larger than for experiments with editors, instead of a single classification tree, we applied the Random Forest model, whose performance is usually superior to decision tree, and used the implementation available in the R package (RandomForest).
Evaluation measures on testing data for teams on wikipl and wikide datasets
Measure  Logistic regression teams wikipl dataset  Logistic regression teams wikide dataset  Random forest model wikipl dataset  Random forest wikide dataset 

Precision  25.34%  38.21%  66.91%  58.93% 
Recall  4.46%  7.65%  7.41%  20.27% 
Accuracy  99.71%  99.57%  99.74%  99.25% 
Fmeasure  7.58%  12.75%  13.34%  30.17% 
6.5 Lift and ROC curves
Figures 9 and 11 show Lift curves for teams of wikipl and wikide dataset. Figures 10 and 12 show the corresponding ROC curves. The results are significantly above the baseline. Note that, Random Forest outperforms logistic regression for both datasets. The ROC curves indicate that the Random Forest model performs better here than in the case of the experiments with single editors.
6.6 Importance of diversity measures in quality prediction
To additionally verify our hypothesis for teams, we assess the relevance of variables by using some variable importance measures available in the Random Forest model (Breiman 2001). The first measure (Imp1) is based on prediction error. Namely, for each tree, the prediction error on the outofbag portion of the data (data not used to build the model) is computed. Then the same is done after permuting the values of the given attribute (this makes the attribute irrelevant). The difference between the two are then averaged over all trees, and normalized by the standard deviation of the differences. The second measure (Imp2) pertains to average decrease of node homogeneity. Algorithms for constructing decision trees usually work topdown, by choosing an attribute at each step that best splits the set of observations. The quality of the split is measured using the decrease of node homogeneity, .e.g the difference between the class homogeneity in parent node and the child nodes. The class homogeneity is measured using entropy or Gini impurity measure. Large decrease indicates that the attribute is relevant. The average decrease of node homogeneity is taken over all splitting nodes and over all trees used to construct an ensemble classifier. Generally, the higher the value of the importance measures the stronger relationship with the predicted attribute (article quality).
Random Forest importance for wikipl dataset. Imp1 is based on the differences in prediction errors. Imp2 is based on the average decrease of node impurity (see the details in the text)
Imp1  Imp2  

Versatility  3.77e+001  1.13e+002 
Mean productivity in article  1.52e+001  1.01e+002 
Mean total productivity  3.61e+001  9.55e+001 
Size of team  1.77e+001  8.02e+001 
Mean tenure in article  5.07e+000  6.15e+001 
Mean tenure in Wikipedia  2.27e+001  7.55e+001 
sd productivity in art.  1.07e+001  1.23e+002 
sd total productivity  4.89e+001  1.02e+002 
sd tenure in article  4.77e+000  6.67e+001 
sd tenure in Wikipedia  2.72e+001  8.85e+001 
Length  7.89e+000  1.53e+002 
Age  2.42e+001  8.46e+001 
Random Forest importance for wikide dataset
Imp1  Imp2  

Versatility  2.58e+001  5.02e+001 
Mean productivity in article  1.68e+001  6.10e+001 
Mean total productivity  1.59e+001  3.55e+001 
Size of team  1.37e+001  5.62e+001 
Mean tenure in article  8.74e+000  3.80e+001 
Mean tenure in Wikipedia  3.35e+001  7.43e+001 
sd productivity in art.  1.21e+001  9.14e+001 
sd total productivity  1.74e+001  3.72e+001 
sd tenure in article  8.20e+000  3.56e+001 
sd tenure in Wikipedia  1.17e+001  3.51e+001 
Length  1.59e+001  1.34e+002 
Age  1.09e+001  4.23e+001 
All of the “winning” attributes in this analysis represent diversityrelated attributes. Interestingly, for both datasets the winners are the same: versatility for the Imp1 measure and “standard deviation of productivity in article” for the second importance measure.
For both datasets, the diversityrelated attributes like versatility and standard deviations are among the most significant variables according to either of the importance measures (Imp1, Imp2).
This result is especially significant, since we consider 10 attributes including such seemingly “strong” ones as “size of team” or tenure of editors. Diversitybased attributes turn out to be superior to them in this experiment.
6.7 Summary of experimental results for teams
This Section summarizes the experiments concerning teams of editors. Here our aim was to verify how different properties of teams (see Table 13), including diversity measures, influence the quality of articles. In this case we use two statistical models: logistic regression and random forest (more sophisticated ensemble of decision trees, tailored to the situation of larger number of attributes). The evaluation measures (Precision, Recall, Fmeasure) are again very promising. They are much larger than for the baseline (random assignment of articles to classes describing quality). Random forest outperforms the logistic regression significantly (this is clearly seen on ROC curves). As the performance of random forest was superior, we also calculated attribute importance measures based on random forest to check which attributes are useful for prediction of quality. It turns out that versatility is the most significant attribute according to the first measure. The experiments clearly indicate that diversityrelated attributes of teams are strongly connected with the quality of the articles.
7 Conclusions and future work
In this article we applied statistical analysis to verify our hypothesis of whether diversity of editors and teams plays an important role in work quality in an opencollaboration environment on the example of Wikipedia.
A series of experiments ranging from more basic exploratory analyses to more advanced techniques including machine learning prediction models executed on two datasets positively verify our hypothesis.
We reported many statistical signals that diversity seems to play an important positive role in high quality cooperative work in Wikipedia.
Interestingly, some of the reported experiments indicated that the considered diversityrelated attributes such as interest diversity (versatility) or experience diversity in teams (st. dev. of tenure or st. dev. of productivity in team) are more connected with the quality of work than such “obvious” attributes as the average experience of the team members or even size of the team.
These findings give interesting insights into the studies of virtual opencollaboration communities and, as we hope, may motivate further work aimed at deeper analysis of the role of diversity in this context.
Another possible outcome of the study presented in this article would be to provide some valuable foundations for developing an intelligent decisionsupport system for suggesting how to build a successful virtual team in opencollaboration environment in order to produce highquality outcome. In particular, it would be interesting to study in a future work whether the controlled level of diversity intentionally introduced to the team improves the quality of its work.
We hope that this work would serve as one of the steps towards achieving such goals in future.
Footnotes
Notes
Acknowledgments
The work was partially supported by the Polish National Science Centre grant 2012/05/B/ST6/03364.
The study is cofinanced by the European Union under the European Social Fund. Project PO KL “Information technologies: Research and their interdisciplinary applications”, Agreement UDAPOKL.04.01.0100051/1000.
We would like to thank J.Szejda and D.Czerniawska for their contributions to the early stage of the work that eventually resulted in this article.
References
 Aggarwal, A.K. (2014). Decision making in diverse swift teams: an exploratory study. In 47th Hawaii international conference on system sciences, HICSS 2014 (pp. 278–288). Waikoloa.Google Scholar
 Agrawal, R., Gollapudi, S., Halverson, A., & Ieong, S. (2009). Diversifying search results. In Proceedings of the 2nd ACM international conference on web search and data mining, WSDM ’09 (pp. 5–14). New York: ACM.CrossRefGoogle Scholar
 Baraniak, K., Sydow, M., Szejda, J., & Czerniawska, D. (2016). Studying the role of diversity in open collaboration network: experiments on wikipedia. In Advances of network science (Proceedings of the NetSciX 2016 conference), Lecture Notes in Computer Science, Chap 8, Vol. 9564. Springer.Google Scholar
 Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRefzbMATHGoogle Scholar
 Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Monterey: Wadsworth and Brooks.zbMATHGoogle Scholar
 Brown, C.D., & Davis, H.T. (2006). Receiver operating characteristics curves and related decision measures: a tutorial. Chemometrics and Intelligent Laboratory Systems, 80(1), 24–38.CrossRefGoogle Scholar
 Carbonell, J., & Goldstein, J. (1998). The use of mmr, diversitybased reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’98 (pp. 335–336). New York: ACM.CrossRefGoogle Scholar
 Chen, J., Ren, Y., & Riedl, J. (2010). The effects of diversity on group productivity and member withdrawal in online volunteer groups. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 821–830). ACM.Google Scholar
 Cormen, T.H., Stein, C., Rivest, R.L., & Leiserson, C.E. (2001). Introduction to algorithms, 2nd edn: McGrawHill Higher Education.Google Scholar
 Crandall, D., Cosley, D., Huttenlocher, D., Kleinberg, J., & Suri, S. (2008). Feedback effects between similarity and social influence in online communities. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08 (pp. 160–168). New York: ACM.CrossRefGoogle Scholar
 Giles, J. (2005). Internet encyclopaedias go head to head. Nature, 438, 900–901.CrossRefGoogle Scholar
 Goffman, W. (1964). A searching procedure for information retrieval. Information Storage and Retrieval, 2(2), 73–78.CrossRefzbMATHGoogle Scholar
 Hosmer, D.W., Lemeshow, S., & Sturdivant, R.X. (2013). Applied logistic regression. New York: Wiley.CrossRefzbMATHGoogle Scholar
 Kittur, A., & Kraut, R.E. (2008). Harnessing the wisdom of crowds in wikipedia: Quality through coordination. In Proceedings of the 2008 ACM conference on computer supported cooperative work, CSCW ’08 (pp. 37–46). New York: ACM.CrossRefGoogle Scholar
 Langlois, R.N., & Garzarelli, G. (2008). Of hackers and hairdressers: modularity and the organizational economics of opensource collaboration. Industry and Innovation, 15(2), 125–143.CrossRefGoogle Scholar
 Liaw, A., & Wiener, M. (2002). Classification and Regression by random. Forest R News, 2, 18–22.Google Scholar
 López, C.A., & Butler, B.S. (2013). Consequences of content diversity for online public spaces for local communities. In Proceedings of the 2013 conference on Computer supported cooperative work (pp. 673–682). ACM.Google Scholar
 Parnas, D.L. (1972). On the criteria to be used in decomposing systems into modules. Communications of the ACM, 15(12), 1053–1058.CrossRefGoogle Scholar
 R Core Team (2013). R: A Language and Environment for Statistical Computing. Technical report, R Foundation for Statistical Computing.Google Scholar
 Ren, Y., Chen, J., & Riedl, J. (2015). The impact and evolution of group diversity in online open collaboration. Management Science.Google Scholar
 Sanchez, R., & Mahoney, J.T. (1996). Modularity, flexibility, and knowledge management in product and organization design. Strategic Management Journal, 17 (S2), 63–76.CrossRefGoogle Scholar
 Shannon, C.E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423.MathSciNetCrossRefzbMATHGoogle Scholar
 Strzezek, A., Trammer, L., & Sydow, M. (2015). Divergene: experiments on controlling population diversity in genetic algorithm with a dispersion operator. In Proceedings of the 2015 federated conference on computer science and information systems, annals of computer science and information systems, (Vol. 5 pp. 155–162).Google Scholar
 Sydow, M., Pikuła, M., & Schenkel, R. (2013). The notion of diversity in graphical entity summarisation on semantic knowledge graphs. Journal of Intelligent Information Systems, 41(2), 109–149.CrossRefGoogle Scholar
 Therneau, T., Atkinson, B., & Ripley, B. (2015). rpart: Recursive Partitioning and Regression Trees. R package version 4.110.Google Scholar
 Vasilescu, B., Posnett, D., Ray, B., van den Brand, M.G., Serebrenik, A., Devanbu, P., & Filkov, V. (2015). Gender and tenure diversity in github teams. CHI. ACM.Google Scholar
 Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., & Yahia, S.A. (2008). Efficient computation of diverse query results. In IEEE 24th international conference on data engineering, 2008. ICDE 2008 (pp. 228–236). IEEE.Google Scholar
 Wilkinson, D.M., & Huberman, B.A. (2007). Cooperation and quality in wikipedia. In Proceedings of the 2007 international symposium on wikis, WikiSym ’07 (pp. 157–164). New York: ACM.CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.