Value Production in a Collaborative Environment
We review some recent endeavors and add some new results to characterize and understand underlying mechanisms in Wikipedia (WP), the paradigmatic example of collaborative value production. We analyzed the statistics of editorial activity in different languages and observed typical circadian and weekly patterns, which enabled us to estimate the geographical origins of contributions to WPs in languages spoken in several time zones. Using a recently introduced measure we showed that the editorial activities have intrinsic dependencies in the burstiness of events. A comparison of the English and Simple English WPs revealed important aspects of language complexity and showed how peer cooperation solved the task of enhancing readability. One of our focus issues was characterizing the conflicts or edit wars in WPs, which helped us to automatically filter out controversial pages. When studying the temporal evolution of the controversiality of such pages we identified typical patterns and classified conflicts accordingly. Our quantitative analysis provides the basis of modeling conflicts and their resolution in collaborative environments and contribute to the understanding of this issue, which becomes increasingly important with the development of information communication technology.
KeywordsPeer-production User-generated content Wikipedia Social dynamics Burstiness Human dynamics Conflict Language complexity Opinion dynamics
Wikipedia (WP) is a truly amazing product of the 21st century. It is a free online encyclopedia1 edited by volunteers, which has achieved within short period of time enormous success: This encyclopedia, which practically anyone can contribute to has a comparable reliability to the highly professional Encyclopedia Britannica  and has got by now the number one general work of reference in everyday practice. The main question related to Wikipedia is: How can an encyclopedia be reliable if anyone can edit it? The bon mot of Wikipedians is not a satisfactory answer, namely that “It works only in practice. In theory, it can never work.”
The literature about WP is overwhelming. Without seeking completeness, Okoli et al.  tracked more than 2000 related articles. However, there are rather comprehensive reviews, e.g., [42, 65] and an overview of the visibility of WP in scholarly publications . In addition, there are also online platforms to collect and index WP-related academic literature; among them are “WikiLit” 2 , “AcaWiki” 3 and “WikiPapers”.4 A monthly review of the most recent scholarly studies on WP is also available at “Wikimedia research Newsletter”.5
First Wikipedia studies were mostly on its size and growth, showing an initial exponential growth [4, 99], which was later reported to be saturating by other authors . Another main line of WP research is focused on vandalism detection [3, 71, 86, 102, 106]. Assessing user reputation [2, 40] and investigating the articles quality [37, 38, 41, 48, 59, 88, 105, 107] are other two important topics. To understand the management system of WP, there have been interesting studies on user authority, adminship, governance and promotion strategy [1, 22, 56, 57, 83], in addition to analysis of WP policies and bureaucracy . A considerable amount of WP policies are on what to be/not to be in WP. Consequently, there are studies on topical coverage and notability of entries [31, 36, 93]. Seeing WP as a network of articles, various researchers offer analysis and models for topology and growth of the Wikigraph [12, 14, 72, 115, 116], whereas some others used WP to build up knowledge taxonomies and semantic structures [15, 50, 63, 70, 85, 87, 89, 113]. Masucci et al. showed that semantic space has a scale-free structure by analyzing information extracted from WP . More to the sociological side, Restivo and van de Rijt studied the effect of social awards on users activity  and Lam et al. explored the gender imbalance among WP editors . Massa presented an algorithm to extract the social network of editors , and Danescu-Niculescu-Mizil et al. have studied talk page conversations to observe the relation between language coordination and social power of editors . Clearly, scholarly studies in the field of peer-production go beyond WP and for instance Roth et al. studied dynamics of communities of wiki-based projects in the whole “WikiSphere” . Finally, in a rather different approach, Mestyán et al. have made use of WP edit and page view data to predict the movies box office revenue .
Our motivation to study Wikipedia comes from the need of understanding the laws of modern collaborative value production. This is of great importance as in our increasingly complex world the role of information communication technology (ICT) mediated peer collaboration is expected to become more and more important in the future. Due to its relation to ICT, the methods of “Computational Social Science” (CSS)  are adequate tools of investigation of such collaborations. CSS is a truly multidisciplinary endeavor with a considerable contributions from physicists (see, e.g., ). The main difference between traditional social science and CSS is that the latter is data driven: it uses the digital footprints we leave behind in almost all our activities in the digital era .
Collaboration has always been fundamental to most human achievements. Modern information communication technology opens up entirely new ways of cooperation, where partners can interact remotely with an unprecedented speed exchanging extremely large amount of information. Tim Barners-Lee developed originally the World Wide Web  at CERN in order to create an appropriate platform for huge collaborations, which are ubiquitous in high energy physics. Another important example is that of free software development as defined by the Free Software Foundation.6 Nowadays all major scientific projects from the Human Genome7 to Hubble Space Telescope8 rely heavily on ICT mediated collaboration but even on smaller scale we often use Concurrent Versions System,9 wiki  and related environments to increase efficiency. WP is a paradigmatic example of collaborative environment with the additional advantage that all the changes and interactions are well documented and publicly available, which makes it particularly suitable for scientific studies.
Many questions arise when studying WP from our point of view. What are the characteristic features of editorial activities? How are they related to other examples of human dynamics, which have been intensively studied in CSS [8, 46]? What is the mechanism behind the emergence of an article? How can the complexity of the product of the cooperation, namely that of the articles be characterized? How do conflicts emerge and get resolved? In the following we will present analysis of WP data in order to contribute to the clarification of these questions.
To accustom the unfamiliar readers to the terminology and work-flow of Wikipedia, in the next section we briefly review main tools and objects in the Wikimedia platform. Familiar readers are encouraged to skip this section. In Sect. 3, we explain different methods and sources for collecting WP data, and in Sect. 4, a summary of our recent [91, 92, 94, 108, 109, 110] and some new results is provided and compared to the related reports by other authors. We close the paper with a conclusion (Sect. 5).
2 How Wikipedia Works
WP has more than 280 language editions at the moment. Main concepts and structures are similar in all language editions with little variations due to local modifications by the editors’ community of the specific edition. Later we will deal with several WPs, however, whenever it is not specified else explicitly, the English WP is meant.
Describing the structure of WP, there are two main elements to name, (i) Articles (ii) WP editors, also called “Wikipedians”. The rest is all about the internal and inter-element connections and interactions of the members of these two groups, which we name “Accessories” in this paper.
Article statistics for 10 largest Wikipedias. First and second columns are indicating the Language and the Symbol of the Wikipedia editions. In the following columns number of Articles (divided by 1000), Average Length of the articles in characters, Average number of Edits per Article, number of editors with at least one edit, divided by the number of articles, and number of Featured articles are reported
In general, articles could be edited by any Internet user. However there are protections against vandalism applied to some articles and prohibiting different classes of editors from editing. Access to more complex actions, e.g. creating a new article, changing the title, or deleting an article is also subject to hierarchal structure of editors (described in the next section).
“Featured articles are considered to be the best articles Wikipedia has to offer, as determined by Wikipedia’s editors.”10 Articles are tagged as featured based on the community decision on their accuracy, neutrality, completeness and style. In English WP there are more than 3,500 featured articles (see Table 1).
Lists of Controversial Articles
There are also lists of articles with severe editorial disagreements in their history, see, e.g., “List of Controversial Articles”,11 and List of “Lamest Edit Wars”.12 However, the accuracy and coverage of those lists are questionable. There is no clear definition and systematic algorithm to determine, which articles should be listed.
In principle any person with access to Internet could be a Wikipedia editor. Editors are recognized by the system based on the IP addresses, through which they are connected or with their user-name which they choose upon registration. As long as editors edit via their user-names, in general no personal information about them is revealed, unless voluntary disclosure by themselves. There are semi-annual surveys run by Wikimedia Foundation to provide some demographical information about the community of WP editors.13 However, since participation in the survey is completely voluntary, the reliability and coverage of this information is questionable. Therefore, personal information of the editors’ community of WP, is the most unknown aspect of it.
Editor statistics for 10 largest Wikipedias. First and second columns are indicating the Language and the Symbol of the Wikipedia editions. In the following columns, number of Registered users, users who have actually Contributed (at least one edit), Administrators, Bureaucrats, and the editors who are banned forever, are reported
Page statistics for 10 largest Wikipedias. First and second columns are indicating the Language and the Symbol of the Wikipedia editions. In the following columns, number of All pages, Articles, Article Talk pages, User Pages, User Talk pages, and Categories, and in the last column sum of number of Wikipedia guidelines, projects, polls and Help pages are reported. All the numbers are divided by 1000
Policies, Guidelines, Essays and Instructions
“Wikipedia’s policies and guidelines are pages that serve to document the good practices that are accepted in the Wikipedia community.”14 These policies are however subject of change and improvement by the community of editors and may slightly differ among different language editions.
“User pages are for communication and collaboration.”15 They could be used to provide personal information of the editor or less encyclopedic content related to the editor. However, as they are part of the encyclopedia project, their content should not violate the main guidelines.
Article Talk Pages
The purpose of a Wikipedia talk page is to provide space for “editors to discuss changes to its associated article or project page”.16 Talk pages are the main channels for social interactions between editors, and supposed to be the main place to resolve disagreements and editorial conflicts.
User Talk Pages
User talk pages are designed for more general communications directly to each editors. User talk pages are usually less technical than article talk pages and conversations are more personal.
Common Discussion Pages
Apart from article and user talk pages, there other discussion pages related to specific projects, polls, and more collective activities. There are also different communication channels for Wikipedians outside of the WP, e.g., IRC channels and Wikimedia mailing lists; for an overview see .
Categories are intended to group together pages on similar subjects.17 Categories are a feature of the MediaWiki platform. The latter allows articles to be grouped and provides the facility for the readers to navigate through the related articles. The process of article categorization, is carried out by editors, and its accuracy is at the same level as other content of WP.
3 Methods and Data
Beyond usual statistical methods to study Wikipedia, there are numerous open source software packages for different analyzing tasks. Among them is “WikiTrust”18 , to measure article quality and assign a reputation to it. WikiXray19  is another package for doing in-depth statistical analysis on different parameters, e.g. size of WPs, size of articles, number of contributers to each article, etc. However, since all WP data is publicly available, developing home made packages to analyze this data is a common approach.
Every single action of Wikipedia editors is tracked and recorded. This includes all edits on articles, posts on talk pages, page deletions or creations, changes in page titles, uploading multimedia files, etc. Apart from the practical advantages of this complete archiving, it is also extremely valuable from scientific point of view. WP is one of the few human societies that the history of all actions of its members are recorded and accessible.20
There are two convenient ways to access live data of Wikipedia. (i) “Wikimedia Toolserver21 databases, which contains a replica of all Wikimedia wiki databases, and (ii) “MediaWiki web service API”.22 For statistical analysis of contributions, Toolserver database tables are among the best sources of information.
Wikipedia also offers archived copies of its content in different formats,23 e.g., XML and HTML and different types, e.g., snapshots of full history of articles or a collection of latest version of all articles. Generally for historical text analysis of articles, the most reliable source would be these static copies.
“Semantic Wikipedia”, as a general concept would be a combination of Semantic Web and WP data to provide structured data sets through query services. There are various projects providing access to Semantic WP. Examples are “DBpedia” 24 , “Semantic MediaWiki” 25 , and “Wikipedia XML corpus” 26 . For a list of Semantic WP projects see http://en.wikipedia.org/wiki/Wikipedia:Semantic_Wikipedia.
4 Results and Discussion
4.1 Editorial Habits
Similarly to any other large human society, the community of Wikipedia editors is very inhomogeneous. Editors vary in age, gender, nationality, education, occupation, religion, interest, etc.
4.1.1 Edits Statistics
4.1.2 Time of Editing
Since all edits are recorded along with a timestamp, it is very convenient to perform temporal analysis on editorial activities at different time scales.
Ung and Dalle, also reported a power law distribution of the inter-edits time intervals and interpreted their observation as an outcome of editors’ focus on few certain tasks (articles). They measured the slope for different class of users and showed that more/less skewed distributions correspond to more focused/dispersed editors .
As mentioned above, the activity pattern of individual editors are quite heterogeneous in time. However, if we consider the whole editorial pool of a language edition of Wikipedia, we can define an average activity level for all editors which also has its own large scale characteristics. In , it is shown that WP is mostly edited between 1 pm and 11 pm, almost in a universal manner for all language editions. This is in accord with the results in [45, 75]. Deviations from this universality originates from cultural differences and working habits, such that language editions with more editors from countries with longer working hours, are even more edited in later time in evening and around midnight. In addition, for more global language editions, the activity curves are flattened, due to contributions from different time zones (see 4.1.3).
Weekly patterns are quite universal within one WP and different WPs can be classified in different categories based on the activity pattern of their editors . For example, German, English, Spanish, and Italian WPs are mostly edited during the working days, in contrast to Japanese, Korean, and Chinese WPs being mostly edited on weekends. Our findings are in accord with  but in contrast with . However, the latter work studied a sample of four languages only and a shorter monitoring time, and we believe that these lead to the conclusion that editorial activity in WP “while showing a clear diurnal pattern, do not have a clear weekday-weekend pattern.”
4.1.3 Edits Origin
As mentioned earlier in Sect. 2.2, personal information of editors is rarely available. That includes their nationality and living place. However, to understand many aspects of social characteristics of the editors societies, as well as conflicts and potential biases in content, such information could be crucial. To achieve exact data on the location of editors, analysis must be restricted to unregistered users with edits recorded along with IP addresses, whose edits are typically between 5 % to 10 % of the total community contributions in different language editions, and clearly not representing the whole community. Moreover, a considerable part of such editors are atypical (vandals, single act editors). Nevertheless, Hardy et al. followed edits of 2.8 Million such editors and geolocated them and the edited articles. By counting the number of edits as a function of distance between editor and article, an exponentially decaying distance dependence was obtained . Cohen has investigated the contribution of unregistered editors to English WP and concluded that most of unregistered edits are from large cities and metropolitan areas . However, normalization to population of regions seems to be a missing essential for such conclusion.
4.1.4 Characterization of Edits
As mentioned above, each editor has her own unique characteristics and editorial personality. However, similar patterns could be observed by considering types of edits. In a novel approach, Wettenberg et al. established a visualization method to illustrate different editorial actions, e.g. adding, spelling correction, reverting, etc. in a time sequence. In the next step based on the patterns of activity they could distinguish different kinds, namely systematic activity, reactive and mixture activity patterns .
Kittur et al. have classified editors based on number of edits and also specifically followed admins’ contributions from the inception of WP . They concluded that in the beginning of the WP history, large amount of contributions were offered by “elite users”, however, it has gradually changed in a way that after 2004, average users overtook the elites. By counting the number of added and removed words for different editors, they suggest that elite users in average add more words per edit compared to normal editors.
4.1.5 Linguistic Features
The content of Wikipedia is generated by large number of editors collaboratively and without any professional or external supervision. That makes the resulting written language of WP articles a unique multilingual corpus of natural languages. A single sentence in WP might be written, edited and polished by various editors many times, therefore any personalization bias is eliminated on large scales. Moreover, the fully recorded history of articles give the opportunity to follow the short time scale evolution of language and characterize the gradual changes of written language in the digital era. Finally, since WP is huge, and available in many different languages, statistical approaches can be taken in a proper way.
In a practical perspective, Tyers and Pienaar used WP to extract pairs of corresponding words in different languages . Serrano et al. used WP corpus along with two others, to build up statistical models concerning fundamental concepts of patterns of word appearance in the text and vocabulary size . Gabrilovich and Markovitch, introduced a method to calculate semantic relatedness of text fragments by extracting a “high-dimensional space of concepts” from WP . In a recent paper  Kornai argued that the maturity of WP and the activity on it are important indicators for the chances of survival of a language in the digital age.
We found out that the overall F of English WP is high with F=15.8±0.4 compared to other standard English corpora, for instance British National Corpus27 with F=12.1±0.5 . However, readability is not homogeneous among articles in different topics. We observed that articles on more sophisticated topics or concepts, especially in science and philosophy are less readable than, e.g., biographical articles.
An interesting language edition of WP is “Simple Wikipedia”, which is meant to be a proper reference for readers with weaker knowledge of English, e.g., children, language learners or non-native speakers. Editors of Simple WP are explicitly requested to use a simpler language, limited vocabulary, less complex words, shorter sentences, and easy structures.28
In a recent work , Simple is examined by measuring the Flesch reading score  and it is found that Simple is not simple enough compared to other English texts, however with a positive trend in time towards more simplicity. There have been also attempts to use Simple WP for establishing text simplification algorithms [18, 64, 112], however with the assumption that Simple WP is really simple. The comparison of Simple and English WPs enables to study the ability of the editors to fulfill a preset task (namely enhance readability) and, at the same time, it also sheds more light to the concept of language complexity in general. We measured the readability index for a sample extracted from Simple WP . We fund it to be 10.8±0.2, i.e., indeed much lower than for the English WP but just as large as a corpus made of Wall Street Journal29 articles.
To further analyze language complexity of Simple WP, we made the statistics at the word level, and surprisingly observed that vocabulary richness of Simple is comparable to that of main English WP. Moreover, by examining two fundamental laws of linguistics, namely Zipf’s law  and Herdan-Heaps’ law [34, 35], we again confirmed that vocabulary richness of Simple and Main English WP are not significantly different , although the directives explicitly suggest self-restriction in this respect for Simple editors. Detailed analysis of longer units (n-grams) of words shows that the language of Simple is indeed less complex than that of Main but due to more frequent use of predefined language blocks, e.g., chains of words in the length of 4 or 5 words in Simple. Lengths of sentences are also shorter in Simple compared to Main. One can conclude that Simple editors solved fairly well the task to write more readable texts as compared to those in the main English WP without following slavishly the directives but mostly by reducing the variation of language compounds.
4.2 Conflicts and Edit Wars
In the process of creating a common product by various agents, occurrence of controversies due to different opinions are unavoidable. WP is neither an exception in this sense. WP editorial wars and disputes are known and studied phenomena [5, 11, 49, 91, 92, 100, 110]. Editorial wars could be evoked both by internal and external causes. For example life events of celebrities  or natural disasters  could conduct flows of editors to an article leading to tensions and disagreements. Apic et al. showed that disputes in WP are corresponding to real world geopolitical instabilities in many cases . To study editorial wars in details, the first step is to establish an algorithm to locate and rank the debated articles among the relatively large number  of peacefully written ones. There are different proposals for this goal in the literature [11, 49, 92, 100]. In the following section we briefly describe our previously established method  for locating and ranking editorial wars.
4.2.1 Identification of Controversial Articles
Clearly, as the time goes on, more mutual reverts could happen in the history of the article. This makes M a dynamic, monotonically increasing variable. Having calculated M for all articles, we are able to find and rank most disputed ones and investigate them in details. We carried out a detailed comparative study of possible single measures and found that M is in most cases as good as its alternatives if not better with the additional advantage of being applicable to different languages. The superiority of our single parameter measure was reinforced by a recent independent investigation .
Based on the calculated controversy measure for articles in different languages, first conclusion is that, although there are sever editorial wars on some articles, but most of the articles in different languages evolve rather peacefully. However, the truly disputed articles consume a considerable amount of editorial resources. Interesting patterns are observed by comparing the debated titles in different language editions . For instance, issues related to politics and religion are commonly among the most disputed articles in many language WPs, whereas, some category of topics only become controversial in specific languages. Science and philosophy in French and soccer clubs in Spanish WPs are examples of locally debated topics. There are even articles, in top of the controversy list in one WP, which is not even covered in other language edition, or does not have a separate article. Here examples are detailed articles around “Baha’i Faith” related topics in Persian WP. Finally, surprisingly, in the Hebrew WP, sport is debated as much as religion and politics.
4.2.2 Temporal Features
The understanding of the emergence of conflicts, their escalation and resolution is important for maintaining WP and may give hints in general for techniques of conflict management. The controversiality measure M enables the temporal analysis of editorial wars on short and long time scales.
Intuitively more popular articles are subject to more collision of opinions and edit wars. However, the correlation between the average times between edits and the measure M is not significantly strong (C=−0.03).
Burstiness and Conflict
4.2.3 Talk Pages, Conflict and Coordination
As mentioned in Sect. 2.3, talk pages are channels to resolve the editorial disagreements in a more civilized manner than overriding each others edits and “talk before you type” is considered as the ideal mechanism of coordination in WP . In a novel approach, Hautasaari and Ishida investigated the role of talk pages in coordination of translation of articles from English to Finnish, French, and Japanese . They conclude that most of the debates in this field are about naming issues and not much about the content. Schneider et al. performed very detailed analysis of Talk pages from 100 articles manually and talk pages from 5000 articles quantitatively . Their results for the category of controversial articles suggests “significant variance between discussion threads (different sub-topics in the talk page of a certain article) on their talk pages”, such that the distribution of the length of single threads is quite heterogeneous. Many threads are rather short, with few comments, and few of them become extremely long with numerous comments. This is in accordance with the results of , where a preferential attachment model to explain the discussion cascades in the talk pages was presented.
We measure the length of the talk pages for all articles. The correlation between talk page length and M for the English WP is much more significant (C=0.54) than that with the edit frequency. It indicates that most of the debates are reflected in talk pages simultaneous to edit wars directly on the article. This is partially supported in , where a method to detect “peaks” in talk pages is presented and showed that larger peaks mostly co-occur with peaks in editorial activities in a distance of 2 days. However, there are substantial differences between WPs on different languages in the usage of talk pages. In general, less developed WPs use talk pages less but even rather mature WPs, like the German one do not fight out controversies on the talk pages. (For a collection of visualizations and other related materials to edit wars, see http://wwm.phy.bme.hu/.)
In contrast to the revert network of editors, which can be constructed rather straightforwardly, creation of talk page networks need more sophisticated algorithms. Laniado et al. constructed three types of talk networks by considering (i) direct replies between users in article discussion pages, (ii) direct replies in user talk pages, and (iii) personal messages posted on the talk page of another user . The conclusion of this studies suggests the presence of dissortativity in outgoing links and assortativity in incoming links.
Our case study of the talk page of “Safavid dynasty” showed that most of the comments are exchanged between few editors, who are actively editing the articles. In addition, the occurrence of clusters is very rare, such that most of the conversations are between pairs of editors and not bigger groups of them. This is in accord to analysis on mutual reverts which shows only few editors are responsible for large amount of edit wars . By fine investigation on those few user-names very active in controversial articles, we could recall many of them from our list of “Bad Editors” introduced in Sect. 4.1.1.
Language Complexity and Sentiment
Laniado et al. studied the emotional aspects of talk page discussions by measuring sentiment of the comments and found “replies are on average more positive than the comments they reply to, and editors having similar emotional styles are more likely to interact with each other.” Moreover, they found that editors with more social power, i.e. admins, talk more positively and interestingly this is also the case for female editors .
We measured the readability of talk pages based on Eq. (1) and compared it to the readability of articles for two samples of controversial and peaceful articles. In both cases there is a significant reduction in readability, going from articles to corresponding talk pages . However, the reduction is much more significant for the controversial articles. This can be explained by previous sociological theories on the effect of destructive conflict on complexity reduction of language ; In simple words, when people talk with more temper, they use less sophisticated language.
4.2.4 Leader-Follower Behavior in Conflict
The community of editors is structures though it is not easy to unfold its patterns. When studying the talk pages of highly edited articles, it becomes clear that editorial behavior is influenced beyond the content also by personal relationships . There are dominant editors and others, who only follow them. Such relationships largely influence the emerging editor network. The easiest way to detect related behavior is to concentrate on leader-follower pairs. These are pairs of editors (say, A and B), who often act in a specific order, i.e., A always precedes B within a reasonable time, e.g., 1 day. As we are interested in the difference between peaceful and conflict articles, we concentrated on the leader-follower phenomenon in reverts . We defined the following process as an event: A reverts C and (within one day) B reverts C, where C is fixed only for this specific event. Confining our interest to reverts restricted considerably the statistics, however, significant differences between the two groups of articles could be observed here.
We took two different edit history samples of WP. The first sample consisted all reverts of the 837 articles with M value above 105 (conflict articles) ordered by time. In order to avoid the effects of vandalism, we excluded reverter-reverted editor pairs consisting at least of one IP address or bot. Moreover, to gain a better focus on leader-follower relationships and not the effects produced solely by editorial wars between two editors, we also excluded repeated reverts where the reverter-reverted pair was the same and no other reverts happened between these two reverts. This seed consisted of 303397 reverts. We took a sample of 12470 articles with M value under 500 (peaceful articles), where the number of reverts was approximately the same. We also created randomized versions of these samples.
4.2.5 War Scenarios
The characterization of the temporal evolution of conflicts is crucial for their typology and understanding. Our measure M is particularly suitable for such a study. We investigated controversial articles of English WP from this point of view. Instead of the real time we use the number of edits as a control parameter. This way we eliminate several sources of temporal inhomogeneities like maturing the whole WP, differences in the sizes of the articles, and external events motivating editors to focus on an article [73, 74].
(i) Consensus after war, Fig. 10(a); After a smooth initial increase of M, an intense period of war appears and once the conflict is resolved, the article reaches consensus and farther edits are mostly on polishing and improving the presentation quality. This is the scenario for most of the disputed articles in English WP . (ii) Stepwise conflicts, Fig. 10(b); After the first cycle of conflict-resolution, the consensus state might be altered mainly because of one of two reasons, namely occurrence of an external event which generates new controversy or arrival of new editors, who are not satisfied with the previously compromised content of the article. Therefore, other conflict-resolution cycles may appear in the overall history of the article. (iii) Never-ending war, Fig. 10(c); If the rate of incoming editors or external events related to the topic of the article, is considerably larger than the typical time to reach consensus, even a temporary equilibrium cannot be achieved and the increase of M becomes permanent. This is the case of highly popular and live-object articles. Number of such articles in English WP does not exceed few hundreds (compared to some millions, the total number of articles).
4.2.6 Agent-Based Modeling
Motivated by empirical results on editorial wars in WP, we aimed at providing a minimalistic agent-based model capturing the main features of the wars . The model belongs to the class of bounded confidence models of opinion dynamics introduced by Deffuant et al. . It consists of two types of elements; Ne editors and one article. In each Monte Carlo step, editors interact if their scalar opinions xi∈[0,1], i=1…Ne are already closer to each other than a threshold value ϵT and then they adopt the opinion of the arithmetic mean. An editor edits the article if she finds it in a state A∈[0,1] with a difference larger than ϵA to xi, otherwise she revises her own opinion which gets closer to the article state by an amount controlled by a parameter μA. In addition, editors can be replaced in each step by new ones with a constant rate pnew.
Fixed Editorial Pool
To evaluate the outcome of the model, initially pnew is set to 0, which leads to consensus for the whole parameter space, meaning that after sufficiently time A becomes constant. However, the relaxation time to consensus very much depends on the parameters set. There are three different scenarios to approach the consensus state: (i) for small values of μA, system needs astronomically long time to reach the final state, although A is always very close to the system average of xi. (ii) Intermediate values of μA puts the system into an oscillatory phase, in which A fluctuates largely between two extreme values, however ending up with one of them in a relatively shorter time. (iii) Large values of μA leads to exaggerated fluctuations of A, however with fast convergence of extremist editors and a shorter relaxation time compared to the previous cases.
Dynamic Editorial Pool
In this paper we surveyed recent work on WP and extended it by some new results. Our studies covered multilingual aspects and focused on the mechanisms and consequences of collaborative value production. The analysis of daily and weekly patterns of the editorial activity made it possible to identify the contributions from different parts of the world to such globally edited WPs as the English, the Spanish or the Arabic as well as to point out cultural differences in editing habits. The “wisdom of the crowd” seems to cope better with some tasks than pre-designed directives as the case of Simple WP demonstrates. Our main focus was to characterize and understand how conflicts emerge and get eventually resolved. While most of the WP articles are edited in a peaceful, constructive atmosphere, some of the most popular articles are rather controversial. In order to be able to study the conflict pages systematically, we developed a simple measure to identify them automatically. We have found interesting differences between peaceful and conflict pages in their dynamics as the edit activity of the latter is a long range correlated process in contrast to that of average (peaceful) pages. The language of the talk pages of conflict articles gets more reduced in complexity than that of regular articles and the leader-follower behavior is more intensive. The temporal evolution of the measure M enabled to distinguish between different types of conflicts (single conflict with resolution, multiple conflicts, permanent war). Finally, we showed that simple multi-agent modeling based on opinion dynamics can reproduce some of our findings.
We would like to thank our collaborators: Gerardo Iñiguez, Kimmo Kaski, András Kornai, András Rung, Maxi San Miguel, Róbert Sumi, János Török. Discussions, advise and help with the data are gratefully acknowledged to Farzaneh Kaveh, Santo Fortunato, Márton Mestyán, Andrzej Nowak, Hoda Sepehri Rad, Attila Zséder, Gábor Recski, Peter Reuvern, and Katarzyna Samson.
- 1.Aaltonen, A., Lanzara, G.F.: Governing complex social production in the Internet: the emergence of a collective capability in Wikipedia (2011). In Decade in Internet Time Symposium Google Scholar
- 2.Adler, B.T., de Alfaro, L.: A content-driven reputation system for the Wikipedia. Tech. Rep. ucsc-crl-06-18, School of Engineering, University of California, Santa Cruz (2006) Google Scholar
- 3.Adler, B.T., de Alfaro, L., Mola-Velasco, S., Rosso, P., West, A.: Wikipedia vandalism detection: combining natural language, metadata, and reputation features. In: Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science, vol. 6609, pp. 277–288. Springer, Berlin (2011) Google Scholar
- 4.Almeida, R.B., Mozafari, B., Cho, J.: On the evolution of Wikipedia. In: Proceedings of the International Conference on Weblogs and Social Media, ICWSM’07 (2007) Google Scholar
- 6.Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: Dbpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.S., Noy, N., Allemang, D., Lee, K.I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) The Semantic Web. Lecture Notes in Computer Science, vol. 4825, pp. 722–735. Springer, Berlin (2007) Google Scholar
- 7.Ayers, P., Priedhorsky, R.: Wikilit: collecting the wiki and Wikipedia literature. In: Proceedings of the 7th International Symposium on Wikis and Open Collaboration, WikiSym ’11, pp. 229–230. ACM, New York (2011) Google Scholar
- 9.Besten, M.D., Dalle, J.: Keep it simple: a companion for simple Wikipedia? Ind. Innov. 15(2), 169–178 (2008) Google Scholar
- 10.Bohannon, J.: Tracking people’s electronic footprints. Science 314(5801), 914–916 (2006) Google Scholar
- 11.Brandes, U., Lerner, J.: Visual analysis of controversy in user-generated encyclopedias. Inf. Vis. 7(1), 34–48 (2008) Google Scholar
- 12.Buriol, L.S., Castillo, C., Donato, D., Leonardi, S., Millozzi, S.: Temporal analysis of the wikigraph. In: Proc. of Web Intelligence, Hong Kong, pp. 45–51 (2006) Google Scholar
- 13.Butler, B., Joyce, E., Pike, J.: Don’t look now, but we’ve created a bureaucracy: the nature and roles of policies and rules in Wikipedia. In: Proceedings of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems, CHI ’08, pp. 1101–1110. ACM, New York (2008) Google Scholar
- 16.Chakrabarti, B.K., Chakraborti, A., Chatterjee, A. (eds.): Econophysics and Sociophysics: Trends and Perspectives. Wiley-VCH, Berlin (2006) Google Scholar
- 18.Coster, W., Kauchak, D.: Simple English Wikipedia: a new text simplification task. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, HLT ’11, pp. 665–669. Association for Computational Linguistics, Stroudsburg (2011) Google Scholar
- 20.Deffuant, G., Neau, D., Amblard, F., Weisbuch, G.: Mixing beliefs among interacting agents. Adv. Complex Syst. 3(4), 87–98 (2000) Google Scholar
- 21.Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. ACM SIGIR forum. In: European Academy of Management Annual Conference 2010, Rome, Italy, vol. 40(1) (2006) Google Scholar
- 22.Derthick, K., Tsao, P., Kriplean, T., Borning, A., Zachry, M., McDonald, D.: Collaborative sensemaking during admin permission granting in Wikipedia. In: Ozok, A., Zaphiris, P. (eds.) Online Communities and Social Computing. Lecture Notes in Computer Science, vol. 6778, pp. 100–109. Springer, Berlin (2011) Google Scholar
- 23.Felipe, O.: Wikipedia: a quantitative analysis. Ph.D. thesis, University Rey Juan Carlos, Madrid, Spain (2009) Google Scholar
- 24.Flesch, R.: How to Write Plain English. Harper & Row, New York (1979) Google Scholar
- 28.Gómez, V., Kappen, H.J., Kaltenbrunner, A.: Modeling the structure and evolution of discussion cascades. In: Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, HT ’11, pp. 181–190. ACM, New York (2011) Google Scholar
- 29.Gunning, R.: The Technique of Clear Writing. McGraw-Hill, New York (1952) Google Scholar
- 31.Halavais, A., Lackaff, D.: An analysis of topical coverage of Wikipedia. J. Comput.-Mediat. Commun. 13(2), 429–440 (2008) Google Scholar
- 32.Hardy, D., Frew, J., Goodchild, M.F.: Volunteered geographic information production as a spatial process. Int. J. Geogr. Inf. Sci. 26(7), 1191–1212 (2012) Google Scholar
- 33.Hautasaari, A., Ishida, T.: Analysis of discussion contributions in translated Wikipedia articles. In: Proceedings of the 4th International Conference on Intercultural Collaboration, ICIC ’12, pp. 57–66. ACM, New York (2012) Google Scholar
- 36.Holloway, T., Bozicevic, M., Börner, K.: Analyzing and visualizing the semantic coverage of Wikipedia and its authors. Complexity 12(3), 30–40 (2007) Google Scholar
- 37.Hu, M., Lim, E.P., Sun, A., Lauw, H.W., Vuong, B.Q.: Measuring article quality in Wikipedia: models and evaluation. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM ’07, pp. 243–252. ACM, New York (2007) Google Scholar
- 38.Javanmardi, S., Lopes, C.: Statistical measure of quality in Wikipedia. In: Proceedings of the First Workshop on Social Media Analytics, SOMA ’10, pp. 132–138. ACM, New York (2010) Google Scholar
- 39.Javanmardi, S., Ganjisaffar, Y., Lopes, C., Baldi, P.: User contribution and trust in Wikipedia. In: 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing, 2009. CollaborateCom 2009, pp. 1–6 (2009) Google Scholar
- 41.Jones, J.: Patterns of revision in online writing. Writ. Commun. 25(2), 262–289 (2008) Google Scholar
- 42.Jullien, N.: What we know about Wikipedia: a review of the literature analyzing the project(s) (2012). Available at SSRN http://ssrn.com/abstract=2053597
- 43.Kaltenbrunner, A., Laniado, D.: There is no deadline—time evolution of Wikipedia discussions. In: Proceedings of the 8th International Symposium on Wikis and Open Collaboration, WikiSym’12, Linz (2012) Google Scholar
- 44.Kämpf, M., Tismer, S., Kantelhardt, J.W., Muchnik, L.: Fluctuations in Wikipedia access-rate and edit-event data. Phys. A, Stat. Mech. Appl. 391(23), 6101–6111 (2012) Google Scholar
- 45.Karkulahti, O., Kangasharju, J.: Surveying Wikipedia activity: collaboration, commercialism, and culture. In: 2012 International Conference on Information Networking (ICOIN), pp. 384–389 (2012) Google Scholar
- 46.Karsai, M., Kaski, K., Barabási, A.L., Kertész, J.: Universal features of correlated bursty behaviour. Sci. Rep. 2, 397 (2012) Google Scholar
- 47.Keegan, B., Gergle, D., Contractor, N.: Hot off the wiki: dynamics, practices, and structures in Wikipedia’s coverage of the Tōhoku catastrophes. In: Proceedings of the 7th International Symposium on Wikis and Open Collaboration, WikiSym ’11, pp. 105–113. ACM, New York (2011) Google Scholar
- 48.Kittur, A., Kraut, R.E.: Harnessing the wisdom of crowds in Wikipedia: quality through coordination. In: Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work, CSCW ’08, pp. 37–46. ACM, New York (2008) Google Scholar
- 49.Kittur, A., Pendleton, B.A., Suh, B., Mytkowicz, T.: Power of the few vs. wisdom of the crowd: Wikipedia and the rise of the bourgeoisie. In: CHI ’07: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2007) Google Scholar
- 50.Kittur, A., Chi, E.H., Suh, B.: What’s in Wikipedia?: mapping topics and conflict using socially annotated category structure. In: Proceedings of the 27th International Conference on Human Factors in Computing Systems, CHI ’09, pp. 1509–1512. ACM, New York (2009) Google Scholar
- 51.Kornai, A.: Language death in the digital age (2012, to be published) Google Scholar
- 52.Lam, S.T.K., Uduwage, A., Dong, Z., Sen, S., Musicant, D.R., Terveen, L., Riedl, J.: Wp:clubhouse?: an exploration of Wikipedia’s gender imbalance. In: Proceedings of the 7th International Symposium on Wikis and Open Collaboration, WikiSym ’11, pp. 1–10. ACM, New York (2011) Google Scholar
- 53.Laniado, D., Tasso, R., Volkovich, Y., Kaltenbrunner, A.: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages. In: 5th International AAAI Conference on Weblogs and Social Media, ICWSM 2011, pp. 177–184 (2011) Google Scholar
- 54.Laniado, D., Castillo, C., Kaltenbrunner, A., Fuster Morell, M.: Emotions and dialogue in a peer-production community: the case of Wikipedia. In: Proceedings of the 8th International Symposium on Wikis and Open Collaboration, WikiSym’12, Linz (2012) Google Scholar
- 55.Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., Van Alstyne, M.: Computational social science. Science 323(5915), 721–723 (2009) Google Scholar
- 56.Lee, J.B., Cabunducan, G., Cabarle, F.G.C., Castillo, R., Malinao, J.A.: Uncovering the social dynamics of online elections. J. Univers. Comput. Sci. 18(4), 487–505 (2012) Google Scholar
- 57.Leskovec, J., Huttenlocher, D., Kleinberg, J.: Governance in social media: a case study of the Wikipedia promotion process. In: Proceedings of the International Conference on Weblogs and Social Media, ICWSM’10 (2010) Google Scholar
- 58.Leuf, B., Cunningham, W.: The Wiki Way: Quick Collaboration on the Web. Addison-Wesley, Longman, Boston (2001) Google Scholar
- 59.Luyt, B., Aaron, T.C.H., Thian, L.H., Hong, C.K.: Improving Wikipedia’s accuracy: is edit age a solution? J. Am. Soc. Inf. Sci. Technol. 59(2), 318–330 (2008) Google Scholar
- 60.Massa, P.: Social networks of Wikipedia. In: Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, HT ’11, pp. 221–230. ACM, New York (2011) Google Scholar
- 62.Mestyán, M., Yasseri, T., Kertész, J.: Early prediction of movie box office success based on Wikipedia activity big data (2012, submitted). Preprint arXiv:1211.0970
- 64.Napoles, C., Dredze, M.: Learning simple Wikipedia: a cogitation in ascertaining abecedarian language. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing, CL&W ’10, pp. 42–50. Association for Computational Linguistics, Stroudsburg (2010) Google Scholar
- 65.Nielsen, F.A.: Wikipedia research and tools: review and comments (2011). Available at http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/6012/pdf/imm6012.pdf
- 66.Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F.A., Lanamäki, A.: The people’s encyclopedia under the gaze of the sages: a systematic review of scholarly research on Wikipedia (2012). Available at SSRN http://ssrn.com/abstract=2021326
- 67.Ortega, F., Gonzalez-Barahona, J., Robles, G.: On the inequality of contributions to Wikipedia. In: Proceedings of the 41st Annual Hawaii International Conference on System Sciences, p. 304 (2008) Google Scholar
- 68.Park, T.K.: The visibility of Wikipedia in scholarly publications. First Monday 16(8) (2011) Google Scholar
- 69.Pentzold, C., Seidenglanz, S.: Foucault@wiki: first steps towards a conceptual framework for the analysis of wiki discourses. In: Proceedings of the 2006 International Symposium on Wikis, WikiSym ’06, pp. 59–68. ACM, New York (2006) Google Scholar
- 71.Potthast, M., Stein, B., Gerling, R.: Automatic vandalism detection in Wikipedia. In: Proceedings of the IR Research, 30th European Conference on Advances in Information Retrieval, ECIR’08, pp. 663–668. Springer, Berlin (2008) Google Scholar
- 72.Ratkiewicz, J., Flammini, A., Menczer, F.: Traffic in social media I: Paths through information networks. In: 2010 IEEE Second International Conference on Social Computing (SocialCom), pp. 452–458 (2010) Google Scholar
- 74.Ratkiewicz, J., Menczer, F., Fortunato, S., Flammini, A., Vespignani, A.: Traffic in social media II: Modeling bursty popularity. In: 2010 IEEE Second International Conference on Social Computing (SocialCom), pp. 393–400 (2010) Google Scholar
- 75.Reinoso, A.J., Gonzalez-Barahona, J.M., Muñoz-Mansilla, R., Herraiz, I.: Temporal characterization of the requests to Wikipedia. In: Proceedings of the 5th International Workshop on New Challenges in Distributed Information Filtering and Retrieval (DART 2011), vol. 771 (2011) Google Scholar
- 77.Rivest, R.L.: The md5 message-digest algorithm. Internet Request for Comments. RFC 1321 (1992) Google Scholar
- 78.Roth, C., Taraborelli, D., Gilbert, N.: Measuring wiki viability: an empirical assessment of the social dynamics of a large sample of wikis. In: Proceedings of the 4th International Symposium on Wikis, WikiSym ’08, pp. 27:1–27:5. ACM, New York (2008) Google Scholar
- 79.Rung, A., Yasseri, T., Kornai, A., Kertész, J.: Editorial relations in controversial Wikipedia articles (2012, to be published) Google Scholar
- 80.Samson, K., Nowak, A.: Linguistic signs of destructive and constructive processes in conflict. In: IACM 23rd Annual Conference Paper (2010) Google Scholar
- 81.Schneider, J., Passant, A., Breslin, J.: A qualitative and quantitative analysis of how Wikipedia talk pages are used. In: Proceedings of the WebSci10: Extending the Frontiers of Society, April 26–27th, 2010, Raleigh, NC: US, pp. 1–7 (2010) Google Scholar
- 82.Sepehri Rad, H., Barbosa, D.: Identifying controversial articles in Wikipedia: a comparative study. In: Proceedings of the 8th International Symposium on Wikis and Open Collaboration, WikiSym’12, Linz (2012) Google Scholar
- 83.Sepehri Rad, H., Makazhanov, A., Rafiei, D., Barbosa, D.: Leveraging editor collaboration patterns in Wikipedia. In: Proceedings of the 23rd ACM Conference on Hypertext and Social Media, HT ’12, pp. 13–22. ACM, New York (2012) Google Scholar
- 85.Silva, F., Viana, M., Travençolo, B., Costa, L. da F.: Investigating relationships within and between category networks in Wikipedia. J. Informetr. 5(3), 431–438 (2011) Google Scholar
- 86.Smets, K., Goethals, B., Verdonk, B.: Automatic vandalism detection in Wikipedia: towards a machine learning approach. In: AAAI Workshop Wikipedia and Artificial Intelligence: An Evolving Synergy, WikiAI08, pp. 43–48. AAAI Press, Menlo Park (2008) Google Scholar
- 87.Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using Wikipedia. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 2, pp. 1419–1424. AAAI Press, Menlo Park (2006) Google Scholar
- 88.Stvilia, B., Twidale, M.B., Smith, L.C., Gasser, L.: Information quality work organization in Wikipedia. J. Am. Soc. Inf. Sci. Technol. 59(6), 983–1001 (2008) Google Scholar
- 89.Suchecki, K., Salah, A., Gao, C., Scharnhorst, A.: Evolution of Wikipedia’s category structure. Adv. Complex Syst. 15(supp01), 1250068 (2012) Google Scholar
- 90.Suh, B., Convertino, G., Chi, E.H., Pirolli, P.: The singularity is not near: slowing growth of Wikipedia. In: Proceedings of the 5th International Symposium on Wikis and Open Collaboration, WikiSym ’09, pp. 8:1–8:10. ACM, New York (2009) Google Scholar
- 91.Sumi, R., Yasseri, T., Rung, A., Kornai, A., Kertész, J.: Characterization and prediction of Wikipedia edit wars. In: Proceedings of the ACM WebSci’11, Koblenz, Germany, pp. 1–3 (2011) Google Scholar
- 92.Sumi, R., Yasseri, T., Rung, A., Kornai, A., Kertész, J.: Edit wars in Wikipedia. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT), and 2011 IEEE Third International Conference on Social Computing (SocialCom), pp. 724–727 (2011) Google Scholar
- 93.Taraborelli, D., Ciampaglia, G.: Beyond notability. Collective deliberation on content inclusion in Wikipedia. In: 2010 Fourth IEEE International Conference on Self-adaptive and Self-organizing Systems Workshop (SASOW), pp. 122–125 (2010) Google Scholar
- 95.Tyers, F., Pienaar, J.: Extracting bilingual word pairs from Wikipedia. In: Proceedings of the SALTMIL Workshop at Language Resources and Evaluation Conference, LREC’08 (2008) Google Scholar
- 96.Ung, H.M., Dalle, J.M.: Characterizing online communities with their “signals”. In: European Academy of Management Annual Conference 2010, Rome, Italy (2010) Google Scholar
- 97.Viegas, F.B., Wattenberg, M., Kriss, J., van Ham, F.: Talk before you type: coordination in Wikipedia. In: 40th Annual Hawaii International Conference on System Sciences, 2007. HICSS 2007, p. 78 (2007) Google Scholar
- 98.Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic Wikipedia. In: Proceedings of the 15th International Conference on World Wide Web, WWW ’06, pp. 585–594. ACM, New York (2006) Google Scholar
- 99.Voss, J.: Measuring Wikipedia. In: International Conference of the International Society for Scientometrics and Informetrics: 10th, Stockholm (Sweden), 24–28 July 2005 (2005) Google Scholar
- 100.Vuong, B.Q., Lim, E.P., Sun, A., Le, M.T., Lauw, H.W., Chang, K.: On ranking controversies in Wikipedia: models and evaluation. In: Proceedings of the International Conference on Web Search and Web Data Mining, WSDM ’08, pp. 171–182. ACM, New York (2008) Google Scholar
- 101.Wattenberg, M., Viégas, F., Hollenbach, K.: Visualizing activity on Wikipedia with chromograms. In: Baranauskas, C., Palanque, P., Abascal, J., Barbosa, S. (eds.) Human-Computer Interaction—INTERACT 2007. Lecture Notes in Computer Science, vol. 4663, pp. 272–287. Springer, Berlin (2007) Google Scholar
- 102.West, A.G., Kannan, S., Lee, I.: Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata. In: Proceedings of the Third European Workshop on System Security, pp. 22–28. ACM, New York (2010) Google Scholar
- 103.Wikipedia: World wide web—Wikipedia, the free encyclopedia (2012). URL http://en.wikipedia.org/w/index.php?title=World_Wide_Web&oldid=508583126. Online. Accessed 22 August 2012
- 104.Wilkinson, D.M.: Strong regularities in online peer production. In: Proceedings of the 9th ACM Conference on Electronic Commerce, EC ’08, pp. 302–309. ACM, New York (2008) Google Scholar
- 105.Wilkinson, D.M., Huberman, B.A.: Assessing the value of cooperation in Wikipedia. First Monday 12(4) (2007) Google Scholar
- 106.Wu, Q., Irani, D., Pu, C., Ramaswamy, L.: Elusive vandalism detection in Wikipedia: a text stability-based approach. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pp. 1797–1800. ACM, New York (2010) Google Scholar
- 107.Wu, G., Harrigan, M., Cunningham, P.: Characterizing Wikipedia pages using edit network motif profiles. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, SMUC ’11, pp. 45–52. ACM, New York (2011) Google Scholar
- 111.Yasseri, T., Spoerri, A., Graham, M., Kertész, J.: The most controversial topics in Wikipedia: a multilingual analysis (2013, in preparation) Google Scholar
- 112.Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C., Lee, L.: For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia. In: Human Language Technologies 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10, pp. 365–368. Association for Computational Linguistics, Stroudsburg (2010) Google Scholar
- 113.Zesch, T., Müller, C., Gurevych, I.: Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In: Proceedings of the Conference on Language Resources and Evaluation, LREC (2008) Google Scholar
- 114.Zipf, G.K.: The Psycho-Biology of Language: An Introduction to Dynamic Philology. MIT Press, Cambridge (1935) Google Scholar