Redundancy and Collaboration in Wikibooks

  • Ilaria Liccardi
  • Olivier Chapuis
  • Ching-Man Au Yeung
  • Wendy Mackay
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6946)


This paper investigates how Wikibooks authors collaborate to create high-quality books. We combined Information Retrieval and statistical techniques to examine the complete multi-year lifecycle of over 50 high-quality Wikibooks. We found that: 1. The presence of redundant material is negatively correlated with collaboration mechanisms; 2. For most books, over 50% of the content is written by a small core of authors; and 3. Use of collaborative tools (predicted pages and talk pages) is significantly correlated with patterns of redundancy. Non-redundant books are well-planned from the beginning and require fewer talk pages to reach high-quality status. Initially redundant books begin with high redundancy, which drops as soon as authors use coordination tools to restructure the content. Suddenly redundant books display sudden bursts of redundancy that must be resolved, requiring significantly more discussion to reach high-quality status. These findings suggest that providing core authors with effective tools for visualizing and removing redundant material may increase writing speed and improve the book’s ultimate quality.


Collaborative writing text redundancy coordination mechanisms 


  1. 1.
    Brooks, F.P.: The Mythical Man-Month: Essays on Software Engineering. Addison-Wesley, Reading (1995)Google Scholar
  2. 2.
    Chesney, T.: An empirical examination of Wikipedia’s credibility. First Monday 11(11) (2006) Google Scholar
  3. 3.
    Chevalier, F., Dragicevic, P., Bezerianos, A., Fekete, J.D.: Using text animated transitions to support navigation in document histories. In: Proc. CHI, pp. 683–692. ACM, New York (2010)Google Scholar
  4. 4.
    Clearwater, S.H., Huberman, B.A., Hogg, T.: Cooperative solution of constraint satisfaction problems. Science 254, 1181–1183 (1991)CrossRefGoogle Scholar
  5. 5.
    Emigh, W., Herring, S.C.: Collaborative authoring on the web: A genre analysis of online encyclopedias. In: Proc. HICSS (2005) Google Scholar
  6. 6.
    Encyclopedia Britannica Inc.: Fatally flawed: refuting the recent study on encyclopedic ac- curacy by the journal Nature (March 2006) Google Scholar
  7. 7.
    Gastwirth, J.L.: The estimation of the Lorenz curve and Gini index. The Review of Economics and Statistics 54(3), 306–316 (1972)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Giles, J.: Internet encyclopedia as go head to head. Nature 438, 900–901 (2005)CrossRefGoogle Scholar
  9. 9.
    Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: Proc. NAACL-ANLP, pp. 40–48. ACL (2000)Google Scholar
  10. 10.
    Gutwin, C., Benford, S., Dyck, J., Fraser, M., Vaghi, I., Greenhalgh, C.: Revealing delay in collaborative environments. In: Proc. CHI, pp. 503–510. ACM, New York (2004)Google Scholar
  11. 11.
    Hill, G.W.: Group versus individual performance: are n+1 heads better than one? Psychological Bulletin 91, 517–539 (1982)CrossRefGoogle Scholar
  12. 12.
    Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Discov. Data 2(2), 1–25 (2008)CrossRefGoogle Scholar
  13. 13.
    Kittur, A., Chi, E.H., Suh, B.: Crowdsourcing user studies with mechanical turk. In: Proc. CHI, pp. 453–456. ACM, New York (2008)Google Scholar
  14. 14.
    Kittur, A., Kraut, R.E.: Harnessing the wisdom of crowds in Wikipedia: quality through coordination. In: Proc. CSCW, pp. 37–46. ACM, New York (2008)CrossRefGoogle Scholar
  15. 15.
    Kittur, A., Lee, B., Kraut, R.E.: Coordination in collective intelligence: the role of team structure and task interdependence. In: Proc. CHI, pp. 1495–1504. ACM, New York (2009)Google Scholar
  16. 16.
    Kittur, A., Suh, B., Pendleton, B.A., Chi, E.H.: He says, she says: conflict and coordination in Wikipedia. In: Proc. CHI, pp. 453–462. ACM, New York (2007)Google Scholar
  17. 17.
    Kowalski, G.: Information retrieval systems: theory and implementation. Kluwer Academic, Dordrecht (1997)MATHGoogle Scholar
  18. 18.
    Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  19. 19.
    Lerner, J., Pathak, P.A., Tirole, J.: The dynamics of open-source contributors. American Economic Review 96(2), 114–118 (2006)CrossRefGoogle Scholar
  20. 20.
    Li, L., Zhou, K., Xue, G.R., Zha, H., Yu, Y.: Enhancing diversity, coverage and balance for summarization through structure learning. In: Proc. WWW, pp. 71–80. ACM, New York (2009)CrossRefGoogle Scholar
  21. 21.
    Li, Y., McLean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)CrossRefGoogle Scholar
  22. 22.
    Lih, A.: Wikipedia as participatory journalism: reliable sources? Metrics for evaluating collaborative media as a news resource. In: Proc. ISOJ, pp. 16–17 (2004) Google Scholar
  23. 23.
    Mackay, W.E.: Patterns of sharing customizable software. In: Proc. CSCW, pp. 209–221. ACM, New York (1990)Google Scholar
  24. 24.
    Nardi, B.A., Miller, J.R.: Twinkling lights and nested loops: distributed problem solving and spreadsheet development. Int. J. Man-Mach. Stud. 34(2), 161–184 (1991)CrossRefGoogle Scholar
  25. 25.
    Panciera, K., Halfaker, A., Terveen, L.: Wikipedians are born, not made: a study of power editors on wikipedia. In: Proc. GROUP, pp. 51–60. ACM, New York (2009)Google Scholar
  26. 26.
    Pedersen, T., Patwardhan, S.: Wordnet:similarity - measuring the relatedness of concepts. In: Proc. AAAI, pp. 1024–1025 (2004)Google Scholar
  27. 27.
    Raymond, E.S.: The Cathedral and the Bazaar. O’Reilly, Sebastopol (2001)Google Scholar
  28. 28.
    Sajjapanroj, S., Bonk, C.J., Lee, M.M., Lin, M.F.: The challenges and successes of wikibookian experts and Wikibook novices: Classroom and community collaborative experiences. In: Proc. AERA (2007) Google Scholar
  29. 29.
    Steiner, I.D.: Group process and productivity. Academic Press, London (1972)Google Scholar
  30. 30.
    Stewart, G.L.: A meta-analytic review of relationships between team design features and team performance. Journal of Management 32, 26–55 (2006)CrossRefGoogle Scholar
  31. 31.
    Thagard, P.: Collaborative knowledge. Nous 31, 242–261 (1997)CrossRefGoogle Scholar
  32. 32.
    Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  33. 33.
    Viégas, F.B., Wattenberg, M., Dave, K.: Studying cooperation and conflict between authors with history flow visualizations. In: Proc. CHI, pp. 575–582. ACM, New York (2004)Google Scholar
  34. 34.
    Xiao, Y., Baker, P.B., O’Shea, P.M., Allen, D.W.: Wikibook as college textbook: a case study of college students’ participation in writing, editing and using a wikibook as primary course textbook. In: Proc. AERA (2007) Google Scholar
  35. 35.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. Machine Learning, pp. 412–420 (1997) Google Scholar
  36. 36.
    Zipf, G.K.: The Psychobiology of Language. Houghton-Mifflin, Boston (1935)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2011

Authors and Affiliations

  • Ilaria Liccardi
    • 2
    • 1
  • Olivier Chapuis
    • 1
    • 2
  • Ching-Man Au Yeung
    • 3
  • Wendy Mackay
    • 2
    • 1
  1. 1.Univ. Paris-Sud & CNRSOrsayFrance
  2. 2.INRIAOrsayFrance
  3. 3.NTT Communication Science LaboratoriesKyotoJapan

Personalised recommendations