Wikipedia Revision Graph Extraction Based on N-Gram Cover

  • Jianmin Wu
  • Mizuho Iwaihara
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7419)


During the past decade, mass collaboration systems have emerged and thrived on the World-Wide Web, with numerous user contents generated. As one of such systems, Wikipedia allows users to add and edit articles in this encyclopedic knowledge base and piles of revisions have been contributed. Wikipedia maintains a linear record of edit history with timestamp for each article, which includes precious information on how each article has evolved. However, meaningful revision evolution features like branching and revert are implicit and needed to be reconstructed. Also, existence of merges from multiple ancestors indicates that the edit history shall be modeled as a directed acyclic graph. To address these issues, we propose a revision graph extraction method based on n-gram cover that effectively find branching and revert. We evaluate the accuracy of our method by comparing with manually constructed revision graphs.


Wikipedia revision graph Mass collaboration 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adler, T.B., de Alfaro, L.: A Content-driven reputation system for the Wikipedia. In: WWW (2007)Google Scholar
  2. 2.
    Cao, Z., Iwaihara, M.: Wikipedia version tree reconstruction by clustering revisions through keywords. IEICE Technical Report DE2011-32 (2011)Google Scholar
  3. 3.
    Doan, R.R., Halevy, A.Y.: Crowdsourcing systems on the World-Wide Web. Commun. ACM 54(4), 86–96 (2011)CrossRefGoogle Scholar
  4. 4.
    Heintze, N.: Scalable document fingerprinting (extended abstract). In: Proc. USENIX Workshop on Electronic Commerce (1996)Google Scholar
  5. 5.
    Hoad, T., Zobel, J.: Methods for Identifying Versioned and Plagiarised Documents. Journal of the American Society for Information Science and Technology 54 (2003)Google Scholar
  6. 6.
    Lih, A.: Wikipedia as participatory journalism: Reliablesources? Metrics for evaluating collaborative media as a news resource. In: Proc. Int. Symp. Online Journalism (2004)Google Scholar
  7. 7.
    Navallo, G.: A Guided Tour to Approximate String Matching. ACM Computing Surveys 33(1) (2001)Google Scholar
  8. 8.
    Sabel, M.: Structuring wiki revision history. In: WikiSym, pp. 125–130 (2007)Google Scholar
  9. 9.
    de Brum Saccol, D., Edelweiss, N., de Matos Galante, R., Zaniolo, C.: XML version detection. In: Proc. ACM DocEng 2007, pp. 79–88 (2007)Google Scholar
  10. 10.
    Ukkonen, E.: Approximate String Matching with q-grams and maximal matches. Theor. Comput. Sci. 1, 191–211 (1992)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Viégas, F.B., Wattenberg, M., Dave, K.: Studying cooperation and conflict between authors with history flow visualizations. In: Proc. ACM CHI 2004, pp. 575–582 (2004)Google Scholar
  12. 12.
    Wang, S., Iwaihara, M.: Quality Evaluation of Wikipedia Articles through Edit History and Editor Groups. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds.) APWeb 2011. LNCS, vol. 6612, pp. 188–199. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  13. 13.
    Wöhner, T., Peters, R.: Assessing the quality of Wikipedia articles with lifecycle based metrics. In: Proc. 5th Int. Symp. Wikis and Open Collaboration (2009)Google Scholar
  14. 14.
    Zeng, H., Alhossaini, M., Ding, L., Fikes, R., McGuinness, D.L., Computing Trust from Revision History. In: Proc. Int. Conf. Privacy, Security and Trust (2006)Google Scholar
  15. 15.
  16. 16.
  17. 17.
  18. 18.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jianmin Wu
    • 1
  • Mizuho Iwaihara
    • 1
  1. 1.Graduate School of Information, Production and SystemsWaseda UniversityFukuokaJapan

Personalised recommendations