Assessing the Quality of Thai Wikipedia Articles Using Concept and Statistical Features

  • Kanchana Saengthongpattana
  • Nuanwan Soonthornphisaj
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 275)

Abstract

The quality evaluation of Thai Wikipedia articles relies on user consideration. There are increasing numbers of articles every day therefore the automatic evaluation method is needed for user. Components of Wikipedia articles such as headers, pictures, references, and links are useful to indicate the quality of articles. However readers need complete content to cover all of concepts in that article. The concept features are investigated in this work. The aim of this research is to classify Thai Wikipedia articles into two classes namely high-quality and low-quality class. Three article domains (Biography, Animal, and Place) are testes with decision tree and Naïve Bayes. We found that Naïve Bayes gets high TP Rate compared to decision tree in every domain. Moreover, we found that the concept feature plays an important role in quality classification of Thai Wikipedia articles.

Keywords

Quality of Thai Wikipedia articles Naïve Bayes Decision tree Concept feature Statistical feature 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dalip, D.H., Gonçalves, M.A., Cardoso, T., Cristo, M., Calado, P.: A Multiview Approach for the Quality Assessment of Wiki Articles. Information and Data Management 3(1), 73–83 (2012)Google Scholar
  2. 2.
    Hu, M., Lim, E.-P., Sun, A., Lauw, H.W., Vuong, B.-Q.: Measuring article quality in wikipedia: models and evaluation. In: 16th ACM Conference on Conference on Information and Knowledge Management, pp. 243–252. ACM, Lisbon (2007)Google Scholar
  3. 3.
    Rassbach, L., Pincock, T., Mingus, B.: Exploring the Feasibility of Automatically Rating Online Article Quality. In: 9th Joint Conference on Digital Libraries (2007)Google Scholar
  4. 4.
    Saengthongpattana, K., Soonthornphisaj, N.: Thai Wikipedia Quality Measurement using Fuzzy Logic. In: 26th Annual Conference of the Japanese Society for Artificial Intelligence (2012)Google Scholar
  5. 5.
    Dalip, D.H., Gonçalves, M.A., Cristo, M., Calado, P.: Automatic Assessment of Document Quality in Web Collaborative Digital Libraries. Journal of Data and Information Quality (JDIQ) 2(3), 1–30 (2011)CrossRefGoogle Scholar
  6. 6.
    Lex, E., Voelske, M., Errecalde, M., Ferretti, E., Cagnina, L., Horn, C., Stein, B., Granitzer, M.: Measuring the quality of web content using factual information. In: 2nd Joint WICOW/AIRWeb Workshop on Web Quality, pp. 7–10. ACM, Lyon (2012)CrossRefGoogle Scholar
  7. 7.
    Xu, Y., Luo, T.: Measuring article quality in Wikipedia: Lexical clue model. In: 3rd Symposium on Web Society (SWS), pp. 141–146 (2011)Google Scholar
  8. 8.
    Liu, J., Ram, S.: Who does what: Collaboration patterns in the wikipedia and their impact on article quality. ACM Trans. Manage. Inf. Syst. 2(2), 1–23 (2011)Google Scholar
  9. 9.
    Lipka, N., Stein, B.: Identifying featured articles in wikipedia: writing style matters. In: The 19th International Conference on World Wide Web, pp. 1147–1148. ACM, Raleigh (2010)CrossRefGoogle Scholar
  10. 10.
  11. 11.
    Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc. (1993)Google Scholar
  12. 12.
    Daniela, X., Hinde, C.J., Stone, R.G.: Naive Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages. Int. Journal of Computer Science 4(1), 16–23 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Kanchana Saengthongpattana
    • 1
  • Nuanwan Soonthornphisaj
    • 1
  1. 1.Department of Computer Science, Faculty of ScienceKasetsart UniversityBangkokThailand

Personalised recommendations