Assessing Web Article Quality by Harnessing Collective Intelligence

  • Jingyu Han
  • Xueping Chen
  • Kejia Chen
  • Dawei Jiang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7238)


Existing approaches assess web article’s quality mainly based on syntax, but seldom work is given on how to quantify its quality based on semantics. In this paper we propose a novel Semantic Quality Assessment(SQA) approach to automatically determine data quality in terms of two most important quality dimensions, namely accuracy and completeness. First, alternative context with respect to source article is built by collecting alternative web articles. Second, each alternative article is transformed and represented by semantic corpus and dimension baselines are synthetically generated from these semantic corpora. Finally, quality dimension of source article is determined by comparing its semantic corpus with dimension baseline. Our approach is promising way to assess web article quality by exploiting available collective knowledge. Experiments show that our approach performs well.


Noun Phrase Latent Dirichlet Allocation Latent Semantic Analysis Quality Class Accuracy Baseline 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dalip, D.H., Cristo, M., Calado, P.: Automatic quality assessment of content created collaboratively by web communities: A case study of wikipedia. In: Proc. of JCDL 2009, pp. 295–304 (2009)Google Scholar
  2. 2.
    Rassbach, L., Pincock, T., Mingus, B.: Exploring the feasibility of automatically rating online article quality (2008)Google Scholar
  3. 3.
    Stvilia, B., Twidle, M.B., Smith, L.C.: Assessing information quality of a community-based encyclopedia. In: Proc. of the International Conference on Information Quality, pp. 442–454 (2005)Google Scholar
  4. 4.
    Wang, R.Y., Kon, H.B., Madnick, S.E.: Data quality requirements analysis and modelling. In: Proc. of the 9th ICDE, pp. 670–677 (1993)Google Scholar
  5. 5.
    Aebi, D., Perrochon, L.: Towards improving data quality. In: Proc. of the International Conference on Information Systems and Management of Data, pp. 273–281 (1993)Google Scholar
  6. 6.
    Bouzeghoub, M., Peralta, V.: A framework for analysis of data freshness. In: Proc. of IQIS 2004, pp. 59–67 (2004)Google Scholar
  7. 7.
    Macdonald, N., Frase, L., Gingrich, P., Keenan, S.: The writer’s workbench: computer aids for text analysis. IEEE Transactions on Communications 30(1), 105–110 (1982)CrossRefGoogle Scholar
  8. 8.
    Foltz, P.W.: Supporting content-based feedback in on-line writing evaluation with lsa. Interactive Learning Environments 8(2), 111–127 (2000)CrossRefGoogle Scholar
  9. 9.
    Hu, M., Lim, E.P., Sun, A., Lauw, H.W., Vuong, B.Q.: Measuring article quality in wikipedia: Models and evaluation. In: Proc. of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 243–252 (2007)Google Scholar
  10. 10.
    Zeng, H., Alhossaini, M.A., Ding, L.: Computing trust from revision history. In: Proc. of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services (2006)Google Scholar
  11. 11.
    Zeng, H., Alhossaini, M.A., Fikes, R.: Mining revision history to assess trustworthiness of article fragments. In: Proc. of International conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 1–10 (2009)Google Scholar
  12. 12.
    Li, X., Meng, W., Yu, C.: T-verifier: Verifying truthfulness of fact statements. In: Proc. of ICDE 2011, pp. 63–74 (2011)Google Scholar
  13. 13.
    Parameswaran, A., Rajaraman, A., Garcia-Molina, H.: Towards the web of concepts: Extracting concepts from large datasets. In: Proc. of 2010 VLDB, vol. 3, pp. 566–577 (2010)Google Scholar
  14. 14.
    Zhao, S., Betz, J.: Corroborate and learn facts from the web. In: Proc. of KDD 2007, pp. 995–1003 (2007)Google Scholar
  15. 15.
    Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  16. 16.
    Hofmann, T.: Probabilistic latent semantic analysis. In: Proc. of UAI 1999 (1999)Google Scholar
  17. 17.
    Kukich, K.: Technique for automatically correcting words in text. ACM Computing Surveys 24(4), 377–439 (1992)CrossRefGoogle Scholar
  18. 18.
    Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jingyu Han
    • 1
  • Xueping Chen
    • 1
  • Kejia Chen
    • 1
  • Dawei Jiang
    • 2
  1. 1.School of Computer Science and TechnologyNanjing University of Posts and TelecommunicationsNanjingP.R.China
  2. 2.School of ComputingNational University of SingaporeSingapore

Personalised recommendations