Web Article Quality Assessment in Multi-dimensional Space

  • Jingyu Han
  • Xiong Fu
  • Kejia Chen
  • Chuandong Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6897)

Abstract

Nowadays user-generated content (UGC) such as Wikipedia, is emerging on the web at an explosive rate, but its data quality varies dramatically. How to effectively rate the article’s quality is the focus of research and industry communities. Considering that each quality class demonstrates its specific characteristics on different quality dimensions, we propose to learn the web quality corpus by taking different quality dimensions into consideration. Each article is regarded as an aggregation of sections and each section’s quality is modelled using Dynamic Bayesian Network(DBN) with reference to accuracy, completeness and consistency. Each quality class is represented by three dimension corpora, namely accuracy corpus, completeness corpus and consistency corpus. Finally we propose two schemes to compute quality ranking. Experiments show our approach performs well.

Keywords

Quality Dimension Beta Distribution Quality Class Dimension Factor Dynamic Bayesian Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hu, M., Lim, E.P., Sun, A.: Measuring article quality in wikipedia: Models and evaluation. In: Proc. of the sixteenth CIKM, pp. 243–252 (2007)Google Scholar
  2. 2.
    Aebi, D., Perrochon, L.: Towards improving data quality. In: Proc. of the International Conference on Information Systems and Management of Data, pp. 273–281 (1993)Google Scholar
  3. 3.
    Wang, R.Y., Kon, H.B., Madnick, S.E.: Data quality requirements analysis and modeling. In: Proc. of the Ninth International Conference on Data Engineering, pp. 670–677 (1993)Google Scholar
  4. 4.
    Pernici, B., Scannapieco, M.: Data quality in web information systems. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 397–413. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Stvilia, B., Twidle, M.B., Smith, L.C.: Assessing information quality of a community-based encyclopedia. In: Proc. of the International Conference on Information Quality, pp. 442–454 (2005)Google Scholar
  6. 6.
    Rassbach, L., Pincock, T., Mingus, B.: Exploring the feasibility of automatically rating online article quality (2008)Google Scholar
  7. 7.
    Dalip, D.H., Cristo, M., Calado, P.: Automatic quality assessment of content created collaboratively by web communities: A case study of wikipedia. In: Proc. of JCDL 2009, pp. 295–304 (2009)Google Scholar
  8. 8.
    Zeng, H., Alhossaini, M.A., Ding, L.: Computing trust from revision history. In: Proc. of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services (2006)Google Scholar
  9. 9.
    Zeng, H., Alhossaini, M.A., Fikes, R., McGuinness, D.L.: Mining revision history to assess trustworthiness of article fragments. In: Proc. of International Conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 1–10 (2009)Google Scholar
  10. 10.
    Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of the ACM 45(4), 211–218 (2002)CrossRefGoogle Scholar
  11. 11.
    Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Computing Surveys 41(3), 1–52 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jingyu Han
    • 1
  • Xiong Fu
    • 1
  • Kejia Chen
    • 1
  • Chuandong Wang
    • 1
  1. 1.School of Computer Science and TechnologyNanjing University of Posts and TelecommunicationsNanjingChina

Personalised recommendations