Reliability Prediction of Webpages in the Medical Domain

  • Parikshit Sondhi
  • V. G. Vinod Vydiswaran
  • ChengXiang Zhai
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7224)


In this paper, we study how to automatically predict reliability of web pages in the medical domain. Assessing reliability of online medical information is especially critical as it may potentially influence vulnerable patients seeking help online. Unfortunately, there are no automated systems currently available that can classify a medical webpage as being reliable, while manual assessment cannot scale up to process the large number of medical pages on the Web. We propose a supervised learning approach to automatically predict reliability of medical webpages. We developed a gold standard dataset using the standard reliability criteria defined by the Health on Net Foundation and systematically experimented with different link and content based feature sets. Our experiments show promising results with prediction accuracies of over 80%. We also show that our proposed prediction method is useful in applications such as reliability-based re-ranking and automatic website accreditation.


Reliability Prediction Mean Average Precision Medical Domain Spam Detection Weighted Accuracy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Andersen, R., Borgs, C., Chayes, J., Hopcroft, J., Jain, K., Mirrokni, V., Teng, S.: Robust PageRank and Locally Computable Spam Detection Features. In: AIRWeb 2008: Proceedings of the 4th Intl. Workshop on Adversarial Information Retrieval on the Web, pp. 69–76 (2008)Google Scholar
  2. 2.
    Aphinyanaphongs, Y., Aliferis, C.F.: Text Categorization Models for Identifying Unproven Cancer Treatments on the Web. In: MedInfo, pp. 968–972 (2007)Google Scholar
  3. 3.
    Becchetti, L., Castillo, C., Donato, D., Baeza-Yates, R., Leonardi, S.: Link analysis for Web spam detection. ACM Trans. Web 2(1), 1–42 (2008)CrossRefGoogle Scholar
  4. 4.
    Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link Analysis Ranking: Algorithms, Theory, and Experiments. ACM TOIT 5(1), 231–297 (2005)CrossRefGoogle Scholar
  5. 5.
    Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proc. of WWW (1998)Google Scholar
  6. 6.
    Gaudinat, A., Grabar, N., Boyer, C.: Automatic Retrieval of Web Pages with Standards of Ethics and Trustworthiness Within a Medical Portal: What a Page Name Tells Us. In: Proc. of Conf. on Artificial Intelligence in Medicine (AIME), pp. 185–189 (2007)Google Scholar
  7. 7.
    Gaudinat, A., Grabar, N., Boyer, C.: Machine Learning Approach for Automatic Quality Criteria Detection of Health Web Pages. In: Proc. of the World Congress on Health (Medical) Informatics – Building Sustainable Health Systems, vol. 129, pp. 705–709 (2007)Google Scholar
  8. 8.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publ. (2006)Google Scholar
  9. 9.
    Henzinger, M.R.: Link Analysis in Web Information Retrieval. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 23, 3–8 (2000)Google Scholar
  10. 10.
    Joachims, T.: Making large-scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods – Support Vector Learning. MIT Press (1998)Google Scholar
  11. 11.
    Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  12. 12.
    Lankes, D.R.: Trusting the Internet: New Approaches to Credibility Tools, pp. 101–122. MIT Press (2008)Google Scholar
  13. 13.
    Marriott, J.V., Stec, P., El-Toukhy, T., Khalaf, Y., Braude, P., Coomarasamy, A.: Infertility information on the World Wide Web: a cross-sectional survey of quality of infertility information on the internet in the UK. In: Human Reproduction, pp. 1520–1525 (July 2008)Google Scholar
  14. 14.
    Martin, M.J.: Reliability and verification of natural language text on the world wide web. PhD thesis, Las Cruces, NM, USA, Chair-Hartley, Roger T (2005)Google Scholar
  15. 15.
    Matthews, S.C., Camacho, A., Mills, P.J., Dimsdale, J.E.: The Internet for Medical Information About Cancer: Help or Hindrance? Psychosomatics 44, 100–103 (2003)CrossRefGoogle Scholar
  16. 16.
    Price, S.L., Hersh, W.R.: Filtering Web pages for Quality Indicators: An Empirical Approach to Finding High Quality Consumer Health Information on the World Wide Web. In: Proceedings of AMIA Symposium, pp. 911–915 (1999)Google Scholar
  17. 17.
    Rubin, V.L., Liddy, E.D.: Assessing credibility of weblogs. In: AAAI Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW), pp. 187–190 (2006)Google Scholar
  18. 18.
    Tang, T.T., Craswell, N., Hawking, D., Griffiths, K., Christensen, H.: Quality and relevance of domain-specific search: A case study in mental health. Inf. Retr. 9(2), 207–225 (2006)CrossRefGoogle Scholar
  19. 19.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)zbMATHGoogle Scholar
  20. 20.
    Vydiswaran, V., Zhai, C., Roth, D.: Content-driven Trust Propagation Framework. In: Proceedings of the 17th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 974–982 (2011)Google Scholar
  21. 21.
    Wang, Y., Richard, R.: Rule-based Automatic Criteria Detection for Assessing Quality of Online Health Information. Journal on Information Technology in Healthcare 5(5), 288–299 (2007)Google Scholar
  22. 22.
    Zhang, L., Zhang, Y., Zhang, Y., Li, X.: Exploring both Content and Link Quality for Anti-Spamming. In: Proceedings of the Sixth IEEE International Conference on Computer and Information Technology (CIT), p. 37 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Parikshit Sondhi
    • 1
  • V. G. Vinod Vydiswaran
    • 1
  • ChengXiang Zhai
    • 1
  1. 1.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations