Quality Evaluation of Search Results by Typicality and Speciality of Terms Extracted from Wikipedia

  • Makoto Nakatani
  • Adam Jatowt
  • Hiroaki Ohshima
  • Katsumi Tanaka
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5463)


In Web search, it is often difficult for users to judge which page they should choose among search results and which page provides high quality and credible content. For example, some results may describe query topics from narrow or inclined viewpoints or they may contain only shallow information. While there are many factors influencing quality perception of search results, we propose two important aspects that determine their usefulness, “topic coverage” and “topic detailedness”. “Topic coverage” means the extent to which a page covers typical topics related to query terms. On the other hand, “topic detailedness” measures how many special topics are discussed in a Web page. We propose a method to discover typical topic terms and special topics terms for a search query by using the information gained from the structural features of Wikipedia, the free encyclopedia. Moreover, we propose an application to calculate topic coverage and topic detailedness of Web search results by using terms extracted from Wikipedia.


Search results quality Wikipedia mining Term extraction Term typicality Term speciality 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Nakamura, S., Konishi, S., Jatowt, A., Ohshima, H., Kondo, H., Tezuka, T., Oyama, S., Tanaka, K.: Trustworthiness analysis of web search results. In: Kovács, L., Fuhr, N., Meghini, C. (eds.) ECDL 2007. LNCS, vol. 4675, pp. 38–49. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Giles, J.: Internet encyclopedia go head to head. Nature 438 (2005)Google Scholar
  3. 3.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 107–117Google Scholar
  4. 4.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Haveliwala, T.H.: Topic-sensitive pagerank. In: WWW 2002: Proceedings of the 11th international conference on World Wide Web, pp. 517–526. ACM, New York (2002)Google Scholar
  6. 6.
    Cho, J., Roy, S., Adams, R.E.: Page quality: in search of an unbiased web ranking. In: SIGMOD 2005: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 551–562. ACM, New York (2005)Google Scholar
  7. 7.
    Yanbe, Y., Jatowt, A., Nakamura, S., Tanaka, K.: Can social bookmarking enhance search in the web? In: JCDL 2007: Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, pp. 107–116. ACM, New York (2007)CrossRefGoogle Scholar
  8. 8.
    Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., Su, Z.: Optimizing web search using social annotations. In: WWW 2007: Proceedings of the 16th international conference on World Wide Web, pp. 501–510. ACM, New York (2007)Google Scholar
  9. 9.
    Amento, B., Terveen, L., Hill, W.: Does “authority” mean quality? predicting expert quality ratings of web documents. In: SIGIR 2000: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 296–303. ACM, New York (2000)Google Scholar
  10. 10.
    Ivory, M.Y., Hearst, M.A.: Statistical profiles of highly-rated web sites. In: CHI 2002: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 367–374. ACM, New York (2002)Google Scholar
  11. 11.
    Mandl, T.: Implementation and evaluation of a quality-based search engine. In: HYPERTEXT 2006: Proceedings of the seventeenth conference on Hypertext and hypermedia, pp. 73–84. ACM, New York (2006)CrossRefGoogle Scholar
  12. 12.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on Computational linguistics, Morristown, NJ, USA, Association for Computational Linguistics (1992)Google Scholar
  13. 13.
    Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th international conference on WWW. ACM, New York (2007)Google Scholar
  14. 14.
    Cilibrasi, R.L., Vitanyi, P.M.B.: The google similarity distance. IEEE TKDE 19(3) (2007)Google Scholar
  15. 15.
    Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: Proceedings of National Conference for Artificial Intelligence (2006)Google Scholar
  16. 16.
    Milne, D., Medelyan, O., Witten, I.H.: Mining domain-specific thesauri from wikipedia: A case study. In: International Conference on Web Intelligence (2006)Google Scholar
  17. 17.
    Erdmann, M., Nakayama, K., Hara, T., Nishio, S.: An approach for extracting bilingual terminology from wikipedia. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds.) DASFAA 2008. LNCS, vol. 4947, pp. 380–392. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Milne, D.N., Witten, I.H., Nichols, D.M.: A knowledge-based search engine powered by wikipedia. In: Proceedings of the sixteenth ACM conference on CIKM. ACM, New York (2007)Google Scholar
  19. 19.
    Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the sixteenth ACM conference on CIKM. ACM, New York (2007)Google Scholar
  20. 20.
    Bennett, N.A., Qin He, K.P., Schatz, B.R.: Extracting noun phrases for all of medline. In: Proceedings of the American Medical Informatics Association (1999)Google Scholar
  21. 21.
    Klavans, J.L., Muresan, S.: Definder: Rule-based methods for the extraction of medical terminology and their associated definitions from on-line text. In: Proceeding of the American Medical Informatics Association (2000)Google Scholar
  22. 22.
    Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proceedings of the Sixteenth IJCAI. Morgan Kaufmann Publishers Inc., San Francisco (1999)Google Scholar
  23. 23.
    Liu, B., Chin, C.W., Ng, H.T.: Mining topic-specific concepts and definitions on the web. In: Proceedings of the 12th international conference on WWW. ACM, New York (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Makoto Nakatani
    • 1
  • Adam Jatowt
    • 1
  • Hiroaki Ohshima
    • 1
  • Katsumi Tanaka
    • 1
  1. 1.Department of Social Informatics, Graduate School of InformaticsKyoto UniversityKyotoJapan

Personalised recommendations