Clustering of Search Engine Keywords Using Access Logs

  • Shingo Otsuka
  • Masaru Kitsuregawa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4080)


It the becomes possible that users can get kinds of information by just inputting search keyword(s) representing the topic which users are interested in. But it is not always true that users can hit upon search keyword(s) properly. In this paper, by using Web access logs (called panel logs), which are collected URL histories of Japanese users (called panels) selected without static deviation similar to the survey on TV audience rating, we study the methods of clustering search keywords. Different from the existing systems where the related search keywords are extracted based on the set of URLs viewed by the users after input of their original search keyword(s), we propose two novel methods of clustering the search words. One is based on the Web communities (set of similar web pages); the other is based on the set of nouns obtained by morphological analysis of Web pages. According to evaluation results, our proposed methods can extract more related search keywords than that based on URL.


Feature Space Search Keyword Complete Bipartite Graph Community Space Matching Factor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM Transactions on Internet Technology (ACM TIT) 3(1), 1–27 (2003)CrossRefGoogle Scholar
  2. 2.
    Cooley, R., Mobasher, B., Srivastava, J.: Web mining: Information and pattern discovery on the world wide web. In: Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 1997) (1997)Google Scholar
  3. 3.
    Ungar, L., Foster, D.: Clustering methods for collaborative filtering. In: AAAI Workshop on Recommendation Systems (1998)Google Scholar
  4. 4.
    Otsuka, S., Toyoda, M., Hirai, J., Kitsuregawa, M.: Extracting User Behavior by Web Communities Technology on Global Web Logs. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 957–968. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Su, Z., Yang, Q., Zhang, H., Xu, X., Hu, Y.: Correlation-based document clustering using web logs. In: 34th Hawaii International Conference on System Sciences (HICSS-34) (2001)Google Scholar
  6. 6.
    Tan, P., Kumar, V.: Mining association patterns in web usage data. In: International Conference on Advances in Infrastructure for e-Business, e-Education, e-Science, and e-Medicine on the Internet (2002)Google Scholar
  7. 7.
    Beeferman, D., Berger, A.: Agglomerative clustering of search engine query log. In: The 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000) (2000)Google Scholar
  8. 8.
    Wen, J., Nie, J., Zhang, H.: Query clustering using user logs. ACM Transactions on Information Systems (ACM TOIS) 20(1), 59–81 (2002)CrossRefGoogle Scholar
  9. 9.
    Ohkubo, M., Sugizaki, M., Inoue, T., Tanaka, K.: Extracting information demand by analyzing a www search log. IPSJ Journal 39(7), 2250–2258 (1998)Google Scholar
  10. 10.
    Koutsoupias, N.: Exploring web access logs with correspondence analysis. Methods and Applications of Artificial Intelligence, Second Hellenic (2002)Google Scholar
  11. 11.
    Prasetyo, B., Pramudiono, I., Takahashi, K., Kitsuregawa, M.: naviz : Website navigational behavior visualizer. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS, vol. 2336, p. 276. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. 12.
    Zeng, H., Chen, Z., Ma, W.: A unified framework for clustering heterogeneous web objects. In: The Third International Conference on Web Information Systems Engineering (WISE 2002) (2002)Google Scholar
  13. 13.
    Catledge, L., Pitkow, J.: Characterizing browsing behaviors on the world-wide web. Computer Networks and ISDN Systems 27(6) (1995)Google Scholar
  14. 14.
    Flake, G., Lawrence, S., Giles, C.L., Coetzee, F.: Self-organization and identification of web communities. IEEE Computer 35(3), 66–71 (2002)Google Scholar
  15. 15.
    Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the web for emerging cyber-communities. In: Proc. of the 8th WWW conference, pp. 403–416 (1999)Google Scholar
  16. 16.
    Toyoda, M., Kitsuregawa, M.: Creating a web community chart for navigating related communities. In: Conference Proceedings of Hypertext 2001, pp. 103–112 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Shingo Otsuka
    • 1
  • Masaru Kitsuregawa
    • 1
  1. 1.Institute of Industrial ScienceThe University of TokyoMeguro-ku, TokyoJapan

Personalised recommendations