Advertisement

SAHN with SEP/COP and SPADE, to Build a General Web Navigation Adaptation System Using Server Log Information

  • Olatz Arbelaitz
  • Ibai Gurrutxaga
  • Aizea Lojo
  • Javier Muguerza
  • Iñigo Perona
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7023)

Abstract

During the last decades, the information on the web has increased drastically but larger quantities of data do not provide added value for web visitors; there is a need of easier access to the required information and adaptation to their preferences or needs. The use of machine learning techniques to build user models allows to take into account their real preferences. We present in this work the design of a complete system, based on the collaborative filtering approach, to identify interesting links for the users while they are navigating and to make the access to those links easier. Starting from web navigation logs and adding a generalization procedure to the preprocessing step, we use agglomerative hierarchical clustering (SAHN) combined with SEP/COP, a novel methodology to obtain the best partition from a hierarchy, to group users with similar navigation behavior or interests. We then use SPADE as sequential pattern discovery technique to obtain the most probable transactions for the users belonging to each group and then be able to adapt the navigation of future users according to those profiles. The experiments show that the designed system performs efficiently in a web-accesible database and is even able to tackle the cold start or 0-day problem.

Keywords

User Session Good Partition Cluster Validity Index Navigation Pattern Navigation Sequence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brusilovsky, P., Kobsa, A., Nejdl, W.: The Adaptive Web: Methods and Strategies of Web Personalization. LNCS, vol. 4321. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems 1(1) (1999)Google Scholar
  3. 3.
    Desikan, P., Srivastava, J., Kumar, V., Tan, P.N.: Hyperlink Analysis - Techniques and Applications. Army High Performance Computing Center Technical Report (2002)Google Scholar
  4. 4.
    EPA-HTTP logs. HTTP requests to the EPA WWW server located at Research Triangle Park, NC (1995), http://ita.ee.lbl.gov/html/contrib/EPA-HTTP.html
  5. 5.
    García, E., Romero, C., Ventura, S., De Castro, C.: An architecture for making recommendations to courseware authors using association rule mining and collaborative filtering. User Modeling User and Adapted Interaction 19(1-2), 99–132 (2009)CrossRefGoogle Scholar
  6. 6.
    Gurrutxaga, I., Albisua, I., Arbelaitz, O., Martín, J.I., Muguerza, J., Pérez, J.M., Perona, I.: SEP/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index. Pattern Recognition 43(10), 3364–3373 (2010)CrossRefzbMATHGoogle Scholar
  7. 7.
    Gusfield, D.: Algorithms on strings, trees, and sequences. Cambridge University Press (1997)Google Scholar
  8. 8.
    The Internet Traffic Archive, ACM SIGCOMM, http://ita.ee.lbl.gov/
  9. 9.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Upper Saddle River (1988)zbMATHGoogle Scholar
  10. 10.
    Kosala, R., Blockeel, H.: Web Mining Research: A Survey. ACM SIGKDD Explorations Newsletter 2(1), 1–15 (2000)CrossRefGoogle Scholar
  11. 11.
    Liu, B.: Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. Springer, Heidelberg (2007)zbMATHGoogle Scholar
  12. 12.
    Mobasher, B.: Web Usage Mining. In: Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer, Berlin (2006)Google Scholar
  13. 13.
    NASA-HTTP logs. HTTP requests to the NASA Kennedy Space Center WWW server in Florida (1995), http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html
  14. 14.
    National Aeronautics and Space Administration (2010), http://www.nasa.gov/
  15. 15.
    Pierrakos, D., Paliouras, G., Papatheodorou, C., Spyropoulos, C.D.: Web Usage Mining as a Tool for Personalization: A Survey. User Modeling and User Adapted Interaction 13, 311–372 (2003)CrossRefGoogle Scholar
  16. 16.
    Srivastava, J., Desikan, P., Kumar, V.: Web Mining - Concepts, Applications & Research Directions. In: Foundations and Advances in Data Mining. Springer, Heidelberg (2005)Google Scholar
  17. 17.
    Zaki, M.J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning 42, 31–60 (2001)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Olatz Arbelaitz
    • 1
  • Ibai Gurrutxaga
    • 1
  • Aizea Lojo
    • 1
  • Javier Muguerza
    • 1
  • Iñigo Perona
    • 1
  1. 1.Dept. of Computer Architecture and TechnologyUniversity of the Basque CountryDonostiaSpain

Personalised recommendations