Journal of Intelligent Information Systems

, Volume 47, Issue 1, pp 165–192 | Cite as

Improving the prediction of page access by using semantically enhanced clustering

  • Erman Sen
  • I. Hakki Toroslu
  • Pinar KaragozEmail author


There are many parameters that may affect the navigation behaviour of web users. Prediction of the potential next page that may be visited by the web user is important, since this information can be used for prefetching or personalization of the page for that user. One of the successful methods for the determination of the next web page is to construct behaviour models of the users by clustering. The success of clustering is highly correlated with the similarity measure that is used for calculating the similarity among navigation sequences. This work proposes a new approach for determining the next web page by extending the standard clustering with the content-based semantic similarity method. Semantics of web-pages are represented as sets of concepts, and thus, user session are modelled as sequence of sets. As a result, session similarity is defined as an alignment of two sequences of sets. The success of the proposed method has been shown through applying it on real life web log data.


Ontology Concept set similarity Session similarity Sequence alignment 


  1. Batet, M., Erola, A., Sanchez, D., & Castella-Roca, J. (2013). Utility preserving query log anonymization via semantic microaggregation. Information Sciences, 242, 49–63.CrossRefGoogle Scholar
  2. Bayir, M., Toroslu, I., Cosar, A., & Fidan, G. (2009). Smart miner: a new framework for mining large scale web usage data. In International conference in World Wide Web (pp. 161–170).Google Scholar
  3. Bayir, M., Toroslu, I., Demirbas, M., & Cosar, A. (2012). Discovering better navigation sequences for the session construction problem. Data and Knowledge Engineering, 73, 58–72.CrossRefGoogle Scholar
  4. Berendt, B. (2000a). Analysis of navigation behaviour in web sites integrating multiple information systems. VLDB Journal, 9, 56–75.Google Scholar
  5. Berendt, B. (2000b). Web usage mining, site semantics and the support of navigation. In Web mining for e-commerce—challenges and opportunities workshop (WEBKDD).Google Scholar
  6. Berendt, B. (2001). Understanding web usage at different levels of abstraction: coarsening and visualizing sequence. In WEBKDD Workshop of mining log data across all customer touch points.Google Scholar
  7. Blanco, L., Dalvi, N., & Machanavajjhala, A. (2011). Highly efficient algorithms for structural clustering of large websites. In 20th international conference on world wide web (WWW) (pp. 443– 446).Google Scholar
  8. Dai, H., & Mobasher, B. (2002). Using ontologies to discover domain-level web usage profiles. In PKDD workshop on semantic mining.Google Scholar
  9. Eirinaki, M., & Vazirgiannis, M. (2003). Web mining for web personalization. ACM Transactions on Internet Technology, 3(1), 1–27.CrossRefGoogle Scholar
  10. Eirinaki, M., Vazigiannis, M., & Varlamis, I. (2003). Sewep: using site semantics and a taxonomy to enhance the web personalization process. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 99–108).Google Scholar
  11. Gunel, B., & Senkul, P. (2012a). Integrating semantic tagging with popularity based pagerank for next page prediction. In International symposium on computer and information sciences (ISCIS).Google Scholar
  12. Gunel, B., & Senkul, P. (2012b). Investigating the effect of duration, page size end frequency on next page recommendation with pagerank algorithm. In WSDM Workshop on web search and click data (WSCD).Google Scholar
  13. Harispe, S., Sanchez, D., Ranwez, S., Janaqi, S., & Montmain, J. (2014). A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. Journal of Biomedical Informatics, 48, 38–53.CrossRefGoogle Scholar
  14. Heflin, J., Hendler, J., & Luke, S. (1999). Shoe: a knowledge representation language for internet applications. CS-TR-4078 (UMACS TR-99-71), University of Maryland, Dept. of Computer Sciences.Google Scholar
  15. Kilic, S., Senkul, P., & Toroslu, I.H. (2012). Clustering frequent navigation patterns from website logs by using ontology and temporal information. In International symposium on computer and information sciences (ISCIS) (pp. 363–370).Google Scholar
  16. Mobasher, B., Cooley, R., & Srivastava, J. (1999). Creating adaptive web through usage-based clustering of urls. In IEEE Knowledge and data engineering exchange workshop.Google Scholar
  17. Mobasher, B., Cooley, R., & Srivastava, J. (2000a). Automatic personalization based on web usage mining. Communications of the ACM, 43(8), 142–151.Google Scholar
  18. Mobasher, B., Dai, H., Luo, T., Nakagawa, M., Yuqing, S., & Wiltshire, J. (2000b). Discovery of aggregate usage profiles for web personalization. In WEBKDD workshop on web mining for e-commerce.Google Scholar
  19. Mobasher, B., Dai, H., Luo, T., Yuqing, S., & Zhu, J. (2000c). Integrating web usage and content mining for more effective personalization. In International conference on e-commerce and web technologies (ECWeb).Google Scholar
  20. Needleman, S., & Wunsch, C. (1970). A general method applicable to search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3), 443–453.CrossRefGoogle Scholar
  21. Pallis, G., Lefteris, A., & Vakali, A. (2007). Validation and interpretation of web users’ session clusters. Information Processing and Managament, 43(5), 1348–1367.CrossRefGoogle Scholar
  22. Perkowitz, M., & Etzioni, O. (1998). Adaptive web sites: automatically synthesizing web pages. In National conference on artificial intelligence.Google Scholar
  23. Perkowitz, M., & Etzioni, O. (1999). Adaptive web sites: conceptual cluster mining. In International joint conference on artificial intelligence (IJCAI).Google Scholar
  24. Perkowitz, M., & Etzioni, O. (2000). Towards adaptive web sites: conceptual framework and case study. Artificial Intelligence, 118(1–2), 245–275.CrossRefzbMATHGoogle Scholar
  25. Pirro, G. (2009). A semantic similarity metric combining features and intrinsic information content. Data and Knowledge Engineering, 68(11), 1289–1308.CrossRefGoogle Scholar
  26. Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19, 17–30.CrossRefGoogle Scholar
  27. Ricklefs, M., & Blomqvist, E. (2008). Ontology-based relevance assesment: an evaluation of different semantic similarity measures. In On the move (OTM) confederated international conferences (coopIS) (pp. 1235–1252).Google Scholar
  28. Sanchez, D., Batet, M., Isem, D., & Valls, A. (2012). Ontology-based semantic similarity: a new feature-based approach. Expert Systems with Applications, 39(9), 7718–7728.CrossRefGoogle Scholar
  29. Senkul, P., & Salin, S. (2012). Improving pattern quality in web usage mining by using semantic information. Knowledge and Information Systems, 30, 527–541.CrossRefGoogle Scholar
  30. Spiliopolou, M. (2000). Web usage mining for web site evaluation. Communications of the ACM, 43(8), 127–134.CrossRefGoogle Scholar
  31. Spiliopoulou, M., & Faulstich, L. (1998). Wum: a web utilization miner. In International workshop on the web and databases.Google Scholar
  32. Spiliopoulou, M., Faulstich, L., & Wilkler, K. (1999). A data miner analyzing the navigational behaviour of web users. In ACAI workshop on machine learning in user modeling.Google Scholar
  33. Thwe, P. (2014). Web page access prediction based on integrated approach. International Journal of Computer Science and Business Informatics, 12(1), 55–64.Google Scholar
  34. Varelas, G., Voutsakis, E., Raftapoulou, P., Petrakis, E., & Milios, E. (2005). Semantic similarity methods in wordnet and their application to information retrieval on the web. In International workshop on web information and data management (WIDM) (pp. 10–16).Google Scholar
  35. Zhao, Y., & Karypis, G. (2004). Emprical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 55, 311–331.CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Computer Engineering DepartmentMiddle East Technical UniversityAnkaraTurkey

Personalised recommendations