Abstract
In this paper we describe a practical approach for modeling navigation patterns of visitors of unstructured websites. These patterns are derived from web logs that are enriched with 3 sorts of information: (1) content type of visited pages, (2) visitor type, and (3) location of the visitor. We developed an intelligent Text Mining system, iTM, which supports the process of classifying web pages into a number of pre-defined categories. With help of this system we were able to reduce the labeling effort by a factor 10–20 without affecting the accuracy of the final result too much. Another feature of our approach is the use of a new technique for modeling navigation patterns: navigation trees. They provide a very informative graphical representation of most frequent sequences of categories of visited pages.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imielinski, T., and Swami, A. (1993), Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 207–216.
Argamon-Engelson, S. and Dagan, I. (1999). Commitee-based sample selection for probabilistic classifiers. Journal of Artificial Intelligence Research, (11):335–360, 1999.
Baglioni, M., Ferrara, U., Romei, A., Ruggieri, S., and Turini, F. (2003), Preprocessing and Mining Web Log Data for Web Personalization. 8th Italian Conf. on Artificial Intelligence vol. 2829 of LNCS, p.237–249.
Balog, K., (2004). An Intelligent Support System for Developing Text Classifiers. MSc. Thesis, Vrije Universiteit Amsterdam, The Netherlands.
Cadez, I. V., Heckerman, D., Meek, C, Smyth, P., and White, S. (2003), Model-Based Clustering and Visualization of Navigation Patterns on a Web Site. Data Mining and Knowledge Discovery, vol.7 n.4, p.399–424.
Chevalier, K., Bothorel, C, and Corruble, V. (2003), Discovering rich navigation patterns on a web site. Proceedings of the 6th International Conference on Discovery Science Hokkaido University Conference Hall, Sapporo, Japan.
Cooley, R., Mobasher, B., Srivastava, J. (1999), Data Preparation for Mining World Wide Web Browsing Patterns. In Knowledge and Information System, vol. 1(1), pages 5–32.
Dumais, S.T., and H. Chen (2000). Hierarchical classification of web content. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’00), August 2000, pages 256–263.
Hay B., Wets, G., and Vanhoof K. (2003), Segmentation of visiting patterns on websites using a sequence alignment method. Journal of Retailing and Consumer Services vol.10, p. 145–153.
Hofgesang, P.I., (2004). Web usage mining. Structuring semantically enriched clickstream data. MSc. Thesis, Vrije Universiteit Amsterdam, The Netherlands.
Jenamani, M., Mohapatra, P.K.J., and Ghose, S. (2003), A stochastic model of e-customer behaviour. Electronic Commerce Research and Applications vol.2, p.81–94.
Kosala, R., and Blocked, H. (2000). Web mining research: A survey, SIGKDD Explorations. Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining 2(1), pp. 1–15, July, 2000.
Mladenic, D. (1998). Turning Yahoo to Automatic Web-Page Classifier. In H. Prade, editor, Proceedings of the 13th European Conference on Artificial Intelligence (ECAI-98), pages 473–474.
Mobasher, B., Jain, N., Han, E., and Srivastava, J. (1996), Web Mining: Pattern discovery from World Wide Web transactions. Technical Report TR 96-050, University of Minnesota, Dept. of Computer Science, Minneapolis.
Nanopoulos A., Manolopoulos Y. (2001), Mining patterns from graph traversals. Data and Knowledge Engineering No. 37, pages 243–266.
Nigam, K., McCallum, A.K., Thrun, S., and Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, Kluwer Acedemic Press, 39(2/3),pages 103–134.
Pei, J., Han, J., Mortazavi-asl, B., and Zhu, H. (2000), Mining Access Patterns Efficiently from Web Logs. Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 396–407.
Sebastiani, F. (2002), Machine learning in automated text categorization. ACM Computing Surveys, 34(1), pages 1–47.
Schapire, R.E. and Singer, Y. (2000). Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2/3), pages 135–168.
Web Mining and Web Usage Mining Software, http://www.kdnuggets.com/software/web.html
Xing, D., and Shen, J. (2004), Efficient data mining for web navigation patterns. Information and Software Technology vol.46, pages 55–63.
Yang, Q., Li T.I., and Wang K. (2003), Web-log Cleaning for Constructing Sequential Classifiers. Applied Artificial Intelligence vol. 17, issue 5–6, pages 431–441.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag London Limited
About this paper
Cite this paper
Balog, K., Hofgesang, P., Kowalczyk, W. (2006). Modeling Navigation Patterns of Visitors of Unstructured Websites. In: Bramer, M., Coenen, F., Allen, T. (eds) Research and Development in Intelligent Systems XXII. SGAI 2005. Springer, London. https://doi.org/10.1007/978-1-84628-226-3_10
Download citation
DOI: https://doi.org/10.1007/978-1-84628-226-3_10
Publisher Name: Springer, London
Print ISBN: 978-1-84628-225-6
Online ISBN: 978-1-84628-226-3
eBook Packages: Computer ScienceComputer Science (R0)