Abstract
We describe the web usage mining activities of an on-going project, called ClickWorld, that aims at extracting models of the navigational behaviour of a web site users. The models are inferred from the access logs of a web server by means of data and web mining techniques. The extracted knowledge is deployed to the purpose of offering a personalized and proactive view of the web services to users. We first describe the preprocessing steps on access logs necessary to clean, select and prepare data for knowledge extraction. Then we show two sets of experiments: the first one tries to predict the sex of a user based on the visited web pages, and the second one tries to predict whether a user might be interested in visiting a section of the site.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)
Berendt, B., Mobasher, B., Spiliopoulou, M., Nakagawa, M.: A framework for the evaluation of session reconstruction heuristics in web usage analysis. INFORMS Journal of Computing 15(2) (2003)
Berendt, B., Spiliopolou, M.: Analysis of navigation behaviour in web sites integrating multiple information systems. VLDB Journal 9(1), 56–75 (2000)
Cooley, R., Deshpande, M., Srivastava, J., Tan, P.N.: Web usage mining: Discovery and applications of usage patterns from web data. ACM SIGKDD Explorations 1(2) (January 2000)
Dai, H., Mobasher, B.: A road map to more effective web personalization: Integrating domain knowledge with web usage mining. In: Proceedings of the International Conference on Internet Computing 2003 (IC 2003) (2003)
Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM Transactions on Internet Technology (TOIT) 3(1), 1–27 (2003)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Mateo (2000)
Joshi, K.P., Joshi, A., Yesha, Y., Krishnapuram, R.: Warehousing and mining web logs. In: Proc. of ACM CIKM Workshop on Web Information and Data Management (WIDM 1999), pp. 63–68. ACM, New York (1999)
KDnuggets. Software for web mining, http://www.kdnuggets.com/-software/web.html
Kosala, R., Blockeel, H.: Web mining research: A survey. SIGKDD Esplorations 2(1), 1–15 (2000)
Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: IEEE International Conference on Data Mining, pp. 369–376 (2001)
Murray, D., Durrell, K.: Inferring demographic attributes of anonymous internet users. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 7–20. Springer, Heidelberg (2000)
Pohle, C., Spiliopoulou, M.: Building and exploiting ad hoc concept hierarchies for web log analysis. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 83–93. Springer, Heidelberg (2002)
Ruggieri, S.: Efficient C4.5. IEEE Transactions on Knowledge and Data Engineering 14, 438–444 (2002)
Spiliopoulou, M., Faulstich, L.C.: WUM: a Web Utilization Miner. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 109–115. Springer, Heidelberg (1998)
Sweiger, M., Madsen, M.R., Langston, J., Lombard, H.: Clickstream Data Warehousing. John Wiley & Sons, Chichester (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baglioni, M., Ferrara, U., Romei, A., Ruggieri, S., Turini, F. (2003). Preprocessing and Mining Web Log Data for Web Personalization. In: Cappelli, A., Turini, F. (eds) AI*IA 2003: Advances in Artificial Intelligence. AI*IA 2003. Lecture Notes in Computer Science(), vol 2829. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39853-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-39853-0_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20119-9
Online ISBN: 978-3-540-39853-0
eBook Packages: Springer Book Archive