Interval Set Clustering of Web Users with Rough KMeans
 Pawan Lingras,
 Chad West
 … show all 2 hide
Rent the article at a discount
Rent now* Final gross prices may vary according to local VAT.
Get AccessAbstract
Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuzzy set theory has been shown to be useful in three important aspects of web and data mining, namely clustering, association, and sequential analysis. There is increasing interest in research on clustering based on rough set theory. Clustering is an important part of web mining that involves finding natural groupings of web resources or web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in web mining. For example, the clusters and associations in web mining do not necessarily have crisp boundaries. As a result, researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have used genetic algorithms based on rough set theory for clustering. However, the genetic algorithms based clustering may not be able to handle the large amount of data typical in a web mining application. This paper proposes a variation of the Kmeans clustering algorithm based on properties of rough sets. The proposed algorithm represents clusters as interval or rough sets. The paper also describes the design of an experiment including data collection and the clustering process. The experiment is used to create interval set representations of clusters of web visitors.
 do Prado, H.A., Engel, P.M., and Filho, H.C. (2002). Rough Clustering: An Alternative to Finding Meaningful Clusters by Using the Reducts from a Dataset. In J. Alpigini, J.F. Peters, A. Skowron, N. Zhong (Eds.), Rough Sets and Current Trends in Computing (RSCTC'02). SpringerVerlag, Lecture notes in Artificial Intelligence 2475.
 Hartigan, J.A., Wong, M.A. (1979) Algorithm AS136: A KMeans Clustering Algorithm. Applied Statistics 28: pp. 100108
 Hathaway, R.J., Bezdek, J.C. (1993) Switching Regression Models and Fuzzy Clustering. IEEE Transactions of Fuzzy Systems 1: pp. 195204
 Hirano, S., Tsumoto, S. (2000) Rough Clustering and Its Application to Medicine. Journal of Information Science 124: pp. 125137
 Joachims, T., Armstrong, R., Freitag, D., and Mitchell, T. (1995). Webwatcher: A Learning Apprentice for the World Wide Web. In AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments.
 Joshi, A. and Krishnapuram, R. (1998). Robust Fuzzy Clustering Methods to SupportWeb Mining. In Proceedings of the Workshop on Data Mining and Knowledge Discovery, SIGMOD '98 (pp. 15/115/8).
 Krishnapuram, R., Frigui, H., Nasraoui, O. (1995) Fuzzy and Possibilistic Shell Clustering Algorithms and Their Application to Boundary Detection and Surface Approximation: Parts I and II. IEEE Transactions on Fuzzy Systems 3: pp. 2960
 Krishnapuram, R., Keller, J. (1993) A Possibilistic Approach to Clustering. IEEE Transactions on Fuzzy Systems 1: pp. 98110
 Lingras, P. (2001) Unsupervised Rough Set Classification Using GAs. Journal of Intelligent Information Systems 16: pp. 215228
 Lingras, P. (2002). Rough Set Clustering forWebMining. In Proceedings of 2002 IEEE International Conference on Fuzzy Systems.
 Lingras, P. and Huang, X. (2002). Statistical, Evolutionary, and Neurocomputing Clustering Techniques: ClusterBased Versus ObjectBased Approaches. Intelligence Review (submitted).
 MacQueen, J. (1967). Some Methods fir Classification and Analysis of Multivariate Observations. In L.M. Le Cam and J. Neyman (Eds.), Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1 (pp. 281297).
 Pawlak, Z. (1982) Rough Sets. International Journal of Information and Computer Sciences 11: pp. 145172
 Pawlak, Z. (1984) Rough Classification. International Journal of ManMachine Studies 20: pp. 469483
 Pawlak, Z. (1992). Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers.
 Polkowski, L., Skowron, (1996) Rough Mereology: A New Paradigm for Approximate Reasoning. International Journal of Approximate Reasoning 15: pp. 333365
 Perkowitz, M. and Etzioni, O. (1997). Adaptive Web Sites: An AI Challenge. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence.
 Perkowitz, M. and Etzioni, O. (1999). Adaptive Web Sites: Conceptual Cluster Mining. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence.
 Peters, J.F., Skowron, A., Suraj, Z., Rzasa, W., and Borkowski, M. (2002). Clustering: A Rough Set Approach to Constructing Information Granules. In Z. Suraj (Ed.), Soft Computing and Distributed Processing, Proceedings of 6th International Conference, SCDP 2002 (pp. 5761).
 Skowron, A., Stepaniuk, J. Information Granules in Distributed Environment. In: Ohsuga, S., Zhong, N., Skowron, A. eds. (1999) New Directions in Rough Sets, Data Mining, and GranularSoft Computing. SpringerVerlag, Tokyo, pp. 357365
 Voges, K.E., Pope, N.K.Ll., and Brown, M.R. (2002a). Cluster Analysis of Marketing Data: A Comparison of KMeans, Rough Set, and Rough Genetic Approaches. In H.A. Abbas, R.A. Sarker, and C.S. Newton (Eds.), Heuristics and Optimization for Knowledge Discovery (pp. 208216). Idea Group Publishing.
 Voges, K.E., Pope, N.K.Ll., and Brown, M.R. (2002b). Cluster Analysis of Marketing Data Examining OnLine Shopping Orientation: A Comparison of KMeans, Rough Clustering Approaches. In H.A. Abbas, R.A. Sarker, and C.S. Newton (Eds.), Heuristics and Optimization for Knowledge Discovery (pp. 217225). Idea Group Publishing.
 Yao, Y.Y., Li, X., Lin, T.Y., and Liu, Q. (1994). Representation and Classification of Rough Set Models. In Proceeding of Third International Workshop on Rough Sets and Soft Computing (pp. 630637).
 Title
 Interval Set Clustering of Web Users with Rough KMeans
 Journal

Journal of Intelligent Information Systems
Volume 23, Issue 1 , pp 516
 Cover Date
 20040701
 DOI
 10.1023/B:JIIS.0000029668.88665.1a
 Print ISSN
 09259902
 Online ISSN
 15737675
 Publisher
 Kluwer Academic Publishers
 Additional Links
 Topics
 Keywords

 clustering
 interval sets
 Kmeans algorithm
 rough sets
 unsupervised learning
 web mining
 Industry Sectors
 Authors

 Pawan Lingras ^{(1)}
 Chad West ^{(1)}
 Author Affiliations

 1. Saint Mary's University, Halifax, Nova Scotia, B3H 3C3, Canada