Abstract
The efficiency of many data driven e-commerce system may be compromised by an abundance of data. In this chapter we discuss how knowledge discovery and data mining techniques can be useful in improving the scalability of data driven e-commerce systems. In particular we focus on improving scalability via dimensionality reduction and improving the information view experienced by each user. To address these issues, we cover several common data mining problems, including feature selection, clustering, classification, and association rule discovery and present several scalable methods and algorithms that address each of those problems. Numerous examples are included to illustrate the key ideas.
This work was supported by the National Science Foundation under grant DMI-0075575.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 1994 International Conference on Very Large Data Bases (VLDB′94), 487–499.
Agrawal, R., Aggerwal, C. and Prasad, V.V.V. (2000). A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing.
Aha, D.W., and Bankert, R. L. (1996). A comparative evaluation of sequential feature selection algorithms. In D. Fisher & J.-H. Lenz (Eds.), Artificial Intelligence and Statistics V. New York: Springer-Verlag.
Ansari, S., R. Kohavi, L. Mason, and Z. Zheng (2000). Integrating e-commerce and data mining: architecture and challenges. In Proceedings of ACM WEBKDD 2000.
Basu, C., Hirch, H., and Cohen, W. (1998). Recommendation as classification: using social and content based information for recommendation. In Proceedings of the National Conference on Artificial Intelligence.
Bennett, K.P. and C. Campbell (2000). Support vector machines: hype or hallelujah? SIGKDD Explorations 2(2), 1–13.
Bradley, P.S., Mangasarian, O.L., and Street, W.N. (1998). Feature selection via mathematical programming, INFORMS Journal on Computing, 10(2), 209–217.
Breese, J., Heckerman, D., and Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14 th Conference on Uncertainty in Artificial Intelligence.
Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees. Wadsworth International Group, Monterey, CA.
Burges, C.J.C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Knowledge Discovery and Data Mining, 2(2).
Caruana, R., and Freitag, D. (1994). Greedy feature selection. In Proceedings of the Eleventh International Conference on Machine Learning, pp. 28–36. New Brunswick, NJ: Morgan Kaufmann.
Cortes, C. and V. Vapnik (1995). Support vector networks. Machine Learning 20, 273–297.
Duda, R. and P. Hart (1973). Pattern Classification and Scene Analysis. John Wiley & Sons, New York.
Fisher, D. (1987). Improving inference through conceptual clustering. In Proceedings of the 1987 AAAI Conference, 461–465, Seattle, WA.
Fletcher, R. (1987). Practical Methods of Optimization. John Wiley and Sons, New York.
Gennari, J.H., P. Langley, and D. Fisher (1990). Models of incremental concept formation. Artificial Intelligence 40, 11–61.
Good, N., Schafer, J.B., Konstan, J.A., Borchers, A., and Sarwar, B. (1999). Combining collaborative filtering with personal agents for better recommendations. In Proceedings of the National Conference on Artificial Intelligence.
Gupta, V.R. (1997). An Introduction to Data Warehousing, System Services Corporation, Chicago, IL.
Hall, M.A. (2000). Correlation-based feature selection for discrete and numeric class machine learning. Proceedings of the Seventeenth International Conference on Machine Learning, Stanford University, CA. Morgan Kaufmann Publishers.
Han, J. and Kamber, M. (2001). Data Mining: concepts and techniques. Morgan Kaufmann, San Francisco, CA.
Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of 2000 ACM International Conference on Management of Data (SIGMOD′00), 1–12.
Heckerman, D. (1996). Bayesian networks for knowledge discovery. In Advances in Knowledge Discovery and Data Mining, 273–305, MIT Press, Cambridge, MA.
Hipp, J., Güntzer, U. and Nakhaeizadeh, G. (2000). Algorithms for association rule mining - a general survey and comparison. SIGKDD Explorations 2, 58–64.
Immon, W.H. (1996). Building the Data Warehouse. John Wiley & Sons, New York.
Kaufman, L. and P.J. Rousseeuw (1990). Finding Groups in Data: and introduction to cluster analysis. John Wiley & Sons, New York.
Kim, Y.S., Street, W.N, and Menczer, F. (2000). Feature selection in unsupervised learning via evolutionary search. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Jain, A.K., Murty, M.N. and Flynn, P.J. (1999). Data clustering: a review. ACM Computing
Surveys 31, 264–323.
Kobayashi, M. and Takeda, K. (2000). Information retrieval on the web: selected topics.
Kosala, R. and H. Blocked (2000). Web mining research: a survey. SIGKDD Explorations 2, 1–15
Last, M. and Maimon, O. (2000). Automated dimensionality reduction of data warehouses. In Jeusfeld, Shu, Staudt, and Vossen (eds), Proceedings of the International Workshops on Design and Management of Data Warehouses.
Lauritzen, S.L. (1995). The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis 19, 191–201.
Lin, W., S.A. Alvarez, and C. Ruiz. (2000). Collaborative recommendation via adaptive association rule mining. In Proceedings of ACM WEBKDD 2000.
Liu, H. and Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Boston.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5 th Berkeley Symposium on Mathematical Statistics and Probability, 281–297.
Mannila, H., Toivonen, H. and Verkamo, A.I. (1994). Efficient algorithms for discovering association rules. In Proceedings of AAAI′9 4 Workshop on Knowledge Discovery in Databases (KDD′94), 181–192.
Modrzejewski, M. (1993). Feature selection using rough sets theory. In P.B. Brazdil, editor, Proceedings of the European Conference on Machine Learning, pp. 213–226.
Mangasarian, O.L. (1965). Linear and nonlinear separation of patterns by linear programming. Operations Research, 13, 444–452.
Narendra, P.M., and Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers, 26(9), 917–922.
Ng, R. and J. Han (1994). Efficient and effective clustering method for spatial data mining. In Proceedings for the 1994 International Conference on Very Large Data Bases, 144–155.
Olafsson, S. and J. Yang (2001). Intelligent partitioning for feature relevance analysis. Working Paper, Industrial Engineering Department, Iowa State University, Ames, IA.
Pai, J., Han, J. and Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proceedings of 2000 ACM-SIGMOD International Workshop on Data Mining and Knowledge Discovery (DMKD00), 11–20.
Pine, B.J. and Gilmore, J.H. (1999). The Experience Economy. Harvard Business School Press, Boston, MA.
Quinlan, J.R. (1987). Induction of decision trees. Machine Learning 1(1), 81–106.
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. Morgan-Kaufmann, San Mateo, CA.
Rasmussen, E. (1992). Clustering algorithms. In Frakes, W., Baeza-Yates, R (eds.), Information Retrieval, 419–442. Prentice-Hall, Englewood Cliffs, NJ.
Resnick P., Iacovou N., Suckak M., Bergstrom P, and Riedl J. (1994). Grouplens: an open architecture for collaborative filtering of netnews. In Proceedings of ACM CSCW′94
Conference on Computer-Supported Cooperative Work, 175–186.
Ripley, B.D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.
Ryan, S.M., K.J. Min, and S. Olafsson (2001). Experimental study of scalability enhancements for reverse logistics e-commerce. In Prabhu, Kumara and Kamath (eds.) Scalable Enterprise Systems — An Introduction to Recent Advances (submitted).
Schafer, J.B., Konstan, J., and Riedl, J. (2001). Electronic commerce recommender applications. Journal of Data Mining and Knowledge Discovery, 5 (1/2), 115–152.
Setiono, R., and Liu, H. (1997). Neural network feature selector. IEEE Transactions on Neural Networks, 8(3), 654–662.
Shardanan, U. and Maes, P. (1995). Social information filtering: algorithms for automating ‘word of mouth’. In Proceedings of ACM CHI′95 Conference on Human Factors in
Computing Systems, 210–17.
Skalak, D. (1994). Prototype and feature selection by sampling and random mutation hill climbing algorithms. Proceedings of the Eleventh International Machine Learning Conference, pp. 293–301, New Brunswick, NJ: Morgan Kauffmann.
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer, N.Y.
Vapnik, V and A. Lerner (1963). Pattern recognition using generalized portrait method. Automation and Remote Control, 24.
Vucetic, S. and Z. Obradovic. (2000). A regression-based approach for scaling-up personalized recommender systems in e-commerce. In Proceedings of ACM WEBKDD 2000.
Witten, LH. and Frank, E. (2000). Data Mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco, CA.
Yang, J., and Honavar, V. (1998). Feature subset selection using a genetic algorithm. In H. Motada and H. Liu, editors, Feature Selection, Construction, and Subset Selection: A Data Mining Perspective, Kluwer, New York.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer Science+Business Media New York
About this chapter
Cite this chapter
Olafsson, S. (2003). Improving Scalability of E-Commerce Systems with Knowledge Discovery. In: Prabhu, V., Kumara, S., Kamath, M. (eds) Scalable Enterprise Systems. Integrated Series in Information Systems, vol 3. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0389-7_6
Download citation
DOI: https://doi.org/10.1007/978-1-4615-0389-7_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5052-1
Online ISBN: 978-1-4615-0389-7
eBook Packages: Springer Book Archive