Improving Scalability of E-Commerce Systems with Knowledge Discovery

Olafsson, Sigurdur

doi:10.1007/978-1-4615-0389-7_6

Sigurdur Olafsson⁵

Part of the book series: Integrated Series in Information Systems ((ISIS,volume 3))

117 Accesses
3 Citations

Abstract

The efficiency of many data driven e-commerce system may be compromised by an abundance of data. In this chapter we discuss how knowledge discovery and data mining techniques can be useful in improving the scalability of data driven e-commerce systems. In particular we focus on improving scalability via dimensionality reduction and improving the information view experienced by each user. To address these issues, we cover several common data mining problems, including feature selection, clustering, classification, and association rule discovery and present several scalable methods and algorithms that address each of those problems. Numerous examples are included to illustrate the key ideas.

This work was supported by the National Science Foundation under grant DMI-0075575.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 1994 International Conference on Very Large Data Bases (VLDB′94), 487–499.
Google Scholar
Agrawal, R., Aggerwal, C. and Prasad, V.V.V. (2000). A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing.
Google Scholar
Aha, D.W., and Bankert, R. L. (1996). A comparative evaluation of sequential feature selection algorithms. In D. Fisher & J.-H. Lenz (Eds.), Artificial Intelligence and Statistics V. New York: Springer-Verlag.
Google Scholar
Ansari, S., R. Kohavi, L. Mason, and Z. Zheng (2000). Integrating e-commerce and data mining: architecture and challenges. In Proceedings of ACM WEBKDD 2000.
Google Scholar
Basu, C., Hirch, H., and Cohen, W. (1998). Recommendation as classification: using social and content based information for recommendation. In Proceedings of the National Conference on Artificial Intelligence.
Google Scholar
Bennett, K.P. and C. Campbell (2000). Support vector machines: hype or hallelujah? SIGKDD Explorations 2(2), 1–13.
Article Google Scholar
Bradley, P.S., Mangasarian, O.L., and Street, W.N. (1998). Feature selection via mathematical programming, INFORMS Journal on Computing, 10(2), 209–217.
Article Google Scholar
Breese, J., Heckerman, D., and Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14 ^th Conference on Uncertainty in Artificial Intelligence.
Google Scholar
Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees. Wadsworth International Group, Monterey, CA.
Google Scholar
Burges, C.J.C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Knowledge Discovery and Data Mining, 2(2).
Google Scholar
Caruana, R., and Freitag, D. (1994). Greedy feature selection. In Proceedings of the Eleventh International Conference on Machine Learning, pp. 28–36. New Brunswick, NJ: Morgan Kaufmann.
Google Scholar
Cortes, C. and V. Vapnik (1995). Support vector networks. Machine Learning 20, 273–297.
Google Scholar
Duda, R. and P. Hart (1973). Pattern Classification and Scene Analysis. John Wiley & Sons, New York.
Google Scholar
Fisher, D. (1987). Improving inference through conceptual clustering. In Proceedings of the 1987 AAAI Conference, 461–465, Seattle, WA.
Google Scholar
Fletcher, R. (1987). Practical Methods of Optimization. John Wiley and Sons, New York.
Google Scholar
Gennari, J.H., P. Langley, and D. Fisher (1990). Models of incremental concept formation. Artificial Intelligence 40, 11–61.
Article Google Scholar
Good, N., Schafer, J.B., Konstan, J.A., Borchers, A., and Sarwar, B. (1999). Combining collaborative filtering with personal agents for better recommendations. In Proceedings of the National Conference on Artificial Intelligence.
Google Scholar
Gupta, V.R. (1997). An Introduction to Data Warehousing, System Services Corporation, Chicago, IL.
Google Scholar
Hall, M.A. (2000). Correlation-based feature selection for discrete and numeric class machine learning. Proceedings of the Seventeenth International Conference on Machine Learning, Stanford University, CA. Morgan Kaufmann Publishers.
Google Scholar
Han, J. and Kamber, M. (2001). Data Mining: concepts and techniques. Morgan Kaufmann, San Francisco, CA.
Google Scholar
Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of 2000 ACM International Conference on Management of Data (SIGMOD′00), 1–12.
Chapter Google Scholar
Heckerman, D. (1996). Bayesian networks for knowledge discovery. In Advances in Knowledge Discovery and Data Mining, 273–305, MIT Press, Cambridge, MA.
Google Scholar
Hipp, J., Güntzer, U. and Nakhaeizadeh, G. (2000). Algorithms for association rule mining - a general survey and comparison. SIGKDD Explorations 2, 58–64.
Article Google Scholar
Immon, W.H. (1996). Building the Data Warehouse. John Wiley & Sons, New York.
Google Scholar
Kaufman, L. and P.J. Rousseeuw (1990). Finding Groups in Data: and introduction to cluster analysis. John Wiley & Sons, New York.
Book Google Scholar
Kim, Y.S., Street, W.N, and Menczer, F. (2000). Feature selection in unsupervised learning via evolutionary search. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Google Scholar
Jain, A.K., Murty, M.N. and Flynn, P.J. (1999). Data clustering: a review. ACM Computing
Google Scholar
Surveys 31, 264–323.
Google Scholar
Kobayashi, M. and Takeda, K. (2000). Information retrieval on the web: selected topics.
Google Scholar
Kosala, R. and H. Blocked (2000). Web mining research: a survey. SIGKDD Explorations 2, 1–15
Article Google Scholar
Last, M. and Maimon, O. (2000). Automated dimensionality reduction of data warehouses. In Jeusfeld, Shu, Staudt, and Vossen (eds), Proceedings of the International Workshops on Design and Management of Data Warehouses.
Google Scholar
Lauritzen, S.L. (1995). The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis 19, 191–201.
Article Google Scholar
Lin, W., S.A. Alvarez, and C. Ruiz. (2000). Collaborative recommendation via adaptive association rule mining. In Proceedings of ACM WEBKDD 2000.
Google Scholar
Liu, H. and Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Boston.
Book Google Scholar
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5 ^th Berkeley Symposium on Mathematical Statistics and Probability, 281–297.
Google Scholar
Mannila, H., Toivonen, H. and Verkamo, A.I. (1994). Efficient algorithms for discovering association rules. In Proceedings of AAAI′9 4 Workshop on Knowledge Discovery in Databases (KDD′94), 181–192.
Google Scholar
Modrzejewski, M. (1993). Feature selection using rough sets theory. In P.B. Brazdil, editor, Proceedings of the European Conference on Machine Learning, pp. 213–226.
Google Scholar
Mangasarian, O.L. (1965). Linear and nonlinear separation of patterns by linear programming. Operations Research, 13, 444–452.
Article Google Scholar
Narendra, P.M., and Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers, 26(9), 917–922.
Article Google Scholar
Ng, R. and J. Han (1994). Efficient and effective clustering method for spatial data mining. In Proceedings for the 1994 International Conference on Very Large Data Bases, 144–155.
Google Scholar
Olafsson, S. and J. Yang (2001). Intelligent partitioning for feature relevance analysis. Working Paper, Industrial Engineering Department, Iowa State University, Ames, IA.
Google Scholar
Pai, J., Han, J. and Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proceedings of 2000 ACM-SIGMOD International Workshop on Data Mining and Knowledge Discovery (DMKD00), 11–20.
Google Scholar
Pine, B.J. and Gilmore, J.H. (1999). The Experience Economy. Harvard Business School Press, Boston, MA.
Google Scholar
Quinlan, J.R. (1987). Induction of decision trees. Machine Learning 1(1), 81–106.
Google Scholar
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. Morgan-Kaufmann, San Mateo, CA.
Google Scholar
Rasmussen, E. (1992). Clustering algorithms. In Frakes, W., Baeza-Yates, R (eds.), Information Retrieval, 419–442. Prentice-Hall, Englewood Cliffs, NJ.
Google Scholar
Resnick P., Iacovou N., Suckak M., Bergstrom P, and Riedl J. (1994). Grouplens: an open architecture for collaborative filtering of netnews. In Proceedings of ACM CSCW′94
Google Scholar
Conference on Computer-Supported Cooperative Work, 175–186.
Google Scholar
Ripley, B.D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.
Google Scholar
Ryan, S.M., K.J. Min, and S. Olafsson (2001). Experimental study of scalability enhancements for reverse logistics e-commerce. In Prabhu, Kumara and Kamath (eds.) Scalable Enterprise Systems — An Introduction to Recent Advances (submitted).
Google Scholar
Schafer, J.B., Konstan, J., and Riedl, J. (2001). Electronic commerce recommender applications. Journal of Data Mining and Knowledge Discovery, 5 (1/2), 115–152.
Article Google Scholar
Setiono, R., and Liu, H. (1997). Neural network feature selector. IEEE Transactions on Neural Networks, 8(3), 654–662.
Article Google Scholar
Shardanan, U. and Maes, P. (1995). Social information filtering: algorithms for automating ‘word of mouth’. In Proceedings of ACM CHI′95 Conference on Human Factors in
Google Scholar
Computing Systems, 210–17.
Google Scholar
Skalak, D. (1994). Prototype and feature selection by sampling and random mutation hill climbing algorithms. Proceedings of the Eleventh International Machine Learning Conference, pp. 293–301, New Brunswick, NJ: Morgan Kauffmann.
Google Scholar
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer, N.Y.
Google Scholar
Vapnik, V and A. Lerner (1963). Pattern recognition using generalized portrait method. Automation and Remote Control, 24.
Google Scholar
Vucetic, S. and Z. Obradovic. (2000). A regression-based approach for scaling-up personalized recommender systems in e-commerce. In Proceedings of ACM WEBKDD 2000.
Google Scholar
Witten, LH. and Frank, E. (2000). Data Mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco, CA.
Google Scholar
Yang, J., and Honavar, V. (1998). Feature subset selection using a genetic algorithm. In H. Motada and H. Liu, editors, Feature Selection, Construction, and Subset Selection: A Data Mining Perspective, Kluwer, New York.
Google Scholar

Download references

Author information

Authors and Affiliations

Iowa State University, USA
Sigurdur Olafsson

Authors

Sigurdur Olafsson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Marcus Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, Pennsylvania, USA
Vittal Prabhu & Soundar Kumara &
School of Industrial Engineering and Management, Oklahoma State University, Stillwater, Oklahoma, USA
Manjunath Kamath

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Olafsson, S. (2003). Improving Scalability of E-Commerce Systems with Knowledge Discovery. In: Prabhu, V., Kumara, S., Kamath, M. (eds) Scalable Enterprise Systems. Integrated Series in Information Systems, vol 3. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0389-7_6

Download citation

DOI: https://doi.org/10.1007/978-1-4615-0389-7_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5052-1
Online ISBN: 978-1-4615-0389-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics