Skip to main content

Improving Scalability of E-Commerce Systems with Knowledge Discovery

  • Chapter
Scalable Enterprise Systems

Part of the book series: Integrated Series in Information Systems ((ISIS,volume 3))

Abstract

The efficiency of many data driven e-commerce system may be compromised by an abundance of data. In this chapter we discuss how knowledge discovery and data mining techniques can be useful in improving the scalability of data driven e-commerce systems. In particular we focus on improving scalability via dimensionality reduction and improving the information view experienced by each user. To address these issues, we cover several common data mining problems, including feature selection, clustering, classification, and association rule discovery and present several scalable methods and algorithms that address each of those problems. Numerous examples are included to illustrate the key ideas.

This work was supported by the National Science Foundation under grant DMI-0075575.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 1994 International Conference on Very Large Data Bases (VLDB′94), 487–499.

    Google Scholar 

  • Agrawal, R., Aggerwal, C. and Prasad, V.V.V. (2000). A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing.

    Google Scholar 

  • Aha, D.W., and Bankert, R. L. (1996). A comparative evaluation of sequential feature selection algorithms. In D. Fisher & J.-H. Lenz (Eds.), Artificial Intelligence and Statistics V. New York: Springer-Verlag.

    Google Scholar 

  • Ansari, S., R. Kohavi, L. Mason, and Z. Zheng (2000). Integrating e-commerce and data mining: architecture and challenges. In Proceedings of ACM WEBKDD 2000.

    Google Scholar 

  • Basu, C., Hirch, H., and Cohen, W. (1998). Recommendation as classification: using social and content based information for recommendation. In Proceedings of the National Conference on Artificial Intelligence.

    Google Scholar 

  • Bennett, K.P. and C. Campbell (2000). Support vector machines: hype or hallelujah? SIGKDD Explorations 2(2), 1–13.

    Article  Google Scholar 

  • Bradley, P.S., Mangasarian, O.L., and Street, W.N. (1998). Feature selection via mathematical programming, INFORMS Journal on Computing, 10(2), 209–217.

    Article  Google Scholar 

  • Breese, J., Heckerman, D., and Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14 th Conference on Uncertainty in Artificial Intelligence.

    Google Scholar 

  • Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees. Wadsworth International Group, Monterey, CA.

    Google Scholar 

  • Burges, C.J.C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Knowledge Discovery and Data Mining, 2(2).

    Google Scholar 

  • Caruana, R., and Freitag, D. (1994). Greedy feature selection. In Proceedings of the Eleventh International Conference on Machine Learning, pp. 28–36. New Brunswick, NJ: Morgan Kaufmann.

    Google Scholar 

  • Cortes, C. and V. Vapnik (1995). Support vector networks. Machine Learning 20, 273–297.

    Google Scholar 

  • Duda, R. and P. Hart (1973). Pattern Classification and Scene Analysis. John Wiley & Sons, New York.

    Google Scholar 

  • Fisher, D. (1987). Improving inference through conceptual clustering. In Proceedings of the 1987 AAAI Conference, 461–465, Seattle, WA.

    Google Scholar 

  • Fletcher, R. (1987). Practical Methods of Optimization. John Wiley and Sons, New York.

    Google Scholar 

  • Gennari, J.H., P. Langley, and D. Fisher (1990). Models of incremental concept formation. Artificial Intelligence 40, 11–61.

    Article  Google Scholar 

  • Good, N., Schafer, J.B., Konstan, J.A., Borchers, A., and Sarwar, B. (1999). Combining collaborative filtering with personal agents for better recommendations. In Proceedings of the National Conference on Artificial Intelligence.

    Google Scholar 

  • Gupta, V.R. (1997). An Introduction to Data Warehousing, System Services Corporation, Chicago, IL.

    Google Scholar 

  • Hall, M.A. (2000). Correlation-based feature selection for discrete and numeric class machine learning. Proceedings of the Seventeenth International Conference on Machine Learning, Stanford University, CA. Morgan Kaufmann Publishers.

    Google Scholar 

  • Han, J. and Kamber, M. (2001). Data Mining: concepts and techniques. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  • Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of 2000 ACM International Conference on Management of Data (SIGMOD′00), 1–12.

    Chapter  Google Scholar 

  • Heckerman, D. (1996). Bayesian networks for knowledge discovery. In Advances in Knowledge Discovery and Data Mining, 273–305, MIT Press, Cambridge, MA.

    Google Scholar 

  • Hipp, J., Güntzer, U. and Nakhaeizadeh, G. (2000). Algorithms for association rule mining - a general survey and comparison. SIGKDD Explorations 2, 58–64.

    Article  Google Scholar 

  • Immon, W.H. (1996). Building the Data Warehouse. John Wiley & Sons, New York.

    Google Scholar 

  • Kaufman, L. and P.J. Rousseeuw (1990). Finding Groups in Data: and introduction to cluster analysis. John Wiley & Sons, New York.

    Book  Google Scholar 

  • Kim, Y.S., Street, W.N, and Menczer, F. (2000). Feature selection in unsupervised learning via evolutionary search. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

    Google Scholar 

  • Jain, A.K., Murty, M.N. and Flynn, P.J. (1999). Data clustering: a review. ACM Computing

    Google Scholar 

  • Surveys 31, 264–323.

    Google Scholar 

  • Kobayashi, M. and Takeda, K. (2000). Information retrieval on the web: selected topics.

    Google Scholar 

  • Kosala, R. and H. Blocked (2000). Web mining research: a survey. SIGKDD Explorations 2, 1–15

    Article  Google Scholar 

  • Last, M. and Maimon, O. (2000). Automated dimensionality reduction of data warehouses. In Jeusfeld, Shu, Staudt, and Vossen (eds), Proceedings of the International Workshops on Design and Management of Data Warehouses.

    Google Scholar 

  • Lauritzen, S.L. (1995). The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis 19, 191–201.

    Article  Google Scholar 

  • Lin, W., S.A. Alvarez, and C. Ruiz. (2000). Collaborative recommendation via adaptive association rule mining. In Proceedings of ACM WEBKDD 2000.

    Google Scholar 

  • Liu, H. and Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Boston.

    Book  Google Scholar 

  • MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5 th Berkeley Symposium on Mathematical Statistics and Probability, 281–297.

    Google Scholar 

  • Mannila, H., Toivonen, H. and Verkamo, A.I. (1994). Efficient algorithms for discovering association rules. In Proceedings of AAAI′9 4 Workshop on Knowledge Discovery in Databases (KDD′94), 181–192.

    Google Scholar 

  • Modrzejewski, M. (1993). Feature selection using rough sets theory. In P.B. Brazdil, editor, Proceedings of the European Conference on Machine Learning, pp. 213–226.

    Google Scholar 

  • Mangasarian, O.L. (1965). Linear and nonlinear separation of patterns by linear programming. Operations Research, 13, 444–452.

    Article  Google Scholar 

  • Narendra, P.M., and Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers, 26(9), 917–922.

    Article  Google Scholar 

  • Ng, R. and J. Han (1994). Efficient and effective clustering method for spatial data mining. In Proceedings for the 1994 International Conference on Very Large Data Bases, 144–155.

    Google Scholar 

  • Olafsson, S. and J. Yang (2001). Intelligent partitioning for feature relevance analysis. Working Paper, Industrial Engineering Department, Iowa State University, Ames, IA.

    Google Scholar 

  • Pai, J., Han, J. and Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proceedings of 2000 ACM-SIGMOD International Workshop on Data Mining and Knowledge Discovery (DMKD00), 11–20.

    Google Scholar 

  • Pine, B.J. and Gilmore, J.H. (1999). The Experience Economy. Harvard Business School Press, Boston, MA.

    Google Scholar 

  • Quinlan, J.R. (1987). Induction of decision trees. Machine Learning 1(1), 81–106.

    Google Scholar 

  • Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. Morgan-Kaufmann, San Mateo, CA.

    Google Scholar 

  • Rasmussen, E. (1992). Clustering algorithms. In Frakes, W., Baeza-Yates, R (eds.), Information Retrieval, 419–442. Prentice-Hall, Englewood Cliffs, NJ.

    Google Scholar 

  • Resnick P., Iacovou N., Suckak M., Bergstrom P, and Riedl J. (1994). Grouplens: an open architecture for collaborative filtering of netnews. In Proceedings of ACM CSCW′94

    Google Scholar 

  • Conference on Computer-Supported Cooperative Work, 175–186.

    Google Scholar 

  • Ripley, B.D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.

    Google Scholar 

  • Ryan, S.M., K.J. Min, and S. Olafsson (2001). Experimental study of scalability enhancements for reverse logistics e-commerce. In Prabhu, Kumara and Kamath (eds.) Scalable Enterprise Systems — An Introduction to Recent Advances (submitted).

    Google Scholar 

  • Schafer, J.B., Konstan, J., and Riedl, J. (2001). Electronic commerce recommender applications. Journal of Data Mining and Knowledge Discovery, 5 (1/2), 115–152.

    Article  Google Scholar 

  • Setiono, R., and Liu, H. (1997). Neural network feature selector. IEEE Transactions on Neural Networks, 8(3), 654–662.

    Article  Google Scholar 

  • Shardanan, U. and Maes, P. (1995). Social information filtering: algorithms for automating ‘word of mouth’. In Proceedings of ACM CHI′95 Conference on Human Factors in

    Google Scholar 

  • Computing Systems, 210–17.

    Google Scholar 

  • Skalak, D. (1994). Prototype and feature selection by sampling and random mutation hill climbing algorithms. Proceedings of the Eleventh International Machine Learning Conference, pp. 293–301, New Brunswick, NJ: Morgan Kauffmann.

    Google Scholar 

  • Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer, N.Y.

    Google Scholar 

  • Vapnik, V and A. Lerner (1963). Pattern recognition using generalized portrait method. Automation and Remote Control, 24.

    Google Scholar 

  • Vucetic, S. and Z. Obradovic. (2000). A regression-based approach for scaling-up personalized recommender systems in e-commerce. In Proceedings of ACM WEBKDD 2000.

    Google Scholar 

  • Witten, LH. and Frank, E. (2000). Data Mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  • Yang, J., and Honavar, V. (1998). Feature subset selection using a genetic algorithm. In H. Motada and H. Liu, editors, Feature Selection, Construction, and Subset Selection: A Data Mining Perspective, Kluwer, New York.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media New York

About this chapter

Cite this chapter

Olafsson, S. (2003). Improving Scalability of E-Commerce Systems with Knowledge Discovery. In: Prabhu, V., Kumara, S., Kamath, M. (eds) Scalable Enterprise Systems. Integrated Series in Information Systems, vol 3. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0389-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-0389-7_6

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5052-1

  • Online ISBN: 978-1-4615-0389-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics