Data Mining as an Automated Service
An automated data mining service offers an out-sourced, cost-effective analysis option for clients desiring to leverage their data resources for decision support and operational improvement. In the context of the service model, typically the client provides the service with data and other information likely to aid in the analysis process (e.g. domain knowledge, etc.). In return, the service provides analysis results to the client. We describe the required processes, issues, and challenges in automating the data mining and analysis process when the high-level goals are: (1) to provide the client with a high quality, pertinent analysis result; and (2) to automate the data mining service, minimizing the amount of human analyst effort required and the cost of delivering the service. We argue that by focusing on client problems within market sectors, both of these goals may be realized.
Unable to display preview. Download preview PDF.
- 1.Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages 207–216, Washington, D.C., May 1993.Google Scholar
- 2.J. D. Becher, P. Berkhin, and E. Freeman. Automating exploratory data analysis for efficient mining. In Proc. of the Sixth ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD-2000), pages 424–429, Boston, MA, 2000.Google Scholar
- 3.P. S. Bradley and U. M. Fayyad. Refining initial points for K-Means clustering. In Proc. 15th International Conf. on Machine Learning, pages 91–99. Morgan Kaufmann, San Francisco, CA, 1998.Google Scholar
- 7.I. V. Cadez and P. S. Bradley. Model based population tracking and automatic detection of distribution changes. In Proc. Neural Information Processing Systems 2001, 2001.Google Scholar
- 8.D. M. Chickering. Personal communication, January 2003.Google Scholar
- 9.CRISP-DM Consortium. Cross industry standard process for data mining (crispdm). http://www.crisp-dm.org/.
- 10.Microsoft Corp. Introduction to ole db for data mining. http://www.microsoft.com/data/oledb/dm.htm.
- 11.R. Duda, P. Hart, and D. Stork. Pattern classification. John Wiley & Sons, New York, 2000.Google Scholar
- 12.U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurasamy. Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge, MA, 1996.Google Scholar
- 13.Data Mining Group. Pmml version 2.0. http://www.dmg.org/index.htm.
- 14.S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. In Proc. ACM SIGMOD Intl. Conf. on Management of Data, pages 73–84, New York, 1998. ACM Press.Google Scholar
- 15.A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.Google Scholar
- 16.Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Efficient algorithms for discovering association rules. In Usama M. Fayyad and Ramasamy Uthurusamy, editors, AAAI Workshop on Knowledge Discovery in Databases (KDD-94), pages 181–192, Seattle, Washington, 1994. AAAI Press.Google Scholar
- 17.Nimrod Megiddo and Ramakrishnan Srikant. Discovering predictive association rules. In Knowledge Discovery and Data Mining, pages 274–278, 1998.Google Scholar
- 19.M. T. Oguz. Strategic intelligence: Business intelligence in competitive strategy. DM Review, August 2002.Google Scholar
- 21.G. Piatetsky-Shapiro. Personal communication, January 2003.Google Scholar
- 22.Foster J. Provost and Tom Fawcett. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In Knowledge Discovery and Data Mining, pages 43–48, 1997.Google Scholar
- 23.D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann, San Francisco, CA, 1999.Google Scholar
- 24.Padhraic Smyth. Clustering using monte carlo cross-validation. In Knowledge Discovery and Data Mining, pages 126–133, 1996.Google Scholar
- 26.D. E. Weisman and C. Buss. Database functionality high, analytics lags, September 28, 2001. Forrester Brief: Business Technographics North America.Google Scholar