Data Mining as an Automated Service

  • P. S. Bradley
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2637)

Abstract

An automated data mining service offers an out-sourced, cost-effective analysis option for clients desiring to leverage their data resources for decision support and operational improvement. In the context of the service model, typically the client provides the service with data and other information likely to aid in the analysis process (e.g. domain knowledge, etc.). In return, the service provides analysis results to the client. We describe the required processes, issues, and challenges in automating the data mining and analysis process when the high-level goals are: (1) to provide the client with a high quality, pertinent analysis result; and (2) to automate the data mining service, minimizing the amount of human analyst effort required and the cost of delivering the service. We argue that by focusing on client problems within market sectors, both of these goals may be realized.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages 207–216, Washington, D.C., May 1993.Google Scholar
  2. 2.
    J. D. Becher, P. Berkhin, and E. Freeman. Automating exploratory data analysis for efficient mining. In Proc. of the Sixth ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD-2000), pages 424–429, Boston, MA, 2000.Google Scholar
  3. 3.
    P. S. Bradley and U. M. Fayyad. Refining initial points for K-Means clustering. In Proc. 15th International Conf. on Machine Learning, pages 91–99. Morgan Kaufmann, San Francisco, CA, 1998.Google Scholar
  4. 4.
    P. S. Bradley, J. Gehrke, R. Ramakrishnan, and R. Srikant. Scaling mining algorithms to large databases. Comm. of the ACM, 45(8):38–43, 2002.CrossRefGoogle Scholar
  5. 5.
    L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Belmont, 1984.MATHGoogle Scholar
  6. 6.
    C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998.CrossRefGoogle Scholar
  7. 7.
    I. V. Cadez and P. S. Bradley. Model based population tracking and automatic detection of distribution changes. In Proc. Neural Information Processing Systems 2001, 2001.Google Scholar
  8. 8.
    D. M. Chickering. Personal communication, January 2003.Google Scholar
  9. 9.
    CRISP-DM Consortium. Cross industry standard process for data mining (crispdm). http://www.crisp-dm.org/.
  10. 10.
    Microsoft Corp. Introduction to ole db for data mining. http://www.microsoft.com/data/oledb/dm.htm.
  11. 11.
    R. Duda, P. Hart, and D. Stork. Pattern classification. John Wiley & Sons, New York, 2000.Google Scholar
  12. 12.
    U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurasamy. Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge, MA, 1996.Google Scholar
  13. 13.
    Data Mining Group. Pmml version 2.0. http://www.dmg.org/index.htm.
  14. 14.
    S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. In Proc. ACM SIGMOD Intl. Conf. on Management of Data, pages 73–84, New York, 1998. ACM Press.Google Scholar
  15. 15.
    A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.Google Scholar
  16. 16.
    Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Efficient algorithms for discovering association rules. In Usama M. Fayyad and Ramasamy Uthurusamy, editors, AAAI Workshop on Knowledge Discovery in Databases (KDD-94), pages 181–192, Seattle, Washington, 1994. AAAI Press.Google Scholar
  17. 17.
    Nimrod Megiddo and Ramakrishnan Srikant. Discovering predictive association rules. In Knowledge Discovery and Data Mining, pages 274–278, 1998.Google Scholar
  18. 18.
    Sreerama K. Murthy. Automatic construction of decision trees from data: A multidisciplinary survey. Data Mining and Knowledge Discovery, 2(4):345–389, 1998.CrossRefGoogle Scholar
  19. 19.
    M. T. Oguz. Strategic intelligence: Business intelligence in competitive strategy. DM Review, August 2002.Google Scholar
  20. 20.
    Clark F. Olson. Parallel algorithms for hierarchical clustering. Parallel Computing, 21(8): 1313–1325, 1995.MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    G. Piatetsky-Shapiro. Personal communication, January 2003.Google Scholar
  22. 22.
    Foster J. Provost and Tom Fawcett. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In Knowledge Discovery and Data Mining, pages 43–48, 1997.Google Scholar
  23. 23.
    D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann, San Francisco, CA, 1999.Google Scholar
  24. 24.
    Padhraic Smyth. Clustering using monte carlo cross-validation. In Knowledge Discovery and Data Mining, pages 126–133, 1996.Google Scholar
  25. 25.
    M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, 36:111–147, 1974.MATHGoogle Scholar
  26. 26.
    D. E. Weisman and C. Buss. Database functionality high, analytics lags, September 28, 2001. Forrester Brief: Business Technographics North America.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • P. S. Bradley
    • 1
  1. 1.Bradley Data ConsultingUSA

Personalised recommendations