Skip to main content

Abstract

The knowledge discovery from large data repositories has been accepted as a key research issue in the field of databases, machine learning, and statistics, as well as an important opportunity for innovation in business. Various applications, such as data warehousing and on-line services via the Internet, invoke different data mining techniques in order to achieve better understanding of customers’ behavior and thus to improve the quality of provided services achieving their business advantage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Faloutsos, C, and Swami, A. “Efficient Similarity Search in Sequence Databases”, in Proceedings of the 4 th FODO Conference, 1993.

    Google Scholar 

  2. Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications”, in Proceedings of the ACM SIGMOD Conference on Management of Data, 1998.

    Google Scholar 

  3. Agrawal R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A.I. “Fast Discovery of Association Rules”, in Usama M. Fayyad, Gregory Piatesky-Shapiro, Padhraic Smuth and Ramasamy Uthurusamy. Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996.

    Google Scholar 

  4. Aggarwal C.C., Procopiuc, C, Wolf, J.L., Yu, P.S., and Park, J.S. “Fast Algorithms for Projected Clustering”, in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1999.

    Google Scholar 

  5. Agrawal R & Srikant R. “Fast Algorithms for Mining Association Rules”, in Proceedings of the 20 th Very Large Data Bases Conference, Santiago de Chile, Chile, 1994.

    Google Scholar 

  6. R. Agrawal, R. Srikant. “Mining Sequential Patterns”, in Proceedings of the Fifth International Conference on Extending Database Technology (EDBT), Avignon, France, March 1996.

    Google Scholar 

  7. C. Aggarwal and P. S. Yu, “Finding generalized projected clusters in high dimensional spaces”, in Proceedings of the ACM SIGMOD International Conference on Management of Data, 2000.

    Google Scholar 

  8. R.J. Bayardo. “Efficiently Mining Long Patterns from Databases”, in Proceedings of ACM SIGMOD International Conference on Management of Data, 1998.

    Google Scholar 

  9. D. Bemdt and J. Clifford. “Using Dynamic Time Warping to Find Patterns in Time Series.” in Proceedings of the KDD Workshop, 1996.

    Google Scholar 

  10. D. Budrick, M. Calimlim, and J. Gehrke. “Mafia: a maximal frequent itemset algorithm for transactional databases”, in International Conference on Data Engineering, 2001.

    Google Scholar 

  11. B. Bollobas, G. Das, D. Gunopulos, H. Mannila. “Time-series Similarity Problems and Well Separated Geometric Sets”, in Nordic Journal of Computing, V. 4, 2001.

    Google Scholar 

  12. Bezdeck J.C, Ehrlich R., Full W., “FCM: Fuzzy C-Means Algorithm”, Computers and Geoscience, 1984.

    Google Scholar 

  13. D. Barbara, C. Faloutsos, J. Hellerstein, Y. loannidis, H.V. Jagadish, T. Johnson, R. Ng, V. Poosala, K. Ross, and K.V. Sevcik. The New Jersey Data Reduction Report, Data Engineering Bulletin, September, 1996.

    Google Scholar 

  14. L. Breiman, J. Friedman, R. Olshen, C. Stone. Classification and Regression Trees. Wadsworth, 1984.

    Google Scholar 

  15. P. Bradley, U. Fayyad, and C. Reina. “Scaling EM (Expectation-Maximization) clustering to large databases”, Microsoft Research Report, MSR-TR-98-35, August, 1998.

    Google Scholar 

  16. Michael J. A. Berry, Gordon Linoff. Data Mining Techniques For marketing, Sales and Customer Support, John Willey & Sons, Inc, 1996.

    Google Scholar 

  17. E. Bingham, H. Mannila. “Random projection in dimensionality reduction: applications to image and text data”, in Proceedings ACM SIGKDD, 2001.

    Google Scholar 

  18. A. Borodin, R. Ostrovsky, and Y. Rabani. “Subquadratic Approximation Algorithms for Clustering Problems in High Dimensional Spaces”, in Proceedings of STOC, pp. 435–444, 1999.

    Google Scholar 

  19. S. Chiu. “Extracting Fuzzy Rules from Data for Function Approximation and Pattern Classification”. Fuzzy Information Engineering — A Guided Tour of Applications, (eds.: D. Dubois, H. Prade, R Yager), 1997.

    Google Scholar 

  20. Cover, T., and Hart, P. “Nearest Neighbor Pattern Classification”, in IEEE Transactions on Information Theory, pp. 21–27, 1967.

    Google Scholar 

  21. Ming-Syan Chen, Jiawei Han, Philip S. Yu. “Data Mining: An Overview from a Database Perspective”, IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, December, 1996.

    Google Scholar 

  22. E. Keogh, K. Chakrabarti, S. Mehrotra, and M. Pazzani. “Locally adaptive dimensionality reduction for indexing large time series databases”, in Proceedings of ACM SIGMOD Conference on Management of Data, 2001.

    Google Scholar 

  23. P. Cheeseman, J. Stutz. “Bayesian Classification (AutoClass): Theory and Results”. Advances in Knowledge Discovery and Data Mining (eds: U. Fayyad, et al.), AAAI Press, 1996.

    Google Scholar 

  24. [DH73]Duda, R.O., and Hart, P.E. Pattern Classification and Scene Analysis. John Wiley and Sons, 1973.

    Google Scholar 

  25. Domeniconi, C, Peng, J., and Gunopulos, D. “An Adaptive Metric Machine for Pattern Classification”, in Advances in Neural Information Processing Systems, 2000.

    Google Scholar 

  26. Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu. “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, in Proceedings of 2 nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 226–231, 1996.

    Google Scholar 

  27. Martin Ester, Hans-Peter Kriegel, Jorg Sander, Michael Wimmer, Xiaowei Xu. “Incremental Clustering for Mining in a Data Warehousing Environment”, in Proceedings of 24 nd VLDB Conference, New York, USA, 1998.

    Google Scholar 

  28. Faloutsos, C. Searching Multimedia Databases by Content, Kluwer Academic, 1996.

    Google Scholar 

  29. Faloutsos, C, Lin, K.-I. “Fastmap: A fast algorithm for indexing, data-mining, and visualization of traditional and multimedia data sets”, in Proceedings of the ACM SIGMOD Conference on Management of Data, 1995.

    Google Scholar 

  30. Usama M. Fayyad, Gregory Piatesky-Shapiro, Padhraic Smuth and Ramasamy Uthurusamy. Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996.

    Google Scholar 

  31. Friedman, J. “Flexible Metric Nearest Neighbor Classification” Technical Report, Department of Statistics, Stanford University, 1994.

    Google Scholar 

  32. Fukunaga, K. Introduction to Statistical Pattern Recognition, Academic Press, 1990.

    Google Scholar 

  33. D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen. “Data mining, hypergraph transversals, and machine learning”, in Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1997.

    Google Scholar 

  34. S. Guha, N. Mishra, R. Motwani, L. O’Callaghan. “Clustering Data Streams”, in IEEE Foundations of Computer Science, 2000.

    Google Scholar 

  35. Glymour C, Madigan D., Pregibon D, Smyth P, “Statistical Inference and Data Mining”, in Communications of ACM, V39 (11), 1996, pp. 35–42.

    Article  Google Scholar 

  36. J. Gehrke, R. Ramakrishnan, and V. Ganti. Rainforest. “A framework for fast decision tree construction of large data sets”. Journal of Data Mining and Knowledge Discovery, V.4, No. 2/3, pp. 127–162, 2000.

    Article  Google Scholar 

  37. Sudipto Guha, Rajeev Rastogi, Kyueseok Shim. “CURE: An Efficient Clustering Algorithm for Large Databases”, in Proceedings of the ACM SIGMOD Conference, 1998.

    Google Scholar 

  38. Sudipto Guha, Rajeev Rastogi, Kyueseok Shim. “ROCK: A Robust Clustering Algorithm for Categorical Attributes”, in Proceedings of the IEEE Conference on Data Engineering, 1999.

    Google Scholar 

  39. X. Ge and P. Smyth. “Deformable Markov model templates for time-series pattern matching”, in Proceedings of ACM SIGKDD, 2000.

    Google Scholar 

  40. M. Gupta, and T. Yamakawa, (eds) “Fuzzy Logic and Knowledge Based Systems”, Decision and Control (North Holland), 1988.

    Google Scholar 

  41. Alexander Hinneburg, Daniel Keim. “An Efficient Approach to Clustering in Large Multimedia Databases with Noise”, in Proceedings of KDD Conference, 1998.

    Google Scholar 

  42. Jiawei Han, Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2001.

    Google Scholar 

  43. Hand, D., Mannila, H., and Smyth, P. Principles of Data Mining. The MIT Press, 2001.

    Google Scholar 

  44. T. Horiuchi. “Decision Rule for Pattern Classification by Integrating Interval Feature Values”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 4, April 1998, pp. 440–448.

    Article  MathSciNet  Google Scholar 

  45. J. Han, J. Pei, and Y. Yin. “Mining frequent patterns without candidate generation”, in Proceedings of ACM SIGMOD International Conference on Management of Data, 2000.

    Google Scholar 

  46. Hastie, T., and Tibshirani, R. “Discriminant Adaptive Nearest Neighbor Classification”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 18, No. 6, pp. 607–615, 1996.

    Article  Google Scholar 

  47. Zhexue Huang. “A Fast Clustering Algorithm to Cluster very Large Categorical Data sets in Data Mining”, DMKD, 1997.

    Google Scholar 

  48. P. Indyk and R. Motwani. “Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality”, in Proceedings of STOC, 1998.

    Google Scholar 

  49. P. Indyk. “A sublinear-time approximation scheme for clustering in metric spaces”, in Proceedings of the 40 th Symposium on Foundations of Computer Science, 1999.

    Google Scholar 

  50. Cezary Z. Janikow, “Fuzzy Decision Trees: Issues and Methods”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 28, Issuel,pp. 1–14, 1998.

    Google Scholar 

  51. Jain, A.K., and Dubes, R.C. Algorithms for Clustering Data, Prentice Hall, 1988.

    Google Scholar 

  52. A.K Jain, M.N. Murty, P.J. Flyn. “Data Clustering: A Review”, ACM Computing Surveys, Vol. 31, No. 3, September 1999.

    Google Scholar 

  53. T. Joachims. “Text Categorization with Support Vector Machines”, in Proceedings of European Conference on Machine Learning, 1998.

    Google Scholar 

  54. E. Keogh. Exact Indexing of Dynamic Time Warping. Proc. of Very Large Data Bases Conf. (VLDB) 2002.

    Google Scholar 

  55. Keogh, E., Chu, S., Hart, D., Pazzani, M. “An Online Algorithm for Segmenting Time Series.” in Proceedings of IEEE International Conference on Data Mining, pp. 289–296, 2001.

    Google Scholar 

  56. Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S. “Dimensionality reduction for fast similarity search in large time series databases”. Journal of Knowledge and Information Systems, pp 263–286, 2000.

    Google Scholar 

  57. G. Karypis, Eui-Hong Han, V. Kumar. “CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling”, IEEE Computer, Vol. 32, No. 8, 68–75, 1999.

    Article  Google Scholar 

  58. E. Keogh and M. Pazzani. “Scaling up Dynamic Time Warping for Datamining Applications”, in Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, Boston, MA, 2000.

    Google Scholar 

  59. Kauffman, L., and Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, 1990.

    Google Scholar 

  60. J. B. Kruskal, and D. Sankoff, Editors. Time Wraps, String Edits, and Macromolecules. The Theory and Practice of Sequence Comparison. Addison-Wesley, 1983.

    Google Scholar 

  61. T. Kahveci and A. K. Singh. “Variable length queries for time series data”, in proceedings of IEEE Inernational Conference on Data Engineering, 2001.

    Google Scholar 

  62. Kruskal, J., and Wish, M. Multidimensional Scaling. Quantitative Applications in the Social Sciences, SAGE Publications, 1978.

    Google Scholar 

  63. H.V. Jagadish, Alberto O. Mendelzon, and Tova Milo. “Similarity-based queries”, in proceedings of the 14th ACM PODS, pages 36–45, May 1995.

    Google Scholar 

  64. M. Melta, R. Agrawal, J. Rissanen. “SLIQ: A fast scalable classifier for data mining”, in Proceedings of EDBT’ 96, Avigon France, March, 1996.

    Google Scholar 

  65. MacQueen, J.B “Some Methods for Classification and Analysis of Multivariate Observations”, in Proceedings of 5th Berkley Symposium on Mathematical Statistics and Probability, Volume I: Statistics, pp. 281–297, 1967.

    MathSciNet  Google Scholar 

  66. T. Mitchell. Machine Learning. McGraw-Hill, 1997.

    Google Scholar 

  67. H. Mannila, H. Toivonen, A. I. Verkamo: Discovery of frequent episodes in event sequences. Report C-1997-15, University of Helsinki, Department of Computer Science, February 1997.

    Google Scholar 

  68. Raymond Ng, Jiawei Han. “Efficient and Effective Clustering Methods for Spatial Data Mining”, in Proceedings of the 2 th VLDB Conference, Santiago, Chile, 1994.

    Google Scholar 

  69. A. Nanopoulos, Y. Theodoridis, Y. Manolopoulos. “C2P: Clustering based on Closest Pairs”, in Proceeding of the VLDB Conference, Roma, Italy, 2001.

    Google Scholar 

  70. C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. M. Murali. “A monte carlo algorithm for fast projective clustering”, in Proceedings of the ACM SIGMOD Conference on Management of Data, 2002.

    Google Scholar 

  71. J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M-C. Hsu. “PrefixSpan: Mining Sequential Patterns Efficiently by PrefixProjected Pattern Growth”, In Proceedings of International Conference of Data Engineering (ICDE’ 01), 2001.

    Google Scholar 

  72. S. Pemg, H. Wang, S. Zhang, and D.S. Parker. “Landmarks: A New Model for Similarity-based Pattern Matching in Time Series Databases”, in Proceedings of IEEE International Conference of Data Engineering, 2000.

    Google Scholar 

  73. J.R Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, 1993.

    Google Scholar 

  74. Ramze Rezaee, B.P.F. Lelieveldt, J.H.C Reiber. “A new cluster validity index for the fuzzy c-mean”. Pattern Recognition Letters, 19, pp. 237–246, 1998.

    Article  MATH  Google Scholar 

  75. D. Rafiei, A. Mendelzon. “Querying Time Series Data Based on Similarity”, in IEEE Transactions on Knowledge and Data Engineering, V. 12, No.5, pp. 675–683, 2000.

    Article  Google Scholar 

  76. R. Rastori, K. Shim. “PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning”, in Proceedings of the 24 th VLDB Conference, New York, USA, 1998.

    Google Scholar 

  77. Roweis, S., and Saul, L. “Nonlinear dimensionality reduction by locally linear embeddings”. Science, V.290, No. 5500, pp. 2323–2326, 2000.

    Article  Google Scholar 

  78. R. Srikant, R. Agrawal. “Mining Generalized Association Rules”, in Proceedings of the 21 st VLDB Conference, 1995.

    Google Scholar 

  79. Shafer J., Agrawal R., Mehta M.. “SPRINT: A scalable parallel classifier for data mining”, in Proceedings of the VLDB Conference, Bombay, India, September 1996.

    Google Scholar 

  80. C. Sheikholeslami, S. Chatterjee, A. Zhang. “WaveCluster: A-MultiResolution Clustering Approach for Very Large Spatial Database”, in Proceedings of 24 th VLDB Conference, Nerw York, USA, 1998.

    Google Scholar 

  81. S. Theodoridis, K. Koutroubas. Pattern recognition. Academic Press, 1999.

    Google Scholar 

  82. J. B. Tenenbaum, V. de Silva, J. C. Langford. “A global geometric framework for nonlinear dimensionality reduction”. Science, V. 290, No. 5500, pp. 2319–2323, 2000.

    Article  Google Scholar 

  83. V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.

    MATH  Google Scholar 

  84. V. Vapnik. Statistical Learning Theory. John Wiley and Sons, 1998.

    Google Scholar 

  85. M. Vlachos, C. Domeniconi, D. Gunopulos, G. KoUios, and N. Koudas. “Non-Linear Dimensionality Reduction Techniques for Classification and Visualization”, in Proceedings of ACM SIGKDD Conference, 2002.

    Google Scholar 

  86. J.S. Vitter, M. Wang, and B. R. Iyer. “Data Cube Approximation and Histograms via Wavelets”, in proceedings of the 1998 ACM CIKM International Conference on Knowledge Management.

    Google Scholar 

  87. Weiss, S.M., and Kulikowski, C. Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems. Morgan Kauffman, 1991.

    Google Scholar 

  88. Wei Wang, Jiorg Yang and Richard Muntz. “STING: A statistical information grid approach to spatial data mining”, in proceedings of 23 rd VLDB Conference, 1997.

    Google Scholar 

  89. B.-K. Yi and C. Faloutsos. “Fast Time Sequence Indexing for Arbitrary Lp Norms”, in proceedings of Very Large Data Bases Conference (VLDB), 2000.

    Google Scholar 

  90. B.-K. Yi, H. V. Jagadish, and C. Faloutsos. “Efficient Retrieval of Similar Time Sequences under Time Warping”, in proceedings of International Conference of Data Enfineering, pp. 201–208, 1998.

    Google Scholar 

  91. M. Zaki. “Efficient Enumeration of Frequent Sequences”, Machine Learning Journal, 2001.

    Google Scholar 

  92. Tian Zhang, Raghu Ramakrishnman, Miron Linvy. “BIRCH: An Efficient Method for Very Large Databases”, ACM SIGMOD, Montreal, Canada, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag London

About this chapter

Cite this chapter

Vazirgiannis, M., Halkidi, M., Gunopulos, D. (2003). Data Mining Process. In: Uncertainty Handling and Quality Assessment in Data Mining. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-4471-0031-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0031-7_2

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-1119-1

  • Online ISBN: 978-1-4471-0031-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics