Abstract
The knowledge discovery from large data repositories has been accepted as a key research issue in the field of databases, machine learning, and statistics, as well as an important opportunity for innovation in business. Various applications, such as data warehousing and on-line services via the Internet, invoke different data mining techniques in order to achieve better understanding of customers’ behavior and thus to improve the quality of provided services achieving their business advantage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Faloutsos, C, and Swami, A. “Efficient Similarity Search in Sequence Databases”, in Proceedings of the 4 th FODO Conference, 1993.
Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications”, in Proceedings of the ACM SIGMOD Conference on Management of Data, 1998.
Agrawal R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A.I. “Fast Discovery of Association Rules”, in Usama M. Fayyad, Gregory Piatesky-Shapiro, Padhraic Smuth and Ramasamy Uthurusamy. Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996.
Aggarwal C.C., Procopiuc, C, Wolf, J.L., Yu, P.S., and Park, J.S. “Fast Algorithms for Projected Clustering”, in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1999.
Agrawal R & Srikant R. “Fast Algorithms for Mining Association Rules”, in Proceedings of the 20 th Very Large Data Bases Conference, Santiago de Chile, Chile, 1994.
R. Agrawal, R. Srikant. “Mining Sequential Patterns”, in Proceedings of the Fifth International Conference on Extending Database Technology (EDBT), Avignon, France, March 1996.
C. Aggarwal and P. S. Yu, “Finding generalized projected clusters in high dimensional spaces”, in Proceedings of the ACM SIGMOD International Conference on Management of Data, 2000.
R.J. Bayardo. “Efficiently Mining Long Patterns from Databases”, in Proceedings of ACM SIGMOD International Conference on Management of Data, 1998.
D. Bemdt and J. Clifford. “Using Dynamic Time Warping to Find Patterns in Time Series.” in Proceedings of the KDD Workshop, 1996.
D. Budrick, M. Calimlim, and J. Gehrke. “Mafia: a maximal frequent itemset algorithm for transactional databases”, in International Conference on Data Engineering, 2001.
B. Bollobas, G. Das, D. Gunopulos, H. Mannila. “Time-series Similarity Problems and Well Separated Geometric Sets”, in Nordic Journal of Computing, V. 4, 2001.
Bezdeck J.C, Ehrlich R., Full W., “FCM: Fuzzy C-Means Algorithm”, Computers and Geoscience, 1984.
D. Barbara, C. Faloutsos, J. Hellerstein, Y. loannidis, H.V. Jagadish, T. Johnson, R. Ng, V. Poosala, K. Ross, and K.V. Sevcik. The New Jersey Data Reduction Report, Data Engineering Bulletin, September, 1996.
L. Breiman, J. Friedman, R. Olshen, C. Stone. Classification and Regression Trees. Wadsworth, 1984.
P. Bradley, U. Fayyad, and C. Reina. “Scaling EM (Expectation-Maximization) clustering to large databases”, Microsoft Research Report, MSR-TR-98-35, August, 1998.
Michael J. A. Berry, Gordon Linoff. Data Mining Techniques For marketing, Sales and Customer Support, John Willey & Sons, Inc, 1996.
E. Bingham, H. Mannila. “Random projection in dimensionality reduction: applications to image and text data”, in Proceedings ACM SIGKDD, 2001.
A. Borodin, R. Ostrovsky, and Y. Rabani. “Subquadratic Approximation Algorithms for Clustering Problems in High Dimensional Spaces”, in Proceedings of STOC, pp. 435–444, 1999.
S. Chiu. “Extracting Fuzzy Rules from Data for Function Approximation and Pattern Classification”. Fuzzy Information Engineering — A Guided Tour of Applications, (eds.: D. Dubois, H. Prade, R Yager), 1997.
Cover, T., and Hart, P. “Nearest Neighbor Pattern Classification”, in IEEE Transactions on Information Theory, pp. 21–27, 1967.
Ming-Syan Chen, Jiawei Han, Philip S. Yu. “Data Mining: An Overview from a Database Perspective”, IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, December, 1996.
E. Keogh, K. Chakrabarti, S. Mehrotra, and M. Pazzani. “Locally adaptive dimensionality reduction for indexing large time series databases”, in Proceedings of ACM SIGMOD Conference on Management of Data, 2001.
P. Cheeseman, J. Stutz. “Bayesian Classification (AutoClass): Theory and Results”. Advances in Knowledge Discovery and Data Mining (eds: U. Fayyad, et al.), AAAI Press, 1996.
[DH73]Duda, R.O., and Hart, P.E. Pattern Classification and Scene Analysis. John Wiley and Sons, 1973.
Domeniconi, C, Peng, J., and Gunopulos, D. “An Adaptive Metric Machine for Pattern Classification”, in Advances in Neural Information Processing Systems, 2000.
Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu. “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, in Proceedings of 2 nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 226–231, 1996.
Martin Ester, Hans-Peter Kriegel, Jorg Sander, Michael Wimmer, Xiaowei Xu. “Incremental Clustering for Mining in a Data Warehousing Environment”, in Proceedings of 24 nd VLDB Conference, New York, USA, 1998.
Faloutsos, C. Searching Multimedia Databases by Content, Kluwer Academic, 1996.
Faloutsos, C, Lin, K.-I. “Fastmap: A fast algorithm for indexing, data-mining, and visualization of traditional and multimedia data sets”, in Proceedings of the ACM SIGMOD Conference on Management of Data, 1995.
Usama M. Fayyad, Gregory Piatesky-Shapiro, Padhraic Smuth and Ramasamy Uthurusamy. Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996.
Friedman, J. “Flexible Metric Nearest Neighbor Classification” Technical Report, Department of Statistics, Stanford University, 1994.
Fukunaga, K. Introduction to Statistical Pattern Recognition, Academic Press, 1990.
D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen. “Data mining, hypergraph transversals, and machine learning”, in Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1997.
S. Guha, N. Mishra, R. Motwani, L. O’Callaghan. “Clustering Data Streams”, in IEEE Foundations of Computer Science, 2000.
Glymour C, Madigan D., Pregibon D, Smyth P, “Statistical Inference and Data Mining”, in Communications of ACM, V39 (11), 1996, pp. 35–42.
J. Gehrke, R. Ramakrishnan, and V. Ganti. Rainforest. “A framework for fast decision tree construction of large data sets”. Journal of Data Mining and Knowledge Discovery, V.4, No. 2/3, pp. 127–162, 2000.
Sudipto Guha, Rajeev Rastogi, Kyueseok Shim. “CURE: An Efficient Clustering Algorithm for Large Databases”, in Proceedings of the ACM SIGMOD Conference, 1998.
Sudipto Guha, Rajeev Rastogi, Kyueseok Shim. “ROCK: A Robust Clustering Algorithm for Categorical Attributes”, in Proceedings of the IEEE Conference on Data Engineering, 1999.
X. Ge and P. Smyth. “Deformable Markov model templates for time-series pattern matching”, in Proceedings of ACM SIGKDD, 2000.
M. Gupta, and T. Yamakawa, (eds) “Fuzzy Logic and Knowledge Based Systems”, Decision and Control (North Holland), 1988.
Alexander Hinneburg, Daniel Keim. “An Efficient Approach to Clustering in Large Multimedia Databases with Noise”, in Proceedings of KDD Conference, 1998.
Jiawei Han, Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2001.
Hand, D., Mannila, H., and Smyth, P. Principles of Data Mining. The MIT Press, 2001.
T. Horiuchi. “Decision Rule for Pattern Classification by Integrating Interval Feature Values”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 4, April 1998, pp. 440–448.
J. Han, J. Pei, and Y. Yin. “Mining frequent patterns without candidate generation”, in Proceedings of ACM SIGMOD International Conference on Management of Data, 2000.
Hastie, T., and Tibshirani, R. “Discriminant Adaptive Nearest Neighbor Classification”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 18, No. 6, pp. 607–615, 1996.
Zhexue Huang. “A Fast Clustering Algorithm to Cluster very Large Categorical Data sets in Data Mining”, DMKD, 1997.
P. Indyk and R. Motwani. “Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality”, in Proceedings of STOC, 1998.
P. Indyk. “A sublinear-time approximation scheme for clustering in metric spaces”, in Proceedings of the 40 th Symposium on Foundations of Computer Science, 1999.
Cezary Z. Janikow, “Fuzzy Decision Trees: Issues and Methods”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 28, Issuel,pp. 1–14, 1998.
Jain, A.K., and Dubes, R.C. Algorithms for Clustering Data, Prentice Hall, 1988.
A.K Jain, M.N. Murty, P.J. Flyn. “Data Clustering: A Review”, ACM Computing Surveys, Vol. 31, No. 3, September 1999.
T. Joachims. “Text Categorization with Support Vector Machines”, in Proceedings of European Conference on Machine Learning, 1998.
E. Keogh. Exact Indexing of Dynamic Time Warping. Proc. of Very Large Data Bases Conf. (VLDB) 2002.
Keogh, E., Chu, S., Hart, D., Pazzani, M. “An Online Algorithm for Segmenting Time Series.” in Proceedings of IEEE International Conference on Data Mining, pp. 289–296, 2001.
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S. “Dimensionality reduction for fast similarity search in large time series databases”. Journal of Knowledge and Information Systems, pp 263–286, 2000.
G. Karypis, Eui-Hong Han, V. Kumar. “CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling”, IEEE Computer, Vol. 32, No. 8, 68–75, 1999.
E. Keogh and M. Pazzani. “Scaling up Dynamic Time Warping for Datamining Applications”, in Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, Boston, MA, 2000.
Kauffman, L., and Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, 1990.
J. B. Kruskal, and D. Sankoff, Editors. Time Wraps, String Edits, and Macromolecules. The Theory and Practice of Sequence Comparison. Addison-Wesley, 1983.
T. Kahveci and A. K. Singh. “Variable length queries for time series data”, in proceedings of IEEE Inernational Conference on Data Engineering, 2001.
Kruskal, J., and Wish, M. Multidimensional Scaling. Quantitative Applications in the Social Sciences, SAGE Publications, 1978.
H.V. Jagadish, Alberto O. Mendelzon, and Tova Milo. “Similarity-based queries”, in proceedings of the 14th ACM PODS, pages 36–45, May 1995.
M. Melta, R. Agrawal, J. Rissanen. “SLIQ: A fast scalable classifier for data mining”, in Proceedings of EDBT’ 96, Avigon France, March, 1996.
MacQueen, J.B “Some Methods for Classification and Analysis of Multivariate Observations”, in Proceedings of 5th Berkley Symposium on Mathematical Statistics and Probability, Volume I: Statistics, pp. 281–297, 1967.
T. Mitchell. Machine Learning. McGraw-Hill, 1997.
H. Mannila, H. Toivonen, A. I. Verkamo: Discovery of frequent episodes in event sequences. Report C-1997-15, University of Helsinki, Department of Computer Science, February 1997.
Raymond Ng, Jiawei Han. “Efficient and Effective Clustering Methods for Spatial Data Mining”, in Proceedings of the 2 th VLDB Conference, Santiago, Chile, 1994.
A. Nanopoulos, Y. Theodoridis, Y. Manolopoulos. “C2P: Clustering based on Closest Pairs”, in Proceeding of the VLDB Conference, Roma, Italy, 2001.
C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. M. Murali. “A monte carlo algorithm for fast projective clustering”, in Proceedings of the ACM SIGMOD Conference on Management of Data, 2002.
J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M-C. Hsu. “PrefixSpan: Mining Sequential Patterns Efficiently by PrefixProjected Pattern Growth”, In Proceedings of International Conference of Data Engineering (ICDE’ 01), 2001.
S. Pemg, H. Wang, S. Zhang, and D.S. Parker. “Landmarks: A New Model for Similarity-based Pattern Matching in Time Series Databases”, in Proceedings of IEEE International Conference of Data Engineering, 2000.
J.R Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, 1993.
Ramze Rezaee, B.P.F. Lelieveldt, J.H.C Reiber. “A new cluster validity index for the fuzzy c-mean”. Pattern Recognition Letters, 19, pp. 237–246, 1998.
D. Rafiei, A. Mendelzon. “Querying Time Series Data Based on Similarity”, in IEEE Transactions on Knowledge and Data Engineering, V. 12, No.5, pp. 675–683, 2000.
R. Rastori, K. Shim. “PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning”, in Proceedings of the 24 th VLDB Conference, New York, USA, 1998.
Roweis, S., and Saul, L. “Nonlinear dimensionality reduction by locally linear embeddings”. Science, V.290, No. 5500, pp. 2323–2326, 2000.
R. Srikant, R. Agrawal. “Mining Generalized Association Rules”, in Proceedings of the 21 st VLDB Conference, 1995.
Shafer J., Agrawal R., Mehta M.. “SPRINT: A scalable parallel classifier for data mining”, in Proceedings of the VLDB Conference, Bombay, India, September 1996.
C. Sheikholeslami, S. Chatterjee, A. Zhang. “WaveCluster: A-MultiResolution Clustering Approach for Very Large Spatial Database”, in Proceedings of 24 th VLDB Conference, Nerw York, USA, 1998.
S. Theodoridis, K. Koutroubas. Pattern recognition. Academic Press, 1999.
J. B. Tenenbaum, V. de Silva, J. C. Langford. “A global geometric framework for nonlinear dimensionality reduction”. Science, V. 290, No. 5500, pp. 2319–2323, 2000.
V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.
V. Vapnik. Statistical Learning Theory. John Wiley and Sons, 1998.
M. Vlachos, C. Domeniconi, D. Gunopulos, G. KoUios, and N. Koudas. “Non-Linear Dimensionality Reduction Techniques for Classification and Visualization”, in Proceedings of ACM SIGKDD Conference, 2002.
J.S. Vitter, M. Wang, and B. R. Iyer. “Data Cube Approximation and Histograms via Wavelets”, in proceedings of the 1998 ACM CIKM International Conference on Knowledge Management.
Weiss, S.M., and Kulikowski, C. Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems. Morgan Kauffman, 1991.
Wei Wang, Jiorg Yang and Richard Muntz. “STING: A statistical information grid approach to spatial data mining”, in proceedings of 23 rd VLDB Conference, 1997.
B.-K. Yi and C. Faloutsos. “Fast Time Sequence Indexing for Arbitrary Lp Norms”, in proceedings of Very Large Data Bases Conference (VLDB), 2000.
B.-K. Yi, H. V. Jagadish, and C. Faloutsos. “Efficient Retrieval of Similar Time Sequences under Time Warping”, in proceedings of International Conference of Data Enfineering, pp. 201–208, 1998.
M. Zaki. “Efficient Enumeration of Frequent Sequences”, Machine Learning Journal, 2001.
Tian Zhang, Raghu Ramakrishnman, Miron Linvy. “BIRCH: An Efficient Method for Very Large Databases”, ACM SIGMOD, Montreal, Canada, 1996.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag London
About this chapter
Cite this chapter
Vazirgiannis, M., Halkidi, M., Gunopulos, D. (2003). Data Mining Process. In: Uncertainty Handling and Quality Assessment in Data Mining. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-4471-0031-7_2
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0031-7_2
Publisher Name: Springer, London
Print ISBN: 978-1-4471-1119-1
Online ISBN: 978-1-4471-0031-7
eBook Packages: Springer Book Archive