Summary
The Data Mining process encompasses many different specific techniques and algorithms that can be used to analyze the data and derive the discovered knowledge. An important problem regarding the results of the Data Mining process is the development of efficient indicators of assessing the quality of the results of the analysis. This, the quality assessment problem, is a cornerstone issue of the whole process because: i) The analyzed data may hide interesting patterns that the Data Mining methods are called to reveal. Due to the size of the data, the requirement for automatically evaluating the validity of the extracted patterns is stronger than ever.
ii)A number of algorithms and techniques have been proposed which under different assumptions can lead to different results. iii)The number of patterns generated during the Data Mining process is very large but only a few of these patterns are likely to be of any interest to the domain expert who is analyzing the data. In this chapter we will introduce the main concepts and quality criteria in Data Mining. Also we will present an overview of approaches that have been proposed in the literature for evaluating the Data Mining results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Athanasopoulos, D. (1991). Probabilistic Theory. Stamoulis, Piraeus.
Berry, M. and Linoff, G. (1996). Data Mining Techniques for Marketing, Sales and Customer Support. John Wiley and Sons, Inc.
Bezdeck, J., Ehrlich, R., and Full,W. (1984). Fcm:fuzzy c-means algorithm. Computers and Geoscience.
Brin, S., Motwani, R., Ullman, J., and Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. CSIGMOD Record (ACM Special Interest Group on Management of Data), 26(2).
Dave, R. (1996). Validating fuzzy partitions obtained through c-shells clustering. Pattern Recognition Letters, 10:613–623.
Davies, D. and Bouldin, D. (1979). A cluster separation measure. PIEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2).
Dietterich, T. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7):6.
Dong, G. and Li, J. (1998). Research and development in knowledge discovery and Data Mining. In Proc. 2nd Pacific-Asia Conf. Knowledge Discovery and Data Mining(PAKDD).
Dunn, J. (1974). Well separated clusters and optimal fuzzy partitions. Cybernetics, 4:95–104.
Fayyad, M., Piatesky-Shapiro, G., Smuth, P., and Uthurusamy, R. (1996). Advances in Knowledge Discovery and Data Mining. AAAI Press.
Gago, P. and Bentos, C. (1998). A metric for selection of the most promising rules. In Proceedings of the 2nd European Conference on The Pronciples of Data Mining and Knowledge Discovery (PKDD’98).
Gath, I. and Geva, A. (1989). Unsupervised optimal fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7).
Gray, B. and Orlowka, M. (1998). Ccaiia: Clustering categorial attributed into interseting accociation rules. In Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD ’98).
Guha, S., Rastogi, R., and Shim, K. (1999). Rock: A robust clustering algorithm for categorical attributes. In Proceedings of the IEEE Conference on Data Engineering.
Halkidi, M. and Vazirgiannis, M. (2001a). Clustering validity assessment: Finding the optimal partitioning of a data set. In Proceedings of ICDM. California, USA.
Halkidi, M. and Vazirgiannis, M. (2001b). A data set oriented approach for clustering algorithm selection. In Proceedings of PKDD. Freiburg, Germany.
Halkidi, M. and Vazirgiannis, M. (2002). Clustering validity assessment: Finding the optimal partitioning of a data set. In Poster paper in the Proceedings of SETN Conference. April, Thessaloniki, Greece.
Halkidi, M., Vazirgiannis, M., and Batistakis, I. (2000). Quality scheme assessement in the clustering process. In Proceedings of PKDD. Lyon, France.
Han, J. and Kamber, M. (2001). Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers.
Hilderman, R. and Hamilton, H. (1999). Knowledge discovery and interestingness measures: A survey. In Technical Report CS 99-04. Department of Computer Science, University of Regina.
Jain, A., Murty, M., and Flyn, P. (1999). Data clustering: A review. ACM Computing Surveys, 31(3).
Liu, H., Hsu, W., and Chen, S. (1997). Using general impressions to analyze discovered classification rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD’97). Newport Beach, California.
MAGOpus. V1.1 software. g.i. webb and assoc. In RuleQuest Research Pty Ltd, 30 Athena Avenue, St Ives NSW 2075, Australia.
Milligan, G. and Cooper, M. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(3):159–179.
Pal, N. and Biswas, J. (1997). Cluster validation using graph theoretic concepts. Pattern Recognition, 30(6).
Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In Proceedings of the 7th International Conference on Database Theory.
Pei, J., Han, J., and Mao, R. (2000). Dcloset: An efficient algorithm for mining frequent closed itemsets. In Proceedings of ACM-SIGMOD International Workshop on Data Mining and Knowledge Discovery (DMKD’00).
Piatetsky-Shapiro, G. (1991). Discovery analysis and presentation of strong rules. Knowledge Discovery in Databases, AAAI/MIT Press.
Roberto, J., Bayardo, J., Agrawal, R., and Gunopulos, D. (1999). Constraint-based rule mining in large, dense databases. In Proceedings of the 15th ICDE.
Rokach, L., Averbuch, M., and Maimon, O., Information retrieval system for medical narrative reports. Lecture notes in artificial intelligence, 3055. pp. 217-228, Springer-Verlag (2004).
Sharma, S. (1996). Applied Multivariate Techniques. John Wiley and Sons.
Smyth, P. (1996). Clustering using monte carlo cross-validation. In Proceedings of KDD Conference.
Smyth, P. and Goodman, R. (1991). Rule induction using information theory. Knowledge Discovery in Databases, AAAI/MIT Press.
Snedecor, G. and Cochran, W. (1989). Statistical Methods. owa State University Press, Ames, IA, 8th Edition.
Theodoridis, S. and Koutroubas, K. (1999). Pattern recognition. Knowledge Discovery in Databases, Academic Press.
Xie, X. and Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and machine Intelligence, 13(4).
Zaki, M. and Hsiao, C. (2002). Charm: An efficient algorithm for closed itemset mining. In Proceedings of the 2nd SIAM International Conference on Data Mining.
Zhong, N., Yao, Y., and Ohsuga, S. (1999). Peculiarity-oriented multi-database mining. In Proceedings of the 3rd European Conference on the Principles of Data Mining and Knowledge Discovery.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Halkidi, M., Vazirgiannis, M. (2009). Quality Assessment Approaches in Data Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_31
Download citation
DOI: https://doi.org/10.1007/978-0-387-09823-4_31
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09822-7
Online ISBN: 978-0-387-09823-4
eBook Packages: Computer ScienceComputer Science (R0)