Skip to main content

Quality Assessment Approaches in Data Mining

  • Chapter
  • First Online:

Summary

The Data Mining process encompasses many different specific techniques and algorithms that can be used to analyze the data and derive the discovered knowledge. An important problem regarding the results of the Data Mining process is the development of efficient indicators of assessing the quality of the results of the analysis. This, the quality assessment problem, is a cornerstone issue of the whole process because: i) The analyzed data may hide interesting patterns that the Data Mining methods are called to reveal. Due to the size of the data, the requirement for automatically evaluating the validity of the extracted patterns is stronger than ever.

ii)A number of algorithms and techniques have been proposed which under different assumptions can lead to different results. iii)The number of patterns generated during the Data Mining process is very large but only a few of these patterns are likely to be of any interest to the domain expert who is analyzing the data. In this chapter we will introduce the main concepts and quality criteria in Data Mining. Also we will present an overview of approaches that have been proposed in the literature for evaluating the Data Mining results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   349.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Athanasopoulos, D. (1991). Probabilistic Theory. Stamoulis, Piraeus.

    Google Scholar 

  • Berry, M. and Linoff, G. (1996). Data Mining Techniques for Marketing, Sales and Customer Support. John Wiley and Sons, Inc.

    Google Scholar 

  • Bezdeck, J., Ehrlich, R., and Full,W. (1984). Fcm:fuzzy c-means algorithm. Computers and Geoscience.

    Google Scholar 

  • Brin, S., Motwani, R., Ullman, J., and Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. CSIGMOD Record (ACM Special Interest Group on Management of Data), 26(2).

    Google Scholar 

  • Dave, R. (1996). Validating fuzzy partitions obtained through c-shells clustering. Pattern Recognition Letters, 10:613–623.

    Article  MathSciNet  Google Scholar 

  • Davies, D. and Bouldin, D. (1979). A cluster separation measure. PIEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2).

    Google Scholar 

  • Dietterich, T. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7):6.

    Article  Google Scholar 

  • Dong, G. and Li, J. (1998). Research and development in knowledge discovery and Data Mining. In Proc. 2nd Pacific-Asia Conf. Knowledge Discovery and Data Mining(PAKDD).

    Google Scholar 

  • Dunn, J. (1974). Well separated clusters and optimal fuzzy partitions. Cybernetics, 4:95–104.

    Article  MathSciNet  Google Scholar 

  • Fayyad, M., Piatesky-Shapiro, G., Smuth, P., and Uthurusamy, R. (1996). Advances in Knowledge Discovery and Data Mining. AAAI Press.

    Google Scholar 

  • Gago, P. and Bentos, C. (1998). A metric for selection of the most promising rules. In Proceedings of the 2nd European Conference on The Pronciples of Data Mining and Knowledge Discovery (PKDD’98).

    Google Scholar 

  • Gath, I. and Geva, A. (1989). Unsupervised optimal fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7).

    Google Scholar 

  • Gray, B. and Orlowka, M. (1998). Ccaiia: Clustering categorial attributed into interseting accociation rules. In Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD ’98).

    Google Scholar 

  • Guha, S., Rastogi, R., and Shim, K. (1999). Rock: A robust clustering algorithm for categorical attributes. In Proceedings of the IEEE Conference on Data Engineering.

    Google Scholar 

  • Halkidi, M. and Vazirgiannis, M. (2001a). Clustering validity assessment: Finding the optimal partitioning of a data set. In Proceedings of ICDM. California, USA.

    Google Scholar 

  • Halkidi, M. and Vazirgiannis, M. (2001b). A data set oriented approach for clustering algorithm selection. In Proceedings of PKDD. Freiburg, Germany.

    Google Scholar 

  • Halkidi, M. and Vazirgiannis, M. (2002). Clustering validity assessment: Finding the optimal partitioning of a data set. In Poster paper in the Proceedings of SETN Conference. April, Thessaloniki, Greece.

    Google Scholar 

  • Halkidi, M., Vazirgiannis, M., and Batistakis, I. (2000). Quality scheme assessement in the clustering process. In Proceedings of PKDD. Lyon, France.

    Google Scholar 

  • Han, J. and Kamber, M. (2001). Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers.

    Google Scholar 

  • Hilderman, R. and Hamilton, H. (1999). Knowledge discovery and interestingness measures: A survey. In Technical Report CS 99-04. Department of Computer Science, University of Regina.

    Google Scholar 

  • Jain, A., Murty, M., and Flyn, P. (1999). Data clustering: A review. ACM Computing Surveys, 31(3).

    Google Scholar 

  • Liu, H., Hsu, W., and Chen, S. (1997). Using general impressions to analyze discovered classification rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD’97). Newport Beach, California.

    Google Scholar 

  • MAGOpus. V1.1 software. g.i. webb and assoc. In RuleQuest Research Pty Ltd, 30 Athena Avenue, St Ives NSW 2075, Australia.

    Google Scholar 

  • Milligan, G. and Cooper, M. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(3):159–179.

    Article  Google Scholar 

  • Pal, N. and Biswas, J. (1997). Cluster validation using graph theoretic concepts. Pattern Recognition, 30(6).

    Google Scholar 

  • Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In Proceedings of the 7th International Conference on Database Theory.

    Google Scholar 

  • Pei, J., Han, J., and Mao, R. (2000). Dcloset: An efficient algorithm for mining frequent closed itemsets. In Proceedings of ACM-SIGMOD International Workshop on Data Mining and Knowledge Discovery (DMKD’00).

    Google Scholar 

  • Piatetsky-Shapiro, G. (1991). Discovery analysis and presentation of strong rules. Knowledge Discovery in Databases, AAAI/MIT Press.

    Google Scholar 

  • Roberto, J., Bayardo, J., Agrawal, R., and Gunopulos, D. (1999). Constraint-based rule mining in large, dense databases. In Proceedings of the 15th ICDE.

    Google Scholar 

  • Rokach, L., Averbuch, M., and Maimon, O., Information retrieval system for medical narrative reports. Lecture notes in artificial intelligence, 3055. pp. 217-228, Springer-Verlag (2004).

    Google Scholar 

  • Sharma, S. (1996). Applied Multivariate Techniques. John Wiley and Sons.

    Google Scholar 

  • Smyth, P. (1996). Clustering using monte carlo cross-validation. In Proceedings of KDD Conference.

    Google Scholar 

  • Smyth, P. and Goodman, R. (1991). Rule induction using information theory. Knowledge Discovery in Databases, AAAI/MIT Press.

    Google Scholar 

  • Snedecor, G. and Cochran, W. (1989). Statistical Methods. owa State University Press, Ames, IA, 8th Edition.

    Google Scholar 

  • Theodoridis, S. and Koutroubas, K. (1999). Pattern recognition. Knowledge Discovery in Databases, Academic Press.

    Google Scholar 

  • Xie, X. and Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and machine Intelligence, 13(4).

    Google Scholar 

  • Zaki, M. and Hsiao, C. (2002). Charm: An efficient algorithm for closed itemset mining. In Proceedings of the 2nd SIAM International Conference on Data Mining.

    Google Scholar 

  • Zhong, N., Yao, Y., and Ohsuga, S. (1999). Peculiarity-oriented multi-database mining. In Proceedings of the 3rd European Conference on the Principles of Data Mining and Knowledge Discovery.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maria Halkidi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Halkidi, M., Vazirgiannis, M. (2009). Quality Assessment Approaches in Data Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_31

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-09823-4_31

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-09822-7

  • Online ISBN: 978-0-387-09823-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics