Quality Assessment Approaches in Data Mining

Halkidi, Maria; Vazirgiannis, Michalis

doi:10.1007/978-0-387-09823-4_31

Quality Assessment Approaches in Data Mining

Maria Halkidi^3,4 &
Michalis Vazirgiannis⁴

Chapter
First Online: 01 January 2010

16k Accesses
1 Citations

Summary

The Data Mining process encompasses many different specific techniques and algorithms that can be used to analyze the data and derive the discovered knowledge. An important problem regarding the results of the Data Mining process is the development of efficient indicators of assessing the quality of the results of the analysis. This, the quality assessment problem, is a cornerstone issue of the whole process because: i) The analyzed data may hide interesting patterns that the Data Mining methods are called to reveal. Due to the size of the data, the requirement for automatically evaluating the validity of the extracted patterns is stronger than ever.

ii)A number of algorithms and techniques have been proposed which under different assumptions can lead to different results. iii)The number of patterns generated during the Data Mining process is very large but only a few of these patterns are likely to be of any interest to the domain expert who is analyzing the data. In this chapter we will introduce the main concepts and quality criteria in Data Mining. Also we will present an overview of approaches that have been proposed in the literature for evaluating the Data Mining results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Athanasopoulos, D. (1991). Probabilistic Theory. Stamoulis, Piraeus.
Google Scholar
Berry, M. and Linoff, G. (1996). Data Mining Techniques for Marketing, Sales and Customer Support. John Wiley and Sons, Inc.
Google Scholar
Bezdeck, J., Ehrlich, R., and Full,W. (1984). Fcm:fuzzy c-means algorithm. Computers and Geoscience.
Google Scholar
Brin, S., Motwani, R., Ullman, J., and Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. CSIGMOD Record (ACM Special Interest Group on Management of Data), 26(2).
Google Scholar
Dave, R. (1996). Validating fuzzy partitions obtained through c-shells clustering. Pattern Recognition Letters, 10:613–623.
Article MathSciNet Google Scholar
Davies, D. and Bouldin, D. (1979). A cluster separation measure. PIEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2).
Google Scholar
Dietterich, T. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7):6.
Article Google Scholar
Dong, G. and Li, J. (1998). Research and development in knowledge discovery and Data Mining. In Proc. 2nd Pacific-Asia Conf. Knowledge Discovery and Data Mining(PAKDD).
Google Scholar
Dunn, J. (1974). Well separated clusters and optimal fuzzy partitions. Cybernetics, 4:95–104.
Article MathSciNet Google Scholar
Fayyad, M., Piatesky-Shapiro, G., Smuth, P., and Uthurusamy, R. (1996). Advances in Knowledge Discovery and Data Mining. AAAI Press.
Google Scholar
Gago, P. and Bentos, C. (1998). A metric for selection of the most promising rules. In Proceedings of the 2nd European Conference on The Pronciples of Data Mining and Knowledge Discovery (PKDD’98).
Google Scholar
Gath, I. and Geva, A. (1989). Unsupervised optimal fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7).
Google Scholar
Gray, B. and Orlowka, M. (1998). Ccaiia: Clustering categorial attributed into interseting accociation rules. In Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD ’98).
Google Scholar
Guha, S., Rastogi, R., and Shim, K. (1999). Rock: A robust clustering algorithm for categorical attributes. In Proceedings of the IEEE Conference on Data Engineering.
Google Scholar
Halkidi, M. and Vazirgiannis, M. (2001a). Clustering validity assessment: Finding the optimal partitioning of a data set. In Proceedings of ICDM. California, USA.
Google Scholar
Halkidi, M. and Vazirgiannis, M. (2001b). A data set oriented approach for clustering algorithm selection. In Proceedings of PKDD. Freiburg, Germany.
Google Scholar
Halkidi, M. and Vazirgiannis, M. (2002). Clustering validity assessment: Finding the optimal partitioning of a data set. In Poster paper in the Proceedings of SETN Conference. April, Thessaloniki, Greece.
Google Scholar
Halkidi, M., Vazirgiannis, M., and Batistakis, I. (2000). Quality scheme assessement in the clustering process. In Proceedings of PKDD. Lyon, France.
Google Scholar
Han, J. and Kamber, M. (2001). Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers.
Google Scholar
Hilderman, R. and Hamilton, H. (1999). Knowledge discovery and interestingness measures: A survey. In Technical Report CS 99-04. Department of Computer Science, University of Regina.
Google Scholar
Jain, A., Murty, M., and Flyn, P. (1999). Data clustering: A review. ACM Computing Surveys, 31(3).
Google Scholar
Liu, H., Hsu, W., and Chen, S. (1997). Using general impressions to analyze discovered classification rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD’97). Newport Beach, California.
Google Scholar
MAGOpus. V1.1 software. g.i. webb and assoc. In RuleQuest Research Pty Ltd, 30 Athena Avenue, St Ives NSW 2075, Australia.
Google Scholar
Milligan, G. and Cooper, M. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(3):159–179.
Article Google Scholar
Pal, N. and Biswas, J. (1997). Cluster validation using graph theoretic concepts. Pattern Recognition, 30(6).
Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In Proceedings of the 7th International Conference on Database Theory.
Google Scholar
Pei, J., Han, J., and Mao, R. (2000). Dcloset: An efficient algorithm for mining frequent closed itemsets. In Proceedings of ACM-SIGMOD International Workshop on Data Mining and Knowledge Discovery (DMKD’00).
Google Scholar
Piatetsky-Shapiro, G. (1991). Discovery analysis and presentation of strong rules. Knowledge Discovery in Databases, AAAI/MIT Press.
Google Scholar
Roberto, J., Bayardo, J., Agrawal, R., and Gunopulos, D. (1999). Constraint-based rule mining in large, dense databases. In Proceedings of the 15th ICDE.
Google Scholar
Rokach, L., Averbuch, M., and Maimon, O., Information retrieval system for medical narrative reports. Lecture notes in artificial intelligence, 3055. pp. 217-228, Springer-Verlag (2004).
Google Scholar
Sharma, S. (1996). Applied Multivariate Techniques. John Wiley and Sons.
Google Scholar
Smyth, P. (1996). Clustering using monte carlo cross-validation. In Proceedings of KDD Conference.
Google Scholar
Smyth, P. and Goodman, R. (1991). Rule induction using information theory. Knowledge Discovery in Databases, AAAI/MIT Press.
Google Scholar
Snedecor, G. and Cochran, W. (1989). Statistical Methods. owa State University Press, Ames, IA, 8th Edition.
Google Scholar
Theodoridis, S. and Koutroubas, K. (1999). Pattern recognition. Knowledge Discovery in Databases, Academic Press.
Google Scholar
Xie, X. and Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and machine Intelligence, 13(4).
Google Scholar
Zaki, M. and Hsiao, C. (2002). Charm: An efficient algorithm for closed itemset mining. In Proceedings of the 2nd SIAM International Conference on Data Mining.
Google Scholar
Zhong, N., Yao, Y., and Ohsuga, S. (1999). Peculiarity-oriented multi-database mining. In Proceedings of the 3rd European Conference on the Principles of Data Mining and Knowledge Discovery.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of California at Riverside, Piraeus, USA
Maria Halkidi
Department of Informatics, Athens University of Economics and Business, Athens, Greece
Maria Halkidi & Michalis Vazirgiannis

Authors

Maria Halkidi
View author publications
You can also search for this author in PubMed Google Scholar
Michalis Vazirgiannis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria Halkidi .

Editor information

Editors and Affiliations

, Dept. Industrial Engineering, Tel Aviv University, Ramat Aviv, 69978, Israel
Oded Maimon
, Dept. Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel
Lior Rokach

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Halkidi, M., Vazirgiannis, M. (2009). Quality Assessment Approaches in Data Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_31

Download citation

DOI: https://doi.org/10.1007/978-0-387-09823-4_31
Published: 07 July 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09822-7
Online ISBN: 978-0-387-09823-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics