A New Efficient and Unbiased Approach for Clustering Quality Evaluation

Lamirel, Jean-Charles; Cuxac, Pascal; Mall, Raghvendra; Safi, Ghada

doi:10.1007/978-3-642-28320-8_18

A New Efficient and Unbiased Approach for Clustering Quality Evaluation

Jean-Charles Lamirel²³,
Pascal Cuxac²⁴,
Raghvendra Mall²⁵ &
…
Ghada Safi²⁶

Conference paper

1488 Accesses
2 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7104))

Abstract

Traditional quality indexes (Inertia, DB, …) are known to be method-dependent indexes that do not allow to properly estimate the quality of the clustering in several cases, as in that one of complex data, like textual data. We thus propose an alternative approach for clustering quality evaluation based on unsupervised measures of Recall, Precision and F-measure exploiting the descriptors of the data associated with the obtained clusters. Two categories of index are proposed, that are Macro and Micro indexes. This paper also focuses on the construction of a new cumulative Micro precision index that makes it possible to evaluate the overall quality of a clustering result while clearly distinguishing between homogeneous and heterogeneous, or degenerated results. The experimental comparison of the behavior of the classical indexes with our new approach is performed on a polythematic dataset of bibliographical references issued from the PASCAL database.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Attik, M., Al Shehabi, S., Lamirel, J.-.C.: Clustering Quality Measures for Data Samples with Multiple Labels. In: IASTED International Conference on Artificial on Databases and Applications (DBA), Innsbruck, Austria, pp. 50–57 (February 2006)
Google Scholar
Bock, H.-H.: Probability model and hypothese testing in partitionning cluster analysis. In: Arabie, P., Hubert, L.J., De Soete, G. (eds.) Clustering and Classification, pp. 377–453. World Scientific, Singapore (1996)
Chapter Google Scholar
Davies, D., Bouldin, W.: A cluster separation measure. IEEE Transaction on Pattern Analysis and Machine Intelligence 1, 224–227 (1979)
Article Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood for incomplete data via the em algorithm. Journal of the Royal Statistical Society B-39, 1–38 (1977)
MathSciNet MATH Google Scholar
Diarmuid, Ó.S., Copestake, A.: Semantic classification with distributional kernels. In: Proceedings of COLING 2008, pp. 649–656 (2008)
Google Scholar
Dunn, J.: Well Separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104
Google Scholar
Forest, D.: Application de techniques de forage de textes de nature prédictive et exploratoire à des fins de gestion et danalyse thématique de documents textuels non structurés, PhD Thesis, Quebec University, Montreal, Canada (2007)
Google Scholar
Ghribi, M., Cuxac, P., Lamirel, J.-C., Lelu, A.: Mesures de qualité de clustering de documents: Prise en compte de la distribution des mots-clés. In: Atelier EvalECD 2010, Hamamet, Tunisie (January 2010)
Google Scholar
Gordon, A.D.: External validation in cluster analysis. Bulletin of the International Statistical Institute 51(2), 353–356 (1997); Response to comments. Bulletin of the International Statistical Institute 51(3), 414–415 (1998)
MATH Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2/3), 147–155 (2001)
Article MATH Google Scholar
Kassab, R., Lamirel, J.-C.: Feature Based Cluster Validation for High Dimensional Data. In: IASTED International Conference on Artificial Intelligence and Applications (AIA), Innsbruck, Austria, pp. 97–103 (February 2008)
Google Scholar
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 56–59 (1982)
Article MathSciNet MATH Google Scholar
Lamirel, J.-C., Al-Shehabi, S., Francois, C., Hofmann, M.: New classification quality estimators for analysis of documentary information: application to patent analysis and web mapping. Scientometrics 60, 445–562 (2004)
Article Google Scholar
Lamirel, J.-C., Attik, M.: Novel labeling strategies for hierarchical representation of multidimensional data analysis results. In: IASTED International Conference on Artificial Intelligence and Applications (AIA), Innsbruck, Austria (February 2008)
Google Scholar
Lebart, L., Morineau, A., Fenelon, J.P.: Traitement des données statistiques, Dunod, Paris (1979)
Google Scholar
MacQueen, J.: Some methods of classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symposium in Mathematics, Statistics and Probability, vol. 1, pp. 281–297. Univ. of California, Berkeley (1967)
Google Scholar
Martinetz, T., Schulten, K.: A neural gas network learns topologies. Artificial Neural Networks, 397–402 (1991)
Google Scholar
Milligan, G.W., Cooper, M.C.: An Examination of Procedures for Determining the Number of Clusters in a Data Set. Psychometrika 50, 159–179
Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53–65
Google Scholar
Salton, G.: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice Hall Inc., Englewood Cliffs (1971)
Google Scholar
Van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

LORIA, Campus Scientifique, BP 239, Vandœuvre-lès-Nancy, France
Jean-Charles Lamirel
INIST-CNRS, 2 allée du Parc de Brabois, 54500, Vandœuvre-lès-Nancy, France
Pascal Cuxac
Center of Data Engineering, IIIT Hyderabad, NBH-61, Hyderabad, Andhra Pradesh, India
Raghvendra Mall
Department of Mathematics, Faculty of Science, Aleppo University, Aleppo, Syria
Ghada Safi

Authors

Jean-Charles Lamirel
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Cuxac
View author publications
You can also search for this author in PubMed Google Scholar
Raghvendra Mall
View author publications
You can also search for this author in PubMed Google Scholar
Ghada Safi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Engineering and Information Technology, University of Technology Sydney, Broadway, PO Box 123, NSW 2007, Sydney, Australia
Longbing Cao
Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences, 518055, Shenzhen, China
Joshua Zhexue Huang & Jun Luo &
The University of Melbourne, VIC 3010, Melbourne, Australia
James Bailey
The University of Auckland, Auckland, New Zealand
Yun Sing Koh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lamirel, JC., Cuxac, P., Mall, R., Safi, G. (2012). A New Efficient and Unbiased Approach for Clustering Quality Evaluation. In: Cao, L., Huang, J.Z., Bailey, J., Koh, Y.S., Luo, J. (eds) New Frontiers in Applied Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 7104. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28320-8_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-28320-8_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28319-2
Online ISBN: 978-3-642-28320-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics