Abstract
Semi-supervised approaches have proven to be efficient in clustering tasks. They allow user input, thus enhancing the quality of the clustering. However, the user intervention is generally limited to integrate boolean constraints in form of must-link and cannot-link constraints between pairs of objects. This paper investigates the issue of satisfying ranked constraints in performing hierarchical clustering. \(\mathcal{SHACUN}\) is a new introduced method for handling cases when some constraints are more important than others and must be firstly enforced. Carried out experiments on real log files used for decision-maker groupization in data warehouse confirm the soundness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bade, K., Hermkes, M., Nürnberger, A.: User Oriented Hierarchical Information Organization and Retrieval. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 518–526. Springer, Heidelberg (2007)
Ben Ahmed, E., Nabli, A., Gargouri, F.: A Survey of User-Centric Data Warehouses: From Personalization to Recommendation. The International Journal of Database Management Systems (IJDMS) 3(2), 59–71 (2011)
Ben Ahmed, E., Nabli, A., Gargouri, F.: Building MultiView Analyst Profile From Multidimensional Query Logs: From Consensual to Conflicting Preferences. The International Journal of Computer Science Issues (IJCSI) 9(1), 124–131 (2012)
Bohm, C., Plant, C.: Hissclu: A hierarchical density-based method for semi-supervised clustering. In: Proceedings of the International Conference on Extending Database Technology (EDBT 2008), New York, USA, pp. 440–451 (2008)
Dasgupta, S., Ng, V.: Which clustering do you want? inducing your ideal clustering with minimal feedback. Journal of Artificial Intelligence Research 39, 581–632 (2010)
Davidson, I., Ravi, S.S.: Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results. Data Mining and Knowledge Discovery 18(2), 257–282 (2009)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29, 131–163 (1997)
Golfarelli, M.: From User Requirements to Conceptual Design in Data Warehouse Design - a Survey. In: Data Warehousing Design and Advanced Engineering Applications: Methods for Complex Construction, pp. 1–16 (2008)
Huang, R., Lam, W.: An active learning framework for semi-supervised document clustering with language modeling. Data and Knowledge Engineering 68(1), 49–67 (2009)
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River (1988)
Kestler, H.A., Kraus, J.M., Palm, G., Schwenker, F.: On the Effects of Constraints in Semi-Supervised Hierarchical Clustering. In: Schwenker, F., Marinai, S. (eds.) ANNPR 2006. LNCS (LNAI), vol. 4087, pp. 57–66. Springer, Heidelberg (2006)
Klein, D., Kamvar, S.D., Manning, C.D.: From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: International Conference on Machine Learning (ICML 2002), pp. 307–314. Springer, San francisco (2002)
Nogueira, B.M., Jorge, A.M., Rezende, S.O.: Hierarchical confidence-based active clustering. In: The Symposium on Applied Computing, pp. 535–537 (2012)
Provost, F., Fawcett, T.: The case against accuracy estimation for comparing induction algorithms. In: International Conference on Machine Learning, Madison, Wisconsin USA, pp. 445–453 (1998)
Quinlan, J.R.: Induction of decision trees. Machine Learning, 81–106 (1986)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ben Ahmed, E., Nabli, A., Gargouri, F. (2012). \(\mathcal{SHACUN}\): Semi-supervised Hierarchical Active Clustering Based on Ranking Constraints. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2012. Lecture Notes in Computer Science(), vol 7377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31488-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-31488-9_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31487-2
Online ISBN: 978-3-642-31488-9
eBook Packages: Computer ScienceComputer Science (R0)