A Novel Hierarchical Clustering Scheme Based on Q-Criterion

Conference paper
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 162)

Abstract

The most important step in hierarchical clustering is to find a pair of clusters with the highest degree of similarity to merge. A widely used evolutionary tree reconstruction algorithm in computational biology, Neighbor joining, defined a similarity metrics based on Q-criterion. A great deal of empirical testing and theoretical studies have showed that the Q-criterion is linear in distances, permutation equivariant, consistent. Motivated by Neighbor joining, this paper proposes a Q-criterion based hierarchical clustering algorithm, named HACNJ. The main contribution of HACNJ is to firstly introduce the Q-criterion to clustering. The final experiment on Iris dataset verifies that HACNJ is effective.

Keywords

hierarchical clustering agglomerative clustering similarity metrics Neighbor Joining Q-criterion 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Comput. Surveys 31, 264–323 (1999)CrossRefGoogle Scholar
  2. 2.
    Fajriya Hakim, R.B., Subanar, E.W.: Reducing Hierarchical Clustering Instability using Clustering Based on Indiscernibility and Indis. In: IEEE International Conference on Granular Computing, pp. 182–187 (2010)Google Scholar
  3. 3.
    van der Kloot, Willem, A., Spaans, A.M.J., Heiser, W.J.: Instability of Hierarchical Cluster Analysis Due to Input Order of the Data: The Permu Cluster Solution. Psychological Methods 10(4), 468–476 (2005)CrossRefGoogle Scholar
  4. 4.
    Guha, S., Rastogi, R., Skim, K.: Cure: An efficient clustering algorithm for large databases. In: Proceedings of the Fourth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 73–84 (1998)Google Scholar
  5. 5.
    Karypis, G., Han, E.-H., Kumar, V.: Chameleon: A hierarchical clustering algorithm using dynamic modeling. Computer (32), 68–75 (1999)Google Scholar
  6. 6.
    Zhou, X., Wang, X., Dougherty, E.R., Russ, D., Suh, E.: Gene Clustering Based on Cluster wide Mutual Information. Computational Biology 11(1), 147–161 (2004)CrossRefGoogle Scholar
  7. 7.
    Gokcay, E., Principe, J.C.: Information Theoretic Clustering. IEEE Transaction on PAMI 24(2), 158–171 (2002)CrossRefGoogle Scholar
  8. 8.
    Saitou, N., Nei, M.: The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic Trees. Mol. Biol. Evol. 4(4), 406–425 (1987)Google Scholar
  9. 9.
    Studier, J.A., Keppler, K.J.: A note on the neighbor-joining method of Saitou and Nei. Mol. Biol. Evol. 5(6), 729–731 (1988)Google Scholar
  10. 10.
    Gascuel Concerning the NJ Algorithm and Its Unweighted Version, UNJ. Mathematical Hierarchies and Biology. DIMACS series in Discrete Mathematics and Theoretical Computer Science, pp. 149–170 (1997)Google Scholar
  11. 11.
    Atteson, K.: The Performance of the Neighbor-Joining Methods of Phylogenetic Reconstruction. Algorithmica 25, 251–278 (1999)MathSciNetMATHCrossRefGoogle Scholar
  12. 12.
    Bryant, D.: On the Uniqueness of the Selection Criterion in Neighbor-joining. Journal of Classification 22(1), 3–15 (2005)MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    Gascuel, Steel, M.: Neighbor-Joining Revealed. Molecular Biology and Evolution 23(11), 1997–2000 (2006)CrossRefGoogle Scholar
  14. 14.
    Mihaescu, R., Levy, D., Pachter, L.: Why Neighbor-Joining Works. Algorithmica (2007)Google Scholar
  15. 15.
    John, K.S., Warnow, T., Moret, B., Vawter, L.: Performance Study of Phylogenetic Methods (Unweighted) Quartet Methods and Neighbor Joining. Journal of Algorithms 48(1), 174–193 (2003)Google Scholar
  16. 16.
    Bryant, D.: On the uniqueness of the selection criterion in neighbor-joining. J. Classif. 22(1), 3–1 (2005)Google Scholar
  17. 17.
    Saitou, N., Nei, M.: The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic Trees. Mol. Biol. Evol. 4(4), 406–425 (1987)Google Scholar
  18. 18.
    Studier, J.A., Keppler, K.J.: A note on the neighbor-joining method of Saitou and Nei. Mol. Biol. Evol. 5(6), 729–731 (1988)Google Scholar
  19. 19.
    Hoyle, D.C., Higgs, P.G.: Factors Affecting the Errors in the Estimation of Evolutionary Distances between Sequences. Molecular Biology and Evolution 20(1), 1–9 (2003)CrossRefGoogle Scholar
  20. 20.
    Zharkikh, Li, W.H.: Estimation of Confidence in Phylogeny: the Complete and Partial Bootstrap Technique. Molecular Phylogenetics and Evolution 4(1), 44–63 (1995)CrossRefGoogle Scholar
  21. 21.
    Gascuel: Concerning the NJ Algorithm and Its Unweighted Version, UNJ. Mathematical Hierarchies and Biology. DIMACS series in Discrete Mathematics and Theoretical Computer Science, pp. 149–170 (1997)Google Scholar
  22. 22.
    Atteson, K.: The Performance of the Neighbor-Joining Methods of Phylogenetic Reconstruction. Algorithmica 25, 251–278 (1999)MathSciNetMATHCrossRefGoogle Scholar
  23. 23.
    Bryant, D.: On the Uniqueness of the Selection Criterion in Neighbor-joining. Journal of Classification 22(1), 3–15 (2005)MathSciNetMATHCrossRefGoogle Scholar
  24. 24.
    Gascuel, Steel, M.: Neighbor-Joining Revealed. Molecular Biology and Evolution 23(11), 1997–2000 (2006)CrossRefGoogle Scholar
  25. 25.
    Mihaescu, R., Levy, D., Pachter, L.: Why Neighbor-Joining Works. Algorithmica (2007)Google Scholar
  26. 26.
    John, K.S., Warnow, T., Moret, B., Vawter, L.: Performance Study of Phylogenetic Methods (Unweighted) Quartet Methods and Neighbor Joining. Journal of Algorithms 48(1), 174–193 (2003)Google Scholar
  27. 27.
  28. 28.
    Anderson: The Irises of the Gaspe peninsula. Bulletin of the American Iris Society 59, 2–5 (1935)Google Scholar
  29. 29.
    Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annuals of Eugenics 7, 179–188 (1936)Google Scholar
  30. 30.
  31. 31.
    Jenssen, R., Erdogmus, D., Hild, K.E., Principe, J.C., Eltoft, T.: Information Force Clustering Using Directed Tree. In: Rangarajan, A., Figueiredo, M.A.T., Zerubia, J. (eds.) EMMCVPR 2003. LNCS, vol. 2683, pp. 68–82. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  32. 32.
    Aghagolzadeh, M., Soltanian-Zadeh, H., Araabi, B., Aghagolzadeh, A.: A hierarchical clustering based on mutual information maximization. In: IEEE International Conference on Image Processing, pp. 277–280 (2007)Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Dept of Computer Science and TechnologyCivil Aviation University of ChinaTianjinChina

Personalised recommendations