Advertisement

Cluster Computing

, Volume 22, Supplement 3, pp 7287–7311 | Cite as

Euclidean space based hierarchical clusterers combinations: an application to software clustering

  • Rashid NaseemEmail author
  • Mustafa Mat Deris
  • Onaiza Maqbool
  • Sara Shahzad
Article
  • 405 Downloads

Abstract

Hierarchical clustering groups similar entities on the basis of some similarity (or distance) association and results in a tree like structure, called dendrogram. Dendrograms represent clusters in a nested manner, where at each step an entity makes a new cluster or merges into an existing cluster. Hierarchical clustering has many applications, therefore researchers have made efforts to come up with improved hierarchical clustering approaches. An approach that has received attention is based on combining clustering results, since different hierarchical clustering algorithms produce different dendrograms and their combination has produced more promising results as compared to individual hierarchical clustering. This paper proposes the hierarchical clustering combination (HCC) approach which uses the different types of structural features present in the dendrogram. Firstly, the dendrograms are represented in a 4+N (4 is the extracted number of features and can be extended to N number) dimensional euclidean space (4+NDES) which results in vector matrices. 4+NDES is the structural representation of the dendrogram which contains not only the relative features but also the absolute features of the entities in the dendrogram. Then the vector matrices are aggregated and the distance is calculated between each two vector using the Euclidean distance measure. The final hierarchy is obtained using a recovery tool like individual hierarchical clustering. 4+NDES-HCC utilizes the structural contents of the dendrogram and has the flexibility to handle an increasing number of features. The proposed approach is tested for software clustering which plays an important role in maintenance of software systems. The experimental results of the proposed approach and comparative analysis with existing approaches reveal the effectiveness of the HCC for software clustering.

Keywords

Hierarchical clusterers combinations Euclidean space Software clustering 

References

  1. 1.
    Abi-Antoun, M., Ammar, N., Hailat, Z.: Extraction of ownership object graphs from object-oriented code. In: Proceedings of the 8th International ACM SIGSOFT Conference on Quality of Software Architectures—QoSA ’12, p. 133. ACM Press, New York (2012).  https://doi.org/10.1145/2304696.2304719
  2. 2.
    Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2013)CrossRefGoogle Scholar
  3. 3.
    Amarjeet, Chhabra, J.K.: Harmony search based remodularization for object-oriented software systems. Comput. Lang. Syst. Struct. 47, 153–169 (2017).  https://doi.org/10.1016/j.cl.2016.09.003 CrossRefGoogle Scholar
  4. 4.
    Amarjeet, Chhabra, J.K.: TA-ABC: two-archive artificial bee colony for multi-objective software module clustering problem. J. Intell. Syst. (2017).  https://doi.org/10.1515/jisys-2016-0253 CrossRefGoogle Scholar
  5. 5.
    Andritsos, P., Tzerpos, V.: Information-theoretic software clustering. IEEE Trans. Softw. Eng. 31(2), 150–165 (2005).  https://doi.org/10.1109/TSE.2005.25 CrossRefGoogle Scholar
  6. 6.
    Anquetil, N., Lethbridge, T.C.: Experiments with clustering as a software remodularization method. In: Working Conference on Reverse Engineering, pp. 235–255. IEEE (1999).  https://doi.org/10.1109/WCRE.1999.806964
  7. 7.
    Anquetil, N., Lethbridge, T.C.: Comparative study of clustering algorithms and abstract representations for software remodularisation. IEE Proc. Softw. 150(3), 185–201 (2003).  https://doi.org/10.1049/ip-sen:20030581 CrossRefGoogle Scholar
  8. 8.
    Bittencourt, R.A., Guerrero, D.D.S.: Comparison of graph clustering algorithms for recovering software architecture module views. In: 2009 13th European Conference on Software Maintenance and Reengineering, pp. 251–254. IEEE (2009).  https://doi.org/10.1109/CSMR.2009.28
  9. 9.
    Candela, I., Bavota, G., Russo, B., Oliveto, R.: Using cohesion and coupling for software remodularization. ACM Trans. Softw. Eng. Methodol. 25(3), 1–28 (2016).  https://doi.org/10.1145/2928268 CrossRefGoogle Scholar
  10. 10.
    Choi, S.S., Sung-Hyuk, C., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inform. 8(1), 43–48 (2010). http://ezproxy.uthm.edu.my/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=59856128&site=ehost-live&scope=site
  11. 11.
    Chong, C.Y., Lee, S.P., Ling, T.C.: Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach. Inf. Softw. Technol. 55(11), 1994–2012 (2013).  https://doi.org/10.1016/j.infsof.2013.07.002 CrossRefGoogle Scholar
  12. 12.
    Corazza, A., Di Martino, S., Scanniello, G.: A probabilistic based approach towards software system clustering. In: 2010 14th European Conference on Software Maintenance and Reengineering, pp. 88–96. IEEE (2010).  https://doi.org/10.1109/CSMR.2010.36
  13. 13.
    Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.: Weighing lexical information for software clustering in the context of architecture recovery. Empir. Softw. Eng. 21(1), 72–103 (2016).  https://doi.org/10.1007/s10664-014-9347-3 CrossRefGoogle Scholar
  14. 14.
    Cui, J.F., Chae, H.S.: Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems. Inf. Softw. Technol. 53(6), 601–614 (2011).  https://doi.org/10.1016/j.infsof.2011.01.006 CrossRefGoogle Scholar
  15. 15.
    Davey, J., Burd, E.: Evaluating the suitability of data clustering for software remodularisation. In: Working Conference on Reverse Engineering, pp. 268–276. IEEE (2000).  https://doi.org/10.1109/WCRE.2000.891478
  16. 16.
    Deursen, A.V., Kuipers, T.: Finding classes in legacy code using cluster analysis. In: Workshop on Object Oriented Reengineering, pp. 1–5 (1997)Google Scholar
  17. 17.
    Dugerdil, P., Jossi, S.: Reverse-architecting legacy software based on roles: an industrial experiment. In: Software and Data Technologies, pp. 114–127. Springer, Berlin (2009).  https://doi.org/10.1007/978-3-540-88655-6_9 Google Scholar
  18. 18.
    El-Ramly, M., Iglinski, P., Stroulia, E., Sorenson, P., Matichuk, B.: Modeling the system-user dialog using interaction traces. In: Proceedings of the Eighth Working Conference on Reverse Engineering, pp. 208–217 (2001).  https://doi.org/10.1109/WCRE.2001.957825
  19. 19.
    François-Joseph Lapointe, P.L.: Comparison tests for dendrograms: a comparative evaluation. J. Classif. 12(2), 265–282 (1995).  https://doi.org/10.1007/BF03040858 CrossRefGoogle Scholar
  20. 20.
    Garcia, J., Ivkovic, I., Medvidovic, N.: A comparative analysis of software architecture recovery techniques. In: IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 486–496. IEEE (2013).  https://doi.org/10.1109/ASE.2013.6693106
  21. 21.
    Glorie, M., Zaidman, A., van Deursen, A., Hofland, L.: Splitting a large software repository for easing future software evolution-an industrial experience report. J. Softw. Maint. Evol. Res. Pract. 21(2), 113–141 (2009).  https://doi.org/10.1002/smr.401 CrossRefGoogle Scholar
  22. 22.
    Gueheneuc, Y.G., Antoniol, G.: DeMIMA: a multilayered approach for design pattern identification. IEEE Trans. Softw. Eng. 34(5), 667–684 (2008).  https://doi.org/10.1109/TSE.2008.48 CrossRefGoogle Scholar
  23. 23.
    Hall, M., Walkinshaw, N., McMinn, P.: Supervised software modularisation. In: IEEE International Conference on Software Maintenance (ICSM), pp. 472–481. IEEE (2012).  https://doi.org/10.1109/ICSM.2012.6405309
  24. 24.
    Hoffman, K.: Analysis in Euclidean Space. Courier Corporation, Mineola (2013)zbMATHGoogle Scholar
  25. 25.
    Huang, J., Liu, J., Yao, X.: A multi-agent evolutionary algorithm for software module clustering problems. Soft Comput. 21(12), 3415–3428 (2017).  https://doi.org/10.1007/s00500-015-2018-5 CrossRefGoogle Scholar
  26. 26.
    Ibrahim, A., Rayside, D., Kashef, R.: Cooperative based software clustering on dependency graphs. In: Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–6. IEEE, Canada (2014).  https://doi.org/10.1109/CCECE.2014.6900911
  27. 27.
    Izadkhah, H., Elgedawy, I., Isazadeh, A.: E-CDGM: an evolutionary call-dependency graph modularization approach for software systems. Cybern. Inf. Technol. 16(3), 70–90 (2016).  https://doi.org/10.1515/cait-2016-0035 CrossRefGoogle Scholar
  28. 28.
    Jahnke, J.: Reverse engineering software architecture using rough clusters. In: IEEE Annual Meeting of the Fuzzy Information, vol. 1, pp. 4–9. IEEE (2004).  https://doi.org/10.1109/NAFIPS.2004.1336239
  29. 29.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River (1988)zbMATHGoogle Scholar
  30. 30.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999).  https://doi.org/10.1145/331499.331504 CrossRefGoogle Scholar
  31. 31.
    Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967).  https://doi.org/10.1007/BF02289588 CrossRefzbMATHGoogle Scholar
  32. 32.
    Kanellopoulos, Y., Antonellis, P., Tjortjis, C., Makris, C.: k-Attractors: a clustering algorithm for software measurement data analysis. In: IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007), pp. 358–365. IEEE (2007).  https://doi.org/10.1109/ICTAI.2007.31
  33. 33.
    Kashef, R.F., Kamel, M.S.: Cooperative clustering. Pattern Recogn. 43(6), 2315–2329 (2010).  https://doi.org/10.1016/j.patcog.2009.12.018 CrossRefzbMATHGoogle Scholar
  34. 34.
    Koschke, R., Eisenbarth, T.: A framework for experimental evaluation of clustering techniques. In: International Workshop on Program Comprehension, pp. 201–210. IEEE Computer Society (2000).  https://doi.org/10.1109/WPC.2000.852494
  35. 35.
    Kramer, H.H., Uchoa, E., Fampa, M., Köhler, V., Vanderbeck, F.: Column generation approaches for the software clustering problem. Comput. Optim. Appl. 64(3), 843–864 (2016).  https://doi.org/10.1007/s10589-015-9822-9 MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Kumari, A.C., Srinivas, K.: Hyper-heuristic approach for multi-objective software module clustering. J. Syst. Softw. 117, 384–401 (2016).  https://doi.org/10.1016/j.jss.2016.04.007 CrossRefGoogle Scholar
  37. 37.
    Lakhotia, A.: A unified framework for expressing software subsystem classification techniques. J. Syst. Softw. 36(3), 211–231 (1997).  https://doi.org/10.1016/0164-1212(95)00098-4 CrossRefGoogle Scholar
  38. 38.
    Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., Kroeger, R.: Comparing software architecture recovery techniques using accurate dependencies. In: IEEE International Conference on Software Engineering (ICSE), vol. 2, pp. 69–78. IEEE, ACM, USA, Canada (2015).  https://doi.org/10.1109/ICSE.2015.136
  39. 39.
    Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., Kroeger, R.: Measuring the impact of code dependencies on software architecture recovery techniques. IEEE Trans. Softw. Eng. (2017).  https://doi.org/10.1109/TSE.2017.2671865 CrossRefGoogle Scholar
  40. 40.
    Lutellier, T.: Measuring the impact of code dependencies on software architecture recovery techniques. Ph.D. thesis, University of Waterloo (2015)Google Scholar
  41. 41.
    Mahmoud, A., Niu, N.: Evaluating software clustering algorithms in the context of program comprehension. In: International Conference on Program Comprehension (ICPC), pp. 162–171. IEEE, USA (2013).  https://doi.org/10.1109/ICPC.2013.6613844
  42. 42.
    Maqbool, O., Babri, H.: The weighted combined algorithm: a linkage algorithm for software clustering. In: Eighth European Conference on Software Maintenance and Reengineering, pp. 15–24. IEEE (2004).  https://doi.org/10.1109/CSMR.2004.1281402
  43. 43.
    Maqbool, O., Babri, H.: Hierarchical clustering for software architecture recovery. IEEE Trans. Softw. Eng. 33(11), 759–780 (2007).  https://doi.org/10.1109/TSE.2007.70732 CrossRefGoogle Scholar
  44. 44.
    Mirzaei, A., Rahmati, M.: A novel hierarchical-clustering-combination scheme based on fuzzy-similarity relations. IEEE Trans. Fuzzy Syst. 18(1), 27–39 (2010).  https://doi.org/10.1109/TFUZZ.2009.2034531 CrossRefGoogle Scholar
  45. 45.
    Mirzaei, A., Rahmati, M., Ahmadi, M.: A new method for hierarchical clustering combination. Intell. Data Anal. 12, 549–571 (2008)CrossRefGoogle Scholar
  46. 46.
    Mitchell, B.S., Mancoridis, S.: Comparing the decompositions produced by software clustering algorithms using similarity measurements. In: International Conference on Software Maintenance, pp. 744–753. IEEE (2001).  https://doi.org/10.1109/ICSM.2001.972795
  47. 47.
    Mitchell, B.S., Mancoridis, S.: On the automatic modularization of software systems using the Bunch tool. IEEE Trans. Softw. Eng. 32(3), 193–208 (2006).  https://doi.org/10.1109/TSE.2006.31 CrossRefGoogle Scholar
  48. 48.
    Muhammad, S., Maqbool, O., Abbasi, A.Q.: Evaluating relationship categories for clustering object-oriented software systems. IET Softw. 6(3), 260 (2012).  https://doi.org/10.1049/iet-sen.2011.0061 CrossRefGoogle Scholar
  49. 49.
    Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983).  https://doi.org/10.1093/comjnl/26.4.354 CrossRefzbMATHGoogle Scholar
  50. 50.
    Naseem, R., Maqbool, O., Muhammad, S.: An improved similarity measure for binary features in software clustering. In: 2010 Second International Conference on Computational Intelligence, Modelling and Simulation, pp. 111–116. IEEE (2010).  https://doi.org/10.1109/CIMSiM.2010.34
  51. 51.
    Naseem, R., Maqbool, O., Muhammad, S.: Improved similarity measures for software clustering. In: European Conference on Software Maintenance and Reengineering (CSMR), pp. 45–54. IEEE, Pakistan (2011).  https://doi.org/10.1109/CSMR.2011.9
  52. 52.
    Naseem, R., Maqbool, O., Muhammad, S.: Cooperative clustering for software modularization. J. Syst. Softw. 86(8), 2045–2062 (2013).  https://doi.org/10.1016/j.jss.2013.03.080 CrossRefGoogle Scholar
  53. 53.
    Naseem, R., Deris, M.B.M., Li, J., Shahzad, S.: Improved binary similarity measures for software modularization. Front. Inf. Technol. Electron. Eng. 18(8), 1–28 (2017)CrossRefGoogle Scholar
  54. 54.
    Patel, C., Hamou-Lhadj, A., Rilling, J.: Software clustering using dynamic analysis and static dependencies. In: 2009 13th European Conference on Software Maintenance and Reengineering, pp. 27–36. IEEE (2009).  https://doi.org/10.1109/CSMR.2009.62
  55. 55.
    Paulson, J., Succi, G., Eberlein, A.: An empirical study of open-source and closed-source software products. IEEE Trans. Softw. Eng. 30(4), 246–256 (2004).  https://doi.org/10.1109/TSE.2004.1274044 CrossRefGoogle Scholar
  56. 56.
    Podani, J.: Simulation of random dendrograms and comparison tests: some comments. J. Classif. 17(1), 123–142 (2000).  https://doi.org/10.1007/s003570000007 MathSciNetCrossRefzbMATHGoogle Scholar
  57. 57.
    Praditwong, K., Harman, M., Yao, X.: Software module clustering as a multi-objective search problem. IEEE Trans. Softw. Eng. 37(2), 264–282 (2011).  https://doi.org/10.1109/TSE.2010.26 CrossRefGoogle Scholar
  58. 58.
    Pukelsheim, F.: The three sigma rule. Am. Stat. 48(2), 88–91 (1994). http://www.jstor.org/stable/2684253 MathSciNetGoogle Scholar
  59. 59.
    Rashedi, E., Mirzaei, A.: A hierarchical clusterer ensemble method based on boosting theory. Knowl. Based Syst. 45, 83–93 (2013).  https://doi.org/10.1016/j.knosys.2013.02.009 CrossRefGoogle Scholar
  60. 60.
    Rashedi, E., Mirzaei, A., Rahmati, M.: An information theoretic approach to hierarchical clustering combination. Neurocomputing 148, 487–497 (2015).  https://doi.org/10.1016/j.neucom.2014.07.014 CrossRefGoogle Scholar
  61. 61.
    Saeed, M., Maqbool, O., Babri, H., Hassan, S., Sarwar, S.: Software clustering techniques and the use of combined algorithm. In: Seventh European Conference on Software Maintenance and Reengineering, pp. 301–306. IEEE Computer Society (2003).  https://doi.org/10.1109/CSMR.2003.1192438
  62. 62.
    Scanniello, G., Risi, M., Tortora, G.: Architecture recovery using latent semantic indexing and K-means: an empirical evaluation. In: 2010 8th IEEE International Conference on Software Engineering and Formal Methods, pp. 103–112. IEEE (2010).  https://doi.org/10.1109/SEFM.2010.19
  63. 63.
    Seriai, A., Sadou, S., Sahraoui, H.A.: Enactment of components extracted from an object-oriented application. In: The European Conference on Software Architecture (ECSA), pp. 234–249 (2014).  https://doi.org/10.1007/978-3-319-09970-5_22 CrossRefGoogle Scholar
  64. 64.
    Shah, Z., Naseem, R., Orgun, M., Mahmood, A.N., Shahzad, S.: Software clustering using automated feature subset selection. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) International Conference on Advanced Data Mining and Applications (ADMA). Lecture Notes in Computer Science, vol. 8347, pp. 47–58. Springer, Italy (2013).  https://doi.org/10.1007/978-3-642-53917-6_5 CrossRefGoogle Scholar
  65. 65.
    Shtern, M., Tzerpos, V.: On the comparability of software clustering algorithms. In: 2010 IEEE 18th International Conference on Program Comprehension, pp. 64–67. IEEE (2010).  https://doi.org/10.1109/ICPC.2010.25
  66. 66.
    Siddique, F., Maqbool, O.: Enhancing comprehensibility of software clustering results. IET Softw. 6(4), 283 (2012).  https://doi.org/10.1049/iet-sen.2012.0027 CrossRefGoogle Scholar
  67. 67.
    Stavropoulou, I., Grigoriou, M., Kontogiannis, K.: Case study on which relations to use for clustering-based software architecture recovery. Empir. Softw. Eng. 2017, 1–46 (2017).  https://doi.org/10.1007/s10664-016-9459-z CrossRefGoogle Scholar
  68. 68.
    Synytskyy, N., Holt, R.C., Davis, I.: Browsing software architectures with LSEdit. In: 13th International Workshop on Program Comprehension, pp. 176–178. IEEE (2005).  https://doi.org/10.1109/WPC.2005.11
  69. 69.
    Tonella, P.: Concept analysis for module restructuring. IEEE Trans. Softw. Eng. 27(4), 351–363 (2001).  https://doi.org/10.1109/32.917524 CrossRefGoogle Scholar
  70. 70.
    Tzerpos, V., Holt, R.C.: ACDC: an algorithm for comprehension-driven clustering. In: Working Conference on Reverse Engineering, pp. 258–267. IEEE (2000).  https://doi.org/10.1109/WCRE.2000.891477
  71. 71.
    Tzerpos, V., Holt, R.C.: MoJo: a distance metric for software clusterings. In: Working Conference on Reverse Engineering, pp. 187–193. IEEE (1999).  https://doi.org/10.1109/WCRE.1999.806959
  72. 72.
    Tzerpos, V., Holt, R.C.: Software botryology. Automatic clustering of software systems. In: International Workshop on Database and Expert Systems Applications, pp. 811–818. IEEE (1998).  https://doi.org/10.1109/DEXA.1998.707499
  73. 73.
    Tzerpos, V.: An optimal algorithm for MoJo distance. In: Proceedings of the 11th IEEE International Workshop on Program Comprehension, pp. 227–235. IEEE Computer Society (2003).  https://doi.org/10.1109/WPC.2003.1199206
  74. 74.
    Vasconcelos, A., Werner, C.: Architecture recovery and evaluation aiming at. In: Software Architectures, Components, and Applications, pp. 72–89. Springer, Berlin (2007).  https://doi.org/10.1007/978-3-540-77619-2_5 Google Scholar
  75. 75.
    Wang, Y., Liu, P., Guo, H., Li, H., Chen, X.: Improved hierarchical clustering algorithm for software architecture recovery. In: 2010 International Conference on Intelligent Computing and Cognitive Informatics, pp. 247–250 (2010).  https://doi.org/10.1109/ICICCI.2010.45
  76. 76.
    Wen, Z., Tzerpos, V.: An effectiveness measure for software clustering algorithms. In: Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004, pp. 194–203. IEEE (2004).  https://doi.org/10.1109/WPC.2004.1311061
  77. 77.
    Wen, Z., Tzerpos, V.: Evaluating similarity measures for software decompositions. In: Proceedings of the 20th IEEE International Conference on Software Maintenance, pp. 368–377. IEEE (2004).  https://doi.org/10.1109/ICSM.2004.1357822
  78. 78.
    Wiggerts, T.: Using clustering algorithms in legacy systems remodularization. In: Working Conference on Reverse Engineering, pp. 33–43. IEEE (1997).  https://doi.org/10.1109/WCRE.1997.624574
  79. 79.
    Wu, J., Hassan, A., Holt, R.C.: Comparison of clustering algorithms in the context of software evolution. In: 21st IEEE International Conference on Software Maintenance, pp. 525–535. IEEE (2005).  https://doi.org/10.1109/ICSM.2005.31
  80. 80.
    Xanthos, S., Goodwin, N.: Clustering object-oriented software systems using spectral graph partitioning. Urbana 51(1), 1–5 (2006)Google Scholar
  81. 81.
    Zheng, L.I., Li, T.A.O., Ding, C.: A framework for hierarchical ensemble clustering. ACM Trans. Knowl. Discov. Data 9(2), 1–23 (2014)CrossRefGoogle Scholar
  82. 82.
    Zhong, L., Xue, L., Zhang, N., Xia, J., Chen, J.: A tool to support software clustering using the software evolution information. In: 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 304–307. IEEE (2016).  https://doi.org/10.1109/ICSESS.2016.7883072

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceCity University of Science and Information TechnologyPeshawarPakistan
  2. 2.Faculty of Computer Science and Information TechnologyUniversiti Tun Hussein Onn MalaysiaBatu PahatMalaysia
  3. 3.Department of Computer ScienceQuaid-I-Azam UniversityIslamabadPakistan
  4. 4.Department of Computer ScienceUniversity of PeshawarPeshawarPakistan

Personalised recommendations