Automated Software Engineering

, Volume 24, Issue 3, pp 543–572 | Cite as

Reconstructing and evolving software architectures using a coordinated clustering framework

  • Sheikh Motahar NaimEmail author
  • Kostadin Damevski
  • M. Shahriar Hossain


During a long maintenance period, software projects experience architectural erosion and drift, making maintenance tasks more challenging to perform for software engineers unfamiliar with the code base. This paper presents a framework that assists software engineers in recovering a software project’s architecture from its source code. The architectural recovery process is an iterative one that combines clustering based on contextual and structural information in the code base with incremental developer feedback. This process converges when the developer is satisfied with the proposed decomposition of the software, and, as an additional benefit, the framework becomes tuned to aid future evolution of the project. The paper provides both analytic and empirical evaluations of the obtained results; experimental results show a reasonably superior performance of our framework over alternative conventional methods. The proposed framework utilizes a novel compartmentalization technique Coordinated Clustering of Heterogeneous Datasets (CCHD) that relies on contextual and structural information in the code base, but, unlike most previous approaches, does not require specific weights for each information type, which allows it to adapt to different project types and domains.


Software architecture Coordinated clustering Heterogeneous data clustering Architecture recovery 


  1. Andritsos, P., Tzerpos, V.: Information-theoretic software clustering. IEEE Trans. Softw. Eng. 31(2), 150–165 (2005)CrossRefGoogle Scholar
  2. Bae, E., Bailey, J.: Coala: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Proceedings of the Sixth International Conference on Data Mining (ICDM’06), IEEE, pp 53–62 (2006)Google Scholar
  3. Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S., Modha, D.: A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. In: Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD’04), pp. 509–514 (2004)Google Scholar
  4. Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. CRC Press, Boca Raton (2008)zbMATHGoogle Scholar
  5. Bauer, M., Trifu, M.: Architecture-aware Adaptive Clustering of OO Systems. In: Proceedings of the 8th European Conference on Software Maintenance and Reengineering (CSMR’04), pp. 3–14 (2004)Google Scholar
  6. Bavota, G., Carnevale, F., Lucia, A., Penta, M., Oliveto, R.: Putting the developer in-the-loop: an interactive GA for software re-modularization. In: Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE’12), pp. 75–89 (2012)Google Scholar
  7. Bavota, G., Lucia, A., Marcus, A., Oliveto, R.: Using structural and semantic measures to improve software modularization. Empir. Softw. Eng. 18(5), 901–932 (2013)CrossRefGoogle Scholar
  8. Berkopec, A.: HyperQuick algorithm for discrete hypergeometric distribution. J. Discrete Algorithms 5(2), 341–347 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  9. Böhm, C., Faloutsos, C., Pan, J., Plant, C.: Robust information-theoretic clustering. In: Proceedings of the 12th International Conference on Knowledge Discovery and Data Mining (KDD’06), pp. 65–75 (2006)Google Scholar
  10. Cai, Y., Iannuzzi, D., Wong, S.: Leveraging design structure matrices in software design education. In: Proceedings of the 24th IEEE-CS Conference on Software Engineering Education and Training (CSEET’11). IEEE, pp. 179–188 (2011)Google Scholar
  11. Cai, Y., Wang, H., Wong, S., Wang, L.: Leveraging design rules to improve software architecture recovery. In: Proceedings of the 9th International ACM Sigsoft Conference on Quality of Software Architectures, ACM, New York, NY, USA, QoSA’13, pp. 133–142. doi: 10.1145/2465478.2465480 (2013)
  12. Chaitin, G.: Algorithmic Information Theory. Wiley Online Library, New York (1982)zbMATHGoogle Scholar
  13. Christl, A., Koschke, R., Storey, M.: Equipping the reflexion method with automated clustering. In: 12th Working Conference on Reverse Engineering. IEEE, pp. 10–20 (2005)Google Scholar
  14. Corazza, A., Di Martino, S., Scanniello, G.: A probabilistic based approach towards software system clustering. In: 2010 14th European Conference on Software Maintenance and Reengineering (CSMR). IEEE, pp. 88–96 (2010)Google Scholar
  15. Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.: Weighing lexical information for software clustering in the context of architecture recovery. Empir. Softw. Eng. 21(1), 72–103 (2016)Google Scholar
  16. Cressie, N.: Statistics for Spatial Data, vol. 900. Wiley, New York (1993)zbMATHGoogle Scholar
  17. Dai, W., Xue, G., Yang, Q., Yu, Y.: Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th International Conference on Knowledge Discovery and Data Mining (KDD’07), pp. 210–219 (2007)Google Scholar
  18. Dhillon, I.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD’01), pp. 269–274 (2001)Google Scholar
  19. Dhillon, I., Guan, Y.: Information theoretic clustering of sparse cooccurrence data. In: Proceedings of the 3rd International Conference on Data Mining (ICDM’03), pp. 517–520 (2003)Google Scholar
  20. Dhillon, I., Mallela, S., Modha, D.: Information-theoretic co-clustering. In: Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining (KDD’03), pp. 89–98 (2003)Google Scholar
  21. Dunn, J.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. (1973)Google Scholar
  22. Gao, B., Liu, T., Zheng, X., Cheng, Q., Ma, W.: Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In: Proceedings of the 11th International Conference on Knowledge Discovery in Data Mining (KDD’05), pp. 41–50 (2005)Google Scholar
  23. Garcia, J., Popescu, D., Mattmann, C., Medvidovic, N., Cai, Y.: Enhancing architectural recovery using concerns. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, pp. 552–555 (2011)Google Scholar
  24. Garcia, J., Ivkovic, I., Medvidovic, N.: A comparative analysis of software architecture recovery techniques. In: Proceedings of the 28th International Conference on Automated Software Engineering (ICASE’13), pp. 486–496 (2013a)Google Scholar
  25. Garcia, J., Krka, I., Mattmann, C., Medvidovic, N.: Obtaining ground-truth software architectures. In: Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, pp. 901–910 (2013b)Google Scholar
  26. Gokcay, E., Principe, J.: Information theoretic clustering. Pattern Anal. Mach. Intell. 24(2), 158–171 (2002)CrossRefGoogle Scholar
  27. Hossain, M.S., Tadepalli, S., Watson, L., Davidson, I., Helm, R., Ramakrishnan, N.: Unifying dependent clustering and disparate clustering for non-homogeneous data. In: Proceedings of the 16th International Conference on Knowledge Discovery and Data Mining (KDD’10), pp. 593–602 (2010)Google Scholar
  28. Hossain, M.S., Gresock, J., Edmonds, Y., Helm, R., Potts, M., Ramakrishnan, N.: Connecting the dots between pubmed abstracts. PLoS ONE 7(1), e29,509 (2012)CrossRefGoogle Scholar
  29. Hossain, M.S., Marwah, M., Shah, A., Watson, L., Ramakrishnan, N.: AutoLCA: a framework for sustainable redesign and assessment of products. ACM Trans. Intell. Syst. Technol. 5(2) (2014)Google Scholar
  30. Koschke, R.: Atomic architectural component recovery for program understanding and evolution. In: IEEE International Conference on Software Maintenance. IEEE Computer Society, pp. 478–488 (2002)Google Scholar
  31. Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., Kroeger, R.: Comparing software architecture recovery techniques using accurate dependencies. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE). IEEE, vol. 2, pp. 69–78 (2015)Google Scholar
  32. Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  33. Mancoridis, S., Mitchell, B.S., Chen, Y., Gansner, E.R.: Bunch: a clustering tool for the recovery and maintenance of software system structures. In: IEEE International Conference on Software Maintenance, 1999 (ICSM’99). Proceedings. IEEE, pp. 50–59 (1999)Google Scholar
  34. Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar
  35. Maqbool, O., Babri, H.A.: The weighted combined algorithm: a linkage algorithm for software clustering. In: Eighth European Conference on Software Maintenance and Reengineering, 2004. CSMR 2004. Proceedings. IEEE, pp. 15–24 (2004)Google Scholar
  36. Mises, R., Pollaczek-Geiringer, H.: Praktische verfahren der gleichungsauflösung. ZAMM 9(1), 58–77 (1929)CrossRefzbMATHGoogle Scholar
  37. Misra, J., Annervaz, K., Kaulgud, V., Sengupta, S., Titus, G.: Software Clustering: Unifying Syntactic and Semantic Features. Working Conference on Reverse Engineering, pp. 113–122 (2012)Google Scholar
  38. Mohar, B.: Some Applications of Laplace Eigenvalues of Graphs. Springer, Berlin (1997)CrossRefzbMATHGoogle Scholar
  39. Mohar, B., Alavi, Y.: The Laplacian Spectrum of Graphs. Graph Theory Comb. Appl. 2, 871–898 (1991)MathSciNetzbMATHGoogle Scholar
  40. Momtazpour, M., Butler, P., Hossain, M.S., Bozchalui, M., Ramakrishnan, N., Sharma, R.: Coordinated clustering algorithms to support charging infrastructure design for electric vehicles. In: Proceedings of the 18th International Conference on Knowledge Discovery and Data Mining (KDD UrbComp’12), pp. 126–133 (2012)Google Scholar
  41. Na, S., Xumin, L., Yong, G.: Research on k-means clustering algorithm: an improved k-means clustering algorithm. In: In Proceedings of the 3rd International Symposium on Intelligent Information Technology and Security Informatics (IITSI’10). IEEE, pp. 63–67 (2010)Google Scholar
  42. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2, 849–856 (2002)Google Scholar
  43. Pohlhausen, E.: Berechnung der eigenschwingungen statisch-bestimmter fachwerke. ZAMM 1(1), 28–42 (1921)CrossRefzbMATHGoogle Scholar
  44. Praditwong, K., Harman, M., Yao, X.: Software module clustering as a multi-objective search problem. IEEE Trans. Softw. Eng. 37(2), 264–282 (2011)CrossRefGoogle Scholar
  45. Scanniello, G., Marcus, A.: Clustering support for static concept location in source code. In: Proceedings of the 19th International Conference on Program Comprehension (ICPC’11), pp. 1–10 (2011)Google Scholar
  46. Shi, J., Malik, J.: Normalized cuts and image segmentation. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRefGoogle Scholar
  47. Shtern, M., Tzerpos, V.: Clustering methodologies for software engineering. Adv. Softw. Eng. (2012). doi: 10.1155/2012/792024
  48. Struyf, A., Hubert, M., Rousseeuw, P.: Clustering in an object-oriented environment. J. Stat. Softw. 1(4), 1–30 (1997)Google Scholar
  49. Taylor, R.N., Medvidovic, N., Dashofy, E.M.: Software Architecture: Foundations, Theory, and Practice. Wiley, New York (2009)CrossRefGoogle Scholar
  50. Tzerpos, V., Holt, R.C.: Acdc: an algorithm for comprehension-driven clustering. In: 2013 20th Working Conference on Reverse Engineering (WCRE). IEEE Computer Society, pp. 258–258 (2000)Google Scholar
  51. Wen, Z., Tzerpos, V.: An effectiveness measure for software clustering algorithms. In: 12th IEEE International Workshop on Program Comprehension, 2004. Proceedings. IEEE, pp. 194–203 (2004)Google Scholar
  52. Yang, C., Zhou, J.: HClustream: a novel approach for clustering evolving heterogeneous data stream. In: Proceedings of the 6th International Conference on Data Mining (ICDM’03), pp. 682–688 (2006)Google Scholar
  53. Yoon, H., Ahn, S., Lee, S., Cho, S., Kim, J.: Heterogeneous clustering ensemble method for combining different cluster results. Data Min. Biomed. Appl. 3916, 82–92 (2006)CrossRefGoogle Scholar
  54. Yue, J., Clayton, M.: A similarity measure based on species proportions. Commun. Stat. Theory Methods 34(11), 2123–2131 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  55. Zheng, F., Webb, G.I.: A comparative study of semi-naive Bayes methods in classification learning. In: Proceedings of the Fourth Australasian Data Mining Conference (AusDM05), Citeseer, pp. 141–156 (2005)Google Scholar
  56. Zhu, J., Huang, J., Zhou, D., Yin, Z., Zhang, G., He, Q.: Software architecture recovery through similarity-based graph clustering. Int. J. Softw. Eng. Knowl. Eng. 23(04), 559–586 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of TexasEl PasoUSA
  2. 2.Department of Computer ScienceVirginia Commonwealth UniversityRichmondUSA

Personalised recommendations