Abstract
Software architectures have become highly heterogeneous and difficult to maintain due to software evolution and continuous change. Therefore, a software system usually must be restructured in terms of modules containing relatively dependent components to address the system complexity. However, it is challenging to remodularize systems automatically to improve their maintainability. In this paper, we present a new mathematical programming model for the software remodularization problem. In contrast to previous research, a novel metric based on the principle of complexity balance is introduced to address the issue of overcohesiveness. In addition, a hybrid genetic algorithm (HGA) is designed to automatically determine highquality remodularization solutions. In the proposed HGA, a heuristic based on edge contraction and vectorization techniques is designed first to generate featurerich solutions and subsequently implant these solutions as seeds into the initial population. Finally, a customized genetic algorithm (GA) is employed to improve the solution quality. Two sets of test problems are employed to evaluate the performance of the HGA. The first set includes sixteen realworld instances and the second set contains 900 largescale simulated data. The proposed HGA is compared with two widely adopted algorithms, i.e., the multistart hillclimbing algorithm (HCA) and the genetic algorithms with group number encoding (GNE). Experimental and statistical results demonstrate that in most cases, the HGA can guarantee better quality solutions than HCA and GNE.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abreu, F. B. E., & Goulão, M. (2001). Coupling and cohesion as modularization drivers: are we being overpersuaded? Paper presented at the European Conference on Software Maintenance and Reengineering.
Abdi, H. (2007). The Bonferonni and Šidák Corrections for Multiple Comparisons. In N. Salkind (Ed.), Encyclopedia of Measurement and Statistics. Thousand Oaks: Sage.
Adamov, R., & Richter, L. (1990). A proposal for measuring the structural complexity of programs. Journal of Systems and Software, 12(1), 55–70.
Anbarasan, K., & Chitrakala, S. (2018). ClusteringBased Image Segmentation Using Local Maxima. International Journal of Intelligent Information Technologies, 14(1), 28–47.
Anquetil, N., & Laval, J. (2011). Legacy Software Restructuring: Analyzing a Concrete Case. Paper presented at the Euromicro Conference on Software Maintenance & Reengineering.
Arora, S., Lee, J., & Naor, A. (2008). Euclidean distortion and the sparsest cut. Journal of the American Mathematical Society, 21(1), 1–21.
Arora, S., Lee, J. R., & Naor, A. (2005). Euclidean distortion and the sparsest cut. Paper presented at the ThirtySeventh ACM Symposium on Theory of Computing.
Barreto, A., Barros, M. D. O., & Werner, C. (2008). Staffing a software project: A constraint satisfaction and optimizationbased approach. Computers & Operations Research, 35(10), 3073–3089.
Bass, L., Clements, P., & Kazman, R. (2003). Software Architecture in Practice 2nd Edition. World Scientific Publishing.
Bavota, G., Gethers, M., Oliveto, R., Poshyvanyk, D., & Lucia, A. D. (2014). Improving software modularization via automated analysis of latent topics and dependencies. ACM Transactions on Software Engineering and Methodology, 23(1), 305–306.
Bender, R., & Lange, S. (2001). Adjusting for multiple testingwhen and how? Journal of Clinical Epidemiology, 54(4), 343–349.
Boschetti, M. A., Golfarelli, M., Rizzi, S., & Turricchia, E. (2014). A Lagrangian heuristic for sprint planning in agile software development. Computers & Operations Research, 43, 116–128.
Caserta, M., & Uribe, A. M. (2009). Tabu searchbased metaheuristic algorithm for software system reliability problems. Computers & Operations Research, 36(3), 811–822.
Chidamber, S. R., & Kemerer, C. F. (2013). Towards a metrics suite for object oriented design. ACM SIGPLAN Notices, 26(11), 197–211.
Chowdhury, I., & Zulkernine, M. (2011). Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. Journal of Systems Architecture, 57(3), 294–313.
Coppick, J. C., & Cheatham, T. J. (1992). Software Metrics for ObjectOriented Systems. Paper presented at the ACM Conference on Communications.
Cuadrado, F., Duenas, J. C., & GarciaCarmona, R. (2012). An Autonomous Engine for Services Configuration and Deployment. IEEE Transactions on Software Engineering, 38(3), 520–536.
Dallal, J. A. (2013). Objectoriented class maintainability prediction using internal quality attributes. Information and Software Technology, 55(11), 2028–2048.
Darcy, D. P., Kemerer, C. F., Slaughter, S. A., & Tomayko, J. E. (2005). The Structural Complexity of Software: An Experimental Test. IEEE Transactions on Software Engineering, 31(11), 982–995.
Deep, K., Singh, K. P., Kansal, M. L., & Mohan, C. (2009). A real coded genetic algorithm for solving integer and mixed integer optimization problems. Applied Mathematics and Computation, 212(2), 505–518.
Deep, K., & Thakur, M. P. (2007). A new crossover operator for real coded genetic algorithms. Applied Mathematics and Computation, 188(1), 895–911.
Diaz, E., Tuya, J., Blanco, R., & Dolado, J. J. (2008). A tabu search algorithm for structural software testing. Computers & Operations Research, 35(10), 3052–3072.
Fatehi, K., Rezvani, M., Fateh, M., & Pajoohan, M. R. (2018). Subspace Clustering for HighDimensional Data Using Cluster Structure Similarity. International Journal of Intelligent Information Technologies, 14(3), 38–55.
Fleischmann, M., Amirpur, M., Grupp, T., Benlian, A., & Hess, T. (2016). The role of software updates in Information Systems continuance  An experimental study from a user perspective. Decision Support Systems, 83, 83–96.
Gill, N. S. (2008). Dependency and interaction oriented complexity metrics of componentbased systems. Acm Sigsoft Software Engineering Notes, 33(2), 1–5.
Hadaytullah, Vathsavayi S., Räihä, O., & Kai, K. (2010). Tool Support for Software Architecture Design with Genetic Algorithms. Paper presented at the International Conference on Software Engineering Advances, Icsea 2010, Nice.
Harman, M., Hierons, R. M., & Proctor, M. (2002). A New Representation And Crossover Operator For Searchbased Optimization Of Software Modularization. Paper presented at the Genetic and Evolutionary Computation Conference.
Huang, Y. J., & Gao, J. H. (2010). Measure Method of Structural Complexity in Open Source Software. Computer Engineering, 36(10), 61–63.
Jiang, H., Chang, C. K., Zhu, D., & Cheng, S. (2007). A foundational study on the applicability of genetic algorithm to software engineering problems. Paper presented at the IEEE Congress on Evolutionary Computation, 2007. CEC 2007.
Köhler, V., Fampa, M., & Araújo, O. (2013). MixedInteger Linear Programming Formulations for the Software Clustering Problem. Computational Optimization and Applications, 55(1), 113–135.
Kumari, A. C., & Srinivas, K. (2016). Hyperheuristic approach for multiobjective software module clustering. Journal of Systems and Software, 117, 384–401.
Ma, Y. T., He, K. Q., Li, B., Liu, J., & Zhou, X. Y. (2010). A Hybrid Set of Complexity Metrics for LargeScale ObjectOriented Software Systems. Journal of Computer Science and Technology, 25(6), 1184–1201.
Mahdavi, K., Harman, M., & Hierons, R. M. (2003). A multiple hill climbing approach to software module clustering. Paper presented at the International Conference on Software Maintenance ICSM 2003. .
Mancoridis, S., Mitchell, B. S., Rorres, C., Chen, Y.F., & Gansner, E. R. (1998a). Using Automatic Clustering to Produce HighLevel System Organizations of Source Code. Paper presented at the International Workshop on Program Comprehension.
Mancoridis, S., Mitchell, B. S., Rorres, C., Chen, Y., & Gansner, E. R. (1998b). Using Automatic Clustering to Produce HighLevel System Organizations of Source Code. International Workshop on Program Comprehension, 45–52.
Maqbool, O., & Babri, H. A. (2007). Hierarchical Clustering for Software Architecture Recovery. IEEE Transactions on Software Engineering, 33(11), 759–780.
Mitchell, B. S., & Mancoridis, S. (2002). Using Heuristic Search Techniques To Extract Design Abstractions From Source Code. Paper presented at the GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, New York.
Mitchell, B. S., & Mancoridis, S. (2006). On the Automatic Modularization of Software Systems Using the Bunch Tool. IEEE Transactions on Software Engineering, 32(3), 193–208.
Mitchell, B. S., & Mancoridis, S. (2008). On the evaluation of the Bunch searchbased software modularization algorithm. Soft Computing, 12(1), 77–93.
Mkaouer, W., Kessentini, M., Shaout, A., Koligheu, P., Bechikh, S., Deb, K., & Ouni, A. (2014). Manyobjective software remodularization using NSGAIII. ACM Transactions on Software Engineering and Methodology, 24(3), 1–45.
O'Brien, L., Stoermer, C., & Verhoef, C. (2002). Software Architecture Reconstruction: Practice Needs and Current Approaches. Software Architecture Reconstruction Practice Needs & Current Approaches.
Praditwong, K., Harman, M., & Yao, X. (2010). Software Module Clustering as a MultiObjective Search Problem. IEEE Transactions on Software Engineering, 37(2), 264–282.
Ramasubbu, N., Kemerer, C. F., & Hong, J. (2012). Structural Complexity and Programmer Team Strategy: An Experimental Test. IEEE Transactions on Software Engineering, 38(5), 1054–1068.
Ren, S. (2017). Multicriteria DecisionMaking Method under a Single Valued Neutrosophic Environment. International Journal of Intelligent Information Technologies, 13(4), 23–37.
Ribeiro, R. A., Moreira, A., Den Broek, P. V., & Pimentel, A. (2011). Hybrid assessment method for software engineering decisions. Decision Support Systems, 51(1), 208–219.
Sarkar, S., Kak, A. C., & Rama, G. M. (2008). Metrics for Measuring the Quality of Modularization of LargeScale ObjectOriented Software. IEEE Transactions on Software Engineering, 34(5), 700–720.
Schach, S. R., Jin, B. O., Yu, L., Heller, G. Z., & Offutt, J. (2003). Determining the Distribution of Maintenance Categories: Survey versus Measurement. Empirical Software Engineering, 8(4), 351–365.
Sundarraj, R. P., & Talluri, S. (2003). A multiperiod optimization model for the procurement of componentbased enterprise information technologies. European Journal of Operational Research, 146(2), 339–351.
Tan, J., Jiang, G., & Wang, Z. (2019). Evolutionary game model of information sharing behavior in supply chain network with agentbased simulation. International Journal of Intelligent Information Technologies, 15(2), 54–68.
Tang, J. F., Mu, L., Kwong, C. K., & Luo, X. (2011). An optimization model for software component selection under multiple applications development. European Journal of Operational Research, 212(2), 301–311.
Tseng, T. L., Liang, W. Y., Huang, C. C., & Chian, T. Y. (2005). Applying genetic algorithm for the development of the componentsbased embedded system. Computer Standards & Interfaces, 27(6), 621–635.
Vinoski, S. (2005). Old measures for new services. IEEE Internet Computing, 9(6), 72–74.
Westland, J. C. (2004). The cost behavior of software defects. Decision Support Systems, 37(2), 229–238.
Xiao, J., Ao, X., & Tang, Y. (2013). Solving software project scheduling problems with ant colony optimization. Computers & Operations Research, 40(1), 33–46.
Zaidan, A. A., Zaidan, B. B., Hussain, M., Haiqi, A., Kiah, M. L. M., & Abdulnabi, M. (2015). Multicriteria analysis for OSEMR software selection problem: A comparative study. Decision Support Systems, 78, 15–27.
Funding
This work was supported in part by a grant from NSFC under grant numbers 71871133 and 71831006. Dr. Sugumaran’s research has been supported by a 2019 School of Business Administration Spring/Summer Research Fellowship from Oakland University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
Lifeng Mu declares that he has no conflict of interest. Vijayan Sugumaran declares that he has no conflict of interest. Fangyuan Wang declares that he has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1
Plot data for Fig. 5. The contour of the effect on solution quality.
Appendix 2
Plot data for Fig. 6. The tendency of time consumption.
Appendix 3. Parametric tTest Results on RealCase Experiment
Sixteen data sets derived from actual cases are used to compare the algorithms in this section. Full descriptions of the software systems and their component diagrams can be found through the URLs listed in Table 2.
The results are shown in Tables 5, 6 and 7. Parametric ttest is employed to analyze solution quality (Tables 5 and 6) and algorithm efficiency (Table 7). The values of FCB obtained by HGA and GNE on real cases are shown in Table 5 and the values of FCB obtained by HGA and HCA on real cases are shown in Table 6.
The more tests performed, the more likely making the type I errors. As a consequence, an adjustment for multiple tests is employed to control the familywise error rate (FWER), which is the probability of rejecting falsely at least one true individual null hypothesis or making one or more type I error^{[1]} (Bender and Lange 2001).
The Šidák procedure is employed to correct the alpha level when performing multiple tests in order to protect against wrong conclusions (Abdi 2007). The Šidák equation is as follows:
In the above formula, C refers to the number of times the tests are performed. α[PT] is the probability of making a Type I error when dealing with only a specific test. α[PF]is the probability of making at least one Type I error for the whole family of tests. It shows that in order to reach a given α[PF] level, we need to adapt the α[PT] level for each test.
Due to the 16 independent tests performed on the realcase in each group, C = 16. The family of ttests is performed with a confidence level of 95%, so the α[PF] = 0.05. After calculation, α[PT] = 1 − (1 − 0.05)^{1/16} = 0.0032, so the adjusted confidence level of each test is 99.68%. Then the results will be justified at 99.68% confidence level for each test instance. In multiple testing, the p value of ttests below 0.0032 denotes that there is a statistically significant difference.
As shown in Table 5, the proposed HGA obtains lower values of FCB than GNE in all 16 cases, and the results of 15 cases are significant at 99.68% confidence; only Case 4 is not significant where the p value is above 0.0032 with the Cohen’s d index of −0.511. The effect size above 0.8 in all cases except Case 4 in Table 5 indicates that HGA outperformed GNE in obtaining better solutions in most cases.
From Table 6 it can be observed that the proposed HGA outperformed HCA in all cases except Case 15. In the remaining 15 cases, the proposed HGA outperformed HCA (expect Case 5, Case 6 and Case 8) with statistically significant differences in the means. Though the results in three cases are not significant, HGA is able to achieve better values in comparison. Therefore, the median of solution quality attained by HGA is superior to that of HCA. Furthermore, the lower values of standard deviation point towards the capability of HGA in producing consistent results. This characterizes HGA as a superior algorithm in terms of robustness.
Because all three algorithms were established to stop if the average relative change in the best fitness function value over 200 generations was equal to 0, the time consumption in Tables 7 and 8 can be used to evaluate the algorithm’s convergence. In Tables 7 and 8, the higher values of standard deviation indicate sometimes GNE converged speedily, sometimes it did not. This means that GNE still performs poorer than the other two algorithms with respect to robustness. In contrast, HCA and HGA have stable convergence speeds. It can also be seen that HCA converges fast in smallsize cases and time consumption increases rapidly as the scale of the case increases. After Case 15, it even required more time than HGA to converge. However, only two cases (Case 15 and Case 16) cannot sufficiently support this conclusion and we will further discuss this through the largescale simulation experiment discussed in the next section. The critical result is that HGA performs steadily in all 16 cases.
Appendix 4. Parametric tTest Results on LargeScale Simulated Experiment
The vertex number varied from 100 to 1000 in steps of 100, and three different E were given as three different densities of interactions for each specific vertex number, including sparse graphE_{s} = ⌊1.5× G ⌋, extremely dense graph\( {\leftE\right}_l=\left\lfloor \frac{3}{4}\times \frac{\mid G\mid \times \left(G1\right)}{2}\right\rfloor \)and mediumdensity graph\( {\leftE\right}_m=\left\lfloor \frac{{\leftE\right}_s+{\leftE\right}_L}{2}\right\rfloor \). Edge weights between vertices were simulated using U(1,10). As we discussed in Section 3.1.2, when G equals 100, the size of the solution space increases dramatically to4.7585 × 10^{115}, the range from 100 to 1000 thus is appropriate to test the performance of the proposed algorithm in an enormous solution space.
The results are shown in Tables 9, 10, 11 and 12. Table 9 shows the quality of the solutions obtained by HGA and GNE on simulated data and Table 10 shows the quality of the solutions obtained by HGA and HCA. Tables 11 and 12 shows the comparison of average runtime the three algorithms consumed while addressing different scale problems. The following conclusions can be drawn based on the results from Tables 9, 10, 11 and 12.
Because of the multiple testing correction used for recalculating probabilities obtained from a family of tests, the Šidák correction for this largescale simulated experiment is computed as: α[PT] = 1 − (1 − α[PF])^{1/C} = 1 − (1 − 0.05)^{1/30} = 0.0017. So if the p value of ttests in this experiment is below 0.0017, it denotes that the result is significant. The confidence level of each test is adjusted at 99.83% instead of 95%.
Tables 9 and 10 show that HGA identified better solutions than GNE and HCA and maintained the smallest standard deviation of FCB of solutions in each instance. This means HGA can provide consistent and highquality solutions in all instances, even when the scale of the problem (the number of nodes and the density of MDG) increases sharply. The p value is below 0.0017 and Cohen’s d index is higher than 0.8 in all the test instances, justifying that the results are significant at 99.83% confidence level.
Other interesting phenomena can also be observed in Tables 9 and 10. First, for a given G, GNE performs better in dense MDGs than in sparse MDGs. This demonstrates that the premature convergence problem of GNE in sparse cases is more conspicuous than in dense cases. Second, the standard deviation of HCA increases in E given G. This represents that the robustness of HCA is worse in dense cases than in sparse cases.
It can be observed in Tables 11 and 12 that the fastest algorithm is GNE, followed by HGA, with HCA placing last. Although GNE is the fastest, it achieves efficiency at the cost of premature convergence, especially when MDG is sparse (as shown in Table 10). Comparing HGA with HCA, there are only five instances where HGA required more processor time than HCA, i.e., G = 700 and E = 183,487, G = 800 and E = 239,700, G = 900 and E = 303,412, G = 1000 and E = 188,862, G = 1000 and E = 374,625. The p value and Cohen’s d index justify that the results are significant.
Tables 11 and 12 also reveal that the density of MDG has no obvious impact on the runtime of GNE and HCA. Relative to GNE and HCA, the density of MDG has an effect on the runtime of HGA, especially as G increases. This is because generating highquality seeds requires additional processor time. However, this does not affect the overall performance, and hence the proposed HGA can be considered a very practical algorithm. The primary reasons are as follows:

1)
The remodularization of a software system, which only occurs once every several years, is a major strategic decision for enterprises, thus quick results are not the top priority for decisionmakers, but rather solution quality, even it will take hours to obtain.

2)
The rate of increase of time consumption is acceptable. As we can see in Tables 11 and 12, given G, the runtime of HGA almost always increases twofold when E increases from E_{m} to E_{l} (given the number of vertices, the density of E_{l} is threequarters of the density of the complete graph).

3)
In reality, most software architectures are sparse. According to Mitchell’s study, the average density for real systems is 17% (Mitchell and Mancoridis 2008).
Can GNE and HCA determine higher quality solutions than HGA after using the same runtime? This question and the huge gap among the different algorithms’ runtime in Table 10 makes another experiment necessary, where the original stop criterion (the average relative change in the best fitness function value over given generations) was replaced by a new criterion which specifies the runtime in seconds before stopping the algorithm. (The runtime of each algorithm was held constant at the maximum runtime of the three algorithms in the experiment) The results of the experiment are shown in Tables 13 and 14. The following conclusions are made based on the results:
The FCBs of the solutions found by GNE are close to 0.75 in almost all instances and have no significant improvement in solution quality relative to the results in Tables 9 and 10. This means that GNE’s premature convergence cannot be solved effectively by extending runtime. Another method which is usually used to avoid premature convergence is increasing the mutation ratio. However, too large a ratio will damage the evolutionary mechanism of GNE and makes it almost a random search. Therefore, it is proven that GNE is not a desirable approach to the problem.
Similar to GNE, the solution quality of HCA is also not greatly improved even when we guarantee HCA the same runtime as HGA in every instance. The introduction of a simulated annealing mechanism does make HCA exit the local optimum to some degree in sparse cases (HCA can find better solutions than GNE in sparse cases); however, the improvement is greatly reduced by the explosive increase of the solution space and the extreme time consumption for distancing from the local optimum. The solutions found by HCA are even worse than those found by GNE in mediumdensity cases and extreme in dense cases. In this experiment, the proposed HGA outperforms GNE and HCA in all cases. In addition, the p values and Cohen’s d index in Tables 13 and 14 show strong evidence that our proposed HGA is superior to GNE and HCA.
Appendix 5
Matlab code for identifying the edge to contract.
Rights and permissions
About this article
Cite this article
Mu, L., Sugumaran, V. & Wang, F. A Hybrid Genetic Algorithm for Software Architecture ReModularization. Inf Syst Front 22, 1133–1161 (2020). https://doi.org/10.1007/s10796019099060
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796019099060