Recovering High-Level Structure of Software Systems Using a Minimum Description Length Principle

  • Rudi Lutz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2464)


In [12] a system was described for finding good hierarchical decompositions of complex systems represented as collections of nodes and links, using a genetic algorithm, with an information theoretic fitness function (representing complexity) derived from a minimum description length principle. This paper describes the application of this approach to the problem of reverse engineering the high-level structure of software systems.


Genetic Algorithm Destination Node Turing Machine Reverse Engineering Dependency Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Briand, L.C., Morasca, S., and Basili, V.R. (1996) Property-based software engineering measurement: Refining the additivity properties. IEEE Transactions on Software Engineering, 22(1):68–86.CrossRefGoogle Scholar
  2. 2.
    Collins, R. and Jefferson, D. (1991) Selection in massively parallel genetic algorithms. Proceedings of the Fourth International Conference on Genetic Algorithms, ICGA-91 Belew, R.K. and Booker, L.B. (eds.), Morgan Kaufmann.Google Scholar
  3. 3.
    Doval, D., Mancoridis, S., and Mitchell, B.S. (1999) Automatic Clustering of Software Systems using a Genetic Algorithm. IEEE Proceedings of the 1999 International Conference on Software Tools and Engineering Practice (STEP’99).Google Scholar
  4. 4.
    Glover, F. (1989) Tabu Search-Part I. ORSA Journal on Computing, Vol. 1, No. 3, pp. 190–206.MATHGoogle Scholar
  5. 5.
    Goldberg, D.E. (1989) Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley.Google Scholar
  6. 6.
    Harman, M., Hierons, R., and Proctor, M. (2002) A New Representation and Crossover Operator for Search-Based Optimization of Software Modularization. Submitted to GECCO-2002.Google Scholar
  7. 7.
    Holland, J.H. (1975) Adaptation in Natural and Artificial Systems. Now published by MIT Press.Google Scholar
  8. 8.
    Hutchens, D., and Basili, R. (1985) System StructureAnalysis: Clustering with Data Bindings. IEEE Transactions on Software Engineering, SE-11(8):749–757, 1985.CrossRefGoogle Scholar
  9. 9.
    Kirkpatrick, S., Gelatt Jr., C.D., Vecchi, M.P. (1983) Optimization by Simulated Annealing, Science, 220, 4598, 671–680.CrossRefMathSciNetGoogle Scholar
  10. 10.
    Koza, J.R. (1992) Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press.Google Scholar
  11. 11.
    Li, M. and Vitanyi, P. (1997) An Introduction to Kolmogorov Complexity Theory and Its Applications. Springer-Verlag.Google Scholar
  12. 12.
    Lutz, R. (2001) Evolving Good Hierarchical Decompositions of Complex Systems. Journal of Systems Architecture, 47, pp. 613–634.CrossRefGoogle Scholar
  13. 13.
    Mancoridis, S., Mitchell, B.S., Rorres, C., Chen, Y., Gansner, E.R. (1998) Using automatic clustering to produce high-level system organizations of source code. In International Workshop on Program Comprehension (IWPC’98) IEEE Computer Society Press, Los Alamitos, California, USA, pp.45–53.Google Scholar
  14. 14.
    McIlhagga, M., Husbands, P., and Ives, R. (1996) A comparison of simulated annealing, dispatching rules and a coevolutionary distributed genetic algorithm as optimization techniques for various integrated manufacturing planning problems. In Proceedings of PPSN IV, Volume I. LNCS 1141, pp. 604–613, Springer-Verlag.Google Scholar
  15. 15.
    Mitchell, M. (1996) An Introduction to Genetic Algorithms. MIT Press.Google Scholar
  16. 16.
    Mitchell, T.M. (1997) Machine Learning. McGraw-Hill.Google Scholar
  17. 17.
    Rissanen, J. (1978) Modelling by the shortest data description. Automatica-J.IFAC, 14, pp.465–471.MATHCrossRefGoogle Scholar
  18. 18.
    Shannon, C.E. (1948) The mathematical theory of communications. Bell System Technical Journal 27:379–423, 623-656.MathSciNetGoogle Scholar
  19. 19.
    Thornton, C.J. and du Boulay, B. (1992) Artificial Intelligence Through Search. Intellect, Oxford, England.Google Scholar
  20. 20.
    Wiggerts, T. (1997) Using clutering algorithms in legacy systems remodularisation. In Proc. Working Conference on Reverse Engineering (WCRE’97)Google Scholar
  21. 21.
    Wood, J.A. (1998) Improving Software Designs via the Minimum Description Length Principle. Ph.D. Thesis, University of Sussex (available from

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Rudi Lutz
    • 1
  1. 1.School of Cognitive and Computing SciencesUniversity of SussexSussex

Personalised recommendations