Soft Computing

, Volume 12, Issue 1, pp 51–66 | Cite as

Search-based inference of dialect grammars

  • Massimiliano Di PentaEmail author
  • Pierpaolo Lombardi
  • Kunal Taneja
  • Luigi Troiano


Building parsers is an essential task for the development of many tools, from software maintenance tools to any kind of business-specific, programmable environment having a command-line interface. Whilst grammars for many programming languages are available, these are, very often, almost useless because of the large diffusion of dialects and variants not contemplated by standard grammars. Writing a grammar by hand is clearly feasible, however it can be a tedious and error-prone task, requiring appropriate skills not always available. Grammar inference is a possible, challenging approach for obtaining suitable grammars from program examples. However, inference from scratch poses serious scalability issues and tends to produce correct, but meaningless grammars, hard to be understood and used to build tools. This paper describes an approach, based on genetic algorithms, for evolving existing grammars towards target (dialect) grammars, inferring changes from examples written using the dialect. Results obtained experimenting the inference of C dialect rules show that the algorithm is able to successfully evolve the grammar. Inspections indicated that the changes automatically made to the grammar during its evolution preserved its meaningfulness, and were comparable to what a developer could have done by hand.


Grammar inference Genetic algorithms Source code analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aho AV, Sethi R, Ullman JD (1985) Compilers. Principles techniques and tools. Addison-Wesley Reading, MAGoogle Scholar
  2. Antoniol G, Di Penta M, Masone G, Villano U (2004a) Compiler hacking for source code analysis. Softw Qual J (12):383–406CrossRefGoogle Scholar
  3. Antoniol G, Di Penta M, Harman M (2004b) A robust search-based approach to project management in the presence of abandonment, rework, error and uncertainty. In: 10th IEEE international software metrics symposium (METRICS 2004), 11–17 September 2004, Chicago, IL, USA, pp 172–183, 2004Google Scholar
  4. Aycinena M (2005) Probabilistic geometric grammars for object recognition. S.M. thesis, MIT, Stanford, USA, August 2005Google Scholar
  5. Caskey S, Story E, Pieraccini R (2003) Interactive grammar inference with finite state transducers. In: Proc. Automatic speech recognition and understanding, (ASRU’03), IEEE Workshop pp 572– 576, Virgin Islands (USA), December 2003. IEEE Press NewYorkGoogle Scholar
  6. Chomsky N (1959) On certain formal properties of grammars. Inform Control 2:137–167CrossRefGoogle Scholar
  7. Clark JA, Dolado JJ, Harman M, Hierons RM, Jones B, Lumkin M, Mitchell BS, Mancoridis S, Rees K, Roper M, Shepperd MJ (2003) Formulating software engineering as a search problem. IEE Proc Softw 150(3):161–175CrossRefGoogle Scholar
  8. Cyre W (2002a) Evolutionary language acquisition. In: IASTED international conference on artificial intelligence and soft computing, pp 146–151, Banff, Canada, July 2002Google Scholar
  9. Cyre W (2002b) Learning grammars with a modified classifier system. In: Proc. 2002 world congress on computational intelligence, pp 1366–1371, Honolulu, Hawaii, USA, May 2002Google Scholar
  10. De La Higuera C (2000) Current trends in grammatical inference. In: Proceedings of the joint IAPR international workshops on advances in pattern recognition, August 2000Google Scholar
  11. Di Penta M, Taneja K (2005) Towards the automatic evolution of reengineering tools. In: Proceedings of the ninth European conference on software maintenance and reengineering (CSMR 2005), pp 241–244, Manchester, UKGoogle Scholar
  12. Dubey A, Aggarwal SK, Jalote P (2005) A technique for extracting keyword based rules from a set of programs. In: Proceedings of the ninth European conference on software maintenance and reengineering (CSMR-2005), pp 217–225, Manchester, UK, 2005. IEEE Computer SocietyGoogle Scholar
  13. Dulewicz G, Unold O (2002) Evolving natural language parser with genetic programming. In: Abraham A, Koppen M (eds) Advances in Soft Computing. Hybrid Information Systems, pp 361–377Google Scholar
  14. Dupont P (1994) Inference from positive and negative samples by genetic search: the GIG method. In: Proceedings of the second international colloquium on grammatical inference and applications, pp 21–23, September 1994Google Scholar
  15. Fatiregun D, Harman M, Hierons RM (2005) Search-based amorphous slicing. In: 12th working conference on reverse engineering (WCRE 2005), 7–11 November 2005, Pittsburgh, PA, USA, pp 3–12Google Scholar
  16. Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, USAzbMATHGoogle Scholar
  17. Greer D, Ruhe G (2004) Software release planning: an evolutionary and iterative approach. Inf Softw Technol 46(4):243–253CrossRefGoogle Scholar
  18. Greibach SA (1964) Formal parsing systems. Commun ACM 7(8): 499–504zbMATHCrossRefGoogle Scholar
  19. Harman M (2007) The current state and future of search based software engineering. In: ICSE—Future of SE Track, 2007Google Scholar
  20. Harman M, Clark JA (2004) Metrics are fitness functions too. In: 10th IEEE international software metrics symposium (METRICS 2004), 11–17 September 2004, Chicago, IL, USA, pp 58–69Google Scholar
  21. Hingston P (2001) A genetic algorithm for regular inference. In: Spector L, Goodman ED, Wu A, Langdon WB, Voigt H-M, Gen M, Sen S, Dorigo M, Pezeshk S, Garzon MH, Burke E (eds) Proceedings of the genetic and evolutionary computation conference (GECCO-2001), pp 1299–1306, San Francisco, California, USA, 7–11 2001. Morgan Kaufmann, San FransiscoGoogle Scholar
  22. Jain R, Aggarwal SK, Jalote P, Biswas S (2004) An interactive method for extracting grammar from programs. Softw Pract Exper 34(5):433–447CrossRefGoogle Scholar
  23. Javed F, Bryant B, Crepinsek M, Mernik, Sprague A Context-free grammar induction using genetic programming. In: ACMSE ’04, Huntsville, Alabama, USA, April 2004Google Scholar
  24. Johnson SC (1979) YACC: yet another compiler–compiler. unix programmer’s manual, vol 2bGoogle Scholar
  25. Kirsopp C, Shepperd MJ, Hart J (2002) Search heuristics, case-based reasoning and software project effort prediction. In: GECCO 2002: Proceedings of the genetic and evolutionary computation conference, New York, USA, 9–13 July 2002, pp 1367–1374, 2002Google Scholar
  26. Lämmel R, Verhoef C (2001a) Cracking the 500-language problem. IEEE Software, pp 78–88, November-DecemberGoogle Scholar
  27. Lämmel R, Verhoef C (2001b) Semi-automatic grammar recovery. Software - Pract Exper 12(1)Google Scholar
  28. Lankhorst M (1996) Genetic algorithms in data analysis. University Library Groningen, 1996Google Scholar
  29. Lucas S (1994) Context-free grammar evolution. In: First international conference on evolutionary computing, pp 130–135Google Scholar
  30. Luenberger DG (1979) Introduction to dynamic systems: Theory, Models, and applications.Wiley, New York, NY 10158-0012Google Scholar
  31. McMinn P (2004) Search-based software test data generation: a survey. Softw Test Verif Reliab 14(2):105–156CrossRefGoogle Scholar
  32. Miclet L, de la Higuera C (eds) (1996) Proceedings of the 3rd international colloquium on grammatical inference: learning syntax from sentences. Springer, HeidelbergGoogle Scholar
  33. Mitchell BS, Mancoridis S (2006) On the automatic modularization of software systems using the bunch tool. IEEE Trans Softw Eng 32(3):193–208CrossRefGoogle Scholar
  34. Moonen L (2001) Generating robust parsers using island grammars. In: Proceedings of IEEE working conference on reverse engineering, pp 13–22, October 2001Google Scholar
  35. O′Keeffe M, O′Cinneide M (2006) Search-based software maintenance pp 249–260Google Scholar
  36. Petasis G, Paliouras G, Spyropoulos CD, Halatsis C (2004) eg-GRIDS: context-free grammatical inference from positive examples using genetic search. In: Paliouras G, Sakakibara Y(eds), Grammatical inference: algorithms and applications: 7th international colloquium, ICGI 2004, vol 3264 of Lecture Notes in Computer Science, pp 223 – 234, Athens, Greece, January 2004. Springer, Heidelberg.Google Scholar
  37. Tsoulos IG, Lagaris IE (2006) Grammar inference with grammatical evolution. Scholar
  38. Wyard P (1991) Context free grammar induction using genetic algorithms. In: Belew RK, Booker LB (eds) Proceedings of the fourth international conference on genetic algorithms, pp 514–518, San Diego, CA, USA, 1991Google Scholar
  39. Wyard P (1994) Representational issues for context free grammar induction using genetic algorithm. In: Carrasco RC, Oncina J (eds) Proceedings of the 2nd international colloquium on grammatical inference and applications, vol 862 of Lecture Notes in Artificial Intelligence, pp 222–235, London, UK, 1994. Springer, Heidelberg.Google Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • Massimiliano Di Penta
    • 1
    Email author
  • Pierpaolo Lombardi
    • 1
  • Kunal Taneja
    • 2
  • Luigi Troiano
    • 1
  1. 1.Research Centre on Software Technology (RCOST)University of SannioBeneventoItaly
  2. 2.North Carolina State UniversityRaleighUSA

Personalised recommendations