Inferring Grammar Rules of Programming Language Dialects

  • Alpana Dubey
  • Pankaj Jalote
  • Sanjeev Kumar Aggarwal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4201)


In this paper we address the problem of grammatical inference in the programming language domain. The grammar of a programming language is an important asset because it is used in developing many software engineering tools. Sometimes, grammars of languages are not available and have to be inferred from the source code; especially in the case of programming language dialects. We propose an approach for inferring the grammar of a programming language when an incomplete grammar along with a set of correct programs is given as input. The approach infers a set of grammar rules such that the addition of these rules makes the initial grammar complete. A grammar is complete if it parses all the input programs successfully. We also proposes a rule evaluation order, i.e. an order in which the rules are evaluated for correctness. A set of rules are correct if their addition makes the grammar complete. Experiments show that the proposed rule evaluation order improves the process of grammar inference.


Programming language grammars Dialects Minimum Description Length Principle 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adriaans, P.W.: Language Learning for Categorial Perspective. PhD thesis, University of Amsterdam, Amsterdam, Netherlands (November 1992)Google Scholar
  2. 2.
    Aho, A.V., Sethi, R., Ullman, J.D.: Compilers Principles, Techniques, and Tools. Pearson Education (Singapore) Pte. Ltd, London (2002)Google Scholar
  3. 3.
    Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Crepinsek, M., Mernik, M., Javed, F., Bryant, B.R., Sprague, A.: Extracting grammar from programs: evolutionary approach. SIGPLAN Not. 40(4), 39–46 (2005)CrossRefGoogle Scholar
  5. 5.
    Crepinsek, M., Mernik, M., Zumer, V.: Extracting grammar from programs: brute force approach. SIGPLAN Not. 40(4), 29–38 (2005)CrossRefGoogle Scholar
  6. 6.
    de la Higuera, C.: A bibliographical study of grammatical inference. Pattern Recognition 38, 1332–1348 (2005)CrossRefGoogle Scholar
  7. 7.
    Dubey, A., Aggarwal, S.K., Jalote, P.: A technique for extracting keyword based rules from a set of programs. In: CSMR 2005: Proceedings of the Ninth European Conference on Software Maintenance and Reengineering (CSMR 2005), Manchester, UK, pp. 217–225. IEEE Computer Society, Los Alamitos (2005)CrossRefGoogle Scholar
  8. 8.
    Dubey, A., Jalote, P., Aggarwal, S.K.: A deterministic technique for extracting keyword based grammar rules from programs. In: Proceedings of 21st Annual ACM Symposium on Applied Computing, PL track, Dijon, France, pp. 1631–1632. ACM SIGAPP, New York (2006)Google Scholar
  9. 9.
    Mark Gold, E.: Language identification in the limit. Information and Control 10(5), 447–474 (1967)MATHCrossRefGoogle Scholar
  10. 10.
    Gold, E.M.: Complexity of automaton identification from given data. Information and Control 37(3), 302–320 (1978)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Grünwald, P.: A minimum description length approach to grammar inference. In: Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, pp. 203–216. Springer, London (1996)Google Scholar
  12. 12.
    Jain, R., Aggarwal, S.K., Jalote, P., Biswas, S.: An interactive method for extracting grammar from programs. Softw. Pract. Exper. 34(5), 433–447 (2004)CrossRefGoogle Scholar
  13. 13.
    Koshiba, T., Makinen, E., Takada, Y.: Learning deterministic even linear languages from positive examples. Theor. Comput. Sci. 185(1), 63–79 (1997)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Lämmel, R., Verhoef, C.: Semi-automatic Grammar Recovery. Software—Practice & Experience 31(15), 1395–1438 (2001)MATHCrossRefGoogle Scholar
  15. 15.
    Langley, P., Stromsten, S.: Learning context-free grammars with a simplicity bias. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 220–228. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  16. 16.
    Lawrence, S., Giles, C.L., Fong, S.: Natural language grammatical inference with recurrent neural networks. IEEE Transactions on Knowledge and Data Engineering 12(1), 126–140 (2000)CrossRefGoogle Scholar
  17. 17.
    Lee, L.: Learning of context-free languages: A survey of the literature. Technical Report TR-12-96, Harvard University (1996),
  18. 18.
    Mernik, M., Gerlic, G., Zumer, V., Bryant, B.: Can a parser be generated from examples? In: Proceedings of 18th ACM symposium on applied computing, pp. 1063–1067. ACM Press, New York (2003)Google Scholar
  19. 19.
    Parekh, R., Honovar, V.: Invited Chapter. In: Dale, Moisl, Somers (eds.) Grammar Inference, Automata Induction, and Language Acquision. Marcel Dekker, New York (2000)Google Scholar
  20. 20.
    van Zaanen, M.: ABL: Alignment-based learning. In: COLING 2000 - Proceedings of the 18th International Conference on Computational Linguistics, Saarbrücken, Germany, pp. 961–967 (August 2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Alpana Dubey
    • 1
  • Pankaj Jalote
    • 1
  • Sanjeev Kumar Aggarwal
    • 1
  1. 1.Dept of Computer Science and EngineeringIndian Institute of TechnologyKanpurIndia

Personalised recommendations