Software Quality Journal

, Volume 19, Issue 2, pp 333–378

Recovering grammar relationships for the Java Language Specification



Grammar convergence is a method that helps in discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent.


Grammar convergence Grammar transformation Grammar recovery Grammar extraction Language documentation 


  1. Alves, T. L., & Visser J. (2009). A case study in grammar engineering. In Software language engineering, first international conference, SLE 2008, Toulouse, France, September 29–30, 2008. Revised selected papers. Springer, LNCS (Vol. 5452, pp. 285–304).Google Scholar
  2. Alves, T. L., Silva, P. F., Visser J., & Oliveira, J. N. (2005). Strategic term rewriting and its application to a VDMSL to SQL conversion. In FM 2005: Formal methods, international symposium of formal methods Europe, Newcastle, UK, July 18–22, 2005, proceedings, Springer, LNCS (Vol. 3582, pp. 399–414).Google Scholar
  3. Barnard, D. (1981). Syntax error handling techniques. Technical report 81-125, Queen’s University, Department of Computing and Information Science, p 23.Google Scholar
  4. Barnard, D., & Holt, R. (1982). Hierarchic syntax error repair for LR grammars. International Journal of Computer and Information Sciences, 11(4), 231–258.CrossRefMATHMathSciNetGoogle Scholar
  5. Berdaguer, P., Cunha, A., Pacheco, H., & Visser, J. (2007). Coupled schema transformation and data conversion for XML and SQL. In Practical aspects of declarative languages, 9th international symposium, PADL 2007, Nice, France, January 14–15, 2007, Springer, LNCS (Vol. 4354, pp. 290–304).Google Scholar
  6. Bouwers, E., Bravenboer, M., & Visser, E. (2008). Grammar engineering support for precedence rule recovery and compatibility checking. ENTCS, 203(2), 85–101.Google Scholar
  7. Cicchetti, A., Ruscio, D. D., Eramo, R., & Pierantonio, A. (2008). Automating co-evolution in model-driven engineering. In 12th international IEEE enterprise distributed object computing conference, ECOC 2008, IEEE Computer Society (pp. 222–231).Google Scholar
  8. Cleve, A., & Hainaut, J. L. (2006). Co-transformations in database applications evolution. In Generative and transformational techniques in software engineering, international summer school, GTTSE 2005, Braga, Portugal, July 4–8, 2005. Revised papers, Springer, LNCS (Vol. 4143, pp. 409–421).Google Scholar
  9. Cordy, J. R. (2003). Generalized selective XML markup of source code using agile parsing. In Proceedings of the 11th IEEE international workshop on program comprehension (IWPC), Portland, Oregon (pp. 144–153).Google Scholar
  10. Črepinšek, M., Mernik, M., Javed, F., Bryant, B. R., & Sprague, A. (2005). Extracting grammar from programs: Evolutionary approach. SIGPLAN Notices 40(4), 39–46.Google Scholar
  11. Cunha, J., Saraiva, J., & Visser, J. (2008). From spreadsheets to relational databases and back. In PEPM ’09: Proceedings of the 2009 ACM SIGPLAN workshop on partial evaluation and program manipulation (pp. 179–188). ACM, New York, NY, USA.Google Scholar
  12. Dean, T., & Synytskyy, M. (2005). Agile parsing techniques for web applications. In Proceedings of the international summer school on generative and transformational techniques in software engineering, part II, technology presentations, Braga, Portugal (pp. 29–38).Google Scholar
  13. Dean, T., Cordy, J., Malton, A., & Schneider, K. (2002). Grammar programming in TXL. In Proceedings, source code analysis and manipulation (SCAM’02), IEEE.Google Scholar
  14. Dean, T., Cordy, J., Malton, A., & Schneider, K. (2003). Agile parsing in TXL. Journal of Automated Software Engineering, 10(4), 311–336.CrossRefGoogle Scholar
  15. de Jonge, M., & Monajemi, R. (2001). Cost-effective maintenance tools for proprietary languages. In Proceedings, international conference on software maintenance (ICSM’01), IEEE (pp. 240–249)Google Scholar
  16. Di Penta, M., & Taneja, K. (2005). Towards the automatic evolution of reengineering tools. In Proceedings of the 9th European conference on software maintenance and reengineering (CSMR ’05), IEEE (pp. 241–244).Google Scholar
  17. Di Penta, M., Lombardi, P., Taneja, K., & Troiano, L. (2008). Search-based inference of dialect grammars. Soft computing—A fusion of foundations. Methodologies and Applications, 12(1), 51–66.Google Scholar
  18. Do, H. H., & Rahm, E. (2007). Matching large schemas: Approaches and evaluation. Information Systems, 32(6), 857–885.CrossRefGoogle Scholar
  19. Dubey, A., Aggarwal, S. K., & Jalote, P. (2005). A technique for extracting keyword based rules from a set of programs. In 9th European conference on software maintenance and reengineering (CSMR 2005), Proceedings, IEEE (pp. 217–225).Google Scholar
  20. Dubey, A., Jalote, P., & Aggarwal, S. K. (2006a). A deterministic technique for extracting keyword based grammar rules from programs. In SAC ’06: Proceedings of the 2006 ACM symposium on applied computing, ACM (pp. 1631–1632). doi:10.1145/1141277.1141659.
  21. Dubey, A., Jalote, P., & Aggarwal, S. K. (2006b). Inferring grammar rules of programming language dialects. In Grammatical inference: Algorithms and applications, 8th international colloquium, ICGI 2006, Tokyo, Japan, September 20–22, 2006, proceedings, Springer, lecture notes in computer science (Vol. 4201, pp. 201–213).Google Scholar
  22. Dubey, A., Jalote, P., & Aggarwal, S. K. (2008). Learning context-free grammar rules from a set of program. IET Software, 2(3), 223–240.CrossRefGoogle Scholar
  23. Duffy, E. B., & Malloy, B. A. (2007). An automated approach to grammar recovery for a dialect of the C++ language. In Proceedings, 14th working conference on reverse engineering (WCRE 2007), IEEE (pp. 11–20).Google Scholar
  24. Falleri, J. R., Huchard, M., Lafourcade, M., & Nebut, C. (2008). Metamodel matching for automatic model transformation generation. In Proceedings of model driven engineering languages and systems (MoDELS 2008), Springer, LNCS (Vol. 5301, pp. 326–340).Google Scholar
  25. Gosling, J., Joy, B., & Steele, G. L. (1996). The Java Language Specification. Addison-Wesley, available at
  26. Gosling, J., Joy, B., Steele, G. L., & Bracha, G. (2000). The Java Language Specification, 2nd edn. Addison-Wesley, available at
  27. Gosling, J., Joy, B., Steele, G. L., & Bracha, G. (2005). The Java Language Specification, 3rd edn. Addison-Wesley, available at
  28. Hainaut, J. L., Tonneau, C., Joris, M., & Chandelon, M. (1994). Schema transformation techniques for database reverse engineering. In Entity-relationship approach-ER’93, 12th international conference on the entity-relationship approach, Arlington, Texas, USA, December 15–17, 1993, proceedings, Springer, LNCS (Vol. 823, pp. 364–375).Google Scholar
  29. Hoare, C. A. R. (1972). Proof of correctness of data representations. Acta Informatica, 1(4), 271–281.CrossRefMATHGoogle Scholar
  30. Jouault, F., Bézivin, J., & Kurtev, I. (2006). TCS:: A DSL for the specification of textual concrete syntaxes in model engineering. In GPCE ’06: Proceedings of the 5th international conference on generative programming and component engineering, ACM (pp. 249–254).Google Scholar
  31. Julien, C., Črepinšek, M., Forax, R., Kosar, T., Mernik, M., Roussel, G. (2009). On defining quality based grammar metrics. In Proceedings of the international multiconference on computer science and information technology, IMCSIT 2009 (pp. 651–658).Google Scholar
  32. Klint, P., Lämmel, R., & Verhoef, C. (2005). Toward an engineering discipline for grammarware. ACM Transactions on Software Engineering Methodology (TOSEM), 14(3), 331–380.Google Scholar
  33. Klusener, A., & Lämmel, R. (2003). Deriving tolerant grammars from a base-line grammar. In Proceedings, international conference on software maintenance (ICSM’03), IEEE (pp. 179–189).Google Scholar
  34. Klusener, S., & Zaytsev, V. (2005). ISO/IEC JTC1/SC22 document N3977—Language standardization needs grammarware. Available at
  35. Kort, J., Lämmel, R., & Verhoef, C. (2002). The grammar deployment kit. In Proceedings, language descriptions, tools, and applications (LDTA’02). Elsevier Science, ENTCS (Vol. 65, p. 7).Google Scholar
  36. Kraft, N. A., Duffy, E. B., & Malloy, B. A. (2009). Grammar recovery from parse trees and metrics-guided grammar refactoring. IEEE Transactions on Software Engineering, 35(6), 780–794.Google Scholar
  37. Lämmel, R. (2001). Grammar adaptation. In Proceedings, formal methods Europe (FME) 2001, Springer, LNCS (Vol. 2021, pp. 550–570).Google Scholar
  38. Lämmel, R. (2004). Coupled software transformations (extended abstract). In First international workshop on software evolution transformations Google Scholar
  39. Lämmel, R. (2005). The Amsterdam toolkit for language archaeology. In Post-proceedings of the 2nd international workshop on meta-models, schemas and grammars for reverse engineering (ATEM 2004), Elsevier Science, ENTCS.Google Scholar
  40. Lämmel, R., & Lohmann, W. (2001). Format evolution. In J. Kouloumdjian, H. Mayr, & A. Erkollar (Eds.), Proceedings, Re-technologies for information systems (RETIS’01) (Vol. 155, pp. 113–134). OCG, Scholar
  41. Lämmel, R., & Meijer, E. (2006). Mappings make data processing go ’round. In Lämmel, R., Saraiva, J., & Visser, J. (Eds.), Generative and transformational techniques in software engineering, international summer school, GTTSE 2005, Braga, Portugal, July 4–8, 2005. Revised papers, Springer, LNCS (Vol. 4143, pp. 169–218).Google Scholar
  42. Lämmel, R., & Verhoef, C. (2001a). Cracking the 500-language problem. IEEE software (pp. 78–88).Google Scholar
  43. Lämmel, R., & Verhoef, C. (2001b). Semi-automatic grammar recovery. Software—Practice & Experience, 31(15), 1395–1438.CrossRefMATHGoogle Scholar
  44. Lämmel, R., & Wachsmuth, G. (2001). Transformation of SDF syntax definitions in the ASF+SDF meta-environment. In Proceedings, language descriptions, tools and applications (LDTA’01), Elsevier Science, ENTCS (Vol. 44).Google Scholar
  45. Lämmel, R., & Zaytsev, V. (2009). An introduction to grammar convergence. In Integrated formal methods, 7th international conference, IFM 2009, proceedings, Springer, LNCS (Vol. 5423, pp. 246–260.Google Scholar
  46. Malloy, B., Power, J., & Waldron, J. (2002). Applying software engineering techniques to parser design: The development of a C# parser. In Proceedings, conference of the South African Institute of Computer Scientists and Information Technologists, in cooperation with ACM Press (pp. 75–82).Google Scholar
  47. Mernik, M., Gerlic, G., Zumer, V., & Bryant, B. R. (2003). Can a parser be generated from examples? In Proceedings of the 2003 ACM symposium on applied computing (SAC), March 9–12, 2003, Melbourne, FL, USA, ACM (pp. 1063–1067).Google Scholar
  48. Moonen, L. (2001). Generating robust parsers using island grammars. In Proceedings, working conference on reverse engineering (WCRE’01), IEEE (pp. 13–22).Google Scholar
  49. Moonen, L. (2002). Lightweight impact analysis using island grammars. In Proceedings, international workshop on program comprehension (IWPC’02), IEEE.Google Scholar
  50. Morgan, C. (1990). Programming from specifications. Prentice Hall International.Google Scholar
  51. Nierstrasz, O., Kobel, M., Girba, T., Lanza, M., & Bunke, H. (2007). Example-driven reconstruction of software models. In CSMR ’07: Proceedings of the 11th European conference on software maintenance and reengineering, IEEE (pp. 275–286).Google Scholar
  52. Oliveira, J. (2008). Transforming data by calculation. In Generative and transformational techniques in software engineering II, international summer school, GTTSE 2007, Braga, Portugal, July 2007, revised papers, Springer, LNCS (Vol. 5235, pp. 134–195).Google Scholar
  53. Rahm, E., & Bernstein, P. A. (2001). A survey of approaches to automatic schema matching. VLDB Journal, 10(4), 334–350.Google Scholar
  54. Sellink, M., & Verhoef, C. (2000). Development, assessment, and reengineering of language descriptions. In Proceedings, conference on software maintenance and reengineering (CSMR’00), IEEE (pp 151–160).Google Scholar
  55. Synytskyy, N., Cordy, J., & Dean, T. (2003). Robust multilingual parsing using island grammars. In Proceedings CASCON’03, 13th IBM centres for advanced studies conference, Toronto (pp. 149–161).Google Scholar
  56. Thomas, D. A. (2003). The impedance imperative—Tuples + objects + infosets = too much stuff! Journal of Object Technology, 2(5), 7–12.CrossRefGoogle Scholar
  57. Vermolen, S., & Visser, E. (2008). Heterogeneous coupled evolution of software languages. In Model driven engineering languages and systems, 11th international conference, MoDELS 2008, Toulouse, France, September 28–October 3, 2008. Proceedings, Springer, LNCS (Vol. 5301, pp. 630–644).Google Scholar
  58. Visser, E. (1997). Syntax definition for language prototyping. PhD thesis, University of Amsterdam.Google Scholar
  59. Wachsmuth, G. (2007). Metamodel adaptation and model co-adaptation. In E. Ernst (Ed.), ECOOP’07, Springer, LNCS (Vol. 4609, pp. 600–624).Google Scholar
  60. Wenzel, S., & Kelter, U. (2008). Analyzing model evolution. In ICSE ’08: Proceedings of the 30th international conference on software engineering, ACM (pp. 831–834).Google Scholar
  61. Wile, D. (1997). Abstract syntax from concrete syntax. In Proceedings, international conference on software engineering (ICSE’97), ACM Press (pp. 472–480).Google Scholar
  62. Xing, Z., & Stroulia, E. (2006). Refactoring detection based on UMLDiff change-facts queries. In WCRE ’06: Proceedings of the 13th working conference on reverse engineering, IEEE (pp.263–274)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Software Languages TeamThe University of Koblenz-LandauKoblenzGermany

Personalised recommendations