Towards a Scalable Framework for Context-Free Language Reachability

  • Nicholas HollingumEmail author
  • Bernhard Scholz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9031)


Context-Free Language Reachability (CFL-R) is a search problem to identify paths in an input labelled graph that form sentences in a given context-free language. CFL-R provides a fundamental formulation for many applications, including shape analysis, data and control flow analysis, program slicing, specification-inferencing and points-to analysis. Unfortunately, generic algorithms for CFL-R scale poorly with large instances, leading research to focus on ad-hoc optimisations for specific applications. Hence, there is the need for scalable algorithms which solve arbitrary CFL-R instances.

In this work, we present a generic algorithm for CFL-R with improved scalability, performance and/or generality over the state-of-the-art solvers. The algorithm adapts Datalog’s semi-naïve evaluation strategy for eliminating redundant computations. Our solver uses the quadtree data-structure, which reduces memory overheads, speeds up runtime, and eliminates the restriction to normalised input grammars. The resulting solver has up to 3.5x speed-up and 60% memory reduction over a state-of-the-art CFL-R solver based on dynamic programming.


program analysis context-free language reachability semi-naïve evaluation quad-trees matrix multiplication 


  1. 1.
    Abdali, S.K., Wise, D.S.: Experiments with quadtree representation of matrices. In: Gianni, P. (ed.) ISSAC 1988. LNCS, vol. 358, pp. 96–108. Springer, Heidelberg (1989)CrossRefGoogle Scholar
  2. 2.
    Abiteboul, S., Hull, R., Vianu, V. (eds.): Foundations of Databases: The Logical Level, 1st edn. Addison-Wesley Longman Publishing Co., Inc., MA (1995)zbMATHGoogle Scholar
  3. 3.
    Bastani, O., Anand, S., Aiken, A.: Specification inference using context-free language reachability. In: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 553–566. ACM (2015)Google Scholar
  4. 4.
    Bravenboer, M., Smaragdakis, Y.: Strictly declarative specification of sophisticated points-to analyses. In: Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA 2009, pp. 243–262. ACM, New York (2009)Google Scholar
  5. 5.
    Chaudhuri, S.: Subcubic algorithms for recursive state machines. In: Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2008, pp. 159–169. ACM, New York (2008)Google Scholar
  6. 6.
    Cocke, J.: Programming languages and their compilers. Courant Institute Math. Sci., New York, USA (1970)Google Scholar
  7. 7.
    Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation 9(3), 251–280 (1990); Computational algebraic complexity editorialGoogle Scholar
  8. 8.
    Dolev, D., Even, S., Karp, R.M.: On the security of ping-pong protocols. Information and Control 55(1-3), 57–68 (1982)CrossRefzbMATHMathSciNetGoogle Scholar
  9. 9.
    Gustavson, F.G.: Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Trans. Math. Softw. 4(3), 250–269 (1978)CrossRefzbMATHMathSciNetGoogle Scholar
  10. 10.
    Hardekopf, B., Lin, C.: The ant and the grasshopper: Fast and accurate pointer analysis for millions of lines of code. SIGPLAN Not. 42(6), 290–299 (2007)CrossRefGoogle Scholar
  11. 11.
    Heintze, N., McAllester, D.: On the cubic bottleneck in subtyping and flow analysis. In: Proceedings of the 12th Annual IEEE Symposium on Logic in Computer Science, LICS 1997, pp. 342–351. IEEE (1997)Google Scholar
  12. 12.
    Hollingum, N.: Source code for worklist and semi-naive cfl-r algorithms (October 2014),
  13. 13.
    Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages and Computation, International edn. Pearson Education International Inc., Upper Saddle River (2003)Google Scholar
  14. 14.
    Kasami, T.: An efficient recognition and syntax-analysis algorithm for context-free languages. Technical report, DTIC Document (1965)Google Scholar
  15. 15.
    Kodumal, J., Aiken, A.: The set constraint/cfl reachability connection in practice. In: Proceedings of the ACM SIGPLAN, Conference on Programming Language Design and Implementation, PLDI 2004, pp. 207–218. ACM, New York (2004)Google Scholar
  16. 16.
    Lange, M., Leiß, H.: To cnf or not to cnf? an efficient yet presentable version of the cyk algorithm. Informatica Didactica 8, 2008–2010 (2009)Google Scholar
  17. 17.
    Lu, Y., Shang, L., Xie, X., Xue, J.: An incremental points-to analysis with cfl-reachability. In: Jhala, R., De Bosschere, K. (eds.) Compiler Construction. LNCS, vol. 7791, pp. 61–81. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  18. 18.
    Melski, D., Reps, T.: Interconvertibility of a class of set constraints and context-free-language reachability. Theoretical Computer Science 248(1–2), 29–98 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Mendez-Lojo, M., Burtscher, M., Pingali, K.: A gpu implementation of inclusion-based points-to analysis. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2012, pp. 107–116. ACM, New York (2012)Google Scholar
  20. 20.
    Okhotin, A.: Fast parsing for boolean grammars: A generalization of valiants algorithm. In: Gao, Y., Lu, H., Seki, S., Yu, S. (eds.) DLT 2010. LNCS, vol. 6224, pp. 340–351. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  21. 21.
    Reps, T.: On the sequential nature of interprocedural program-analysis problems. Acta Informatica 33(5), 739–757 (1996)CrossRefzbMATHMathSciNetGoogle Scholar
  22. 22.
    Reps, T.: Program analysis via graph reachability. Information and Software Technology 40(11-12), 701–726 (1998)CrossRefGoogle Scholar
  23. 23.
    Reps, T., Horwitz, S., Sagiv, M.: Precise interprocedural dataflow analysis via graph reachability. In: Proceedings of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 1995, pp. 49–61. ACM, New York (1995)Google Scholar
  24. 24.
    Sagiv, M., Reps, T., Horwitz, S.: Precise interprocedural dataflow analysis with applications to constant propagation. In: Mosses, P.D., Nielsen, M., Schwartzbach, M. (eds.) TAPSOFT 1995. LNCS, vol. 915, pp. 651–665. Springer, Heidelberg (1995)Google Scholar
  25. 25.
    Sridharan, M., Gopan, D., Shan, L., Bodík, R.: Demand-driven points-to analysis for java. In: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA 2005, pp. 59–76. ACM, New York (2005)Google Scholar
  26. 26.
    Valiant, L.G.: General context-free recognition in less than cubic time. Journal of Computer and System Sciences 10(2), 308–315 (1975)CrossRefzbMATHMathSciNetGoogle Scholar
  27. 27.
    Vardoulakis, D., Shivers, O.: Cfa2: A context-free approach to control-flow analysis. In: Gordon, A.D. (ed.) ESOP 2010. LNCS, vol. 6012, pp. 570–589. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  28. 28.
    Yan, D., Xu, G., Rountev, A.: Demand-driven context-sensitive alias analysis for java. In: Proceedings of the 2011 International Symposium on Software Testing and Analysis, ISSTA 2011, pp. 155–165. ACM, New York (2011)Google Scholar
  29. 29.
    Yannakakis, M.: Graph-theoretic methods in database theory. In: Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 1990, pp. 230–242. ACM, New York (1990)CrossRefGoogle Scholar
  30. 30.
    Younger, D.H.: Recognition and parsing of context-free languages in time n 3. Information and control 10(2), 189–208 (1967)CrossRefzbMATHGoogle Scholar
  31. 31.
    Yuan, H., Eugster, P.: An efficient algorithm for solving the dyck-cfl reachability problem on trees. In: Castagna, G. (ed.) ESOP 2009. LNCS, vol. 5502, pp. 175–189. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  32. 32.
    Yuster, R., Zwick, U.: Fast sparse matrix multiplication. ACM Trans. Algorithms 1(1), 2–13 (2005)CrossRefMathSciNetGoogle Scholar
  33. 33.
    Zhang, Q., Lyu, M.R., Yuan, H., Su, Z.: Fast algorithms for dyck-cfl-reachability with applications to alias analysis. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2013, pp. 435–446. ACM, New York (2013)CrossRefGoogle Scholar
  34. 34.
    Zhang, Q., Xiao, X., Zhang, C., Yuan, H., Su, Z.: Efficient subcubic alias analysis for c. In: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, pp. 829–845. ACM, New York (2014)CrossRefGoogle Scholar
  35. 35.
    Zheng, X., Rugina, R.: Demand-driven alias analysis for c. SIGPLAN Not. 43(1), 197–208 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.The University of SydneySydneyAustralia

Personalised recommendations