Abstract
We consider interprocedural data-flow analysis as formalized by the standard IFDS framework, which can express many widely-used static analyses such as reaching definitions, live variables, and null-pointer. We focus on the well-studied on-demand setting in which queries arrive one-by-one in a stream and each query should be answered as fast as possible. While the classical IFDS algorithm provides a polynomial-time solution for this problem, it is not scalable in practice. More specifically, it will either require a quadratic-time preprocessing phase or takes linear time per query, both of which are untenable for modern huge codebases with hundreds of thousands of lines. Previous works have already shown that parameterizing the problem by the treewidth of the program’s control-flow graph is promising and can lead to significant gains in efficiency. Unfortunately, these results were only applicable to the limited special case of same-context queries.
In this work, we obtain significant speedups for the general case of on-demand IFDS with queries that are not necessarily same-context. This is achieved by exploiting a new graph sparsity parameter, namely the treedepth of the program’s call graph. Our approach is the first to exploit the sparsity of control-flow graphs and call graphs at the same time and parameterize by both the treewidth and the treedepth. We obtain an algorithm with a linear preprocessing phase that can answer each query in constant time wrt the size of the input. Finally, our experimental results demonstrate that our approach significantly outperforms the classical IFDS and its on-demand variant.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Instead of single data facts \(d_1\) and \(d_2\), we can also use a set of data facts at each of \(\ell _1\) and \(\ell _2,\) but as we will see in Sect. 2, this does not affect the generality.
- 2.
This algorithm uses the Word-RAM model of computation. The division by \(\lg n\) is obtained by encoding \(\lg n\) bits in one word.
- 3.
The bags do not have to be disjoint.
- 4.
The name partial order tree is not standard in this context, but we use it throughout this work since it provides a good intuition about the nature of T. Usually, the term “treedepth decomposition” is used instead.
- 5.
For generating each query, we randomly and uniformly picked two points in the exploded supergraph. Note that none of our queries are same-context. Even when the two points of the query are in the same function, we are asking for reachability using interprocedurally valid paths that are not necessarily same-context.
References
T.J. Watson libraries for analysis, with frontends for Java, Android, and JavaScript, and many common static program analyses. https://github.com/wala/WALA
Ahmadi, A., Daliri, M., Goharshady, A.K., Pavlogiannis, A.: Efficient approximations for cache-conscious data placement. In: PLDI, pp. 857–871 (2022)
Aiswarya, C.: How treewidth helps in verification. ACM SIGLOG News 9(1), 6–21 (2022)
Arzt, S., et al.: FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps. In: PLDI, pp. 259–269 (2014)
Asadi, A., Chatterjee, K., Goharshady, A.K., Mohammadi, K., Pavlogiannis, A.: Faster algorithms for quantitative analysis of MCs and MDPs with small treewidth. In: Hung, D.V., Sokolsky, O. (eds.) ATVA 2020. LNCS, vol. 12302, pp. 253–270. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59152-6_14
Babich, W.A., Jazayeri, M.: The method of attributes for data flow analysis: Part II. Demand analysis. Acta Informatica 10, 265–272 (1978)
Bebenita, M., et al.: SPUR: a trace-based JIT compiler for CIL. In: OOPSLA, pp. 708–725 (2010)
Blackburn, S.M., et al.: The DaCapo benchmarks: Java benchmarking development and analysis. In: OOPSLA, pp. 169–190 (2006)
Bodden, E.: Inter-procedural data-flow analysis with IFDS/IDE and soot. In: SOAP, pp. 3–8 (2012)
Bodden, E., Tolêdo, T., Ribeiro, M., Brabrand, C., Borba, P., Mezini, M.: SPLLIFT: statically analyzing software product lines in minutes instead of years. In: PLDI, pp. 355–364 (2013)
Bodlaender, H.L.: Dynamic programming on graphs with bounded treewidth. In: ICALP, pp. 105–118 (1988)
Bodlaender, H.L.: A linear time algorithm for finding tree-decompositions of small treewidth. In: STOC, pp. 226–234 (1993)
Bodlaender, H.L.: A tourist guide through treewidth. Acta Cybern. 11(1–2), 1–21 (1993)
Bodlaender, H.L., et al.: Rankings of graphs. SIAM J. Discret. Math. 11(1), 168–181 (1998)
Bodlaender, H.L., Hagerup, T.: Parallel algorithms with optimal speedup for bounded treewidth. SIAM J. Comput. 27(6), 1725–1746 (1998)
Burgstaller, B., Blieberger, J., Scholz, B.: On the tree width of Ada programs. In: Llamosí, A., Strohmeier, A. (eds.) Ada-Europe 2004. LNCS, vol. 3063, pp. 78–90. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24841-5_6
Callahan, D., Cooper, K.D., Kennedy, K., Torczon, L.: Interprocedural constant propagation. In: CC, pp. 152–161 (1986)
Chang, W., Streiff, B., Lin, C.: Efficient and extensible security enforcement using dynamic data flow analysis. In: CCS, pp. 39–50 (2008)
Chatterjee, K., Goharshady, A.K., Goharshady, E.K.: The treewidth of smart contracts. In: SAC, pp. 400–408 (2019)
Chatterjee, K., Goharshady, A.K., Goyal, P., Ibsen-Jensen, R., Pavlogiannis, A.: Faster algorithms for dynamic algebraic queries in basic RSMs with constant treewidth. TOPLAS 41(4), 23:1–23:46 (2019)
Chatterjee, K., Goharshady, A.K., Ibsen-Jensen, R., Pavlogiannis, A.: Algorithms for algebraic path properties in concurrent systems of constant treewidth components. In: POPL, pp. 733–747 (2016)
Chatterjee, K., Goharshady, A.K., Ibsen-Jensen, R., Pavlogiannis, A.: Optimal and perfectly parallel algorithms for on-demand data-flow analysis. In: ESOP, pp. 112–140 (2020)
Chatterjee, K., Goharshady, A.K., Okati, N., Pavlogiannis, A.: Efficient parameterized algorithms for data packing. In: POPL, pp. 53:1–53:28 (2019)
Chatterjee, K., Goharshady, A.K., Pavlogiannis, A.: JTDec: a tool for tree decompositions in soot. In: D’Souza, D., Narayan Kumar, K. (eds.) ATVA 2017. LNCS, vol. 10482, pp. 59–66. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68167-2_4
Chatterjee, K., Ibsen-Jensen, R., Goharshady, A.K., Pavlogiannis, A.: Algorithms for algebraic path properties in concurrent systems of constant treewidth components. TOPLAS 40(3), 9:1–9:43 (2018)
Chatterjee, K., Ibsen-Jensen, R., Pavlogiannis, A.: Faster algorithms for quantitative verification in constant treewidth graphs. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 140–157. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_9
Chatterjee, K., Ibsen-Jensen, R., Pavlogiannis, A.: Quantitative verification on product graphs of small treewidth. In: FSTTCS, pp. 42:1–42:23 (2021)
Chen, T., Lin, J., Dai, X., Hsu, W.-C., Yew, P.-C.: Data dependence profiling for speculative optimizations. In: Duesterwald, E. (ed.) CC 2004. LNCS, vol. 2985, pp. 57–72. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24723-4_5
Chow, A.L., Rudmik, A.: The design of a data flow analyzer. In: CC, pp. 106–113 (1982)
Collard, J.F., Knoop, J.: A comparative study of reaching-definitions analyses (1998)
Dangel, A., Fournier, C., et al.: PMD Eclipse plugin. https://github.com/pmd/pmd-eclipse-plugin
Das, A., Lal, A.: Precise null pointer analysis through global value numbering. In: D’Souza, D., Narayan Kumar, K. (eds.) ATVA 2017. LNCS, vol. 10482, pp. 25–41. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68167-2_2
Dell, H., Komusiewicz, C., Talmon, N., Weller, M.: The PACE 2017 parameterized algorithms and computational experiments challenge: the second iteration. In: IPEC, pp. 30:1–30:12 (2018)
Duesterwald, E., Gupta, R., Soffa, M.L.: Demand-driven computation of interprocedural data flow. In: POPL, pp. 37–48 (1995)
Eclipse Foundation: Eclipse documentation, Java development user guide. http://help.eclipse.org/2022-06/index.jsp?topic=/org.eclipse.jdt.doc.user/reference/preferences/java/compiler/ref-preferences-errors-warnings.htm
Ferrara, A., Pan, G., Vardi, M.Y.: Treewidth in verification: local vs. global. In: Sutcliffe, G., Voronkov, A. (eds.) LPAR 2005. LNCS (LNAI), vol. 3835, pp. 489–503. Springer, Heidelberg (2005). https://doi.org/10.1007/11591191_34
Flückiger, O., Scherer, G., Yee, M., Goel, A., Ahmed, A., Vitek, J.: Correctness of speculative optimizations with dynamic deoptimization. In: POPL, pp. 49:1–49:28 (2018)
Goharshady, A.K.: Parameterized and algebro-geometric advances in static program analysis. Ph.D. thesis, Institute of Science and Technology Austria, Klosterneuburg, Austria (2020)
Goharshady, A.K., Hooshmandasl, M.R., Meybodi, M.A.: [1, 2]-sets and [1, 2]-total sets in trees with algorithms. Discret. Appl. Math. 198, 136–146 (2016)
Goharshady, A.K., Mohammadi, F.: An efficient algorithm for computing network reliability in small treewidth. Reliab. Eng. Syst. Saf. 193, 106665 (2020)
Gould, C., Su, Z., Devanbu, P.T.: JDBC checker: a static analysis tool for SQL/JDBC applications. In: ICSE, pp. 697–698 (2004)
Grove, D., Torczon, L.: Interprocedural constant propagation: a study of jump function implementations. In: PLDI, pp. 90–99 (1993)
Gupta, R., Benson, D., Fang, J.Z.: Path profile guided partial dead code elimination using predication. In: PACT, pp. 102–113 (1997)
Gustedt, J., Mæhle, O.A., Telle, J.A.: The treewidth of Java programs. In: Mount, D.M., Stein, C. (eds.) ALENEX 2002. LNCS, vol. 2409, pp. 86–97. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45643-0_7
Horwitz, S., Reps, T.W., Sagiv, S.: Demand interprocedural dataflow analysis. In: FSE, pp. 104–115 (1995)
Khedker, U., Sanyal, A., Sathe, B.: Data Flow Analysis: Theory and Practice. CRC Press, Boca Raton (2017)
Kildall, G.A.: A unified approach to global program optimization. In: POPL, pp. 194–206 (1973)
Kildall, G.A.: Global Expression Optimization During Compilation. University of Washington (1972)
Knoop, J., Steffen, B.: Efficient and optimal bit-vector data flow analyses: a uniform interprocedural framework. Institut für Informatik und Praktische Mathematik Kiel, Bericht (1993)
Knoop, J., Steffen, B., Vollmer, J.: Parallelism for free: efficient and optimal bitvector analyses for parallel programs. TOPLAS 18(3), 268–299 (1996)
Kowalik, Ł., Mucha, M., Nadara, W., Pilipczuk, M., Sorge, M., Wygocki, P.: The PACE 2020 parameterized algorithms and computational experiments challenge: treedepth. In: IPEC, pp. 37:1–37:18 (2020)
Kurdahi, F.J., Parker, A.C.: REAL: a program for register allocation. In: DAC, pp. 210–215 (1987)
Lin, J., et al.: A compiler framework for speculative optimizations. TACO (3), 247–271 (2004)
Meybodi, M.A., Goharshady, A.K., Hooshmandasl, M.R., Shakiba, A.: Optimal mining: maximizing Bitcoin miners’ revenues from transaction fees. In: Blockchain, pp. 266–273. IEEE (2022)
Meyer, B.: Ending null pointer crashes. Commun. ACM 60(5), 8–9 (2017)
Nadara, W., Pilipczuk, M., Smulewicz, M.: Computing treedepth in polynomial space and linear FPT time. CoRR abs/2205.02656 (2022)
Naeem, N.A., Lhoták, O., Rodriguez, J.: Practical extensions to the IFDS algorithm. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 124–144. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11970-5_8
Nanda, M.G., Sinha, S.: Accurate interprocedural null-dereference analysis for Java. In: ICSE, pp. 133–143. IEEE (2009)
Nešetřil, J., De Mendez, P.O.: Sparsity: Graphs, Structures, and Algorithms. Springer, Cham (2012)
Nesetril, J., de Mendez, P.O.: Tree-depth, subgraph coloring and homomorphism bounds. Eur. J. Comb. 27(6), 1022–1041 (2006)
Nguyen, T.V.N., Irigoin, F., Ancourt, C., Coelho, F.: Automatic detection of uninitialized variables. In: Hedin, G. (ed.) CC 2003. LNCS, vol. 2622, pp. 217–231. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36579-6_16
Obdržálek, J.: Fast mu-calculus model checking when tree-width is bounded. In: Hunt, W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 80–92. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45069-6_7
Pessoa, T., Monteiro, M.P., Bryton, S., et al.: An eclipse plugin to support code smells detection. arXiv preprint arXiv:1204.6492 (2012)
Pothen, A.: The complexity of optimal elimination trees. Technical report (1988)
Rapoport, M., Lhoták, O., Tip, F.: Precise data flow analysis in the presence of correlated method calls. In: Blazy, S., Jensen, T. (eds.) SAS 2015. LNCS, vol. 9291, pp. 54–71. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48288-9_4
Reps, T.: Undecidability of context-sensitive data-dependence analysis. TOPLAS 22(1), 162–186 (2000)
Reps, T.W.: Demand interprocedural program analysis using logic databases. In: Ramakrishnan, R. (ed.) Applications of Logic Databases. SECS, pp. 163–196. Springer, Boston (1993). https://doi.org/10.1007/978-1-4615-2207-2_8
Reps, T.W.: Program analysis via graph reachability. Inf. Softw. Technol. 40(11–12), 701–726 (1998)
Reps, T.W., Horwitz, S., Sagiv, S.: Precise interprocedural dataflow analysis via graph reachability. In: POPL, pp. 49–61 (1995)
Robertson, N., Seymour, P.D.: Graph minors. III. Planar tree-width. J. Comb. Theory Ser. B 36(1), 49–64 (1984)
Robertson, N., Seymour, P.D.: Graph minors. II. Algorithmic aspects of tree-width. J. Algorithms 7(3), 309–322 (1986)
Rountev, A., Kagan, S., Marlowe, T.: Interprocedural dataflow analysis in the presence of large libraries. In: Mycroft, A., Zeller, A. (eds.) CC 2006. LNCS, vol. 3923, pp. 2–16. Springer, Heidelberg (2006). https://doi.org/10.1007/11688839_2
Sagiv, S., Reps, T.W., Horwitz, S.: Precise interprocedural dataflow analysis with applications to constant propagation. Theor. Comput. Sci. 167, 131–170 (1996)
Shang, L., Xie, X., Xue, J.: On-demand dynamic summary-based points-to analysis. In: CGO, pp. 264–274 (2012)
Späth, J., Ali, K., Bodden, E.: Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems. In: POPL, pp. 48:1–48:29 (2019)
Sridharan, M., Bodík, R.: Refinement-based context-sensitive points-to analysis for Java. In: PLDI, pp. 387–400 (2006)
Sridharan, M., Gopan, D., Shan, L., Bodík, R.: Demand-driven points-to analysis for Java. In: OOPSLA, pp. 59–76 (2005)
Thorup, M.: All structured programs have small tree-width and good register allocation. Inf. Comput. 142(2), 159–181 (1998)
Vallée-Rai, R., Co, P., Gagnon, E., Hendren, L.J., Lam, P., Sundaresan, V.: Soot - a Java bytecode optimization framework. In: CASCON, p. 13. IBM (1999)
Xu, G., Rountev, A., Sridharan, M.: Scaling CFL-reachability-based points-to analysis using context-sensitive must-not-alias analysis. In: Drossopoulou, S. (ed.) ECOOP 2009. LNCS, vol. 5653, pp. 98–122. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03013-0_6
Yan, D., Xu, G., Rountev, A.: Demand-driven context-sensitive alias analysis for Java. In: ISSTA, pp. 155–165 (2011)
Zheng, X., Rugina, R.: Demand-driven alias analysis for C. In: POPL, pp. 197–208 (2008)
Acknowledgments
The research was partially supported by the Hong Kong Research Grants Council ECS Project Number 26208122, the HKUST-Kaisa Joint Research Institute Project Grant HKJRI3A-055 and the HKUST Startup Grant R9272. Author names are ordered alphabetically.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Goharshady, A.K., Zaher, A.K. (2023). Efficient Interprocedural Data-Flow Analysis Using Treedepth and Treewidth. In: Dragoi, C., Emmi, M., Wang, J. (eds) Verification, Model Checking, and Abstract Interpretation. VMCAI 2023. Lecture Notes in Computer Science, vol 13881. Springer, Cham. https://doi.org/10.1007/978-3-031-24950-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-24950-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24949-5
Online ISBN: 978-3-031-24950-1
eBook Packages: Computer ScienceComputer Science (R0)