Skip to main content

Staged Points-to Analysis for Large Code Bases

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 9031)

Abstract

Bug checker tools for Java require fine-grained heap abstractions including object-sensitive call graphs, field information for objects, and points-to sets for program variables to find bugs in source codes. However, heap abstractions coined commonly as points-to analysis, have high runtime-complexity especially when the points-to analysis is context- sensitive, and, hence, state-of-the-art points-to analyses do not scale for large code bases.

In this paper, we introduce a new points-to framework that facilitates the computation of context-sensitive points-to analysis for large code bases. The framework is demand-driven, i.e., a client queries the points-to information for some program variables. The novelty of our approach is a pre-analysis technique that is a combination of staged points-to analyses with program slicing and program compaction. We implemented the proposed points-to framework in Datalog for a proprietary bug checker that could identify security vulnerabilities in the OpenJDKTM library which has approximately 1.3 million variables and 500,000 allocation-sites. For the clients that we have chosen, our technique is able to eliminate about 73% of all variables and about 95% of allocation-sites. Thus our points-to framework scales for code bases with millions of program variables and hundreds of thousands of methods.

Keywords

  • Program Variable
  • Alias Analysis
  • Allocation Site
  • Receiver Object
  • Initial Slice

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)

    Google Scholar 

  2. Andersen, L.O.: Program Analysis and Specialization for the C Programming Language. Ph.D. thesis, DIKU, University of Copenhagen (Fall 1994)

    Google Scholar 

  3. Appel, A.W.: Modern Compiler Implementation in Java. Cambridge University Press (1998)

    Google Scholar 

  4. Ball, T., Rajamani, S.K.: The SLAM toolkit. In: Berry, G., Comon, H., Finkel, A. (eds.) CAV 2001. LNCS, vol. 2102, pp. 260–264. Springer, Heidelberg (2001)

    CrossRef  Google Scholar 

  5. Bessey, A., Block, K., Chelf, B., Chou, A., Fulton, B., Hallem, S., Henri-Gros, C., Kamsky, A., McPeak, S., Engler, D.: A few billion lines of code later – using static analysis to find bugs in the real world. Comm. ACM 53, 66–75 (2010)

    Google Scholar 

  6. Blackburn, S.M., Garner, R., Hoffmann, C., Khan, A.M., McKinley, K.S., Bentzur, R., Diwan, A., Feinberg, D., Frampton, D., Guyer, S.Z., Hirzel, M., Hosking, A., Jump, M., Lee, H., Moss, J.E.B., Phansalkar, A., Stefanovic, D., VanDrunen, T., von Dincklage, D., Wiedermann, B.: The DaCapo benchmarks: Java benchmarking development and analysis. In: OOPSLA 2006: Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (2006)

    Google Scholar 

  7. Bravenboer, M., Smaragdakis, Y.: Strictly declarative specification of sophisticated points-to analyses. In: Proceeding of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA 2009, pp. 243–262. ACM (2009), http://doi.acm.org/10.1145/1640089.1640108

  8. Cifuentes, C., Keynes, N., Li, L., Hawes, N., Valdiviezo, M.: Transitioning Parfait into a development tool. IEEE Security and Privacy 10(3), 16–23 (2012)

    CrossRef  Google Scholar 

  9. Corporation, O.: Secure coding guidelines for java se (April 2014), http://www.oracle.com/technetwork/java/seccodeguide-139067.html

  10. Debray, S.K., Evans, W., Muth, R., De Sutter, B.: Compiler techniques for code compaction. ACM Transactions on Programming Languages and Systems 22(2), 378–415 (2000)

    CrossRef  Google Scholar 

  11. Feng, Y., Anand, S., Dillig, I., Aiken, A.: Apposcopy: Semantics-based detection of android malware through static analysis. In: International Symposium on Foundations of Software Engineering (2014) (to appear)

    Google Scholar 

  12. Gotsman, A., Berdine, J., Cook, B.: Interprocedural shape analysis with separated heap abstractions. In: Yi, K. (ed.) SAS 2006. LNCS, vol. 4134, pp. 240–260. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  13. Green, T.J., Aref, M., Karvounarakis, G.: Logicblox, platform and language: A tutorial. In: Barceló, P., Pichler, R. (eds.) Datalog 2.0 2012. LNCS, vol. 7494, pp. 1–8. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  14. Hind, M., Pioli, A.: Which pointer analysis should i use? In: Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), pp. 113–123. ACM (2000)

    Google Scholar 

  15. Lhoták, O., Hendren, L.J.: Context-sensitive points-to analysis: Is it worth it? In: Mycroft, A., Zeller, A. (eds.) CC 2006. LNCS, vol. 3923, pp. 47–64. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  16. Lhoták, O., Hendren, L.J.: Evaluating the benefits of context-sensitive points-to analysis using a BDD-based implementation. ACM Transactions on Software Engineering Methodology 18(1) (2008)

    Google Scholar 

  17. Lu, Y., Shang, L., Xie, X., Xue, J.: An incremental points-to analysis with cfl-reachability. In: Jhala, R., De Bosschere, K. (eds.) Compiler Construction. LNCS, vol. 7791, pp. 61–81. Springer, Heidelberg (2013)

    CrossRef  Google Scholar 

  18. Milanova, A., Rountev, A., Ryder, B.G.: Parameterized object sensitivity for points-to analysis for Java. ACM Transaction on Software Engineering Methodolology 14(1), 1–41 (2005), http://doi.acm.org/10.1145/1044834.1044835

    CrossRef  Google Scholar 

  19. Octeau, D., McDaniel, P., Jha, S., Bartel, A., Bodden, E., Klein, J., Le Traon, Y.: Effective inter-component communication mapping in android with epicc: An essential step towards holistic security analysis. In: Proceedings of the 22nd USENIX Conference on Security (SEC), pp. 543–558. USENIX Association (2013), http://dl.acm.org/citation.cfm?id=2534766.2534813

  20. Oh, H., Lee, W., Heo, K., Yang, H., Yi, K.: Selective context-sensitivity guided by impact pre-analysis. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 475–484. ACM (2014)

    Google Scholar 

  21. Ryder, B.G.: Dimensions of precision in reference analysis of object-oriented programming languages. In: Hedin, G. (ed.) CC 2003. LNCS, vol. 2622, pp. 126–137. Springer, Heidelberg (2003)

    CrossRef  Google Scholar 

  22. Smaragdakis, Y., Balatsouras, G., Kastrinis, G.: Set-based pre-processing for points-to analysis. In: ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA), pp. 253–270 (2013)

    Google Scholar 

  23. Smaragdakis, Y., Bravenboer, M., Lhoták, O.: Pick your contexts well: understanding object-sensitivity. In: Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, pp. 17–30. ACM (2011), http://doi.acm.org/10.1145/1926385.1926390

  24. Sridharan, M., Bodík, R.: Refinement-based context-sensitive points-to analysis for Java. In: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2006, pp. 387–400. ACM (2006), http://doi.acm.org/10.1145/1133981.1134027

  25. Sridharan, M., Gopan, D., Shan, L., Bodik, R.: Demand-driven points-to analysis for Java. In: Proceedings of the 20th Annual ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pp. 59–76. ACM (2005), http://doi.acm.org/10.1145/1094811.1094817

  26. Tip, F., Palsberg, J.: Scalable propagation-based call graph construction algorithms. In: Rosson, M.B., Lea, D. (eds.) OOPSLA 2000, pp. 281–293. ACM (2000)

    Google Scholar 

  27. Yan, D., Xu, G., Rountev, A.: Demand-driven context-sensitive alias analysis for Java. In: Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA), pp. 155–165. ACM (2011), http://doi.acm.org/10.1145/2001420.2001440

  28. Zheng, X., Rugina, R.: Demand-driven alias analysis for C. In: Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2008, pp. 197–208 (2008), http://doi.acm.org/10.1145/1328438.1328464

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicholas Allen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Allen, N., Scholz, B., Krishnan, P. (2015). Staged Points-to Analysis for Large Code Bases. In: Franke, B. (eds) Compiler Construction. CC 2015. Lecture Notes in Computer Science(), vol 9031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46663-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46663-6_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46662-9

  • Online ISBN: 978-3-662-46663-6

  • eBook Packages: Computer ScienceComputer Science (R0)