Applying Program Analysis to Code Retrieval

  • Joel OssherEmail author
  • Cristina Lopes


Early code retrieval systems were primarily adaptations of standard text retrieval approaches, and so treated source code as either plain or structured text. While fairly successful, these approaches ignored much of the information that can be extracted from the source code. Recently, researchers have demonstrated a number of ways in which static program analysis can be used to augment text-based retrieval approaches. By taking advantage of the structural and semantic information embedded in source code, advanced code retrieval systems can provide a superior experience.

This chapter begins describing how basic text-based code retrieval systems function. It then introduces a basic form of static program analysis which allows source code to be treated as structured text. Finally, it describes link analysis, an advanced program analysis technique. Link analysis aids code retrieval systems in numerous ways, for example enabling better estimates of result quality and the sharing of descriptive terms. The chapter concludes by describing in great detail a single static program analysis technique called dependency slicing. Dependency slicing is used in code retrieval systems to package up search results as a compilable unit, which supports the reuse of the retrieved results.


Source Code Abstract Type Term Extraction Abstract Syntax Tree Type Hierarchy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This material is based upon work supported by the National Science Foundation under Grant No. 1018374.


  1. [1]
    Apache Lucene.Google Scholar
  2. [2]
    Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. 2006.Google Scholar
  3. [3]
    Sushil K. Bajracharya, Joel Ossher, and Cristina V. Lopes. Leveraging usage similarity for effective retrieval of examples in code repositories. In Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering, FSE ’10, pages 157–166, New York, NY, USA, 2010. ACM.Google Scholar
  4. [4]
    Michael W. Berry and J. Kogan. Text Mining : Applications and Theory. 2010.Google Scholar
  5. [5]
    Barthélémy Dagenais and Laurie Hendren. Enabling static analysis for partial java programs. In Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications, OOPSLA ’08, pages 313–328, New York, NY, USA, 2008. ACM.Google Scholar
  6. [6]
    GNU Software Foundation. GNU Grep 2.9 Manual.Google Scholar
  7. [7]
    Oliver Hummel, Werner Janjic, and Colin Atkinson. Code Conjurer: Pulling Reusable Software out of Thin AIr. IEEE Software, January 2008.Google Scholar
  8. [8]
    Otávio Augusto Lazzarini Lemos, Sushil Bajracharya, Joel Ossher, Ricardo Santos Morla, Paulo Cesar Masiero, Pierre Baldi, and Cristina Videira Lopes. CodeGenie. Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering - ASE ’07, page 525, 2007.Google Scholar
  9. [9]
    Erik Linstead, Sushil Bajracharya, Trung Ngo, Paul Rigor, Cristina Lopes, and Pierre Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18(2):300–336, 2009.MathSciNetCrossRefGoogle Scholar
  10. [10]
    Christopher D. Manning and Prabhakar Raghavan. Introduction to Information Retrieval. 2008.Google Scholar
  11. [11]
    C McMillan, M Grechanik, and D Poshyvanyk. Portfolio: Finding relevant functions and their usages. In Proceeding of the 33rd, pages 111–120, 2011.Google Scholar
  12. [12]
    Flemming Nielson, Hanne R. Nielson, and Chris Hankin. Principles of Program Analysis. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1999.zbMATHCrossRefGoogle Scholar
  13. [13]
    J. Ossher, S. Bajracharya, and C. Lopes. Automated dependency resolution for open source software. In Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on, pages 130–140, may 2010.Google Scholar
  14. [14]
    Suresh Thummalapenta and Tao Xie. SpotWeb: Detecting Framework Hotspots and Coldspots via Mining Open Source Code on the Web. 2008 23rd IEEE/ACM International Conference on Automated Software Engineering, pages 327–336, September 2008.Google Scholar
  15. [15]
    Frank Tip, Chris Laffra, Peter F. Sweeney, and David Streeter. Practical experience with an application extractor for java. In Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA ’99, pages 292–305, New York, NY, USA, 1999. ACM.Google Scholar
  16. [16]
    Reishi Yokomori, Takashi Ishio, Tetsuo Yamamoto, Makoto Matsushita, Shinji Kusumoto, and Katsuro Inoue. Java program analysis projects in Osaka University: aspect-based slicing system ADAS and ranked-component search system SPARS-J. In Proceedings of the 25th International Conference on Software Engineering, pages 828–829. IEEE Computer Society, 2003.Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.University of CaliforniaIrvineUSA

Personalised recommendations