SNIFF: A Search Engine for Java Using Free-Form Queries

  • Shaunak Chatterjee
  • Sudeep Juvekar
  • Koushik Sen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5503)


Reuse of existing libraries simplifies software development efforts. However, these libraries are often complex and reusing the APIs in the libraries involves a steep learning curve. A programmer often uses a search engine such as Google to discover code snippets involving library usage to perform a common task. A problem with search engines is that they return many pages that a programmer has to manually mine to discover the desired code. Recent research efforts have tried to address this problem by automating the generation of code snippets from user queries. However, these queries need to have type information and therefore require the user to have a partial knowledge of the APIs.

We propose a novel code search technique, called SNIFF, which retains the flexibility of performing code search in plain English, while obtaining a small set of relevant code snippets to perform the desired task. Our technique is based on the observation that the library methods that a user code calls are often well-documented. We use the documentation of the library methods to add plain English meaning to an otherwise undocumented user code. The annotated user code is then indexed for the purpose of free-form query search. Another novel contribution of our technique is that we take a type-based intersection of the candidate code snippets obtained from a query search to generate a set of small and highly relevant code snippets.

We have implemented SNIFF for Java and have performed evaluations and user studies to demonstrate the utility of SNIFF. Our evaluations show that SNIFF performed better than most of the existing online search engines as well as related tools.


Search Engine User Query Longe Common Subsequence User Code Longe Common Subsequence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Ammons, G., Bodik, R., Larus, J.R.: Mining specifications. In: POPL 2002, pp. 4–16 (2002)Google Scholar
  2. 2.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1–7), 107–117 (1998)CrossRefGoogle Scholar
  3. 3.
  4. 4.
    Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to algorithms. MIT press/ McGraw-Hill (2001)Google Scholar
  5. 5.
    Google code search,
  6. 6.
    Holmes, R., Murphy, G.: Using structural context to recommend source code examples. In: Inverardi, P., Jazayeri, M. (eds.) ICSE 2005. LNCS, vol. 4309, pp. 117–125. Springer, Heidelberg (2006)Google Scholar
  7. 7.
    Holmes, R., Walker, R., Murphy, G.: Approximate structural context matching: An approach to recommend relevant examples. IEEE Transactions on Software Engineering 32(12), 952–970 (2006)CrossRefGoogle Scholar
  8. 8.
    Java frequently asked questions,
  9. 9.
    Jiang, L., Misherghi, G., Su, Z., Glondu, S.: Deckard: Scalable and accurate tree-based detection of code clones. In: ICSE 2007, pp. 96–105 (2007)Google Scholar
  10. 10.
  11. 11.
    Kremenek, T., Twohey, P., Back, G., Ng, A., Engler, D.: From uncertainty to belief: inferring the specification within. In: OSDI 2006, pp. 161–176 (2006)Google Scholar
  12. 12.
    Mandelin, D., Xu, L., Bodík, R., Kimelman, D.: Jungloid mining: helping to navigate the api jungle. In: PLDI 2005, pp. 48–61 (2005)Google Scholar
  13. 13.
    Matsushita, M., Inoue, K., Yokomori, R., Yamamoto, T., Kusumoto, S.: Ranking significance of software components based on use relations. IEEE Trans. Softw. Eng. 31(3), 213–225 (2005)CrossRefGoogle Scholar
  14. 14.
    Porter, M.F.: An algorithm for suffix stripping. In: Readings in information retrieval, vol. 14, pp. 130–137 (1980)Google Scholar
  15. 15.
    Robillard, M.P.: Automatic generation of suggestions for program investigation. In: ESEC/FSE-13: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pp. 11–20. ACM, New York (2005)CrossRefGoogle Scholar
  16. 16.
    Sahavechaphan, N., Claypool, K.: Xsnippet: mining for sample code. In: OOPSLA 2006, pp. 413 – 430 (2006)Google Scholar
  17. 17.
    Tan, L., Yuan, D., Krishna, G., Zhou, Y.: /*icomment: bugs or bad comments?*/. In: SOSP 2007, pp. 145–158 (2007)Google Scholar
  18. 18.
    Thummalapenta, S., Xie, T.: PARSEWeb: A programmer assistant for reusing open source code on the web. In: ASE 2007, pp. 204–213 (2007)Google Scholar
  19. 19.
    Woodfield, S., Dunsmore, H., Shen, V.Y.: The effect of modularization and comments on program comprehension. In: ICSE 2002, pp. 215–223 (1981)Google Scholar
  20. 20.
    Ying, A.T.T., Wright, J.L., Abrams, S.: Source code that talks: an exploration of eclipse task comments and their implication to repository mining. In: MSR 2005, pp. 1–5 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Shaunak Chatterjee
    • 1
  • Sudeep Juvekar
    • 1
  • Koushik Sen
    • 1
  1. 1.EECS DepartmentUniversity of CaliforniaBerkeleyUSA

Personalised recommendations