Novel and Applied Algorithms in a Search Engine for Java Code Snippets

  • Phitchayaphong TantikulEmail author
  • C. Albert Thompson
  • Rosalva E. Gallardo-Valencia
  • Susan Elliott Sim


Programmers often look for a “snippet,” that is, a small piece of example code, to remind themselves of how to solve a problem or to quickly learn about a new resource. However, existing tools such as general-purpose search engines and code-specific search engines do not deal well with searches for snippets. In this chapter, we present a prototype search engine designed to work with code snippets. Our approach is based on using the non-code text on a web page as metadata for the snippet to improve indexing and retrieval. We discuss some implementation issues that we encountered, which lead to lessons learned for others who follow. These issues include: extracting snippets from web pages, selecting and indexing metadata, matching query terms with multiple metadata indexes, and identifying a text summary to be used in the presentations of results.


Source Code Search Engine Stop Word Text Segment Page Title 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This material is based upon work supported by the NSF under Grant No. IIS-0846034 and by the UCI Summer Undergraduate Research Program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessary reflect the views of the NSF.


  1. [1].
    S. Bajracharya and C. Lopes. Mining search topics from a code search engine usage log. In Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories, pages 111–120. IEEE Computer Society, 2009.Google Scholar
  2. [2].
    C. Fox. A stop list for general text, 1989.Google Scholar
  3. [3].
    James Gosling, Bill Joy, Guy Steele, and Gilad Bracha. The Java TM Language Specification. Addison-Wesley Professional, 3rd edition, 2005.Google Scholar
  4. [4].
    T. Grotton. Combining content extraction heuristics: The combine system. In Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services, pages 591–595, 2008.Google Scholar
  5. [5].
    Reid Holmes, Robert J. Walker, and Gail C. Murphy. Strathcona example recommendation tool. In Michel Wermelinger and Harald Gall, editors, ESEC/SIGSOFT FSE, pages 237–240. ACM, 2005.Google Scholar
  6. [6].
    Erik Linstead, Sushil Bajracharya, Trung Ngo, Paul Rigor, Cristina Lopes, and Pierre Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18(2):300–336, 2009.MathSciNetCrossRefGoogle Scholar
  7. [7].
    Merriam-Webster. Merriam-Webster’s 9th Collegiate Dictionary. Merriam-Webster. Springfield, MA, USA, 1992.Google Scholar
  8. [8].
    Michael McCandless, Erik Hatcher, and Otis Gospodnetić. Lucene in Action. Manning Publications, second edition, 2010.Google Scholar
  9. [9].
    M.F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.CrossRefGoogle Scholar
  10. [10].
    J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso. Rotation forest: A classifier ensemble method, 2006.Google Scholar
  11. [11].
    Susan Elliott Sim, Charles L. A. Clarke, and Richard C. Holt. Archetypal source code searches: A survey of software developers and maintainers. In Proceedings of the Sixth International Workshop on Program Comprehension, page 180, Los Alamitos, CA, 1998. IEEE Computer Society.Google Scholar
  12. [12].
    Jeffrey Stylos and Brad A. Myers. Mica: A web-search tool for finding api components and examples. In IEEE Symposium on Visual Languages and Human-Centric Computing, 2006. VL/HCC 2006, pages 195–202, Brighton, United Kingdom, 2006. IEEE.Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Phitchayaphong Tantikul
    • 1
    Email author
  • C. Albert Thompson
    • 2
  • Rosalva E. Gallardo-Valencia
    • 3
  • Susan Elliott Sim
    • 4
  1. 1.University of California, IrvineIrvineUSA
  2. 2.University of British ColumbiaVancouverCanada
  3. 3.Intel CorporationSanta ClaraUSA
  4. 4.Many Roads StudiosTorontoCanada

Personalised recommendations