Novel and Applied Algorithms in a Search Engine for Java Code Snippets
Programmers often look for a “snippet,” that is, a small piece of example code, to remind themselves of how to solve a problem or to quickly learn about a new resource. However, existing tools such as general-purpose search engines and code-specific search engines do not deal well with searches for snippets. In this chapter, we present a prototype search engine designed to work with code snippets. Our approach is based on using the non-code text on a web page as metadata for the snippet to improve indexing and retrieval. We discuss some implementation issues that we encountered, which lead to lessons learned for others who follow. These issues include: extracting snippets from web pages, selecting and indexing metadata, matching query terms with multiple metadata indexes, and identifying a text summary to be used in the presentations of results.
KeywordsSource Code Search Engine Stop Word Text Segment Page Title
This material is based upon work supported by the NSF under Grant No. IIS-0846034 and by the UCI Summer Undergraduate Research Program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessary reflect the views of the NSF.
- .S. Bajracharya and C. Lopes. Mining search topics from a code search engine usage log. In Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories, pages 111–120. IEEE Computer Society, 2009.Google Scholar
- .C. Fox. A stop list for general text, 1989.Google Scholar
- .James Gosling, Bill Joy, Guy Steele, and Gilad Bracha. The Java TM Language Specification. Addison-Wesley Professional, 3rd edition, 2005.Google Scholar
- .T. Grotton. Combining content extraction heuristics: The combine system. In Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services, pages 591–595, 2008.Google Scholar
- .Reid Holmes, Robert J. Walker, and Gail C. Murphy. Strathcona example recommendation tool. In Michel Wermelinger and Harald Gall, editors, ESEC/SIGSOFT FSE, pages 237–240. ACM, 2005.Google Scholar
- .Merriam-Webster. Merriam-Webster’s 9th Collegiate Dictionary. Merriam-Webster. Springfield, MA, USA, 1992.Google Scholar
- .Michael McCandless, Erik Hatcher, and Otis Gospodnetić. Lucene in Action. Manning Publications, second edition, 2010.Google Scholar
- .J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso. Rotation forest: A classifier ensemble method, 2006.Google Scholar
- .Susan Elliott Sim, Charles L. A. Clarke, and Richard C. Holt. Archetypal source code searches: A survey of software developers and maintainers. In Proceedings of the Sixth International Workshop on Program Comprehension, page 180, Los Alamitos, CA, 1998. IEEE Computer Society.Google Scholar
- .Jeffrey Stylos and Brad A. Myers. Mica: A web-search tool for finding api components and examples. In IEEE Symposium on Visual Languages and Human-Centric Computing, 2006. VL/HCC 2006, pages 195–202, Brighton, United Kingdom, 2006. IEEE.Google Scholar