Skip to main content

Novel and Applied Algorithms in a Search Engine for Java Code Snippets

  • Chapter
Finding Source Code on the Web for Remix and Reuse

Abstract

Programmers often look for a “snippet,” that is, a small piece of example code, to remind themselves of how to solve a problem or to quickly learn about a new resource. However, existing tools such as general-purpose search engines and code-specific search engines do not deal well with searches for snippets. In this chapter, we present a prototype search engine designed to work with code snippets. Our approach is based on using the non-code text on a web page as metadata for the snippet to improve indexing and retrieval. We discuss some implementation issues that we encountered, which lead to lessons learned for others who follow. These issues include: extracting snippets from web pages, selecting and indexing metadata, matching query terms with multiple metadata indexes, and identifying a text summary to be used in the presentations of results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.krugle.com/.

  2. 2.

    http://www.koders.com.

  3. 3.

    http://www.google.com/codesearch.

  4. 4.

    http://htmlparser.sourceforge.net.

  5. 5.

    http://www.cs.waikato.ac.nz/ml/weka/.

  6. 6.

    http://www.ranks.nl/resources/stopwords.html.

  7. 7.

    http://www.eclipse.org/articles/article.php?file=Article-JavaCodeManipulation_AST/index.html.

References

  1. S. Bajracharya and C. Lopes. Mining search topics from a code search engine usage log. In Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories, pages 111–120. IEEE Computer Society, 2009.

    Google Scholar 

  2. C. Fox. A stop list for general text, 1989.

    Google Scholar 

  3. James Gosling, Bill Joy, Guy Steele, and Gilad Bracha. The Java TM Language Specification. Addison-Wesley Professional, 3rd edition, 2005.

    Google Scholar 

  4. T. Grotton. Combining content extraction heuristics: The combine system. In Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services, pages 591–595, 2008.

    Google Scholar 

  5. Reid Holmes, Robert J. Walker, and Gail C. Murphy. Strathcona example recommendation tool. In Michel Wermelinger and Harald Gall, editors, ESEC/SIGSOFT FSE, pages 237–240. ACM, 2005.

    Google Scholar 

  6. Erik Linstead, Sushil Bajracharya, Trung Ngo, Paul Rigor, Cristina Lopes, and Pierre Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18(2):300–336, 2009.

    Article  MathSciNet  Google Scholar 

  7. Merriam-Webster. Merriam-Webster’s 9th Collegiate Dictionary. Merriam-Webster. Springfield, MA, USA, 1992.

    Google Scholar 

  8. Michael McCandless, Erik Hatcher, and Otis Gospodnetić. Lucene in Action. Manning Publications, second edition, 2010.

    Google Scholar 

  9. M.F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.

    Article  Google Scholar 

  10. J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso. Rotation forest: A classifier ensemble method, 2006.

    Google Scholar 

  11. Susan Elliott Sim, Charles L. A. Clarke, and Richard C. Holt. Archetypal source code searches: A survey of software developers and maintainers. In Proceedings of the Sixth International Workshop on Program Comprehension, page 180, Los Alamitos, CA, 1998. IEEE Computer Society.

    Google Scholar 

  12. Jeffrey Stylos and Brad A. Myers. Mica: A web-search tool for finding api components and examples. In IEEE Symposium on Visual Languages and Human-Centric Computing, 2006. VL/HCC 2006, pages 195–202, Brighton, United Kingdom, 2006. IEEE.

    Google Scholar 

Download references

Acknowledgements

This material is based upon work supported by the NSF under Grant No. IIS-0846034 and by the UCI Summer Undergraduate Research Program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessary reflect the views of the NSF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phitchayaphong Tantikul .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Tantikul, P., Thompson, C.A., Gallardo-Valencia, R.E., Sim, S.E. (2013). Novel and Applied Algorithms in a Search Engine for Java Code Snippets. In: Sim, S.E., Gallardo-Valencia, R.E. (eds) Finding Source Code on the Web for Remix and Reuse. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6596-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-6596-6_14

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-6595-9

  • Online ISBN: 978-1-4614-6596-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics