Advertisement

Frontiers of Computer Science

, Volume 12, Issue 5, pp 923–938 | Cite as

Correlation-based software search by leveraging software term database

  • Zhixing Li
  • Gang Yin
  • Tao Wang
  • Yang Zhang
  • Yue Yu
  • Huaimin Wang
Research Article

Abstract

Internet-scale open source software (OSS) production in various communities generates abundant reusable resources for software developers. However, finding the desired and mature software with keyword queries from a considerable number of candidates, especially for the fresher, is a significant challenge because current search services often fail to understand the semantics of user queries. In this paper, we construct a software term database (STDB) by analyzing tagging data in Stack Overflow and propose a correlation-based software search (CBSS) approach that performs correlation retrieval based on the term relevance obtained from STDB. In addition, we design a novel ranking method to optimize the initial retrieval result. We explore four research questions in four experiments, respectively, to evaluate the effectiveness of the STDB and investigate the performance of the CBSS. The experiment results show that the proposed CBSS can effectively respond to keyword-based software searches and significantly outperforms other existing search services at finding mature software.

Keywords

software retrieval software term database open source software 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

The research was supported by the National Natural Science Foundation of China (Grant Nos. 61432020, 61303064, 61472430, 61502512) and National Grand R&D Plan (2016YFB1000805).

Supplementary material

11704_2017_6573_MOESM1_ESM.ppt (192 kb)
Supplementary material, approximately 195 KB.

References

  1. 1.
    Frakes WB, Kang K. Software reuse research: status and future. IEEE transactions on Software Engineering, 2005, 31(7): 529–536CrossRefGoogle Scholar
  2. 2.
    Yin G, Wang T, Wang H, Fan Q, Zhang Y, Yu Y, Yang C. OSSEAN: mining crowd wisdom in open source communities. In: Proceedings of IEEE Symposium on Service-oriented System Engineering. 2015, 367–371Google Scholar
  3. 3.
    Krueger C W. Software reuse. ACM Computing Surveys, 1992, 24(2): 131–183CrossRefGoogle Scholar
  4. 4.
    Ghezzi C, Jazayeri M, Mandrioli D. Fundamentals of Software Engineering. Beijing: China Electric Power Press, 2006zbMATHGoogle Scholar
  5. 5.
    Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies T. Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the International Conference on Software Engineering. 2013, 842–851Google Scholar
  6. 6.
    Chau M, Chen H. Comparison of three vertical search spiders. Computer, 2003, 36(5): 56–62CrossRefGoogle Scholar
  7. 7.
    Guha R, McCool R, Miller E. Semantic search. Bulletin of the American Society for Information Science & Technology, 2003, 36(1): 700–709Google Scholar
  8. 8.
    Howard M J, Gupta S, Pollock L, Vijay-Shanker K. Automatically mining software-based, semantically-similar words from comment-code mappings. In: Proceedings of the 10th Working Conference on Mining Software Repositories. 2013, 377–386Google Scholar
  9. 9.
    Yang J, Tan L. Swordnet: inferring semantically related words from software context. Empirical Software Engineering, 2014, 19(6): 161–170MathSciNetCrossRefGoogle Scholar
  10. 10.
    Wang S, Lo D, Jiang L. Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: Proceedings of IEEE International Conference on Software Maintenance. 2012, 604–607Google Scholar
  11. 11.
    Tian Y, Lo D, Lawall J. Automated construction of a software-specific word similarity database. In: proceedings of IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering. 2014, 44–53Google Scholar
  12. 12.
    Meij E, Balog K, Odijk D. Entity linking and retrieval for semantic search. In: Proceedings of ACM International Conference on Web Search and Data Mining. 2014, 683–684CrossRefGoogle Scholar
  13. 13.
    Rasolofo Y, Savoy J. Term proximity scoring for keyword-based retrieval systems. In: Proceedings of European Conference on Information Retrieval. 2003, 207–218Google Scholar
  14. 14.
    Widdows C, Duijnhouwer F. Open source maturity model. Cap Gemini Ernst & Young, 2003Google Scholar
  15. 15.
    Wasserman A I, PalM, Chan C. The business readiness rating: a framework for evaluating open source. EFOSS-Evaluation Framework for Open Source Software, 2006Google Scholar
  16. 16.
    Russo B, Damiani E, Hissam S, Lundell B, Succi G. Open Source Development, Communities and Quality. Springer US, 2008CrossRefGoogle Scholar
  17. 17.
    Yu Y, Wang H, Yin G, Wang T. Reviewer recommendation for pullrequests in GitHub: What can we learn from code review and bug assignment. Information and Software Technology, 2016, 74: 204–218CrossRefGoogle Scholar
  18. 18.
    Fan Q, Wang H, Yin G, Wang T. Ranking open source software based on crowd wisdom. In: Proceedings of IEEE International Conference on Software Engineering and Service Science. 2015, 966–972Google Scholar
  19. 19.
    Zhang Y, Yin G, Wang T, Yu Y, Wang H. Evaluating bug severity using crowd-based knowledge: an exploratory study. In: Proceedings of the 7th Asia-Pacific Symposium on Internetware. 2015Google Scholar
  20. 20.
    Bhat V, Gokhale A, Jadhav R, Pudipeddi J, Akoglu L. Min(e)d your tags: analysis of question response time in stackoverflow. In: Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2014, 328–335Google Scholar
  21. 21.
    Pal D, Mitra M, Bhattacharya S. Exploring query categorisation for query expansion: a study. Computer Science, 2015Google Scholar
  22. 22.
    Miller G A. Wordnet: a lexical database for English. Communications of the ACM, 1995, 38(11): 39–41CrossRefGoogle Scholar
  23. 23.
    Stanley C, Byrne M D. Predicting tags for stackoverflow posts. Proceedings of ICCM, 2013Google Scholar
  24. 24.
    Short L, Wong C, Zeng D. Tag recommendations in stackoverflow. San Francisco: Stanford University, 2014Google Scholar
  25. 25.
    Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. Computer Science, 2013Google Scholar
  26. 26.
    Jamieson S. Likert scales: how to (ab)use them. Medical Education, 2004, 38(38): 1217–1218CrossRefGoogle Scholar
  27. 27.
    Manning C D, Raghavan P, Tze H. Introduction to Information Retrieval. Beijing: Posts & Telecom Press, 2010Google Scholar
  28. 28.
    Aula A, Majaranta P, Räihä K J. Eye-tracking reveals the personal styles for search result evaluation. In: Proceedings of IFIP Conference on Human-Computer Interaction. 2005, 1058–1061Google Scholar
  29. 29.
    Hucka M, Graham M J. Software search is not a science, even among scientists. 2016, arXiv preprint arXiv:1605.02265Google Scholar
  30. 30.
    Bissyande T F, Thung F, Lo D, Jiang L, Reveillere L. Orion: a software project search engine with integrated diverse software artifacts. In: Proceedings of the International Conference on Engineering of Complex Computer Systems. 2013, 242–245Google Scholar
  31. 31.
    Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P P. Sourcerer: mining and searching Internet-scale software repositories. Data Mining and Knowledge Discovery, 2009, 18(2): 300–336MathSciNetCrossRefGoogle Scholar
  32. 32.
    Lu M, Sun X,Wang S, Lo D. Query expansion via wordnet for effective code search. In: Proceedings of IEEE International Conference on Software Analysis, Evolution and Reengineering. 2015, 545–549Google Scholar
  33. 33.
    Nie L, Jiang H, Ren Z, Sun Z, Li X. Query expansion based on crowd knowledge for code search. IEEE Transactions on Services Computing, 2016, 9(5): 771–783CrossRefGoogle Scholar
  34. 34.
    Lv F, Zhang H, Lou J, Wang S, Zhang D, Zhao J. Codehow: effective code search based on API understanding and extended boolean model(e). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering. 2015, 260–270Google Scholar
  35. 35.
    McMillan C, Grechanik M, Poshyvanyk D, Fu C, Xie Q. Exemplar: a source code search engine for finding highly relevant applications. IEEE Transactions on Software Engineering, 2012, 38(5): 1069–1087CrossRefGoogle Scholar
  36. 36.
    Sridhara G, Hill E, Pollock L, Vijay-Shanker K. Identifying word relations in software: a comparative study of semantic similarity tools. In: Proceedings of IEEE International Conference on Program Comprehension. 2008, 123–132Google Scholar
  37. 37.
    Wang S, Lo D, Jiang L. Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: Proceedings of IEEE International Conference on Software Maintenance. 2012, 604–607Google Scholar
  38. 38.
    Tian Y, Lo D, Lawall J. SEWordSim: software-specific word similarity database. In: Proceedings of the 36th ACM International Conference on Software Engineering. 2014, 568–571Google Scholar
  39. 39.
    Bhat V, Gokhale A, Jadhav R, Pudipeddi J, Akoglu L. Min(e)d your tags: analysis of question response time in stackoverflow. In: Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2014, 328–335Google Scholar
  40. 40.
    Wang S, Lo D, Vasilescu B, Serebrenik A. Entagrec: an enhanced tag recommendation system for software information sites. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution. 2014, 291–300Google Scholar
  41. 41.
    Mo W, Zhu J, Qian Z, Shen B. SOLinker: constructing semantic links between tags and URLs on StackOverflow. In: Proceedings of the 40th IEEE Annual Computer Software and Applications Conference. 2016, 582–591Google Scholar
  42. 42.
    Chen C, Gao S, Xing Z. Mining analogical libraries in Q&A discussions–incorporating relational and categorical knowledge into word embedding In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. 2016, 338–348Google Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Zhixing Li
    • 1
  • Gang Yin
    • 1
  • Tao Wang
    • 1
  • Yang Zhang
    • 1
  • Yue Yu
    • 1
  • Huaimin Wang
    • 1
  1. 1.National Laboratory for Parallel and Distributed Processing, College of ComputerNational University of Defense TechnologyChangshaChina

Personalised recommendations