Advertisement

How do developers utilize source code from stack overflow?

  • Yuhao Wu
  • Shaowei Wang
  • Cor-Paul Bezemer
  • Katsuro Inoue
Article
  • 222 Downloads

Abstract

Technical question and answer Q&A platforms, such as Stack Overflow, provide a platform for users to ask and answer questions about a wide variety of programming topics. These platforms accumulate a large amount of knowledge, including hundreds of thousands lines of source code. Developers can benefit from the source code that is attached to the questions and answers on Q&A platforms by copying or learning from (parts of) it. By understanding how developers utilize source code from Q&A platforms, we can provide insights for researchers which can be used to improve next-generation Q&A platforms to help developers reuse source code fast and easily. In this paper, we first conduct an exploratory study on 289 files from 182 open-source projects, which contain source code that has an explicit reference to a Stack Overflow post. Our goal is to understand how developers utilize code from Q&A platforms and to reveal barriers that may make code reuse more difficult. In 31.5% of the studied files, developers needed to modify source code from Stack Overflow to make it work in their own projects. The degree of required modification varied from simply renaming variables to rewriting the whole algorithm. Developers sometimes chose to implement an algorithm from scratch based on the descriptions from Stack Overflow answers, even if there was an implementation readily available in the post. In 35.5% of the studied files, developers used Stack Overflow posts as an information source for later reference. To further understand the barriers of reusing code and to obtain suggestions for improving the code reuse process on Q&A platforms, we conducted a survey with 453 open-source developers who are also on Stack Overflow. We found that the top 3 barriers that make it difficult for developers to reuse code from Stack Overflow are: (1) too much code modification required to fit in their projects, (2) incomprehensive code, and (3) low code quality. We summarized and analyzed all survey responses and we identified that developers suggest improvements for future Q&A platforms along the following dimensions: code quality, information enhancement & management, data organization, license, and the human factor. For instance, developers suggest to improve the code quality by adding an integrated validator that can test source code online, and an outdated code detection mechanism. Our findings can be used as a roadmap for researchers and developers to improve code reuse.

Notes

References

  1. Abdalkareem R, Shihab E, Rilling J (2017) What do developers use the crowd for? a study using Stack Overflow. IEEE Soft 34(2):53–60CrossRefGoogle Scholar
  2. Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA (2016) Mining duplicate questions in Stack Overflow. In: Proceedings of the 13th international conference on mining software repositories (MSR), pp 402–412Google Scholar
  3. Almeida DA, Murphy GC, Wilson G, Hoye M (2017) Do software developers understand open source licenses?. In: Proceedings of the 25th international conference on program comprehension (ICPC), pp 1–11. IEEEGoogle Scholar
  4. Alnusair A, Rawashdeh M, Hossain MA, Alhamid MF (2016) Utilizing semantic techniques for automatic code reuse in software repositories. In: Quality software through reuse and integration, pp 42–62. SpringerGoogle Scholar
  5. An L, Mlouki O, Khomh F, Antoniol G (2017) Stack Overflow: A code laundering platform?. In: Proceedings of the 24th IEEE international conference on software analysis, evolution, and reengineering (SANER), pp 283–293. IEEEGoogle Scholar
  6. Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2013) Steering user behavior with badges. In: Proceedings of the 22nd international conference on World Wide Web (WWW), pp 95–106. ACMGoogle Scholar
  7. Armaly A, McMillan C (2016) Pragmatic source code reuse via execution record and replay. J Soft Evolution Process 28(8):642–664CrossRefGoogle Scholar
  8. Atwood J (2009) Attribution required – Stack Overflow blog. https://stackoverflow.blog/2009/06/25/attribution-required/. (last visited: Aug 25, 2017)
  9. Azad S, Rigby PC, Guerrouj L (2017) Generating API call rules from version history and stack overflow posts. ACM Trans Softw Eng Methodol (TOSEM) 25(4):29CrossRefGoogle Scholar
  10. Bajracharya S, Ngo T, Linstead E, Dou Y, Rigor P, Baldi P, Lopes C (2006) Sourcerer: A search engine for open source code supporting structure-based search. In: Companion to the 21st ACM SIGPLAN symposium on object-oriented programming systems, languages, and applications (OOPSLA), pp 681–682. ACMGoogle Scholar
  11. Barzilay O (2011) Example embedding. In: Proceedings of the 10th SIGPLAN symposium on new ideas, new paradigms, and reflections on programming and software, Onward!, pp 137-144Google Scholar
  12. Bian J, Gao B, Liu T-Y (2014) Knowledge-powered deep learning for word embedding. Springer, Berlin, pp 132–148Google Scholar
  13. Cavusoglu H, Li Z, Huang K-W (2015) Can gamification motivate voluntary contributions?: The case of StackOverflow Q&A community. In: Proceedings of the 18th ACM conference companion on computer supported cooperative work & social computing, pp 171–174. ACMGoogle Scholar
  14. Chen C, Gao S, Xing Z (2016) Mining analogical libraries in Q&A discussions - incorporating relational and categorical knowledge into word embedding. In: IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), pp 338–348. IEEEGoogle Scholar
  15. Chen C, Xing Z, Wang X (2017) Unsupervised software-specific morphological forms inference from informal discussions. In: Proceedings of the 39th international conference on software engineering (ICSE), pp 450–461. IEEEGoogle Scholar
  16. Cottrell R, Walker RJ, Denzinger J (2008) Semi-automating small-scale source code reuse via structural correspondence. In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering (SIGSOFT), pp 214–225. ACMGoogle Scholar
  17. Feldthaus A, Møller A (2013) Semi-automatic rename refactoring for javascript. In: Proceedings of the 2013 ACM SIGPLAN international conference on object oriented programming systems languages & applications, vol 48, pp 323–338. ACMGoogle Scholar
  18. Galenson J, Reames P, Bodik R, Hartmann B, Sen K (2014) Codehint: Dynamic and interactive synthesis of code snippets. In: Proceedings of the 36th international conference on software engineering, ICSE, pp 653-663Google Scholar
  19. Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: Elements of reusable object-oriented software. Addison-Wesley Longman Publishing Co., Inc., BostonzbMATHGoogle Scholar
  20. Ganguly D, Roy D, Mitra M, Jones GJ (2015) Word embedding based generalized language model for information retrieval. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 795–798Google Scholar
  21. Gao Q, Zhang H, Wang J, Xiong Y, Zhang L, Mei H (2015) Fixing recurring crash bugs via analyzing Q&A sites. In: Proceedings of the 30th international conference on automated software engineering (ASE), pp 307–318Google Scholar
  22. Gharehyazie M, Ray B, Filkov V (2017) Some from here, some from there: Cross-project code reuse in github. In: Proceedings of the 14th international conference on mining software repositories, MSR ’17, pp 291–301Google Scholar
  23. Glaser B (2017) Discovery of grounded theory: Strategies for qualitative research. RoutledgeGoogle Scholar
  24. Gu X, Zhang H, Zhang D, Kim S (2016) Deep API learning. In: Proceedings of the 24th ACM SIGSOFT international symposium on foundations of software engineering (FSE), pp 631–642. ACMGoogle Scholar
  25. Gwet K et al (2002) Inter-rater reliability: dependency on trait prevalence and marginal homogeneity. Statistical Methods for Inter-Rater Reliability Assessment Series 2:1–9Google Scholar
  26. Hua L, Kim M, McKinley KS (2015) Does automated refactoring obviate systematic editing?. In: IEEE/ACM 37th IEEE international conference on software engineering (ICSE), vol 1, pp 392–402. IEEEGoogle Scholar
  27. Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014a) The promises and perils of mining GitHub. In: Proceedings of the 11th working conference on mining software repositories (MSR), pp 92–101. ACMGoogle Scholar
  28. Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014b) The promises and perils of mining GitHub. In: Proceedings of the 11th working conference on mining software repositories (MSR), pp 92–101Google Scholar
  29. Krumia (2014) Introduce an “obsolete answer” vote. https://meta.stackoverflow.com/questions/272651/introduce-an-obsolete-answer-vote,. (last visited: Aug 25)
  30. Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp 2267–2273. AAAI PressGoogle Scholar
  31. Liu P, Joty SR, Meng HM (2015) Fine-grained opinion mining with recurrent neural networks and word embeddings. In: Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP), pp 1433–1443. The Association for Computational LinguisticsGoogle Scholar
  32. Lv F, Zhang H, Lou J-G, Wang S, Zhang D, Zhao J (2015) CodeHow: Effective code search based on API understanding and extended boolean model. In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering (ASE), pp 260–270. IEEEGoogle Scholar
  33. McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: Finding relevant functions and their usage. In: Proceedings of the 33rd international conference on software engineering (ICSE), pp 111–120Google Scholar
  34. Meng N, Kim M, McKinley KS (2011) Systematic editing: Generating program transformations from an example. In: Proceedings of the 32nd ACM SIGPLAN conference on programming language design and implementation (PLDI), pages 329–342Google Scholar
  35. Meng N, Kim M, McKinley KS (2013) Lase: locating and applying systematic edits by learning from examples. In: Proceedings of the 2013 international conference on software engineering, pp 502–511. IEEEGoogle Scholar
  36. Nguyen AT, Nguyen TT, Nguyen HA, Tamrawi A, Nguyen HV, Al-Kofahi J, Nguyen TN (2012) Graph-based pattern-oriented, context-sensitive source code completion. In: Proceedings of the 34th international conference on software engineering (ICSE), pp 69–79Google Scholar
  37. Ponzanelli L, Bacchelli A, Lanza M (2013) Leveraging crowd knowledge for software comprehension and development. In: Proceedings of the 17th european conference on software maintenance and reengineering (CSMR), pp 57–66. IEEEGoogle Scholar
  38. Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014a) Mining stackoverflow to turn the IDE into a self-confident programming prompter. In: Proceedings of the 11th working conference on mining software repositories, pp 102–111. ACMGoogle Scholar
  39. Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014b) Prompter: A self-confident recommender system. In: ICSME, pp 577–580Google Scholar
  40. Ponzanelli L, Mocci A, Bacchelli A, Lanza M (2014c) Understanding and classifying the quality of technical forum questions. In: Proceedings of the 14th international conference on quality software (QSIC), pp 343–352Google Scholar
  41. Raychev V, Vechev M, Yahav E (2014) Code completion with statistical language models. In: Proceedings of the 35th ACM SIGPLAN conference on programming language design and implementation (PLDI), pp 419–428Google Scholar
  42. Reja U, Manfreda KL, Hlebec V, Vehovar V (2003) Open-ended vs. close-ended questions in web questionnaires. Developments in Applied Statistics (Metodološ,ki zvezki) 19:159–77Google Scholar
  43. Rigby PC, Robillard MP (2013) Discovering essential code elements in informal documentation. In: Proceedings of the 2013 international conference on software engineering (ICSE), pp 832–841. IEEEGoogle Scholar
  44. Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng (TSE) 25(4):557–572CrossRefGoogle Scholar
  45. Seaman CB, Shull F, Regardie M, Elbert D, Feldmann RL, Guo Y, Godfrey S (2008) Defect categorization: making use of a decade of widely varying historical data. In: Proceedings of the 2nd ACM-IEEE international symposium on Empirical software engineering and measurement, pp 149–157. ACMGoogle Scholar
  46. Searchcode (2016a) Searchcode - API. https://searchcode.com/api/. (last visited: Aug 25, 2017)
  47. Searchcode (2016b) Searchcode - Homepage. https://searchcode.com/. (last visited: Aug 25, 2017)
  48. Sillito J, Maurer F, Nasehi SM, Burns C (2012) What makes a good code example?: A study of programming Q&A in StackOverflow. In: Proceedings of the 2012 IEEE international conference on software maintenance (ICSM), pp 25–34Google Scholar
  49. Stack Exchange (2015) The MIT license — clarity on using code on stack overflow and stack exchange. https://meta.stackexchange.com/q/271080/337948,. (last visited: Aug 25, 2017)
  50. Stack Exchange (2017) All sites - Stack Exchange. https://stackexchange.com/sites,. (last visited: Aug 25, 2017)
  51. Stack Overflow (2014) Feedback requested: Runnable code snippets in questions and answers. https://meta.stackoverflow.com/questions/269753/feedback-requested-runnable-code-snippets-in-questions-and-answers. (last visited: Aug 25, 2017)
  52. Stack Overflow (2016) Stack Overflow developer survey results 2016. http://stackoverflow.com/research/developer-survey-2016,. (last visited: Aug 25, 2017)
  53. Stack Overflow (2017) Stack Overflow - Homepage. https://stackoverflow.com/,. (last visited: Aug 25, 2017)
  54. Treude C, Robillard MP (2016) Augmenting API documentation with insights from Stack Overflow. In: Proceedings of the 38th international conference on software engineering (ICSE), pp 392–403. ACMGoogle Scholar
  55. Treude C, Robillard MP (2017) Understanding stack overflow code fragments. In: 2017 IEEE international conference on software maintenance and evolution, ICSME 2017, Shanghai, China, September 17-22, pp 509-513Google Scholar
  56. Treude C, Barzilay O, Storey M-A (2011) How do programmers ask and answer questions on the web? (NIER track). In: Proceedings of the 33rd international conference on software engineering (ICSE), pp 804–807Google Scholar
  57. Vasilescu B, Filkov V, Serebrenik A (2013) StackOverflow and GitHub: Associations between software development and crowdsourced knowledge. In: Proceedings of 2013 international conference on social computing (SocialCom), pp 188–195. IEEEGoogle Scholar
  58. Wang H, Lu Y, Zhai C (2010) Latent aspect rating analysis on review text data: A rating regression approach. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 783–792Google Scholar
  59. Wang S, Lo D, Jiang L (2014a) Active code search: Incorporating user feedback to improve code search relevance. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering (ASE), pp 677–682Google Scholar
  60. Wang S, Lo D, Vasilescu B, Serebrenik A (2014b) EnTagRec: An enhanced tag recommendation system for software information sites. In: Proceedings of the 2014 IEEE international conference on software maintenance and evolution (ICSME), pp 291–300Google Scholar
  61. Wang S, Lo D, Jiang L (2016a) Autoquery: automatic construction of dependency queries for code search. Autom Softw Eng 23(3):393–425CrossRefGoogle Scholar
  62. Wang S, Lo D, Vasilescu B, Serebrenik A (2017a) EnTagRec ++: An enhanced tag recommendation system for software information sites. Empirical Software EngineeringGoogle Scholar
  63. Wang S, Chen T.-H., Hassan AE (2017b) Understanding the factors for fast answers in technical Q&A websites, Empirical Software Engineering, pp 1–42Google Scholar
  64. Wang X, Pollock LL, Vijay-Shanker K (2014c) Automatic segmentation of method code into meaningful blocks: Design and evaluation. J Soft Evolution Process 26(1):27–49CrossRefGoogle Scholar
  65. Wang X, Pollock LL, Vijay-Shanker K (2017) Automatically generating natural language descriptions for object-related statement sequences. In: IEEE 24th international conference on software analysis, evolution and reengineering, SANER 2017, Klagenfurt, Austria, February 20-24, pp 205–216Google Scholar
  66. Wang Y, Feng Y, Martins R, Kaushik A, Dillig I, Reiss SP (2016b) Hunter: next-generation code reuse for java. In: Proceedings of the 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 1028–1032. ACMGoogle Scholar
  67. Wang Z, Hamza W, Florian R (2017d) Bilateral multi-perspective matching for natural language sentences. arXiv:1702.03814
  68. Wong T-L, Lam W, Wong T-S (2008) An unsupervised framework for extracting and normalizing product attributes from multiple web sites. In: Proceedings of the 31st annual international acm sigir conference on research and development in information retrieval (SIGIR), pp 35–42Google Scholar
  69. Wu Y, Wang S, Bezemer C-P, Inoue K (2017) Online appendix of manuscript ”How Do Developers Utilize Source Code from Stack Overflow?”. https://zenodo.org/record/1116508
  70. Xia X, Bao L, Lo D, Kochhar PS, Hassan AE, Xing Z (2017) What do developers search for on the web? Empirical Software EngineeringGoogle Scholar
  71. Xin X, Lingfeng B, David L, Zhenchang X, Ahmed EH, Shanping L (2017) Measuring program comprehension: A large-scale field study with professionals. IEEE Trans Softw Eng (TSE) 99(26):1–1Google Scholar
  72. Yellin DM, Strom RE (1997) Protocol specifications and component adaptors. ACM Trans Program Lang Syst (TOPLAS) 19(2):292–333CrossRefGoogle Scholar
  73. Yin P, Neubig G (2017) A syntactic neural model for general-purpose code generation. arXiv:1704.01696
  74. Yu J, Zha Z-J, Wang M, Chua T-S (2011) Aspect ranking: Identifying important product aspects from online consumer reviews. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies - vol 1, pp 1496–1505Google Scholar
  75. Zagalsky A, German DM, Storey M-A, Teshima CG, Poo-Caamaño G (2017) How the R community creates and curates knowledge: an extended study of Stack Overflow and mailing lists. Empirical Software EngineeringGoogle Scholar
  76. Zhang WE, Sheng QZ, Lau JH, Abebe E (2017) Detecting duplicate posts in programming qa communities via latent semantics and association rules. In: Proceedings of the 26th international conference on World Wide Web (WWW), pp 1221–1229Google Scholar
  77. Zhang Y, Lo D, Xia X, Sun J-L (2015) Multi-factor duplicate question detection in Stack Overflow. J Comput Sci Technol 30(5):981–997CrossRefGoogle Scholar
  78. Zhao L, Li C (2009) Ontology based opinion mining for movie reviews. Springer, Berlin, pp 204–214Google Scholar
  79. Zhou P, Liu J, Yang Z, Zhou G (2017) Scalable tag recommendation for software information sites. In: Proceedings of the 24th international conference on software analysis, evolution and reengineering (SANER), pp 272–282. IEEEGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Graduate School of Information Science and TechnologyOsaka UniversitySuitaJapan
  2. 2.SAILQueen’s UniversityKingstonCanada

Personalised recommendations