Skip to main content

An exploratory study on the repeatedly shared external links on Stack Overflow

Abstract

On Stack Overflow, users reuse 11,926,354 external links to share the resources hosted outside the Stack Overflow website. The external links connect to the existing programming-related knowledge and extend the crowdsourced knowledge on Stack Overflow. Some of the external links, so-called as repeated external links, can be shared for multiple times. We observe that 82.5% of the link sharing activities (i.e., sharing links in any question, answer, or comment) on Stack Overflow share external resources, and 57.0% of the occurrences of the external links are sharing the repeated external links. However, it is still unclear what types of external resources are repeatedly shared. To help users manage their knowledge, we wish to investigate the characteristics of the repeated external links in knowledge sharing on Stack Overflow. In this paper, we analyze the repeated external links on Stack Overflow. We observe that external links that point to the text resources (hosted in documentation websites, tutorial websites, etc.) are repeatedly shared the most. We observe that different users repeatedly share the same knowledge in the form of repeated external links, thus increasing the maintenance effort of knowledge (e.g., update invalid links in multiple posts). The repeated external links can bring risks to the software engineering process, as 1) the same users can repeatedly share the external links for the purpose of promotion, and 2) external links can point to webpages with an overload of information that makes it difficult for users to retrieve relevant information. Our findings provide insights to Stack Overflow moderators and researchers. For example, we encourage Stack Overflow to centrally manage the commonly occurring knowledge in the form of repeated external links in order to better maintain the crowdsourced knowledge on Stack Overflow.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    https://meta.stackoverflow.com/q/358992/

  2. 2.

    https://stackoverflow.com/help/how-to-ask

  3. 3.

    https://stackoverflow.com/help/minimal-reproducible-example

  4. 4.

    https://stackoverflow.com/help/formatting

  5. 5.

    https://stackoverflow.com/editing-help#code

  6. 6.

    https://stackoverflow.com/questions/28207373/

  7. 7.

    https://github.com/jersey/jersey/blob/master/examples/https-clientserver-grizzly/src/main/java/org/glassfish/jersey/examples/h-ttpsclientservergrizzly/SecurityFilter.java

  8. 8.

    https://stackoverflow.com/editing-help

  9. 9.

    https://meta.stackoverflow.com/q/252811/

  10. 10.

    For example, we consider docs.oracle.com and www.oracle.com are different websites because they have different full domains.

  11. 11.

    https://zenodo.org/record/3255045#.XYWaMyh3iUk

  12. 12.

    http://en.wikipedia.org/wiki/Internal_link

  13. 13.

    https://en.wikipedia.org/wiki/Website

  14. 14.

    https://meta.stackexchange.com/q/90342

  15. 15.

    https://meta.stackexchange.com/questions/313790/i-stack-imgur-seems-to-be-down

  16. 16.

    https://meta.stackoverflow.com/questions/341016/is-it-ok-to-re-upload-externally-hosted-images-on-stack-overflows-imgur

  17. 17.

    https://stackoverflow.com/help/minimal-reproducible-example

  18. 18.

    https://stackoverflow.com/editing-help#code

  19. 19.

    https://meta.stackoverflow.com/questions/358992/ive-been-told-to-create-a-runnable-example-with-stack-snippets-how-do-i-dohttps://meta.stackoverflow.com/questions/358992/ive-been-told-to-create-a-runnable-example-with-stack-snippets-how-do-i-do

  20. 20.

    https://stackoverflow.com/q/28886508/

  21. 21.

    http://bugs.python.org/issue22942

  22. 22.

    https://stackoverflow.com/q/22021491/

  23. 23.

    https://stackoverflow.com/help/how-to-ask

  24. 24.

    https://www.google.com/#q=_IOWR_BAD+OR+_IOR_BAD+OR+_-IOW_BAD&safe=off

  25. 25.

    https://stackoverflow.com/q/22021491/22021641

  26. 26.

    https://www.google.com/search?q=_IOR_BAD+lkml

  27. 27.

    https://meta.stackexchange.com/q/176445

  28. 28.

    https://stackoverflow.com/q/2660914

  29. 29.

    http://goo.gl/b93ns

  30. 30.

    http://www.youtube.com/wa-tch?v=_CruQY55HOk

  31. 31.

    https://www.youtube.com/playlist?list=PL284C9FF2488-BC6D1

  32. 32.

    https://help.archive.org/hc/en-us/articles/360004716091-Wayback-Machine-General-Information

References

  1. An L, Mlouki O, Khomh F, Antoniol G (2017) Stack overflow: a code laundering platform?, IEEE

  2. Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Discovering value from community activity on focused question answering sites: A case study of Stack Overflow. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, New York, pp 850–858

  3. Bajaj K, Pattabiraman K, Mesbah A (2014) Mining questions asked by web developers. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014. ACM, New York, pp 112–121

  4. Baltes S, Dumani L, Treude C, Diehl S (2018) Sotorrent: reconstructing and analyzing the evolution of Stack Overflow posts. In: Proceedings of the 15th international conference on mining software repositories, MSR 2018. Gothenburg, Sweden, May 28-29, 2018, pp 319–330

  5. Baltes S, Treude C, Diehl S (2019) Sotorrent: Studying the origin, evolution, and usage of Stack Overflow code snippets. In: 2019 IEEE/ACM 16th international conference on mining software repositories, MSR. IEEE, pp 191–194

  6. Baltes S, Treude C, Robillard MP (2020) Contextual documentation referencing on Stack Overflow. IEEE Trans Softw Eng

  7. Barua A, Thomas SW, Hassan AE (2014) What are developers talking about? an analysis of topics and trends in Stack Overflow. Empir Softw Eng 19 (3):619–654

    Article  Google Scholar 

  8. Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing. Springer, pp 1–4

  9. Berners-Lee T, Fielding R, Masinter L (2005) Rfc 3986, uniform resource identifier (uri): Generic syntax, 2005. http://www.faqs.org/rfcs/rfc3986.html

  10. Cai L, Wang H, Huang Q, Xia X, Xing Z, Lo D (2019) Biker: a tool for bi-information source based api method recommendation. In: Dumas M, Pfahl D, Apel S, Russo A (eds) Proceedings of the 2019 27th ACM joint meeting - european software engineering conference and symposium on the foundations of software engineering. Association for Computing Machinery, pp 1075–1079

  11. Cavusoglu H, Li Z, Huang KW (2015) Can gamification motivate voluntary contributions?: The case of stackoverflow q&a community. In: Proceedings of the 18th ACM conference companion on computer supported cooperative work & social computing, CSCW’15. ACM, New York Companion, pp 171–174

  12. Chen C, Xing Z, Liu Y (2017) By the community & for the community: a deep learning approach to assist collaborative editing in q&a sites. ACM Proc Human-Comput Interact 1(CSCW). https://doi.org/10.1145/3134667

  13. Chen C, Chen X, Sun J, Xing Z, Li G (2018) Data-driven proactive policy assurance of post quality in community q&a sites. Proc ACM Hum-Comput Interact 2(CSCW):33:1–33:22

    Google Scholar 

  14. Chen F, Kim S (2015) Crowd debugging. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 320–332

  15. Chen M, Fischer F, Meng N, Wang X, Grossklags J (2019) How reliable is the crowdsourced knowledge of security implementation?. In: Proceedings of the 41st international conference on software engineering, pp 536–547

  16. Correa D, Sureka A (2013) Integrating issue tracking systems with community-based question and answering websites. In: 2013 22nd Australian software engineering conference. IEEE, pp 88–96

  17. Dang V, Croft BW (2010) Query reformulation using anchor text. In: Proceedings of the third ACM international conference on web search and data mining, WSDM ’10. Association for Computing Machinery, New York, p 41–50. https://doi.org/10.1145/1718487.1718493

  18. Gao S, Xing Z, Ma Y, Ye D, Lin S (2017) Enhancing knowledge sharing in Stack Overflow via automatic external web resources linking. In: 2017 22nd international conference on engineering of complex computer systems, pp 90–99

  19. Gómez C, Cleary B, Singer L (2013) A study of innovation diffusion through link sharing on Stack Overflow. In: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press

  20. Hanrahan BV, Convertino G, Nelson L (2012) Modeling problem difficulty and expertise in stackoverflow. In: Proceedings of the ACM 2012 conference on computer supported cooperative work companion, CSCW ’12. ACM, New York, pp 91–94

  21. Huang Q, Xia X, Xing Z, Lo D, Wang X (2018) Api method recommendation without worrying about the task-api knowledge gap. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, ASE 2018. ACM, New York, pp 293–304

  22. Li G, Zhu H, Lu T, Ding X, Gu N (2015) Is it good to be like wikipedia?: Exploring the trade-offs of introducing collaborative editing model to q&a sites. In: Conference on computer supported cooperative work, pp 1080–1091

  23. Linares-Vásquez M, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D (2014) How do api changes trigger Stack Overflow discussions? a study on the android sdk. In: Proceedings of the 22nd international conference on program comprehension, ICPC 2014. ACM, New York, pp 83–94

  24. Liu J, Xia X, Lo D, Zhang H, Zou Y, Hassan AE, Li S (2020) Broken external links on Stack Overflow. arXiv:201004892

  25. Liu J, Xia X, Lo D, Li S (2021) Characterizing and predicting fragile links on Stack Overflow. submitted to EMSE journal

  26. MacLeod L, Storey MA, Bergen A (2015) Code, camera, action: How software developers document and share program knowledge using youtube. In: Proceedings of the 2015 IEEE 23rd international conference on program comprehension. IEEE Press, pp 104–114

  27. Mamykina L, Manoim B, Mittal M, Hripcsak G, Hartmann B (2011) Design lessons from the fastest q&a site in the west. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’11. ACM, New York, pp 2857–2866

  28. Newman M (2005) Power laws, pareto distributions and zipf’s law. Contemp Phys 46(5):323–351. https://doi.org/10.1080/00107510500052444

    Article  Google Scholar 

  29. Pal A, Chang S, Konstan JA (2012) Evolution of experts in question answering communities. In: Sixth international AAAI conference on weblogs and social media

  30. Ponzanelli L, Bavota G, Mocci A, Di Penta M, Oliveto R, Hasan M, Russo B, Haiduc S, Lanza M (2016a) Too long; didn’t watch!: Extracting relevant fragments from software development video tutorials. In: Proceedings of the 38th international conference on software engineering, ICSE ’16. ACM, New York, pp 261–272

  31. Ponzanelli L, Bavota G, Mocci A, Di Penta M, Oliveto R, Russo B, Haiduc S, Lanza M (2016b) Codetube: Extracting relevant fragments from software development video tutorials. In: Proceedings of the 38th international conference on software engineering companionICSE ’16. ACM, New York, pp 645–648

  32. Ragkhitwetsagul C, Krinke J, Paixão M, Bianco G, Oliveto R (2018) Toxic code snippets on Stack Overflow. arXiv:1806.07659

  33. Rahman MM, Yeasmin S, Roy CK (2014) Towards a context-aware ide-based meta search engine for recommendation about programming errors and exceptions. In: 2014 software evolution week-ieee conference on software maintenance, reengineering, and reverse engineering. IEEE, pp 194–203

  34. Rath M, Rendall J, Guo JL, Cleland-Huang J, Mäder P (2018) Traceability in the wild: automatically augmenting incomplete trace links. In: 2018 IEEE/ACM 40th international conference on software engineering. IEEE

  35. Rosen C, Shihab E (2016) What are mobile developers asking about? a large scale study using Stack Overflow. Empir Softw Eng 21(3):1192–1223

    Article  Google Scholar 

  36. Saha RK, Saha AK, Perry DE (2013) Toward understanding the causes of unanswered questions in software information sites: A case study of Stack Overflow. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, ESEC/FSE 2013. ACM, New York. https://doi.org/10.1145/2491411.2494585, pp 663–666

  37. Spencer D (2009) Card sorting: Designing usable categories. Rosenfeld Media, New York

    Google Scholar 

  38. Viera AJ, Garrett JM (2005) Understanding interobserver agreement: The kappa statistic. Fam Med 37(5):360–363

    Google Scholar 

  39. Wang S, Lo D, Vasilescu B, Serebrenik A (2014) Entagrec: An enhanced tag recommendation system for software information sites. In: 2014 IEEE international conference on software maintenance and evolution. IEEE, pp 291–300

  40. Wang S, Chen THP, Hassan AE (2018) How do users revise answers on technical q&a websites? a case study on Stack Overflow. IEEE Trans Softw Eng

  41. Wang T, Yin G, Wang H, Yang C, Zou P (2015) Automatic knowledge sharing across communities: a case study on android issue tracker and Stack Overflow. In: 2015 IEEE symposium on service-oriented system engineering. IEEE, pp 107–116

  42. Wu Y, Wang S, Bezemer CP, Inoue K (2019) How do developers utilize source code from Stack Overflow? Empir Softw Eng 24(2):637–673

    Article  Google Scholar 

  43. Xia X, Bao L, Lo D, Kochhar PS, Hassan AE, Xing Z (2017) What do developers search for on the web? Empir Softw Eng 22(6):3149–3185

    Article  Google Scholar 

  44. Xu B, Ye D, Xing Z, Xia X, Chen G, Li S (2016) Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, ASE 2016. ACM, New York, pp 51–62

  45. Ye D, Xing Z, Kapre N (2017) The structure and dynamics of knowledge network in domain-specific q&a sites: a case study of Stack Overflow. Empir Softw Eng 22(1):375–406

    Article  Google Scholar 

  46. Zhang H, Wang S, Chen T, Hassan AE (2019a) Reading answers on Stack Overflow: Not enough! IEEE Trans Softw Eng :1–1

  47. Zhang H, Wang S, Chen TP, Zou Y, Hassan AE (2019b), An empirical study of obsolete answers on Stack Overflow. IEEE Trans Softw Eng. https://doi.org/10.1109/TSE.2019.2906315

Download references

Acknowledgments

This research was partially supported by the National Science Foundation of China (No. U20A20173), Key Research and Development Program of Zhejiang Province (No.2021C01014), and the National Research Foundation, Singapore under its Industry Alignment Fund – Prepositioning (IAF-PP) Funding Initiative. Any opinions, findings, and conclusions, or recommendations expressed in this material are those of the author(s) and do not reflect the views of Huawei and the National Research Foundation, Singapore.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Xin Xia.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by: Emerson Murphy-Hill

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Zhang, H., Xia, X. et al. An exploratory study on the repeatedly shared external links on Stack Overflow. Empir Software Eng 27, 19 (2022). https://doi.org/10.1007/s10664-021-10028-y

Download citation

Keywords

  • Knowledge sharing
  • Stack overflow
  • External link