Skip to main content
Log in

The reproducibility of programming-related issues in Stack Overflow questions

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Software developers often look for solutions to their code-level problems using the Stack Overflow Q&A website. To receive help, developers frequently submit questions that contain sample code segments along with the description of the programming issue. Unfortunately, it is not always possible to reproduce the issues from the code segments they provide. Issues that are not easily reproducible may impede questions from receiving prompt and appropriate solutions. We conducted an exploratory study on the reproducibility of issues discussed in 400 Java and 400 Python questions. We parsed, compiled, executed, and carefully examined the code segments from these questions to reproduce the reported programming issues, expending 300 person-hours of effort. The outcomes of our study are three-fold. First, we can reproduce the issues for approximately 68% of Java and 71% of Python code segments. In contrast, we were unable to reproduce approximately 22% of Java and 19% of Python issues. Of the reproducible issues, approximately 67% of the Java and 20% of the Python code segments required minor or major modifications to reproduce the issues. Second, we carefully investigated why programming issues could not be reproduced and provided evidence-based guidelines to write effective code examples for Stack Overflow questions. Third, we investigated the correlation between the issue reproducibility status of questions and the corresponding answer meta-data, such as the presence of an accepted answer. According to our analysis, a reproducible question has at least two times higher chance of receiving an accepted answer than an irreproducible question. Besides, the median time delay in receiving accepted answers is double if the issues reported in questions could not be reproduced. We also investigated the confounding factors (e.g., user reputation) that can affect questions receiving answers besides reproducibility. We found that such factors do not hurt the correlation between reproducibility status and answer meta-data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. https://www.eclipse.org

  2. https://netbeans.org

  3. http://javaparser.org

  4. https://www.mysql.com/products/workbench

  5. https://www.jetbrains.com/pycharm

  6. https://www.python.org

  7. http://www.reproducibility.org/wiki/Tutorial

  8. http://usegalaxy.org

References

  • Asaduzzaman M, Mashiyat AS, Roy CK, Schneider KA (2013) Answering questions about unanswered questions of stack overflow. In: Proceedings of MSR, pp 97–100

  • Boettiger C (2015) An introduction to docker for reproducible research. ACM SIGOPS Oper Syst Rev 49(1):71–79

    Article  Google Scholar 

  • Bosu A, Corley CS, Heaton D, Chatterji D, Carver JC, Kraft NA (2013) Building reputation in stackoverflow: an empirical investigation. In: 2013 10th Working conference on mining software repositories (MSR), pp 89–92

  • Buse RP, Weimer WR (2008) A metric for software readability. In: Proceedings of the ISSTA, pp 121–130

  • Buse RP, Weimer WR (2010) Learning a metric for code readability. TSE 36(4):546–558

    Google Scholar 

  • Calefato F, Lanubile F, Marasciulo MC, Novielli N (2015) Mining successful answers in stack overflow. In: Proceedings of the MSR, pp 430–433

  • Calefato F, Lanubile F, Novielli N (2018) How to ask for technical help? Evidence-based guidelines for writing questions on stack overflow. J Inf Softw Technol 94:186–207

    Article  Google Scholar 

  • Cito J, Gall HC (2016) Using docker containers to improve reproducibility in software engineering research. In: 2016 IEEE/ACM 38th international conference on software engineering companion (ICSE-c), pp 906–907

  • Cohen-Boulakia S, Belhajjame K, Collin O, Chopard J, Froidevaux C, Gaignard A, Hinsen K, Larmande P, Le Bras Y, Lemoine F et al (2017) Scientific workflows for computational reproducibility in the life sciences: status, challenges and opportunities. Futur Gener Comput Syst 75:284–298

    Article  Google Scholar 

  • Crick T, Hall BA, Ishtiaq S (2014) “Can i implement your algorithm?”: a model for reproducible research software. arXiv:14075981

  • Daka E, Campos J, Fraser G, Dorn J, Weimer W (2015) Modeling readability to improve unit tests. In: Proceedings of the FSE, pp 107–118

  • Dalle O (2012) On reproducibility and traceability of simulations. In: Proceedings of the 2012 Winter simulation conference (WSC), pp 1–12

  • de Oliveira Neto FG, Torkar R, Machado PD (2015) An initiative to improve reproducibility and empirical evaluation of software testing techniques. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 2, pp 575–578

  • Dit B, Moritz E, Linares-Vásquez M, Poshyvanyk D, Cleland-Huang J (2015) Supporting and accelerating reproducible empirical research in software evolution and maintenance using tracelab component library. Empir Softw Eng 20 (5):1198–1236

    Article  Google Scholar 

  • Duijn M, Kucera A, Bacchelli A (2015) Quality questions need quality code: classifying code fragments on stack overflow. In: Proceedings of the MSR, pp 410–413

  • Ebert F, Castor F, Novielli N, Serebrenik A (2019) Confusion in code reviews: reasons, impacts, and coping strategies. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER)

  • Erfani Joorabchi M, Mirzaaghaei M, Mesbah A (2014) Works for me! Characterizing non-reproducible bug reports. In: Proceedings of the 11th working conference on mining software repositories, pp 62–71

  • Exchange S (2019) StackExchage API. http://data.stackexchange.com/stackoverflow

  • Exchange S (2021) Accessed on: August 2021. How does reputation work? https://meta.stackexchange.com/questions/7237/how-does-reputation-work

  • Fazzini M, Prammer M, d’Amorim M, Orso A (2018) Automatically translating bug reports into test cases for mobile apps. In: Proceedings of the ISSTA, pp 141–152

  • Ferro N (2017) Reproducibility challenges in information retrieval evaluation. J Data Inf Qual (JDIQ) 8(2):1–4

    Article  Google Scholar 

  • Ford D, Lustig K, Banks J, Parnin C (2018) “We don’t do that here” how collaborative editing with mentors improves engagement in social q&a communities. In: Proceedings of the 2018 CHI conference on human factors in computing systems

  • Freire J, Bonnet P, Shasha D (2012) Computational reproducibility: state-of-the-art, challenges, and database research opportunities. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data, pp 593–596

  • Goecks J, Nekrutenko A, Taylor J, Team G et al (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11(8):R86

    Article  Google Scholar 

  • Grüning B, Chilton J, Köster J, Dale R, Soranzo N, van den Beek M, Goecks J, Backofen R, Nekrutenko A, Taylor J (2018) Practical computational reproducibility in the life sciences. Cell Syst 6(6):631–635

    Article  Google Scholar 

  • Horton E, Parnin C (2018) Gistable: evaluating the executability of python code snippets on github. In: Proceedings of the ICSME, pp 217–227

  • Horton E, Parnin C (2019) Dockerizeme: automatic inference of environment dependencies for python code snippets. In: 2019 IEEE/ACM 41st international conference on software engineering (ICSE)

  • Jimenez I, Sevilla M, Watkins N, Maltzahn C, Lofstead J, Mohror K, Arpaci-Dusseau A, Arpaci-Dusseau R (2017) The popper convention: making reproducible systems evaluation practical. In: 2017 IEEE International parallel and distributed processing symposium workshops (IPDPSW), pp 1561–1570

  • Liem C, Panichella A (2020) Run, forest, run? On randomization and reproducibility in predictive software engineering. arXiv:201208387

  • Lin J, Wu K (2008) Evaluation of software understandability based on fuzzy matrix. In: Proceedings of the WCCI, pp 887–892

  • Liu DM, Salganik MJ (2019) Successes and struggles with computational reproducibility: lessons from the fragile families challenge. Socius 5

  • Marwick B (2017) Computational reproducibility in archaeological research: basic principles and a case study of their implementation. J Archaeol Method Theory 24(2):424–450

    Article  Google Scholar 

  • McHugh ML (2013) The chi-square test of independence. BM 23 (2):143–149

    Google Scholar 

  • Mondal S, Rahman MM, Roy CK (2019) Can issues reported at stack overflow questions be reproduced?: an exploratory study. In: Proceedings of the 16th international conference on mining software repositories, pp 479–489

  • Mondal S, Rahman MM, Roy CK, Schneider K (2021a) The reproducibility of programming-related issues in stack overflow questions. https://bit.ly/2JqxabU

  • Mondal S, Saifullah CK, Bhattacharjee A, Rahman MM, Roy CK (2021b) Early detection and guidelines to improve unanswered questions on stack overflow. In: 14th Innovations in software engineering conference (formerly known as India software engineering conference)

  • Moran K, Linares-Vásquez M, Bernal-Cárdenas C, Vendome C, Poshyvanyk D (2016) Automatically discovering, reporting and reproducing android application crashes. In: Proceedings of the ICST, pp 33–44

  • Mu D, Cuevas A, Yang L, Hu H, Xing X, Mao B, Wang G (2018) Understanding the reproducibility of crowd-reported security vulnerabilities. In: 27th {USENIX} security symposium ({USENIX} security 18), pp 919–936

  • Overflow S (2021) (Accessed on: August 2021) What does it mean if a question is “closed”? https://stackoverflow.com/help/closed-questions

  • Playford CJ, Gayle V, Connelly R, Gray AJ (2016) Administrative social science data: the challenge of reproducible research. Big Data Soc 3 (2):2053951716684143

    Article  Google Scholar 

  • Poldrack RA, Poline JB (2015) The publication and reproducibility challenges of shared data. Trends Cognit Sci 19(2):59–61

    Article  Google Scholar 

  • Ponzanelli L, Mocci A, Bacchelli A, Lanza M (2014) Understanding and classifying the quality of technical forum questions. In: Proceedings of the QSIC, pp 343–352

  • Posnett D, Hindle A, Devanbu P (2011) A simpler model of software readability. In: Proceedings of the MSR, pp 73–82

  • Rahman MM, Roy CK (2015a) An insight into the unresolved questions at stack overflow. In: Proceedings of the MSR, pp 426–429

  • Rahman MM, Roy CK (2015b) An insight into the unresolved questions at stack overflow. In: Proceedings of the 12h working conference on mining software repositories, pp 426–429

  • Rahman MM, Khomh F, Castelluccio M (2020) Why are some bugs non-reproducible? An empirical investigation using data fusion. In: 2020 International conference on software maintenance and evolution (ICSME)

  • Rodríguez-Pérez G, Robles G, González-Barahona JM (2018) Reproducibility and credibility in empirical software engineering: a case study based on a systematic literature review of the use of the szz algorithm. Inf Softw Technol 99:164–176

    Article  Google Scholar 

  • Scalabrino S, Linares-Vásquez M, Poshyvanyk D, Oliveto R (2016) Improving code readability models with textual features. In: Proceedings of the ICPC, pp 1–10

  • Scalabrino S, Bavota G, Vendome C, Linares-Vásquez M, Poshyvanyk D, Oliveto R (2017) Automatically assessing code understandability: how far are we?. In: Proceedings of the ASE, pp 417–427

  • Scheitle Q, Wählisch M, Gasser O, Schmidt TC, Carle G (2017) Towards an ecosystem for reproducible research in computer networking. In: Proceedings of the reproducibility workshop, pp 5–8

  • Silvello G, Ferro N (2016) Data citation is coming. introduction to the special issue on data citation. Bull IEEE Tech Comm Digit Libr Special Issue on Data Citation 12(1):1–5

    Google Scholar 

  • Skeet J (2010) The golden rule: imagine you’re trying to answer the question. https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question

  • Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? ACM Sigsoft Softw Eng Notes 30(4):1–5

    Article  Google Scholar 

  • Soltani M, Panichella A, van Deursen A (2017) A guided genetic algorithm for automated crash reproduction. In: Proceedings of the ICSE, pp 209–220

  • Squire M, Funkhouser C (2014) “A bit of code”: how the stack overflow community creates quality postings. In: Proceedings of the HICSS, pp 1425–1434

  • Tahaei M, Vaniea K, Saphra N (2020) Understanding privacy-related questions on stack overflow. In: Proceedings of the 2020 CHI conference on human factors in computing systems, pp 1–14

  • Teran-Somohano A, Dayıbas O, Yilmaz L, Smith A (2014) Toward a model-driven engineering framework for reproducible simulation experiment lifecycle management. In: Proceedings of the Winter simulation conference 2014, IEEE, pp 2726–2737

  • Terragni V, Liu Y, Cheung SC (2016) Csnippex: automated synthesis of compilable code snippets from q&a sites. In: Proceedings of the 25th international symposium on software testing and analysis

  • Tian Y, Lo D, Lawall J (2014) Automated construction of a software-specific word similarity database. In: Proceedings of the CSMR-WCRE, pp 44–53

  • Treude C, Robillard MP (2017) Understanding stack overflow code fragments. In: Proceedings of the ICSME, pp 509–513

  • Treude C, Barzilay O, Storey M (2011) How do programmers ask and answer questions on the web?: Nier track. In: Proceedings of the ICSE, pp 804–807

  • Trockman A, Cates K, Mozina M, Nguyen T, Kästner C, Vasilescu B (2018) Automatically assessing code understandability reanalyzed: combined metrics matter. In: Proceedings of the MSR, pp 314–318

  • Walters WP (2013) Modeling, informatics, and the quest for reproducibility. J Chem Inf Model 53(7):1529–1530

    Article  Google Scholar 

  • Wang S, Chen T, Hassan AE (2018) Understanding the factors for fast answers in technical Q&A websites. ESE 23(3):1552–1593

    Google Scholar 

  • White M, Linares-Vásquez M, Johnson P, Bernal-Cárdenas C, Poshyvanyk D (2015) Generating reproducible and replayable bug reports from android application crashes. In: Proceedings of the ICPC, pp 48–59

  • Yang D, Hussain A, Lopes CV (2016) From query to usable code: an analysis of stack overflow code snippets. In: Proceedings of the MSR, pp 391–402

Download references

Acknowledgements

This research is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), Canada First Research Excellence Fund (CFREF) grant coordinated by the Global Institute for Food Security (GIFS), and Tenure-track Startup Fund, Dalhousie University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saikat Mondal.

Ethics declarations

Conflict of Interests

The authors have no conflict of interest.

Additional information

Communicated by: Bram Adams

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mondal, S., Rahman, M.M., Roy, C.K. et al. The reproducibility of programming-related issues in Stack Overflow questions. Empir Software Eng 27, 62 (2022). https://doi.org/10.1007/s10664-021-10113-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-10113-2

Keywords

Navigation