Skip to main content
Log in

The bug report duplication problem: an exploratory study

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Duplicate bug report entries in bug trackers have a negative impact on software maintenance and evolution. This is due, among other factors, to the increased time spent on report analysis and validation, which in some cases takes over 20 min. Therefore, a considerable amount of time is lost in duplicate bug report analysis. In order to understand the possible factors that cause bug report duplication and its impact on software development, this paper presents an exploratory study in which bug tracking data from private and open source projects were analyzed. The results show, for example, that all projects we investigated had duplicate bug reports and a considerable amount of time was wasted by this duplication. Furthermore, features such as project lifetime, staff size, and the number of bug reports do not seem to be significant factors for duplication, while others, such as the submitters’ profile and the number of submitters, do seem to influence the bug report duplication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. In some environments, the term bug report is replaced by change request, request for enhancements, issue, ticket, or just bug.

  2. Recife center for advanced studies and systems. http://www.cesar.org.br.

  3. Stop words are words that do not improve the searches in information retrieval systems. Examples of stop words are the, of, should, themselves, etc. A comprehensive list of common stop words can be found in: http://www.ranks.nl/resources/stopwords.html.

  4. http://www.r-project.org/.

  5. This number was calculated by multiplying the average of bug reports per day by the average time spent with search and analysis of bug reports: (231.5 bugs × 12.5 min)/60–48 man-h.

  6. https://www.launchpad.net.

  7. http://www.mozilla.org/.

  8. A centroid, in this case, is a set of bug reports with high similarity among them.

References

  • Anvik, J., & Murphy, G. C. (2007). Determining implementation expertise from bug reports. In Proceedings of the fourth international workshop on mining soft. Repositories (MSR07). New York, NY: IEEE Press.

  • Anvik, J., Hiew, L., & Murphy, G. C. (2005). Coping with an open bug repository. In Proceedings of the 2005 OOPSLA workshop on eclipse technology eXchange (pp. 35–39). New York, NY: ACM Press. doi:10.1145/1117696.1117704.

  • Anvik, J., Hiew, L., & Murphy, G. C. (2006). Who should fix this bug? In Proceedings of the 28th international conference on software engineering (ICSE06) (pp. 361–370). New York, NY : ACM Press.

  • Basili, V., Selby, R., & Hutchens, D. (1986). Experimentation in software engineering. IEEE Transacations on Software Engineering, 12(7), 733–743.

    Article  Google Scholar 

  • Bettenburg, N., Just, S., Schröter, A., Weiss, C., Premraj, R., & Zimmermann, T. (2007). Quality of bug reports in eclipse. In Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange (eclipse07) (pp. 21–25). New York: ACM Press. doi:10.1145/1328279.1328284.

  • Bettenburg, N., Premraj, R., Zimmermann, T., & Kim, S. (2008a). Duplicate bug reports considered harmful? In Proceedings of the international conference on software maintenance (ICSM08) (pp. 337–345). New York: IEEE Press.

  • Bettenburg, N., Premraj, R., Zimmermann, T., & Kim, S. (2008b). Extracting structural information from bug reports. In Proceedings of the 2008 international workshop on mining software repositories (MSR08) (pp. 27–30). New York: ACM Press. doi:10.1145/1370750.1370757.

  • Canfora, G., & Cerulo, L. (2005). Impact analysis by mining software and change request repositories. In Proceedings of the 11th IEEE international software metrics symposium (METRICS05) (p. 29). Washington, DC: IEEE Press. doi:10.1109/METRICS.2005.28.

  • Canfora, G., & Cerulo, L. (2006). Supporting change request assignment in open source development. In Proceedings of the 2006 ACM symposium on applied computing (SAC06) (pp. 1767–1772). New York: ACM Press. doi:10.1145/1141277.1141693.

  • Card, S. K., Mackinlay, J. D., & Shneiderman, B. (1999). Readings in information visualization: Using vision to think. In C. Stuart, P. A. R. C. Xerox, & G. Jonathan (Eds.), The Morgan Kaufmann series in interactive technologies. MA, USA: Morgan Kaufmann.

  • Castro, M., Costa, M., & Martin, J. P. (2008). Better bug reporting with better privacy. In Proceedings of the 13th international conference on architectural support for programming languages and operating systems (ASPLOS XIII) (pp. 319–328). New York, NY: ACM Press. doi:10.1145/1346281.1346322.

  • Cavalcanti, Y. C., de Almeida, E. S., da Cunha, C. E. A., Lucrédio, D., & de Lemos Meira, S. R. (2010a). An initial study on the bug report duplication problem. In Proceedings of the 14th European conference on software maintenance and reengineering (CSMR’2010) (pp. 273–276). Madrid, Spain: IEEE.

  • Cavalcanti, Y. C., da Silveira Mota, P. A., de Almeida, E. S., Lucrédio, D., da Cunha, CEA., & de Lemos Meira, S. R. (2010b). One step more to understand the bug report duplication problem. In XXIV Simpósio Brasileiro de Engenharia de software (SBES2010), Salvador, Brazil.

  • da Cunha, C. E. A., Cavalcanti, Y. C., da Mota Silveira Neto, P. A., de Almeida, E. S., & de Lemos Meira, S. R. (2010). A visual bug report analysis and search tool. In Proceedings of the 22nd international conference on software engineering and knowledge engineering (SEKE2010) (pp. 742–747), San Franciso, CA.

  • D’Ambros, M., & Lanza, M. (2006). Software bugs and evolution: A visual approach to uncover their relationship. In Proceedings of the 10th European conference on software maintenance and reengineering (CSMR06) (pp. 229–238). New York: IEEE Press. doi:10.1109/CSMR.2006.51.

  • Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York, NY: Chapman and Hall/CRC.

    MATH  Google Scholar 

  • Feldman, R., & Sanger, J. (2007). The text mining handbook: Advanced approaches in analyzing unstructured data. Cambridge: Cambridge University Press.

    Google Scholar 

  • Fischer, M., Pinzger, M., & Gall, H. (2003a). Analyzing and relating bug report data for feature tracking. In Proceednings of the 10th working conference on reverse engineering (WCRE03) (pp. 90–99). Washington, DC: IEEE Press.

  • Fischer, M., Pinzger, M., & Gall, H. (2003b). Populating a release history database from version control and bug tracking systems. In Proceedings of the 19th international conference on software maintenance (ICSM03) (pp. 23–32). New York: IEEE Press. doi:10.1109/ICSM.2003.1235403.

  • Hiew, L. (2006). Assisted detection of duplicate bug reports. Master’s thesis, The University of British Columbia.

  • Jalbert, N., & Weimer, W. (2008). Automated duplicate detection for bug tracking systems. In Proceedings of the 38th annual IEEE/IFIP international conference on dependable systems and networks (DSN08) (pp. 52–61). New York: IEEE Press.

  • Jeong, G., Kim, S., & Zimmermann, T. (2009). Improving bug triage with bug tossing graphs. In Proceedings of the 7th joint meeting of the European software engineering confernce and the ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE2009).

  • Johnson, J. N., Dubois, P. F. (2003). Issue tracking. Computers in Science and Engineering, 5(6), 71–77.

    Article  Google Scholar 

  • Ko, A. J., Myers, B. A., & Chau, D. H. (2006). A linguistic analysis of how people describe software problems. In Proceedings of the visual languages and human-centric computing (VLHCC06) (pp. 127–134). Washington, DC: IEEE Press. doi:10.1109/VLHCC.2006.3.

  • Koponen, T., Lintula, H. (2006). Are the changes induced by the defect reports in the open source software maintenance? In H. R. Arabnia, & H. Reza (Eds.), Procedings of the 2006 international confernce on software engineering research (SERP06) (pp. 429–435). Nevada, USA: CSREA Press.

  • Lancaster, F. W. (1986). Vocabulary control for information retrieval (2nd ed.). AL, USA: Information Resources Press.

    Google Scholar 

  • Podgurski, A., Leon, D., Francis, P., Masri, W., Minch, M., Sun, J., & Wang, B. (2003). Automated support for classifying software failure reports. In Proceedings of the 25th international conference on software engineering (ICSE03) (pp. 465–475). Washington, DC: IEEE Press. doi:10.1109/ICSE.2003.1201224.

  • Runeson, P., Alexandersson, M., & Nyholm, O. (2007). Detection of duplicate defect reports using natural language processing. In Proceedings of the 29th international conference on software engineering (ICSE07) (pp. 499–510). New York: IEEE Press. doi:10.1109/ICSE.2007.32.

  • Sandusky, R. J., Gasser, L., & Ripoche, G. (2004). Bug report networks: Varieties, strategies, and impacts in a f/oss development community. In Proceedings of the 1st international workshop on mining software repositories (MSR04) (pp. 80–84). Waterloo: University of Waterloo.

  • Serrano, N., Ciordia, I. (2005). Bugzilla, itracker, and other bug trackers. IEEE Software, 22(2), 11–13.

    Article  Google Scholar 

  • Sommerville, I. (2007). Software engineering, (8th ed.). New York: Addison Wesley.

    MATH  Google Scholar 

  • Song, Q., Shepperd, M. J., Cartwright, M., & Mair, C. (2006). Software defect association mining and defect correction effort prediction. IEEE Transactions on Software Engineering, 32(2), 69–82. doi:10.1109/TSE.2006.1599417.

    Article  Google Scholar 

  • Wang, X., Zhang, L., Xie, T., Anvik, J., & Sun, J. (2008). An approach to detecting duplicate bug reports using natural language and execution information. In Proceedings of the 13th international conference on software engineering (ICSE08) (pp. 461–470). New York: ACM Press. doi:10.1145/1368088.1368151.

  • Weiss, C., Premraj, R., Zimmermann, T., & Zeller, A. (2007). How long will it take to fix this bug? In Proceedings of the fourth international workshop on mining software repositories (MSR07) (pp. 20–26). New York: IEEE Press. doi:10.1109/MSR.2007.13.

  • Wohlin, C., Runeson, P., Martin Höst, M. C. O., Regnell, B., & Wesslén, A. (2000). Experimentation in software engineering: An introduction the Kluwer internationational series in software engineering. MA, USA: Kluwer Academic Publishers.

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by the National Institute of Science and Technology for Software Engineering (INES http://www.ines.org.br), funded by CNPq and FACEPE, grants 573964/2008-4 and APQ-1037-1.03/08 and CNPq grants 305968/2010-6, 559997/2010-8, 474766/2010-1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yguaratã Cerqueira Cavalcanti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cavalcanti, Y.C., da Mota Silveira Neto, P.A., Lucrédio, D. et al. The bug report duplication problem: an exploratory study. Software Qual J 21, 39–66 (2013). https://doi.org/10.1007/s11219-011-9164-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-011-9164-5

Keywords

Navigation