Skip to main content
Log in

On the unreliability of bug severity data

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Severity levels, e.g., critical and minor, of bugs are often used to prioritize development efforts. Prior research efforts have proposed approaches to automatically assign the severity label to a bug report. All prior efforts verify the accuracy of their approaches using human-assigned bug reports data that is stored in software repositories. However, all prior efforts assume that such human-assigned data is reliable. Hence a perfect automated approach should be able to assign the same severity label as in the repository – achieving a 100% accuracy. Looking at duplicate bug reports (i.e., reports referring to the same problem) from three open-source software systems (OpenOffice, Mozilla, and Eclipse), we find that around 51 % of the duplicate bug reports have inconsistent human-assigned severity labels even though they refer to the same software problem. While our results do indicate that duplicate bug reports have unreliable severity labels, we believe that they send warning signals about the reliability of the full bug severity data (i.e., including non-duplicate reports). Future research efforts should explore if our findings generalize to the full dataset. Moreover, they should factor in the unreliable nature of the bug severity data. Given the unreliable nature of the severity data, classical metrics to assess the accuracy of models/learners should not be used for assessing the accuracy of approaches for automated assigning severity label. Hence, we propose a new approach to assess the performance of such models. Our new assessment approach shows that current automated approaches perform well – 77-86 % agreement with human-assigned severity labels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. JIRA bug reports do not contain severity field but only priority field (which has the same meaning as the severity field for Bugzilla bug reports). Thus when we sent emails to Apache developers who are using JIRA, we replace the term “severity” with “priority”.

  2. Stop words are words (like “a” and “the”) that do not carry much specific information.

  3. The tf-idf is commonly used to reflect the importance of a word to a document in a collection of documents. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the collection of documents.

References

  • Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement?: A text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, CASCON ’08,. ACM, New York, pp 23:304–23:318

    Google Scholar 

  • Cohen WW (1995) Fast effective rule induction. In: Proceedings of the 12th international conference on machine learning (ICML’95), pp 115–123

  • Espinha T, Zaidman A, Gross H G (2014) Web api growing pains: stories from client developers and their code. In: Software evolution week-IEEE conference on software maintenance, reengineering and reverse engineering (CSMR-WCRE), 2014. IEEE, pp 84–93

  • Hayes A F, Krippendorff K (2007) Answering the call for a standard reliability measure for coding data. Commun Method Meas 1(1):77–89

    Article  Google Scholar 

  • Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13, pp 392–401

  • Huang S J, Lin C Y, Chiu N H et al (2006) Fuzzy decision tree approach for embedding risk assessment information into software cost estimation model. J Inf Sci Eng 22(2):297–313

    Google Scholar 

  • Joachims T (1998) Text categorization with suport vector machines: learning with many relevant features. In: Proceedings of the 10th European conference on machine learning, ECML ’98. Springer-Verlag, London, pp 137–142

    Chapter  Google Scholar 

  • Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: Proceedings of the 33rd international conference on software engineering, ICSE ’11. ACM, New York, pp 481–490

    Google Scholar 

  • Krippendorff K (2003) Content analysis: an introduction to its methodology, 2nd edn. Sage Publications

  • Kocaguneli E, Menzies T, Bener AB, Keung JW (2012) Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans Software Eng 38 (2):425–438

    Article  Google Scholar 

  • Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: 7th IEEE working conference on mining software repositories (MSR), 2010. IEEE, pp 1–10

  • Lamkanfi A, Demeyer S, Soetens QD, Verdonck T (2011) Comparing mining algorithms for predicting the severity of a reported bug. In: 15th European conference on software maintenance and reengineering (CSMR), 2011, pp 249–258

  • Lim H, Goel A (2006) Support vector machines for data modeling with software engineering applications. In: Pham H (ed) Springer handbook of engineering statistics. Springer, London, pp 1023–1037

    Chapter  Google Scholar 

  • Menzies T, Marcus A (2008) Automated severity assessment of software defect reports. In: IEEE international conference on software maintenance, 2008. ICSM 2008. IEEE, pp 346–355

  • Mockus A (2008) Missing data in software engineering. In: Shull F, Singer J, Sjberg D (eds) Guide to advanced empirical software engineering. doi:10.1007/978-1-84800-044-5_7. Springer, London, pp 185–200

    Chapter  Google Scholar 

  • Nguyen A, Nguyen T, Nguyen T, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: ASE. ACM, pp 70–79

  • Nguyen HA, Nguyen AT, Nguyen T (2013) Filtering noise in mixed-purpose fixing commits to improve defect prediction and localization

  • Nuseibeh B, Easterbrook S, Russo A (2000) Leveraging inconsistency in software development. Computer 33(4):24–29

    Article  Google Scholar 

  • Ramler R, Himmelbauer J (2013) Noise in bug report data and the impact on defect prediction results. In: Joint conference of the 23rd international workshop on software measurement and the 2013 8th international conference on software process and product measurement (IWSM-MENSURA), 2013, pp 173–180

  • Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: ICSE. IEEE, pp 499–510

  • Shihab E, Ihara A, Kamei Y, Ibrahim WM, Ohira M, Adams B, Hassan AE, Ki Matsumoto (2013) Studying re-opened bugs in open source software. Empir Softw Eng 18(5):1005–1042

    Article  Google Scholar 

  • Strike K, El Emam K, Madhavji N (2001) Software cost estimation with incomplete data. IEEE Trans Softw Eng 27(10):890–908

    Article  Google Scholar 

  • Sun C, Lo D, Wang X, Jiang J, Khoo S (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: ICSE. ACM, pp 45–54

  • Thung F, Lo D, Jiang L, Rahman F, Devanbu PT et al (2012) When would this bug get reported? In: 28th IEEE International Conference on Software Maintenance (ICSM), 2012. IEEE, pp 420–429

  • Tian Y, Lo D, Sun C (2012a) Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In: 19th working conference on reverse engineering (WCRE), 2012. IEEE, pp 215–224

  • Tian Y, Sun C, Lo D (2012b) Improved duplicate bug report identification. In: 16th European conference on software maintenance and reengineering, CSMR 2012, Szeged, pp 385–390

  • Tian Y, Lo D, Xia X, Sun C (2014) Automated prediction of bug report priority using multi-factor analysis. Empir Softw Eng:1–30

  • Valdivia Garcia H, Shihab E (2014) Characterizing and predicting blocking bugs in open source projects. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 72–81

  • Vyas D, Fritz T, Shepherd D (2014) Bug reproduction: A collaborative practice within software maintenance activities. In: COOP 2014-Proceedings of the 11th international conference on the design of cooperative systems. Springer, Nice (France), pp 189–207

    Google Scholar 

  • Xia X, Lo D, Wen M, Shihab E, Zhou B (2014) An empirical study of bug report field reassignment. In: 2014 software evolution week - IEEE conference on software maintenance, reengineering, and reverse engineering, CSMR-WCRE 2014. Antwerp, Belgium, pp 174–183

    Chapter  Google Scholar 

  • Zeller A (2013) Can we trust software repositories? In: Mnch J, Schmid K (eds) Perspectives on the future of software engineering. Springer, Berlin Heidelberg, pp 209–215

    Chapter  Google Scholar 

  • Zhang F, Khomh F, Zou Y, Hassan AE (2012) An empirical study on factors impacting bug fixing time. In: 19th Working Conference on Reverse Engineering (WCRE), 2012. IEEE, pp 225–234

  • Zhang H, Gong L, Versteeg S (2013) Predicting bug-fixing time: an empirical study of commercial software projects. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 1042–1051

  • Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study of their impacts. Artif Intell Rev 22(3):177–210

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan Tian.

Additional information

Communicated by: Andreas Zeller

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, Y., Ali, N., Lo, D. et al. On the unreliability of bug severity data. Empir Software Eng 21, 2298–2323 (2016). https://doi.org/10.1007/s10664-015-9409-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-015-9409-1

Keywords

Navigation