On the unreliability of bug severity data

Tian, Yuan; Ali, Nasir; Lo, David; Hassan, Ahmed E.

doi:10.1007/s10664-015-9409-1

On the unreliability of bug severity data

Published: 27 October 2015

Volume 21, pages 2298–2323, (2016)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Yuan Tian¹,
Nasir Ali²,
David Lo¹ &
…
Ahmed E. Hassan³

992 Accesses
36 Citations
1 Altmetric
Explore all metrics

Abstract

Severity levels, e.g., critical and minor, of bugs are often used to prioritize development efforts. Prior research efforts have proposed approaches to automatically assign the severity label to a bug report. All prior efforts verify the accuracy of their approaches using human-assigned bug reports data that is stored in software repositories. However, all prior efforts assume that such human-assigned data is reliable. Hence a perfect automated approach should be able to assign the same severity label as in the repository – achieving a 100% accuracy. Looking at duplicate bug reports (i.e., reports referring to the same problem) from three open-source software systems (OpenOffice, Mozilla, and Eclipse), we find that around 51 % of the duplicate bug reports have inconsistent human-assigned severity labels even though they refer to the same software problem. While our results do indicate that duplicate bug reports have unreliable severity labels, we believe that they send warning signals about the reliability of the full bug severity data (i.e., including non-duplicate reports). Future research efforts should explore if our findings generalize to the full dataset. Moreover, they should factor in the unreliable nature of the bug severity data. Given the unreliable nature of the severity data, classical metrics to assess the accuracy of models/learners should not be used for assessing the accuracy of approaches for automated assigning severity label. Hence, we propose a new approach to assess the performance of such models. Our new assessment approach shows that current automated approaches perform well – 77-86 % agreement with human-assigned severity labels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

JIRA bug reports do not contain severity field but only priority field (which has the same meaning as the severity field for Bugzilla bug reports). Thus when we sent emails to Apache developers who are using JIRA, we replace the term “severity” with “priority”.
Stop words are words (like “a” and “the”) that do not carry much specific information.
The tf-idf is commonly used to reflect the importance of a word to a document in a collection of documents. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the collection of documents.

References

Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement?: A text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, CASCON ’08,. ACM, New York, pp 23:304–23:318
Google Scholar
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the 12th international conference on machine learning (ICML’95), pp 115–123
Espinha T, Zaidman A, Gross H G (2014) Web api growing pains: stories from client developers and their code. In: Software evolution week-IEEE conference on software maintenance, reengineering and reverse engineering (CSMR-WCRE), 2014. IEEE, pp 84–93
Hayes A F, Krippendorff K (2007) Answering the call for a standard reliability measure for coding data. Commun Method Meas 1(1):77–89
Article Google Scholar
Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13, pp 392–401
Huang S J, Lin C Y, Chiu N H et al (2006) Fuzzy decision tree approach for embedding risk assessment information into software cost estimation model. J Inf Sci Eng 22(2):297–313
Google Scholar
Joachims T (1998) Text categorization with suport vector machines: learning with many relevant features. In: Proceedings of the 10th European conference on machine learning, ECML ’98. Springer-Verlag, London, pp 137–142
Chapter Google Scholar
Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: Proceedings of the 33rd international conference on software engineering, ICSE ’11. ACM, New York, pp 481–490
Google Scholar
Krippendorff K (2003) Content analysis: an introduction to its methodology, 2nd edn. Sage Publications
Kocaguneli E, Menzies T, Bener AB, Keung JW (2012) Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans Software Eng 38 (2):425–438
Article Google Scholar
Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: 7th IEEE working conference on mining software repositories (MSR), 2010. IEEE, pp 1–10
Lamkanfi A, Demeyer S, Soetens QD, Verdonck T (2011) Comparing mining algorithms for predicting the severity of a reported bug. In: 15th European conference on software maintenance and reengineering (CSMR), 2011, pp 249–258
Lim H, Goel A (2006) Support vector machines for data modeling with software engineering applications. In: Pham H (ed) Springer handbook of engineering statistics. Springer, London, pp 1023–1037
Chapter Google Scholar
Menzies T, Marcus A (2008) Automated severity assessment of software defect reports. In: IEEE international conference on software maintenance, 2008. ICSM 2008. IEEE, pp 346–355
Mockus A (2008) Missing data in software engineering. In: Shull F, Singer J, Sjberg D (eds) Guide to advanced empirical software engineering. doi:10.1007/978-1-84800-044-5_7. Springer, London, pp 185–200
Chapter Google Scholar
Nguyen A, Nguyen T, Nguyen T, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: ASE. ACM, pp 70–79
Nguyen HA, Nguyen AT, Nguyen T (2013) Filtering noise in mixed-purpose fixing commits to improve defect prediction and localization
Nuseibeh B, Easterbrook S, Russo A (2000) Leveraging inconsistency in software development. Computer 33(4):24–29
Article Google Scholar
Ramler R, Himmelbauer J (2013) Noise in bug report data and the impact on defect prediction results. In: Joint conference of the 23rd international workshop on software measurement and the 2013 8th international conference on software process and product measurement (IWSM-MENSURA), 2013, pp 173–180
Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: ICSE. IEEE, pp 499–510
Shihab E, Ihara A, Kamei Y, Ibrahim WM, Ohira M, Adams B, Hassan AE, Ki Matsumoto (2013) Studying re-opened bugs in open source software. Empir Softw Eng 18(5):1005–1042
Article Google Scholar
Strike K, El Emam K, Madhavji N (2001) Software cost estimation with incomplete data. IEEE Trans Softw Eng 27(10):890–908
Article Google Scholar
Sun C, Lo D, Wang X, Jiang J, Khoo S (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: ICSE. ACM, pp 45–54
Thung F, Lo D, Jiang L, Rahman F, Devanbu PT et al (2012) When would this bug get reported? In: 28th IEEE International Conference on Software Maintenance (ICSM), 2012. IEEE, pp 420–429
Tian Y, Lo D, Sun C (2012a) Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In: 19th working conference on reverse engineering (WCRE), 2012. IEEE, pp 215–224
Tian Y, Sun C, Lo D (2012b) Improved duplicate bug report identification. In: 16th European conference on software maintenance and reengineering, CSMR 2012, Szeged, pp 385–390
Tian Y, Lo D, Xia X, Sun C (2014) Automated prediction of bug report priority using multi-factor analysis. Empir Softw Eng:1–30
Valdivia Garcia H, Shihab E (2014) Characterizing and predicting blocking bugs in open source projects. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 72–81
Vyas D, Fritz T, Shepherd D (2014) Bug reproduction: A collaborative practice within software maintenance activities. In: COOP 2014-Proceedings of the 11th international conference on the design of cooperative systems. Springer, Nice (France), pp 189–207
Google Scholar
Xia X, Lo D, Wen M, Shihab E, Zhou B (2014) An empirical study of bug report field reassignment. In: 2014 software evolution week - IEEE conference on software maintenance, reengineering, and reverse engineering, CSMR-WCRE 2014. Antwerp, Belgium, pp 174–183
Chapter Google Scholar
Zeller A (2013) Can we trust software repositories? In: Mnch J, Schmid K (eds) Perspectives on the future of software engineering. Springer, Berlin Heidelberg, pp 209–215
Chapter Google Scholar
Zhang F, Khomh F, Zou Y, Hassan AE (2012) An empirical study on factors impacting bug fixing time. In: 19th Working Conference on Reverse Engineering (WCRE), 2012. IEEE, pp 225–234
Zhang H, Gong L, Versteeg S (2013) Predicting bug-fixing time: an empirical study of commercial software projects. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 1042–1051
Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study of their impacts. Artif Intell Rev 22(3):177–210
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Systems, Singapore Management University, Singapore, Singapore
Yuan Tian & David Lo
Electrical and Computer Engineering, University of Waterloo, Waterloo, Canada
Nasir Ali
Software Analysis and Intelligence Lab (SAIL), Queen’s University, Kingston, Canada
Ahmed E. Hassan

Authors

Yuan Tian
View author publications
You can also search for this author in PubMed Google Scholar
Nasir Ali
View author publications
You can also search for this author in PubMed Google Scholar
David Lo
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed E. Hassan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuan Tian.

Additional information

Communicated by: Andreas Zeller

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tian, Y., Ali, N., Lo, D. et al. On the unreliability of bug severity data. Empir Software Eng 21, 2298–2323 (2016). https://doi.org/10.1007/s10664-015-9409-1

Download citation

Published: 27 October 2015
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10664-015-9409-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the unreliability of bug severity data

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

How different are different diff algorithms in Git?

Sampling in software engineering research: a critical review and guidelines

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the unreliability of bug severity data

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

How different are different diff algorithms in Git?

Sampling in software engineering research: a critical review and guidelines

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation