Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Predicting software defect type using concept-based classification


Automatically predicting the defect type of a software defect from its description can significantly speed up and improve the software defect management process. A major challenge for the supervised learning based current approaches for this task is the need for labeled training data. Creating such data is an expensive and effort-intensive task requiring domain-specific expertise. In this paper, we propose to circumvent this problem by carrying out concept-based classification (CBC) of software defect reports with help of the Explicit Semantic Analysis (ESA) framework. We first create the concept-based representations of a software defect report and the defect types in the software defect classification scheme by projecting their textual descriptions into a concept-space spanned by the Wikipedia articles. Then, we compute the “semantic” similarity between these concept-based representations and assign the software defect type that has the highest similarity with the defect report. The proposed approach achieves accuracy comparable to the state-of-the-art semi-supervised and active learning approach for this task without requiring labeled training data. Additional advantages of the CBC approach are: (i) unlike the state-of-the-art, it does not need the source code used to fix a software defect, and (ii) it does not suffer from the class-imbalance problem faced by the supervised learning paradigm.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. 1.

    Note that Table 1 and Table 8 contain only the introductory definition snippets from the classification schemes. Their detailed descriptions along with contextual information and examples are available in IBM (2013a, b) and IEEE (2009).

  2. 2.

    The expert needs to refer to IBM(2013a, b) to get the detailed descriptions and understand the defect type classification scheme.

  3. 3.

  4. 4.

    Following the ESA terminology, we use “a concept” and “a Wikipedia article” interchangeably.

  5. 5.

  6. 6.

    Available from

  7. 7.

    Notion of a stub-article in Wikipedia:

  8. 8.

  9. 9.

    Mahout, the machine learning library,

  10. 10.

    Lucene, the search engine library

  11. 11.

    OpenNLP, the natural language processing library

  12. 12.

  13. 13.

  14. 14.

  15. 15.

  16. 16.

  17. 17.


  1. Alenezi M, Magel K, Banitaan S (2013) Efficient bug triaging using text mining. Journal of Software 8(9):2185–2190

  2. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory, COLT ’92, pp 144–152.

  3. Bridge N, Miller C (1998) Orthogonal defect classification using defect data to improve software development. Software Quality 3(1):1–8

  4. Butcher M, Munro H, Kratschmer T (2002) Improving software testing via ODC: Three case studies. IBM Syst J 41(1):31–44

  5. Carrozza G, Pietrantuono R, Russo S (2015) Defect analysis in mission-critical software systems: a detailed investigation. Journal of Software: Evolution and Process 27(1):22–49

  6. Chawla NV, Japkowicz N, Kotcz A (2004) Edit: Special issue on learning from imbalanced data sets. SIGKDD Explorations Newsletter 6(1):1–6. 10.1145/1007730.1007733

  7. Chillarege R (1996) Orthogonal defect classification. Handbook of Software Reliability Engineering, pp 359–399

  8. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

  9. Cortes C, Vapnik V (1995) Support vector machine. Mach. Learn. 20(3):273–297.

  10. Čubranić D (2004) Automatic bug triage using text categorization. In: Proceedings of 16th international conference on software engineering & knowledge engineering (SEKE)

  11. Egozi O, Markovitch S, Gabrilovich E (2011) Concept-based information retrieval using explicit semantic analysis. ACM Trans. Inf. Syst. 29(2):8

  12. Ferschke O, Zesch T, Gurevych I (2011) Wikipedia revision toolkit: Efficiently accessing Wikipedia’s edit history. In: Proceedings of the ACL-HLT 2011 system demonstrations, association for computational linguistics, pp 97–102

  13. Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th intl. Joint conf. on artificial intelligence (IJCAI), vol 7, pp 1606–1611

  14. Gabrilovich E, Markovitch S (2009) Wikipedia-based semantic interpretation for natural language processing. J Artif Intell Res 34:443–498

  15. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM Sigmod record, vol 29. ACM, pp 1–12

  16. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

  17. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans on Knowledge and Data Engineering

  18. Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of 35th international conference on software engineering, pp 392–401

  19. Huang L, Ng V, Persing I, Geng R, Bai X, Tian J (2011) AutoODC: Automated generation of orthogonal defect classifications. In: Proceedings of 26th IEEE/ACM international conference on automated software engineering (ASE)

  20. Huang L, Ng V, Persing I, Chen M, Li Z, Geng R, Tian J (2015) AutoODC: Automated generation of orthogonal defect classifications. Automated Software Engineering Journal 22(1):3–46

  21. IBM (2013a) Orthogonal defect classification version 5.2 extensions for defects in GUI, user documentation, build and national language support (NLS)., (URL accessibility verified on 9th Nov., 2018)

  22. IBM (2013b) Orthogonal defect classification version 5.2 for software design and code., (URL accessibility verified on 9th Nov., 2018)

  23. IEEE (2009) IEEE standard 1044-2009 classification for software anomalies

  24. Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intelligent Data Analysis 6(5):429–449

  25. Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 1st edn. Prentice Hall PTR, Upper Saddle River

  26. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York

  27. Mellegård N, Staron M, Törner F (2012) A light-weight defect classification scheme for embedded automotive software and its initial evaluation. In: Proceedings of IEEE 23rd International Symp. on Software Reliability Engineering (ISSRE), pp 261–270

  28. Menzies T, Marcus A (2008) Automated severity assessment of software defect reports. In: IEEE international conference on software maintenance (ICSM), pp 346–355

  29. Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13, pp 522–531

  30. Patil S (2017) Concept based classification of software defect reports. In: Proceedings of 14th international conference on mining software repositories (MSR), IEEE/ACM

  31. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830

  32. Robertson S, Zaragoza H, et al (2009) The probabilistic relevance framework: BM25 and beyond. Foundations and Trends®; in Information Retrieval 3(4):333–389

  33. Robertson S E, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M, et al (1995) Okapi at TREC-3. NIST Special Publication Sp 109:109

  34. Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of 29th international conference on software engineering. IEEE Computer Society, pp 499–510

  35. Salton G, McGill M J (1986) Introduction to modern information retrieval. McGraw-Hill Inc, New York

  36. Silva N, Vieira M (2014) Experience report: orthogonal classification of safety critical issues. In: 2014 IEEE 25th international symposium on software reliability engineering. IEEE, pp 156–166

  37. Student (1908) The probable error of a mean. Biometrika 6(1):1–25.

  38. Thung F, Lo D, Jiang L (2012) Automatic defect categorization. In: Proceedings of 19th working conference on reverse engineering (WCRE). IEEE, pp 205–214

  39. Thung F, Le X-BD, Lo D (2015) Active semi-supervised defect categorization. In: Proceedings of IEEE 23rd international conference on program comprehension (ICPC), pp 60–70

  40. Vallespir D, Grazioli F, Herbert J (2009) A framework to evaluate defect taxonomies. In: Proceedings of XV Congreso Argentino de Ciencias de La Computación

  41. Wagner S (2008) Defect classification and defect types revisited. In: Proceedings of workshop on defects in large software systems. ACM, pp 39–40

  42. Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on software engineering, pp 461–470

  43. Xia X, Lo D, Wang X, Zhou B (2014) Automatic defect categorization based on fault triggering conditions. In: Proceedings of 19th international conference on engineering of complex computer systems (ICECCS). IEEE, pp 39–48

  44. Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans on pattern analysis and machine intelligence.

  45. Yang YY, Lee SC, Chung YA, Wu TE, Chen SA, Lin HT (2017) libact: Pool-based active learning in python. Tech. rep., National Taiwan University., available as arXiv:1710.00379

  46. Zaki MJ, Meira W Jr (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, Cambridge

  47. Zesch T, Müller C, Gurevych I (2008) Extracting lexical semantic knowledge from wikipedia and wiktionary. In: Proceedings of 6th International conference on language resources and evaluation (LREC), vol 8, pp 1646–1652

  48. Zhou Y, Tong Y, Gu R, Gall H (2016) Combining text mining and data mining for bug report classification. Journal of Software: Evolution and Process 28(3)

Download references

Author information

Correspondence to Sangameshwar Patil.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary, work-in-progress version of this work was presented as a short paper – “Concept based Classification of Software Defect Reports”, Sangameshwar Patil, Mining Software Repositories (MSR), 2017. This article is a significantly extended version of the short paper with new results and analysis.

Communicated by: Tim Menzies

Electronic supplementary material

Below is the link to the electronic supplementary material.

(XLSX 7.68 KB)

(XLSX 54.7 KB)


Appendix A: IEEE 1044-2009 Standard based Software Defect Type Classification Scheme

Table 8 The software defect type families based on the sample defect type classification scheme in Table A.1 (Annexure A) of IEEE 1044-2009 Standard (IEEE 2009)

Appendix B: Additional Figures for Experimental Results of RQ2

In this section, we provide the additional figures summarizing the experimental results of the RQ2 to analyze the effect of change in number of concepts (N) used on the coverage and accuracy of the concept-based classification (CBC) approach. The analysis of these results is already discussed in the Section 4.3.2.

B.1: RQ2 Results for Roundcube Dataset and IEEE-Based Classification Scheme

Fig. 6

Roundcube dataset and IEEE-based classification scheme: Effect of varying the number of concepts (N) in the concept-based representation on coverage

B.2: RQ2 Results for Roundcube Dataset and ODC-Based Classification Scheme

Fig. 7

Roundcube dataset and ODC-based classification scheme: Effect of varying the number of concepts (N) in the concept-based representation on coverage

B.3: RQ2 Results for Apache-Libs Dataset and IEEE-Based Classification Scheme

Fig. 8

Apache-Libs dataset and IEEE-based classification scheme: Effect of varying the number of concepts (N) in the concept-based representation on coverage

Appendix C: Dataset Annotation Details

The annotations for the Apache-Libs dataset by Thung et al. (2012) were done before the IBM ODC version 5.2 and its extensions (IBM2013a, b) were made available (12th Sept. 2013). This new version of IBM ODC v5.2 extensions (IBM 2013a) introduces additional defect types. It includes a new National Language Support (NLS) type of defect (i.e., “Problems encountered in the implementation of the product functions in languages other than English”). These changes in the ODC scheme could not have been considered by Thung et al. (2012). To account for the changes in the defect type families due to the IBM ODC v5.2 extensions (IBM 2013a) as well as to improve the robustness of this dataset as a benchmark, we re-annotated the dataset. The annotations were done by a software professional with multi-year experience in software design, development, testing, and debugging experience.

Out of the 500 defect type annotations in this dataset, there are 472 annotations which matched with the original annotations by Thung et al. (2012) and there are 28 annotation disagreements. There are 94.4% matching annotations with Thung et al. (2012) and the inter-annotator agreement with their original annotations using Cohen’s kappa statistic (Cohen 1960) is 90.02%. Note that this is a very high-level of inter-annotator agreement. The 28 annotations which differed with the original annotations were further reviewed and verified by another software professional with more than a decade’s hands-on experience in software development life-cycle. This review led to change in annotations of 2 defect reports (out of the 28 defect reports with differing annotations). These two annotations were analyzed in the discussions between the two annotators and the corrections were approved.

We make the annotated dataset available for research prupose as Supplementary Material along with the paper as well as on email request. The high-level of inter-annotator agreement (the 94.4% matching annotations and Cohen’s κ = 90.02%) as well as the explanatory comments for the few differing annotations make this dataset a high-quality benchmark for software defect type classification task. Table 5 shows the dataset statistics and the label distribution in the ground truth annotations. For other combinations of datasets and classification schemes used in this paper, the annotation process was similar. Details of inter-annotator agreement for annotations of other combinations of datasets and classification scheme are mentioned in Section 4.1.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Patil, S., Ravindran, B. Predicting software defect type using concept-based classification. Empir Software Eng (2020).

Download citation


  • Software defect classification
  • Software defect management
  • Natural language processing
  • Explicit semantic analysis
  • Orthogonal defect classification