Software Quality Journal

, Volume 19, Issue 3, pp 515–536 | Cite as

An industrial case study of classifier ensembles for locating software defects

  • Ayşe Tosun Mısırlı
  • Ayşe Başar Bener
  • Burak Turhan


As the application layer in embedded systems dominates over the hardware, ensuring software quality becomes a real challenge. Software testing is the most time-consuming and costly project phase, specifically in the embedded software domain. Misclassifying a safe code as defective increases the cost of projects, and hence leads to low margins. In this research, we present a defect prediction model based on an ensemble of classifiers. We have collaborated with an industrial partner from the embedded systems domain. We use our generic defect prediction models with data coming from embedded projects. The embedded systems domain is similar to mission critical software so that the goal is to catch as many defects as possible. Therefore, the expectation from a predictor is to get very high probability of detection (pd). On the other hand, most embedded systems in practice are commercial products, and companies would like to lower their costs to remain competitive in their market by keeping their false alarm (pf) rates as low as possible and improving their precision rates. In our experiments, we used data collected from our industry partners as well as publicly available data. Our results reveal that ensemble of classifiers significantly decreases pf down to 15% while increasing precision by 43% and hence, keeping balance rates at 74%. The cost-benefit analysis of the proposed model shows that it is enough to inspect 23% of the code on local datasets to detect around 70% of defects.


Defect prediction Ensemble of classifiers Static code attributes Embedded software 



This research is supported in part by Turkish State Planning Organization (DPT) under project number 2007K120610.


  1. Adrian, R. W., Branstad, A. M., & Cherniavsky, C. J. (1982). Validation, verification and testing of computer software. ACM Computing Surveys, 14(22), 159–192.CrossRefGoogle Scholar
  2. Alpaydın, E. (2004). Introduction to machine learning. Cambridge: The MIT Press.Google Scholar
  3. Amasaki, S., Takagi, Y., Mizuno, O., & Kikuno, T. (2005). Constructing a Bayesian belief network to predict final quality in embedded system development. IEICE Transactions on Information and Systems, 134, 1134–1141.Google Scholar
  4. Arisholm, E., & Briand, L. C. (2006). Predicting fault-prone components in a java legacy system. In ISESE ’06: Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering (pp. 8–17). ACM.Google Scholar
  5. Basili, V. R., Briand, L. C., & Melo, W. L. (1996). A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering. IEEE Press, 22, 751–761CrossRefGoogle Scholar
  6. Basili, V. R., McGarry, F. E., Pajerski, R., & Zelkowitz, M. V. (2002). Lessons learned from 25 years of process improvement: The rise and fall of the NASA software engineering laboratory. In ICSE ’02: proceedings of the 24th international conference on software engineering (pp. 69–79). ACM.Google Scholar
  7. Biffl, S., Halling, M., & Kszegi, S. (2003). Investigating the accuracy of defect estimation models for individuals and teams based on inspection data. In ISESE ’03: Proceedings of the 2003 international symposium on empirical software engineering (p. 232). IEEE Computer Society.Google Scholar
  8. Boetticher, G., Menzies, T., & Ostrand, T. J. (2007). The PROMISE repository of empirical software engineering data West Virginia University, Lane Department of Computer Science and Electrical Engineering.Google Scholar
  9. Brooks, F. P. (1995). The mythical man-month: Essays on software engineering. Reading: Anniversary Edition Addison-WesleyGoogle Scholar
  10. Demiroz, G., & Guvenir, H. A. (1997). Classification by voting feature intervals. In ECML ’97: Proceedings of the 9th European conference on machine learning (pp. 85–92). Springer.Google Scholar
  11. Fagan, M. (1976). Design and code inspections to reduce errors in program development. IBM Systems Journal, 15, 182–211.Google Scholar
  12. Fenton, N., Neil, M., Marsh, W., Hearty, P., Marquez, D., Krause, P. & Mishra, R. (2007). Predicting software defects in varying development lifecycles using Bayesian nets Information and Software Technology. Butterworth-Heinemann, 49, 32–43.Google Scholar
  13. Fenton, N., & Neil, M. (1999). A critique of software defect prediction models. IEEE Transactions on Software Engineering, 25(5), 675–689CrossRefGoogle Scholar
  14. Hall, M. A., & Holmes, G. (2003). Benchmarking attribute selection techniques for discrete class data mining IEEE transactions on knowledge and data engineering. IEEE Educational Activities Department, 15, 1437–1447.Google Scholar
  15. Heeger, D. (1998). Signal detection theory.Google Scholar
  16. IEEE Glossary of Software Engineering Terminology. (1990). ANSI/IEEE Standard 610.12 IEEE, New York.Google Scholar
  17. Jiang, Y., Cukic, B., & Menzies, T. (2008). Can data transformation help in the detection of fault-prone modules? In DEFECTS ’08: Proceedings of the 2008 workshop on defects in large software systems (pp. 16–20). ACM, New York.Google Scholar
  18. Kocaguneli, E., Tosun, A., Bener, A., Turhan, B., & Caglayan, B. (2009). Prest: An intelligent software metrics extraction. In Analysis and Defect Prediction Tool, SEKE’09: Proceedings of the 21st international conference on software engineering & knowledge engineering (SEKE’2009) (pp. 526–529). Boston, MA, USA, July 1–3.Google Scholar
  19. Lee, E. A. (2002). Embedded software, advances in computers 56. London: Academic Press.Google Scholar
  20. Kan, S. H. (2002). Metrics and models in software quality engineering. Reading: Addison-Wesley.Google Scholar
  21. Khoshgoftaar, T. M., & Szabo, R. M. (1996). Using neural networks to predict software faults during testing. IEEE Transactions on Reliability, 45, 456–462.CrossRefGoogle Scholar
  22. Khoshgoftaar, T. M., & Allen, E. B. (1999). Predicting fault-prone software modules in embedded systems with classification trees. In HASE ’99: The 4th IEEE international symposium on high-assurance systems engineering (p. 105). IEEE Computer Society.Google Scholar
  23. Khoshgoftaar, T. M., & Seliya, N. (2003). Fault prediction modeling for software quality estimation: Comparing commonly used techniques. Empirical Software Engineering, 8 255–283.Google Scholar
  24. Khoshgoftaar, T., & Seliya, N. (2004). The necessity of assuring quality in software measurement data. In METRICS ’04: Proceedings of the software metrics, 10th international symposium (pp. 119–130). IEEE Computer Society, Washington, DC, USA.Google Scholar
  25. Khoshgoftaar, T., Zhong, S., & Joshi, V. (2005). Enhancing software quality estimation using ensemble-classifier based noise filtering. Intelligent Data Analysis, 9, 3–27Google Scholar
  26. Khoshgoftaar, T. M., & Gao, K. (2006). Assessment of a multi-strategy classifier for an embedded software system. In ICTAI ’06: Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence (pp. 651–658). IEEE Computer Society.Google Scholar
  27. Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239.CrossRefGoogle Scholar
  28. Kocaguneli, E., Tosun, A., Bener, A., Turhan, B., Caglayan, B. (2009). Prest: An intelligent software metrics extraction. In Analysis and Defect Prediction Tool, SEKE 2009: Proceedings of the 21st international conference on software engineering and knowledge engineering (pp. 637–642).Google Scholar
  29. Koru, A. G., Zhang, D., El Emam, K., & Liu, H. (2009). An investigation into the functional form of the size-defect relationship for software modules. IEEE Transactions on Software Engineering, 35(2), 293–304CrossRefGoogle Scholar
  30. Kuncheva, L. I. (2004). Combining pattern classifiers: Methods and algorithms. Hoboken: Wiley-Interscience.zbMATHCrossRefGoogle Scholar
  31. Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4), 1–12.CrossRefGoogle Scholar
  32. Libralon, G., Carvalho, A., & Lorena, A. (2009). Ensembles of pre-processing techniques for noise detection in gene expression data. In Proceedings of the 15th international conference on advances in neuro-information processing (pp. 486–493). New Zealand.Google Scholar
  33. Li, Q., & Yao, C. (2003). Real-time concepts for embedded systems. San Francisco: CMP Books.Google Scholar
  34. Marchenko, A., & Abrahamsson, P. (2007). Predicting software defect density: A case study on automated static code analysis. In XP ’07: Proceedings of the International Conference on Agile Processes in Software Engineering and Extreme Programming (pp. 137–140). Springer.Google Scholar
  35. Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, IEEE Computer Society, 32(11), 2–13CrossRefGoogle Scholar
  36. Menzies, T., Dekhtyar, A., Distefano, J., & Greenwald, J. (2007). Problems with Precision: A response to comments on data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(9), 637–640.CrossRefGoogle Scholar
  37. Munson, J. C., & Khoshgoftaar, T. M. (1992). The Detection of Fault-Prone Programs. IEEE Transactions on Software Engineering, IEEE Press, 18, 423–433.CrossRefGoogle Scholar
  38. Ohlsson, N., & Wohlin, C. (1998). Experiences of fault data in a large software system. Failure and Lessons Learned in Information Technology Management, 2, 163–171.Google Scholar
  39. Oral, A. D., & Bener, A. (2007). Defect Prediction for Embedded Software. ISCIS ’07: Proceedings of the 22nd international symposium on computer and information sciences (pp. 1–6).Google Scholar
  40. Ostrand, T. J., Weyuker E. J., & Bell, R. M. (2005). Predicting the location and number of faults in large software systems. IEEE Transactions on Software Engineering, 31(4), 340–355.CrossRefGoogle Scholar
  41. Padberg, F., Ragg, T., & Schoknecht, R. (2004). Using machine learning for estimating the defect content after an inspection. IEEE Transactions on Software Engineering, IEEE Press, 30, 17–28.CrossRefGoogle Scholar
  42. Rombach, D., & Seelisch, F. (2008). Formalisms in software engineering: Myths Vs. empirical facts, CEE-SET’07. Lecture Notes in Computer Science (LNCS), 5082, 18–25.Google Scholar
  43. Runeson, P., Ohlsson, M. C., & Wohlin, C. (2001). A classification scheme for studies on fault-prone components. In PROFES ’01: Proceedings of the third international conference on product focused software process improvement (pp. 341–355). Springer, Berlin.Google Scholar
  44. Shull, F., Boehm, V. B., Brown, A., Costa, P., Lindvall, M., Port, D., Rus, I., Tesoriero, R., & Zelkowitz, M. (2002). What we have learned about fighting defects. In Proceedings of the eighth international software metrics symposium (pp. 249–258).Google Scholar
  45. Shull, F. J., Carver, J. C., Vegas, S., & Juristo, N. (2008). The role of replications in empirical software engineering. Empirical Software Engineering Journal, 13, 211–218.CrossRefGoogle Scholar
  46. Tosun, A., Turhan, B., & Bener, A. (2008). Ensemble of software defect predictors: A case study. In Proceedings of the 2nd international symposium on empirical software engineering and measurement (pp. 318–320).Google Scholar
  47. Tosun, A., Turhan, B., & Bener, A. (2009). Practical Considerations in Deploying AI for defect prediction: A case study within the Turkish telecommunication industry. In PROMISE’09: Proceedings of the first international conference on predictor models in software engineering. Vancouver, Canada.Google Scholar
  48. Twala, B., & Cartwright, M. (2010). Ensemble missing data techniques for software effort prediction. Intelligent Data Analysis, 14(3), 299–331.Google Scholar
  49. Twala, B., Cartwright, M., & Shepperd, M. (2006). Ensemble of missing data techniques to improve software prediction accuracy. Proceedings of International Conference on Software Engineering (pp. 909–912).Google Scholar
  50. Turhan, B., & Bener, A. (2008). Analysis of naive Bayes’ assumptions on software fault data: An empirical study. Data and Knowledge Engineering Journal, 68, 278–290.CrossRefGoogle Scholar
  51. Turhan, B., & Bener, A. (2007). Software defect prediction: Heuristics for weighted naive bayes. In Proceedings of the 2nd international conference on software and data technologies (ICSOFT’07) (pp. 244–249).Google Scholar
  52. Turhan, B., Menzies, T., Bener, A., & Distefano, J. (2009). On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering Journal, 14(5), 540–578.CrossRefGoogle Scholar
  53. Verbaeten, S., & Assche, A. V. (2003). Ensemble methods for noise elimination in classification problems. In Proceedings of the 4th international conference on multiple classifier systems (pp. 317–325). UK.Google Scholar
  54. Wohlin, C., Aurum, A., Petersson, H., Shull, F., & Ciolkowski, M. (2002). Software inspection benchmarking—A qualitative and quantitative comparative opportunity. In METRICS ’02: Proceedings of the 8th international symposium on software metrics (pp. 118–127). IEEE Computer Society.Google Scholar
  55. Xu, W., Qin, Z., Ji, L., & Chang, Y. (2009). A feature weighted ensemble classifier on stream data. In Proceedings of international conference on computational intelligence and software engineering (pp. 1–5). China.Google Scholar
  56. Zhang, H., & Zhang, X. (2007). Comments on data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(9), 635–637.zbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Ayşe Tosun Mısırlı
    • 1
  • Ayşe Başar Bener
    • 2
  • Burak Turhan
    • 3
  1. 1.Department of Computer EngineeringBoğaziçi UniversityBebek, IstanbulTurkey
  2. 2.Ted Rogers School of Information Technology ManagementRyerson UniversityTorontoCanada
  3. 3.Department of Information Processing ScienceUniversity of OuluOuluFinland

Personalised recommendations