Empirical Software Engineering

, Volume 17, Issue 1–2, pp 62–74 | Cite as

On the dataset shift problem in software engineering prediction models

Article

Abstract

A core assumption of any prediction model is that test data distribution does not differ from training data distribution. Prediction models used in software engineering are no exception. In reality, this assumption can be violated in many ways resulting in inconsistent and non-transferrable observations across different cases. The goal of this paper is to explain the phenomena of conclusion instability through the dataset shift concept from software effort and fault prediction perspective. Different types of dataset shift are explained with examples from software engineering, and techniques for addressing associated problems are discussed. While dataset shifts in the form of sample selection bias and imbalanced data are well-known in software engineering research, understanding other types is relevant for possible interpretations of the non-transferable results across different sites and studies. Software engineering community should be aware of and account for the dataset shift related issues when evaluating the validity of research outcomes.

Keywords

Dataset shift Prediction models Effort estimation Defect prediction 

References

  1. Alpaydin E (2010) Introduction to machine learning, 2nd edn. The MIT Press, Cambridge, MAGoogle Scholar
  2. Bakır A, Turhan B, Bener A (2010) A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain. Softw Qual J 18(1):57–80CrossRefGoogle Scholar
  3. Bickel S, Brückner M, Scheffer T (2009) Discriminative learning under covariate shift. J Mach Learn Res 10:2137–2155MathSciNetGoogle Scholar
  4. Boehm B, Horowitz E, Madachy R, Reifer D, Clark BK, Steece B, Brown AW, Chulani S, Abts C (2000) Software cost estimation with Cocomo II. Prentice Hall, Englewood Cliffs, NJGoogle Scholar
  5. Briand L, Wust J (2002) Empirical studies of quality models in object-oriented systems. Adv Comput 56:97–166CrossRefGoogle Scholar
  6. Briand LC, Melo WL, Wust J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28:706–720CrossRefGoogle Scholar
  7. Candela JQ, Sugiyama M, Schwaighofer A, Lawrence ND (eds) (2009) Dataset shift in machine learning. The MIT Press, Cambridge, MAGoogle Scholar
  8. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58CrossRefGoogle Scholar
  9. Demirors O, Gencel C (2009) Conceptual association of functional size measurement methods. IEEE Softw 26(3):71–78CrossRefGoogle Scholar
  10. Drummond C, Holte RC (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn 65(1):95–130CrossRefGoogle Scholar
  11. Guo P, Lyu MR (2000) Software quality prediction using mixture models with EM algorithm. In: Proceedings of the the first Asia-Pacific conference on quality software (APAQS’00). IEEE Computer Society, Washington, DC, USA, pp 69–78Google Scholar
  12. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD explorations, vol 11/1Google Scholar
  13. Hand DJ (2006) Classifier technology and the illusion of progress. Stat Sci 21(1):1–15MathSciNetCrossRefGoogle Scholar
  14. Huang J, Smola AJ, Gretton A, Borgwardt KM, Schšlkopf B (2006) Correcting sample selection bias by unlabeled data. Neural Information Processing Systems, pp 601–608Google Scholar
  15. Jiang Y, Cukic B, Ma Y (2008a) Techniques for evaluating fault prediction models. Empir Soft Eng 13(5):561–595CrossRefGoogle Scholar
  16. Jiang Y, Cukic B, Menzies T (2008b) Cost curve evaluation of fault prediction models. In: Proceedings of the 19th int’l symposium on software reliability engineering (ISSRE 2008), Redmond, WA, pp 197–206Google Scholar
  17. Keung JW, Kitchenham BA, Jeffery DR (2008) Analogy-X: providing statistical inference to analogy-based software cost estimation. IEEE Trans Softw Eng 34(4):471–484CrossRefGoogle Scholar
  18. Kitchenham BA, Mendes E, Travassos GH (2007) Cross versus within-company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33(5):316–329CrossRefGoogle Scholar
  19. Kocaguneli E, Menzies T (2011) How to find relevant data for effort estimation? In: Proceedings of the 5th ACM/IEEE international symposium on empirical software engineering and measurement (ESEM’11)Google Scholar
  20. Kocaguneli E, Gay G, Menzies T, Yang Y, Keung JW (2010) When to use data from other projects for effort estimation. In: Proceedings of the IEEE/ACM international conference on automated software engineering (ASE ’10). ACM, New York, pp 321–324CrossRefGoogle Scholar
  21. Lin J, Keogh E, Lonardi S, Lankford J, Nystrom DM (2004) Visually mining and monitoring massive time series. In: Proceedings of 10th ACM SIGKDD international conference on knowledge and data mining. ACM Press, pp 460–469Google Scholar
  22. Lokan C, Wright T, Hill PR, Stringer M (2001) Organizational benchmarking using the isbsg data repository. IEEE Softw 18:26–32CrossRefGoogle Scholar
  23. Menzies T, Jalali O, Hihn J, Baker D, Lum K (2010) Stable rankings for different effort models. Autom Softw Eng 17(4):409–437CrossRefGoogle Scholar
  24. Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proceedings of the 4th international workshop on predictor models in software engineering (PROMISE ’08). ACM, New York, pp 47–54CrossRefGoogle Scholar
  25. Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391CrossRefGoogle Scholar
  26. Premraj R, Zimmermann T (2007) Building software cost estimation models using homogenous data. In: Proceedings of the first international symposium on empirical software engineering and measurement (ESEM ’07). IEEE Computer Society, Washington, DC, USA, pp 393–400CrossRefGoogle Scholar
  27. Shepperd M, Kadoda G (2001) Comparing software prediction techniques using simulation. IEEE Trans Softw Eng 27(11):1014–1022CrossRefGoogle Scholar
  28. Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference 90(2):227–244MathSciNetCrossRefGoogle Scholar
  29. Storkey A (2009) When training and test sets are different: characterizing learning transfer. In: Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND (eds) Dataset shift in machine learning, chapter 1. The MIT Press, Cambridge, MA, pp 3–28Google Scholar
  30. Sugiyama M, Suzuki T, Nakajima S, Kashima H, von Bünau P, Kawanabe M (2008) Direct importance estimation for covariate shift adaptation. Ann Inst Stat Math 60(4):699–746CrossRefGoogle Scholar
  31. Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578CrossRefGoogle Scholar
  32. Wieczorek I, Ruhe M (2002) How valuable is company-specific data compared to multi-company data for software cost estimation? In: Proceedings of the 8th international symposium on software metrics (METRICS ’02). IEEE Computer Society, Washington, DC, USA, p 237CrossRefGoogle Scholar
  33. Zhang H, Sheng S (2004) Learning weighted naive Bayes with accurate ranking. In: Proceedings of the 4th IEEE international conference on data mining, pp 567–570Google Scholar
  34. Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACMGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of Information Processing ScienceUniversity of OuluOuluFinland

Personalised recommendations