Software Quality Journal

, Volume 18, Issue 1, pp 57–80 | Cite as

A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain

  • Ayşe Bakır
  • Burak Turhan
  • Ayşe B. Bener


Cost estimation and effort allocation are the key challenges for successful project planning and management in software development. Therefore, both industry and the research community have been working on various models and techniques to accurately predict the cost of projects. Recently, researchers have started debating whether the prediction performance depends on the structure of data rather than the models used. In this article, we focus on a new aspect of data homogeneity, “cross- versus within-application domain”, and investigate what kind of training data should be used for software cost estimation in the embedded systems domain. In addition, we try to find out the effect of training dataset size on the prediction performance. Based on our empirical results, we conclude that it is better to use cross-domain data for embedded software cost estimation and the optimum training data size depends on the method used.


Application domain Cost estimation Data homogeneity Embedded software Machine learning 



This research is supported in part by Boğaziçi University research fund under grant number BAP 06HA104 and by Tubitak EEEAG 108E014.


  1. Albrecht, A. J. (1979). Measuring application development productivity. In Proceedings of the joint SHARE, GUIDE, and IBM application development symposium, Monterey, CL, October 14–17 (pp. 83–92). IBM Corporation.Google Scholar
  2. Alpaydin, E. (1998). Techniques for combining multiple learners. Proceedings of Engineering of Intelligent Systems, 2, 6–12.Google Scholar
  3. Alpaydin, E. (2004). Introduction to machine learning. Cambridge: MIT.Google Scholar
  4. Angelis, L., & Stamelos, I. (2000). A simulation tool for efficient analogy based cost estimation. Journal of Empirical Software Engineering, 5(1), 35–68. doi: 10.1023/A:1009897800559.CrossRefGoogle Scholar
  5. Baskeles, B., Turhan, B., & Bener, A. (2007). Software effort estimation using machine learning methods. ISCIS, 2007, 1–6.Google Scholar
  6. Boehm, B. W. (1981). Software engineering economics. Advances in computer science and technology series. Borough: Prentice Hall PTR.Google Scholar
  7. Boehm, B. W. (1999). COCOMO II and COQUALMO data collection questionnaire. University of Southern California, Version 2.2.Google Scholar
  8. Boehm, B. W. (2009). COCOMO II model definition manual. University of Southern California, Version 1.4.
  9. Boetticher, G. D. (2001). Using machine learning to predict project effort: Empirical case studies in data-starved domains. 1st International workshop on model-based requirements engineering, pp. 17–24.Google Scholar
  10. Boetticher, G., Menzies, T., & Ostrand, T. (2007). PROMISE repository of empirical software engineering data. West Virginia University, Department of Computer Science.
  11. Briand, L. C., Basili, V. R., & Thomas, W. M. (1992). A pattern recognition approach for software engineering data analysis. IEEE Transactions on Software Engineering, 18(11), 931–942. doi: 10.1109/32.177363.CrossRefGoogle Scholar
  12. Debardelaben, J. A., Madisetti, V. K., & Gadient, A. J. (1997). Incorporating cost modeling in embedded-system design. IEEE Design & Test of Computers, 14(3), 24–35. doi: 10.1109/54.605989.CrossRefGoogle Scholar
  13. Fausett, L. (1994). Fundamentals of neural networks. Borough: Prentice Hall.zbMATHGoogle Scholar
  14. Foss, T., Stensrud, E., Kitchenham, B., & Myrtveit, I. (2003). A simulation study of the model evaluation criteria MMRE. IEEE Transactions on Software Engineering, 29(11), 985–995. doi: 10.1109/TSE.2003.1245300.CrossRefGoogle Scholar
  15. Gunn, S. R. (1998). Support vector machines for classification and regression. Faculty of Engineering, Science and Mathematics, School of Electronics and Computer Science, Tech. Rep., May 1998 (online). Available:
  16. Kitchenham, B. A., Mendes, E., & Travassos, G. H. (2007). Cross- vs. within-company cost estimation studies: A systematic review. IEEE Transactions on Software Engineering, 33(5), 316–329. doi: 10.1109/TSE.2007.1001.CrossRefGoogle Scholar
  17. Kitchenham, B. A., Pickard, L. M., MacDonell, S. G., & Shepperd, M. J. (2001). What accuracy statistics really measure. IEEE Proceedings-Software, 148(3), 81–85. doi: 10.1049/ip-sen:20010506.CrossRefGoogle Scholar
  18. Korte, M., & Port, D. (2008). Confidence in software cost estimation results. PROMISE, 2008, 63–70. doi: 10.1145/1370788.1370804.CrossRefGoogle Scholar
  19. Leung, H., & Fan, Z. (2001). Software cost estimation. Handbook of software engineering and knowledge engineering.
  20. Lokan, C., Wright, T., Hill, P. R., & Stringer, M. (2001). Organizational benchmarking using the ISBSG data repository. IEEE Software, 18(5), 26–32. doi: 10.1109/52.951491.CrossRefGoogle Scholar
  21. Mason, A. K. & Sweeney, N. (1992). Parametric cost estimating with limited sample sizes. In Proceedings of the 3rd annual artificial intelligence symposium.Google Scholar
  22. Menzies, T. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13. doi: 10.1109/TSE.2007.256941.CrossRefGoogle Scholar
  23. Menzies, T., Chen, Z., Hihn, J., & Lum, K. (2006). Selecting best practices for effort estimation. IEEE Transactions on Software Engineering, 32(11), 883–895. doi: 10.1109/TSE.2006.114.CrossRefGoogle Scholar
  24. Ohsugi, N., Monden, A., Kikuchi, N., Barker, M. D., Tsunoda, M., Kakimoto, T., & Matsumoto, K. (2007). Is this cost estimate reliable?—The relationship between homogeneity of analogues and estimation reliability. In 1st International symposium on empirical software engineering and measurement, ESEM 2007.Google Scholar
  25. Oliveira, M. N., Martins, P. R. M., Barreto, R. S., & Carvalho, F. F. (2004). Towards a software power cost analysis framework using colored petri net. PATMOS 2004: International workshop on power and timing modeling, optimization and simulation, Santorini, Greece, Vol. 3254, pp. 362–371.Google Scholar
  26. Perel, R. J. (1994). Mold cost estimator generator utilizing standard data and linear regression. In Proceedings of the regional technical conference of the society of plastic engineers, pp. GI–G19.Google Scholar
  27. Premraj, R., & Zimmermann, T. (2007). Building software cost models using homogenous data. In ESEM ’07: Proceedings of the 1st empirical software engineering and measurement, Madrid, Spain, September 2007, IEEE, pp. 393–400.Google Scholar
  28. Putnam, L. H. (1978). A general empirical solution to the macro software sizing and estimating problem. IEEE Transactions on Software Engineering, 4(4), 345–361. doi: 10.1109/TSE.1978.231521.Google Scholar
  29. Ragan, D., Sandborn, P., & Stoaks, P. (2002). A detailed cost model for concurrent use with hardware/software co-design. DAC 2002, ACM, pp. 269–274.Google Scholar
  30. SCEP. (2009). Software cost estimation program.
  31. Shalabi, L. A., & Shaaban, Z. (2006). Normalization as a preprocessing engine for data mining and the approach of preference matrix. In IEEE proceedings of the international conference on dependability of computer systems (DEPCOS-RELCOMEX’06).Google Scholar
  32. Shepperd, M., Schofield, C., & Kitchenham, B. (1996). Effort estimation using analogy. 18th International conference on software engineering (ICSE'96), p. 170.Google Scholar
  33. Smola, A. J., & Schölkopf, B. (2003). A tutorial on support vector regression. NeuroCOLT Technical Report.
  34. SoftLab. (2009). Software Research Laboratory. Department of Computer Engineering, Bogazici University.
  35. Srinivasan, K., & Fisher, D. (1995). Machine learning approaches to estimating software development effort. IEEE Transactions on Software Engineering, 21(2), 126–137. doi: 10.1109/32.345828.CrossRefGoogle Scholar
  36. Stensrud, E., & Myrtveit, I. (1998). Human performance estimating with analogy and regression models: An empirical validation. In Proceedings of 5th international metrics symposium. Bethesda, MD: IEEE Computer Society.Google Scholar
  37. Tiwari, V., Malik, S., & Wolfe, A. (1994). Power analysis of embedded software: A first step towards software power minimization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2(4), 437–445. doi:  10.1109/92.335012.
  38. Vahid, F., & Givargis, T. D. (2002). Embedded system design: A unified hardware/software introduction. New York: Wiley.Google Scholar
  39. Walston, C. E., & Felix, C. P. (1977). A method of programming measurement and estimation. IBM Systems Journal, 16(1), 54–73.CrossRefGoogle Scholar
  40. Zotos, K., Litke, A., Chatzigeorgiou, A., Nikolaidis, S., Stephanides, G., & Giannakides (Greece), G. (2005). Energy complexity of software in embedded systems. From Proceeding (483) ACIT—Automation, Control, and Applications.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Computer EngineeringBoğaziçi UniversityBebek, IstanbulTurkey
  2. 2.Software Engineering Group, Institute for Information TechnologyNational Research Council of CanadaOttawaCanada

Personalised recommendations