A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain
- 256 Downloads
Cost estimation and effort allocation are the key challenges for successful project planning and management in software development. Therefore, both industry and the research community have been working on various models and techniques to accurately predict the cost of projects. Recently, researchers have started debating whether the prediction performance depends on the structure of data rather than the models used. In this article, we focus on a new aspect of data homogeneity, “cross- versus within-application domain”, and investigate what kind of training data should be used for software cost estimation in the embedded systems domain. In addition, we try to find out the effect of training dataset size on the prediction performance. Based on our empirical results, we conclude that it is better to use cross-domain data for embedded software cost estimation and the optimum training data size depends on the method used.
KeywordsApplication domain Cost estimation Data homogeneity Embedded software Machine learning
This research is supported in part by Boğaziçi University research fund under grant number BAP 06HA104 and by Tubitak EEEAG 108E014.
- Albrecht, A. J. (1979). Measuring application development productivity. In Proceedings of the joint SHARE, GUIDE, and IBM application development symposium, Monterey, CL, October 14–17 (pp. 83–92). IBM Corporation.Google Scholar
- Alpaydin, E. (1998). Techniques for combining multiple learners. Proceedings of Engineering of Intelligent Systems, 2, 6–12.Google Scholar
- Alpaydin, E. (2004). Introduction to machine learning. Cambridge: MIT.Google Scholar
- Baskeles, B., Turhan, B., & Bener, A. (2007). Software effort estimation using machine learning methods. ISCIS, 2007, 1–6.Google Scholar
- Boehm, B. W. (1981). Software engineering economics. Advances in computer science and technology series. Borough: Prentice Hall PTR.Google Scholar
- Boehm, B. W. (1999). COCOMO II and COQUALMO data collection questionnaire. University of Southern California, Version 2.2.Google Scholar
- Boehm, B. W. (2009). COCOMO II model definition manual. University of Southern California, Version 1.4. http://sunset.usc.edu/research/.
- Boetticher, G. D. (2001). Using machine learning to predict project effort: Empirical case studies in data-starved domains. 1st International workshop on model-based requirements engineering, pp. 17–24.Google Scholar
- Boetticher, G., Menzies, T., & Ostrand, T. (2007). PROMISE repository of empirical software engineering data. West Virginia University, Department of Computer Science. http://promisedata.org/repository.
- Brierley, P. (2009). http://www.philbrierley.com/main.html?code/matlab.html&code/codeleft.html.
- EstimatorPal. (2009). http://software.techrepublic.com.com/download.aspx?docid=236622.
- Gunn, S. R. (1998). Support vector machines for classification and regression. Faculty of Engineering, Science and Mathematics, School of Electronics and Computer Science, Tech. Rep., May 1998 (online). Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9736.
- Igoodsoft. (2009). http://www.igoodsoft.com/sesdevelopment.asp.
- Leung, H., & Fan, Z. (2001). Software cost estimation. Handbook of software engineering and knowledge engineering. ftp://cs.pitt.edu/chang/handbook/42b.pdf.
- Mason, A. K. & Sweeney, N. (1992). Parametric cost estimating with limited sample sizes. In Proceedings of the 3rd annual artificial intelligence symposium.Google Scholar
- Ohsugi, N., Monden, A., Kikuchi, N., Barker, M. D., Tsunoda, M., Kakimoto, T., & Matsumoto, K. (2007). Is this cost estimate reliable?—The relationship between homogeneity of analogues and estimation reliability. In 1st International symposium on empirical software engineering and measurement, ESEM 2007.Google Scholar
- Oliveira, M. N., Martins, P. R. M., Barreto, R. S., & Carvalho, F. F. (2004). Towards a software power cost analysis framework using colored petri net. PATMOS 2004: International workshop on power and timing modeling, optimization and simulation, Santorini, Greece, Vol. 3254, pp. 362–371.Google Scholar
- Perel, R. J. (1994). Mold cost estimator generator utilizing standard data and linear regression. In Proceedings of the regional technical conference of the society of plastic engineers, pp. GI–G19.Google Scholar
- Premraj, R., & Zimmermann, T. (2007). Building software cost models using homogenous data. In ESEM ’07: Proceedings of the 1st empirical software engineering and measurement, Madrid, Spain, September 2007, IEEE, pp. 393–400.Google Scholar
- Ragan, D., Sandborn, P., & Stoaks, P. (2002). A detailed cost model for concurrent use with hardware/software co-design. DAC 2002, ACM, pp. 269–274.Google Scholar
- SCEP. (2009). Software cost estimation program. http://www.retisoft.com/Products.html.
- Shalabi, L. A., & Shaaban, Z. (2006). Normalization as a preprocessing engine for data mining and the approach of preference matrix. In IEEE proceedings of the international conference on dependability of computer systems (DEPCOS-RELCOMEX’06).Google Scholar
- Shepperd, M., Schofield, C., & Kitchenham, B. (1996). Effort estimation using analogy. 18th International conference on software engineering (ICSE'96), p. 170.Google Scholar
- Smola, A. J., & Schölkopf, B. (2003). A tutorial on support vector regression. NeuroCOLT Technical Report. http://eprints.pascal-network.org/archive/00002057/01/SmoSch03b.pdf.
- SoftLab. (2009). Software Research Laboratory. Department of Computer Engineering, Bogazici University. http://softlab.boun.edu.tr.
- Stensrud, E., & Myrtveit, I. (1998). Human performance estimating with analogy and regression models: An empirical validation. In Proceedings of 5th international metrics symposium. Bethesda, MD: IEEE Computer Society.Google Scholar
- Tiwari, V., Malik, S., & Wolfe, A. (1994). Power analysis of embedded software: A first step towards software power minimization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2(4), 437–445. doi: 10.1109/92.335012.
- Vahid, F., & Givargis, T. D. (2002). Embedded system design: A unified hardware/software introduction. New York: Wiley.Google Scholar
- Zotos, K., Litke, A., Chatzigeorgiou, A., Nikolaidis, S., Stephanides, G., & Giannakides (Greece), G. (2005). Energy complexity of software in embedded systems. From Proceeding (483) ACIT—Automation, Control, and Applications.Google Scholar