Software Effort Estimation Using Data Mining Techniques

  • Tirimula Rao Benala
  • Rajib Mall
  • P. Srikavya
  • M. Vani HariPriya
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 248)


This paper describes an empirical study undertaken to investigate the quantitative aspects of application of data mining techniques to build models for Software effort estimation. The techniques chosen are Multi linear regression, Logistic regression and CART.Empirical evaluation using three fold cross validation procedure has been carried out using three bench marking datasets of software projects, namely, Nasa93, Cocomo81, and Bailey Basili. We observed that: (1) CART technique is suitable for Nasa93 and Nasa93_5. (2). Multiple Linear Regression is suitable for Nasa93_2, Cocomo81s, Cocomo81o and Basili Bailey. (3). Logistic Regression is suitable for Nasa93_1, Cocomo81 and Cocomo81e. It is concluded that data mining techniques tend to help estimating in the best way possible as they are objective and are applicable to unlimited sets of data.


Software effort estimation CART Logistic regression 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bowie, D.K.: Using multi variate logistic regression analysis to predict black male student persistence at a predominately white institution. An approach investigating the relationship between engagement and persistence. PhD Dissertation. Louisiana State University and Agricultural and Mechanical College, USA (2006)Google Scholar
  2. 2.
    Saha, G.: Applying logistic regression model to the examination results data. Journal of Reliability and Statistical Studies 4(2), 1–13 (2011)Google Scholar
  3. 3.
    Dejaeger, K., Verbeke, W., Martens, D., Baesens, B.: Data Mining Techniques for Software Effort estimation: a Comparative study. IEEE Transactions on Software Engineering 1(1), 1–25 (2011)Google Scholar
  4. 4.
    Mendes, E.: Cost Estimation Techniques for Web Projects. IGI Publishing (2008)Google Scholar
  5. 5.
    Pickard, L., Kitchenham, B., Linkman, S.: Using simulated data sets to compare data analysis techniques used for software cost modeling. IEE Proceeding of Software 148(6), 165–174 (2001)CrossRefGoogle Scholar
  6. 6.
    Brown, S.H.: Multiple linear regression analysis: a matrix approach with matlab. Alabama Journal of Mathematics Spring/Fall, 1–3 (2009)Google Scholar
  7. 7.
    Tirimula Rao, B., Sameet, B., Kiran Swathi, G., Vikram Gupta, K., Raviteja, C., Sumana, S.: A Novel Neural Network approach for Software Cost Estimation Using Functional Link Artificial Neural Networks. International Journal of Computer Science and Network Security (IJCSNS) 9(6), 126–131 (2009)Google Scholar
  8. 8.
    Tirimula Rao, B., Dehuri, S., Mall, R.: Functional Link Artificial Neural Networks for Software Cost Estimation. International Journal of Applied Evolutionary Computation (IJAEC) 3(2), 62–82 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Tirimula Rao Benala
    • 1
  • Rajib Mall
    • 2
  • P. Srikavya
    • 3
  • M. Vani HariPriya
    • 3
  1. 1.Department of Information TechnologyJNTUK, University College of EngineeringVizianagaramIndia
  2. 2.Department of Computer Science and EngineeringIndian Institute of TechnologyKharagpurIndia
  3. 3.Department of Computer Science and EngineeringAnil Neerukonda Institute of Technology and SciencesVisakhapatnamIndia

Personalised recommendations