Skip to main content

A novel under sampling strategy for efficient software defect analysis of skewed distributed data


The software quality development process is a continuous process which starts by identifying a reliable fault detection technique. The implementation of the effective fault detection technique depends on the properties of the dataset in terms of domain information, characteristics of input data, complexity, etc. The early detection of defective modules provide more time for the developers to allocate resources effectively to deliver the quality software in time. The class imbalance nature of the software defect datasets indicates that the existing techniques are unsuccessful for identifying all the defective modules. Misclassification of the defective modules in the software engineering datasets invites unexpected loses to the software developers. To classify the class imbalance software datasets in an efficient way, we have proposed a novel approach called as under sampling strategy. This proposed approach uses under sampling strategy to reduce the less prominent instances from majority subset. The experimental results confirm that the proposed approach can deliver more accuracy in predicting the modules which are error prone with less and simple rules.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  • Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185.

    Article  MathSciNet  Google Scholar 

  • Alvarez JL, Mata (2004) J:Data mining for the management of software development process. Int J Softw Eng Knowl Eng 14:665

    Article  Google Scholar 

  • Anupama D, Kaberi D, Puthal B (2011) Improving software development process through data mining techniques embedding alitheia core tool. (IJCSIT). Int J Comput Sci Inf Technol 2(2):629–632

    Google Scholar 

  • Barnabé Lortie V, Bellinger C, Japkowicz N (2015) Active learning for OneClass classification. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), pp 390–395

  • Beatriz P, Oscar F, Noelia SM (2015) Selecting target concept in one-class classification for handling class imbalance problem. In: International joint conference on neural networks (IJCNN) 2015, July 12–July 17, pp 1–8

  • Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, Belmont

    MATH  Google Scholar 

  • Burak T, Gozde K, Ayse B (2009) Data mining source code for locating software bugs: a case study in telecommunication industry. Expert Syst Appl 36:9986–9990

    Article  Google Scholar 

  • Ceren S, Ahin G, Hasan S (2017) Automated refinement of models for model-based testing using exploratory testing. Softw Qual J.

    Article  Google Scholar 

  • de Jesus Rubio J (2009) SOFMLS: online self-organizing fuzzy modified least-squares network. Fuzzy Syst IEEE Trans 17:1296–1309.

    Article  Google Scholar 

  • de J Rubio, J (2018) Error convergence analysis of the SUFIN and CSUFIN. Appl Soft Comput 72:587–595.

    Article  Google Scholar 

  • de Jesus Rubio J, Lughofer E, Meda Campaña J, Paramo Carranza L, Francisco Novoa J, Pacheco J (2018) Neural network updating via argument Kalman filter for modeling of Takagi–Sugeno fuzzy models. J Intell Fuzzy Syst 35:2585–2596.

    Article  Google Scholar 

  • Haibo H, Edwardo AG (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  • Hall MA (1998) Correlation-based feature subset selection for machine learning. Hamilton

  • Lakshi T, Prasad Ch (2014) A study on classifying imbalanced datasets. In: Proc. international conference on networks & soft computing (ICNSC2014), pp 141–145

  • Lin C, Bin F, Zhaowei S, Yuanyan T (2018) Tackling class overlap and imbalance problems in software defect prediction. Softw Qual J.

    Article  Google Scholar 

  • Liu N, Woon WL, Aung Z, Afshari A (2014) Handling class imbalance in customer behavior prediction. In: Proc. 2014 IEEE international conference on collaboration technologies and systems, pp 100–103

  • Lov K, Rath SK (2017) Empirical validation for effectiveness of fault prediction technique based on cost analysis framework. Int J Syst Assur Eng Manag.

    Article  Google Scholar 

  • Lovedeep, Varinder KA (2014) Applications of data mining techniques in software engineering. Int J Electr Electron Comput Syst (IJEECS) 2(5, 6):2347–2820

    Google Scholar 

  • Maimon O, Rokach L (2010) Data mining and knowledge discovery handbook. Springer, Berlin

    Book  Google Scholar 

  • Meda Campaña J (2018) Estimation of complex systems with parametric uncertainties using a JSSF heuristically adjusted. IEEE Latin Am Trans 16:350–357.

    Article  Google Scholar 

  • Meda-Campaña JA, Grande-Meza A, de Jesús Rubio J, Tapia-Herrera R, Hernández-Cortés T, Curtidor-López A, Páramo-Carranza LA, Cázares-Ramírez IO (2018) Design of stabilizers and observers for a class of multivariable T–S fuzzy models on the basis of new interpolation functions. IEEE Trans Fuzzy Syst 26(5):2649–2662

    Article  Google Scholar 

  • Naheed A, Shazia U (2011) Defect prediction leads to high quality product. J Softw Eng Appl 4:639–645.

    Article  Google Scholar 

  • Padmabhushana D, Srikanth D (2012) Predicting software bugs using web log analysis techniques and naïve bayesian technique. Int J Comput Trends Technol 3(1):185–191

    Google Scholar 

  • Puneet JK, Pallavi M (2014) Data mining techniques for software defect prediction. Int J Softw Web Sci (IJSWS) 3:54–57

    Google Scholar 

  • Quinlan J (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  • Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers, Burlington

    Google Scholar 

  • Rao KN, Reddy ChS (2018) An efficient software defect analysis using correlation-based oversampling. Arab J Sci Eng.

    Article  Google Scholar 

  • Safia Y (2014) Software bug detection algorithm using data mining techniques. Int J Innov Res Adv Eng 1(5):105–108

    Google Scholar 

  • Shuhua L, Thomas F (2015) Text classification models for web content filtering and online safety. In: 2015 IEEE international conference on data mining workshop (ICDMW), pp 961–968

  • Shuo W, Xin Y (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443

    Google Scholar 

  • Wahidah H, Pey VL, Lee KN, Zhen LO (2011) Application of data mining techniques for improving software engineering. IN: ICIT 2011 the 5th international conference on information technology

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to K. Nitalaksheswara Rao.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rao, K.N., Reddy, C.S. A novel under sampling strategy for efficient software defect analysis of skewed distributed data. Evolving Systems 11, 119–131 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: