The software quality development process is a continuous process which starts by identifying a reliable fault detection technique. The implementation of the effective fault detection technique depends on the properties of the dataset in terms of domain information, characteristics of input data, complexity, etc. The early detection of defective modules provide more time for the developers to allocate resources effectively to deliver the quality software in time. The class imbalance nature of the software defect datasets indicates that the existing techniques are unsuccessful for identifying all the defective modules. Misclassification of the defective modules in the software engineering datasets invites unexpected loses to the software developers. To classify the class imbalance software datasets in an efficient way, we have proposed a novel approach called as under sampling strategy. This proposed approach uses under sampling strategy to reduce the less prominent instances from majority subset. The experimental results confirm that the proposed approach can deliver more accuracy in predicting the modules which are error prone with less and simple rules.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185. https://doi.org/10.1080/00031305.1992.10475879
Alvarez JL, Mata (2004) J:Data mining for the management of software development process. Int J Softw Eng Knowl Eng 14:665
Anupama D, Kaberi D, Puthal B (2011) Improving software development process through data mining techniques embedding alitheia core tool. (IJCSIT). Int J Comput Sci Inf Technol 2(2):629–632
Barnabé Lortie V, Bellinger C, Japkowicz N (2015) Active learning for OneClass classification. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), pp 390–395
Beatriz P, Oscar F, Noelia SM (2015) Selecting target concept in one-class classification for handling class imbalance problem. In: International joint conference on neural networks (IJCNN) 2015, July 12–July 17, pp 1–8
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, Belmont
Burak T, Gozde K, Ayse B (2009) Data mining source code for locating software bugs: a case study in telecommunication industry. Expert Syst Appl 36:9986–9990
Ceren S, Ahin G, Hasan S (2017) Automated refinement of models for model-based testing using exploratory testing. Softw Qual J. https://doi.org/10.1007/s11219-016-9338-2
de Jesus Rubio J (2009) SOFMLS: online self-organizing fuzzy modified least-squares network. Fuzzy Syst IEEE Trans 17:1296–1309. https://doi.org/10.1109/TFUZZ.2009.2029569
de J Rubio, J (2018) Error convergence analysis of the SUFIN and CSUFIN. Appl Soft Comput 72:587–595. https://doi.org/10.1016/j.asoc.2018.04.003
de Jesus Rubio J, Lughofer E, Meda Campaña J, Paramo Carranza L, Francisco Novoa J, Pacheco J (2018) Neural network updating via argument Kalman filter for modeling of Takagi–Sugeno fuzzy models. J Intell Fuzzy Syst 35:2585–2596. https://doi.org/10.3233/jifs-18425
Haibo H, Edwardo AG (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Hall MA (1998) Correlation-based feature subset selection for machine learning. Hamilton
Lakshi T, Prasad Ch (2014) A study on classifying imbalanced datasets. In: Proc. international conference on networks & soft computing (ICNSC2014), pp 141–145
Lin C, Bin F, Zhaowei S, Yuanyan T (2018) Tackling class overlap and imbalance problems in software defect prediction. Softw Qual J. https://doi.org/10.1007/s11219-016-9342-6
Liu N, Woon WL, Aung Z, Afshari A (2014) Handling class imbalance in customer behavior prediction. In: Proc. 2014 IEEE international conference on collaboration technologies and systems, pp 100–103
Lov K, Rath SK (2017) Empirical validation for effectiveness of fault prediction technique based on cost analysis framework. Int J Syst Assur Eng Manag. https://doi.org/10.1007/s13198-016-0566-4
Lovedeep, Varinder KA (2014) Applications of data mining techniques in software engineering. Int J Electr Electron Comput Syst (IJEECS) 2(5, 6):2347–2820
Maimon O, Rokach L (2010) Data mining and knowledge discovery handbook. Springer, Berlin
Meda Campaña J (2018) Estimation of complex systems with parametric uncertainties using a JSSF heuristically adjusted. IEEE Latin Am Trans 16:350–357. https://doi.org/10.1109/TLA.2018.8327386
Meda-Campaña JA, Grande-Meza A, de Jesús Rubio J, Tapia-Herrera R, Hernández-Cortés T, Curtidor-López A, Páramo-Carranza LA, Cázares-Ramírez IO (2018) Design of stabilizers and observers for a class of multivariable T–S fuzzy models on the basis of new interpolation functions. IEEE Trans Fuzzy Syst 26(5):2649–2662
Naheed A, Shazia U (2011) Defect prediction leads to high quality product. J Softw Eng Appl 4:639–645. https://doi.org/10.4236/jsea.2011.411075
Padmabhushana D, Srikanth D (2012) Predicting software bugs using web log analysis techniques and naïve bayesian technique. Int J Comput Trends Technol 3(1):185–191
Puneet JK, Pallavi M (2014) Data mining techniques for software defect prediction. Int J Softw Web Sci (IJSWS) 3:54–57
Quinlan J (1986) Induction of decision trees. Mach Learn 1(1):81–106
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers, Burlington
Rao KN, Reddy ChS (2018) An efficient software defect analysis using correlation-based oversampling. Arab J Sci Eng. https://doi.org/10.1007/s13369-018-3076-7
Safia Y (2014) Software bug detection algorithm using data mining techniques. Int J Innov Res Adv Eng 1(5):105–108
Shuhua L, Thomas F (2015) Text classification models for web content filtering and online safety. In: 2015 IEEE international conference on data mining workshop (ICDMW), pp 961–968
Shuo W, Xin Y (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443
Wahidah H, Pey VL, Lee KN, Zhen LO (2011) Application of data mining techniques for improving software engineering. IN: ICIT 2011 the 5th international conference on information technology
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Rao, K.N., Reddy, C.S. A novel under sampling strategy for efficient software defect analysis of skewed distributed data. Evolving Systems 11, 119–131 (2020). https://doi.org/10.1007/s12530-018-9261-9