Minority–Majority Mix mean Oversampling Technique: An Efficient Technique to Improve Classification of Imbalanced Data Sets

  • Sachin PatilEmail author
  • Shefali Sonavane
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1025)


The challenges related to handling of the gigantic imbalanced data volumes are incredible and has set a new trail for its efficient processing. The inventive prospects contained by these huge imbalanced data sets have posed a priority of concern in recent research avenues. The several applications handling imbalanced Big Data sets have noted significance for precise classification while determining unidentified values from these data sets. Traditional classifiers are not able to discourse the imbalance of class distribution among the data samples. A class having fewer samples indicates difficulty in learning, whereas it points to a notable drop in the performance. Recent studies demonstrate that the classifier independent set of oversampling techniques are more capable to efficiently handle the issues raised in imbalanced data sets. An enhanced oversampling technique, viz., Minority–Majority Mix mean Oversampling Technique (MMMmOT), improving classification performance is discussed in detail in this paper. An appropriate consideration of majority as well as minority samples is planned to generate the synthetic samples. The proposed technique is investigated encircling data sets mainly from the UCI repository over Apache Hadoop. Furthermore, the stimulus of maintaining the imbalance ratio with better oversampling instances from the generated pool is analyzed. The results of classification performance are recognized using standard parameters like F-Measure and area under the curve. The achieved experimental outcomes clearly exhibit the preeminence of the presented technique over the traditional techniques.


Oversampling Safe-level based Better learning Safe-level centered synthetic sampling Imbalanced data sets 


  1. 1.
    Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. J. Intelli. Data Analy. 6, 429–449 (2002)CrossRefGoogle Scholar
  2. 2.
    He, H., Garcia, E.: Learning from imbalanced data. J. Trans. Knowl. Data Engg. 21, 1263–1284 (2009). Scholar
  3. 3.
    Sun, Y., Wong, A., Kamel, M.: Classification of imbalanced data: a review. J. Patt. Recog. Artif. Intel. 23, 687–719 (2009). Scholar
  4. 4.
    Byoung-Jun, P., Oh, S., Pedrycz, W.: The design of polynomial function-based neural…network predictors for detection of software defects. J. Inform. Sci. 40–57 (2013).
  5. 5.
    López, V., Fernandez, A., Garcia, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. J. Inform. Sci. 250, 113–141 (2013). Scholar
  6. 6.
    Sara, R., Lopez, V., Benitez, J., Herrera, F.: On the use of MapReduce for imbalanced big data using Random Forest. J. Inform. Sci. 112–137 (2014).
  7. 7.
    Jiang, H., Chen, Y., Qiao, Z.: Scaling up mapreduce-based big data processing on multi-GPU systems. SpingerLink Clust. Comp. 18, 369–383 (2015). Scholar
  8. 8.
    Batista, G., Prati, R., Monard, M.: A study of the behaviour of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslet. Speci. Iss. Learn. from Imbal. Data. (6), 20–29 (2004).
  9. 9.
    Chawla, N., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Art. Int. Research 1(6), 321–357 (2002). Scholar
  10. 10.
    Han, H., Wang, W., Mao, B.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Proc. Internat. Conf. Int. Comp. 3644, 878–887 (2005). Scholar
  11. 11.
    Chumphol, B., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: Safelevel-synthetic minority over-sampling technique for handling the class imbalanced problem. PAKDD Adv. In Know. Discov. Data Min. 475–482 (2009).
  12. 12.
    He, H., Bai, Y., Garcia, E., Li, S.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Join Conference on Neural Network, pp. 1322–1328 (2008).
  13. 13.
    Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. J. Data Min. Know. Discov. 28, 92–122 (2014). Scholar
  14. 14.
    Garcia, S., et al.: Evolutionary-based selection of generalized instances for imbalanced classification. J. Know. Based Sys. 3–12 (2012).
  15. 15.
    Jinyan, L., Simon, F., Raymond, W., Victor, C.: Adaptive multi-objective swarm fusion for imbalanced data classification. J. Inform. Fus. 39, 1–24 (2018). Scholar
  16. 16.
    Feng, H., Hang, L.: A novel boundary oversampling algorithm based on neighborhood rough set model NRSBoundary-SMOTE. J. Mat. Prob. Eng. 1–11 (2013).
  17. 17.
    Chawla, N., Aleksandar, L., Hall, L., Bowyer, K.: SMOTEBoost: improving prediction of the minority class in boosting. PKDD Know. Disc. In Data. 107–119 (2003).
  18. 18.
    Ratsch, G., Onoda T., Muller, K.: Soft margins for AdaBoost. J. Mach. Learn. (42), 287–320 (2001).
  19. 19.
    Joonho, G., Hyunjoong, K.: RHSBoost: improving classification performance in imbalance data. J. Comp. Stat. Data Analy. 111, 1–13 (2017). Scholar
  20. 20.
    Alberto, F., Jesus, M., Herrera, F.: Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning. IPMU Comp. Int. Know. Sys. Desg. 89–98 (2010).

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Rajarambapu Institute of Technology, Rajaramnagar and Research ScholarWalchand College of EngineeringSangliIndia
  2. 2.Walchand College of EngineeringSangliIndia

Personalised recommendations