Knowledge and Information Systems

, Volume 48, Issue 1, pp 201–228 | Cite as

Transfer learning for class imbalance problems with inadequate data

Regular Paper

Abstract

A fundamental problem in data mining is to effectively build robust classifiers in the presence of skewed data distributions. Class imbalance classifiers are trained specifically for skewed distribution datasets. Existing methods assume an ample supply of training examples as a fundamental prerequisite for constructing an effective classifier. However, when sufficient data are not readily available, the development of a representative classification algorithm becomes even more difficult due to the unequal distribution between classes. We provide a unified framework that will potentially take advantage of auxiliary data using a transfer learning mechanism and simultaneously build a robust classifier to tackle this imbalance issue in the presence of few training samples in a particular target domain of interest. Transfer learning methods use auxiliary data to augment learning when training examples are not sufficient and in this paper we will develop a method that is optimized to simultaneously augment the training data and induce balance into skewed datasets. We propose a novel boosting-based instance transfer classifier with a label-dependent update mechanism that simultaneously compensates for class imbalance and incorporates samples from an auxiliary domain to improve classification. We provide theoretical and empirical validation of our method and apply to healthcare and text classification applications.

Keywords

Rare class Transfer learning Class imbalance AdaBoost Weighted majority algorithm HealthCare informatics Text mining 

References

  1. 1.
    He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRefGoogle Scholar
  2. 2.
    Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRefGoogle Scholar
  3. 3.
    Li Y, Vinzamuri B, Reddy CK (2015) Constrained elastic net based knowledge transfer for healthcare information exchange. Data Min Knowl Discov 29(4):1094–1112MathSciNetCrossRefGoogle Scholar
  4. 4.
    Waters D (2009) Spam overwhelms e-mail messages. BBC News. http://news.bbc.co.uk/2/hi/technology/7988579.stm
  5. 5.
    Halliday J (2011) Email spam level bounces back after record low. The Guardian; Retrieved 2011-01-11Google Scholar
  6. 6.
    Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, CambridgeGoogle Scholar
  7. 7.
    Mitchell T (1997) Machine learning. McGraw-Hill, New YorkMATHGoogle Scholar
  8. 8.
    Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor Newsl 6(1):7–19CrossRefGoogle Scholar
  9. 9.
    He J (2010) Rare category analysis. Ph.D. thesis; Carnegie Mellon UniversityGoogle Scholar
  10. 10.
    Banko M, Brill E (2001) Scaling to very very large corpora for natural language disambiguation. In: Proceedings of the 39th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 26–33Google Scholar
  11. 11.
    Weiss GM, Provost F (2003) Learning when training data are costly: the effect of class distribution on tree induction. J Artif Intell Res 19(1):315–354MATHGoogle Scholar
  12. 12.
    Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449MATHGoogle Scholar
  13. 13.
    Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159CrossRefGoogle Scholar
  14. 14.
    Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
  15. 15.
    Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 233–240Google Scholar
  16. 16.
    Dai W, Yang Q, Xue GR, Yu Y (2007a) Boosting for transfer learning. In: Proceedings of the international conference on machine learning, pp 193–200Google Scholar
  17. 17.
    Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Proceedings of the second European conference on computational learning theory, pp 23–37Google Scholar
  18. 18.
    Littlestone N, Warmuth MK (1989) The weighted majority algorithm. In: Proceedings of the 30th annual symposium on foundations of computer science, pp 256–261Google Scholar
  19. 19.
    Yao Y, Doretto G (2010) Boosting for transfer learning with multiple sources. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1855–1862Google Scholar
  20. 20.
    Al-Stouhi S, Reddy CK, Lanfear DE (2012) Label space transfer learning. In: IEEE 24th international conference on tools with artificial intelligence, ICTAI 2012, Athens, Greece, November 7–9, 2012, pp 727–734Google Scholar
  21. 21.
    Vieriu RL, Rajagopal A, Subramanian R, Lanz O, Ricci E, Sebe N et al (2012) Boosting-based transfer learning for multi-view head-pose classification from surveillance videos. In: Proceedings of the 20th European signal processing conference (EUSIPCO), pp 649–653Google Scholar
  22. 22.
    Luo W, Li X, Li W, Hu W (2011) Robust visual tracking via transfer learning. In: ICIP, pp 485–488Google Scholar
  23. 23.
    Eaton E, Des Jardins M (2009) Set-based boosting for instance-level transfer. In: Proceedings of the 2009 IEEE international conference on data mining workshops, pp 422 –428Google Scholar
  24. 24.
    Venkatesan A, Krishnan N, Panchanathan S (2010) Cost–sensitive boosting for concept drift. In: Proceedings of the 2010 international workshop on handling concept drift in adaptive information systems, pp 41–47Google Scholar
  25. 25.
    Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378CrossRefMATHGoogle Scholar
  26. 26.
    Pardoe D, Stone P (2010) Boosting for regression transfer. In: Proceedings of the 27th international conference on machine learning, pp 863–870Google Scholar
  27. 27.
    Eaton E (2009) Selective knowledge transfer for machine learning. Ph.D. thesis. University of Maryland Baltimore CountyGoogle Scholar
  28. 28.
    Al-Stouhi S, Reddy CK (2011) Adaptive boosting for transfer learning using dynamic updates. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 60–75Google Scholar
  29. 29.
    Provost F (2000) Machine learning from imbalanced data sets 101. In: Proceedings of the American association for artificial intelligence workshop, pp 1–3Google Scholar
  30. 30.
    Ertekin S, Huang J, Bottou L, Giles L (2007) Learning on the border: active learning in imbalanced data classification. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management, pp 127–136Google Scholar
  31. 31.
    Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2–3):195–215CrossRefGoogle Scholar
  32. 32.
    Guyon I, Aliferis CF, Cooper GF, Elisseeff A, Pellet JP, Spirtes P et al (2008) Design and analysis of the causation and prediction challenge. J Mach Learn Res Proc Track 3:1–33Google Scholar
  33. 33.
    Rijsbergen CJV (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, Newton, ISBN:0408709294Google Scholar
  34. 34.
    Brodersen K, Ong CS, Stephan K, Buhmann J (2010) The balanced accuracy and its posterior distribution. In: Pattern recognition (ICPR), 2010 20th international conference on, pp 3121–3124Google Scholar
  35. 35.
    Vinzamuri B, Reddy CK (2013) Cox regression with correlation based regularization for electronic health records. In: Data mining (ICDM), 2013 IEEE 13th international conference on. IEEE, pp 757–766Google Scholar
  36. 36.
    Clancy C, Munier W, Crosson K, Moy E, Ho K, Freeman W et al (2011) 2010 National healthcare quality and disparities reports. Tech. Rep, Agency for Healthcare Research and Quality (AHRQ)Google Scholar
  37. 37.
    Gertler P, Molyneaux J (1994) How economic development and family planning programs combined to reduce indonesian fertility. Demography 31(1):33–63. doi:10.2307/2061907 CrossRefGoogle Scholar
  38. 38.
    Little MA, McSharry PE, Hunter EJ, Spielman J, Ramig LO (2009) Suitability of dysphonia measurements for telemonitoring of parkinson’s disease. IEEE Trans Biomed Eng 56(4):1015–1022CrossRefGoogle Scholar
  39. 39.
    Fahn S, Elton R, Committee UD et al (1987) Unified parkinson’s disease rating scale. Recent Dev Parkinson’s Dis 2:153–163Google Scholar
  40. 40.
    Dai W, Xue GR, Yang Q, Yu Y (2007b) Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 210–219Google Scholar
  41. 41.
    Aizawa A (2003) An information-theoretic perspective of tf–idf measures. Inf Process Manag 39(1):45–65MathSciNetCrossRefMATHGoogle Scholar
  42. 42.
    Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATHGoogle Scholar
  43. 43.
    Kohavi R, et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence; vol. 14. Lawrence Erlbaum Associates Ltd, pp 1137–1145Google Scholar
  44. 44.
    Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retr 1(1):69–90CrossRefGoogle Scholar
  45. 45.
    Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: Proceedings of the principles of knowledge discovery in databases, PKDD-2003, pp 107–119Google Scholar
  46. 46.
    Batista G, Prati R, Monard M (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29CrossRefGoogle Scholar
  47. 47.
    Wang Y, Xiao J (2011) Transfer ensemble model for customer churn prediction with imbalanced class distribution. In: Information technology, computer engineering and management sciences (ICM), 2011 international conference on. vol. 3. IEEE, pp 177–181Google Scholar
  48. 48.
    Palit I, Reddy CK (2012) Scalable and parallel boosting with mapreduce. IEEE Trans Knowl Data Eng 24(10):1904–1916CrossRefGoogle Scholar
  49. 49.
    Reddy CK, Park JH (2011) Multi-resolution boosting for classification and regression problems. Knowl Inf Syst 29(2):435–456CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  1. 1.Honda Automobile Technology ResearchSouthfieldUSA
  2. 2.Department of Computer ScienceWayne State UniversityDetroitUSA

Personalised recommendations