Skip to main content

Handling Imbalanced and Overlapping Classes in Smart Environments Prompting Dataset

  • Chapter
  • First Online:
Data Mining for Service

Part of the book series: Studies in Big Data ((SBD,volume 3))

Abstract

The area of supervised machine learning often encounters imbalanced class distribution problem where one class is under represented as compared to other classes. Additionally, in many real-life problem domains, data with an imbalanced class distribution contains ambiguous regions in the data space where the prior probability of two or more classes are approximately equal. This problem, known as overlapping classes, thus makes it difficult for the learners in classification task. In this chapter, intersection between the problems of imbalanced class and overlapping classes is explored from the perspective of Smart Environments as the application domain. In smart environments, the task of delivering in-home interventions to residents for timely reminders or brief instructions to ensure successful completion of daily activities, is an ideal scenario for the problem. As a solution to the aforementioned problem, a novel clustering-based under-sampling (ClusBUS) technique is proposed. Density-based clustering technique, DBSCAN, is used to identify “interesting” clusters in the instance space on which under-sampling is performed on the basis of a threshold value for degree of minority class dominance in the clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. He, H., Garcia, E.A.: (2008) Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2008)

    Google Scholar 

  2. Das, B., Chen, C., Dasgupta, N., Cook, D.J., Seelye, A.M.: Automated prompting in a smart home environment. In: Paper presented at the 2010 IEEE international conference on data mining workshops (2010)

    Google Scholar 

  3. Singla, G., Cook, D.J., Schmitter-Edgecombe, M.: Recognizing independent and joint activities among multiple residents in smart environments. J. Ambient Intell. Hum. Comput. 1(1), 57–63 (2010)

    Article  Google Scholar 

  4. Singla, G., Cook, D.J., Schmitter-Edgecombe, M.: Tracking activities in complex settings using smart environment technologies. Int. J. Biosci. psychiatry Technol. (IJBSPT) 1(1), 25 (2009)

    Google Scholar 

  5. Tapia, E.M., Intille, S.S., Larson, K.: Activity recognition in the home using simple and ubiquitous sensors. Pervasive Comput. 3001, 158–175 (2004)

    Google Scholar 

  6. Maurer, U., Smailagic, A., Siewiorek, D.P., Deisher, M.: Activity recognition and monitoring using multiple sensors on different body positions. In: IEEE, pp. 113–116 (2006)

    Google Scholar 

  7. Bureau, U.C.: US population projections. http://www.census.gov/population/www/projections/natdet-D1A.html (2011)

  8. Bates, J., Boote, J., Beverley, C.: Psychosocial interventions for people with a milder dementing illness: a systematic review. J. Adv. Nurs. 45(6), 644–658 (2004)

    Article  Google Scholar 

  9. Wadley, V.G., Okonkwo, O., Crowe, M., Ross-Meadows, L.A.: Mild cognitive impairment and everyday function: evidence of reduced speed in performing instrumental activities of daily living. Am. J. Geriatr. Psychol. 16(5), 416 (2008)

    Google Scholar 

  10. Das, B., Chen, C., Seelye, A.M, Cook, D.J.: An automated prompting system for smart environments. In: Paper presented at the 9th international conference on smart homes and health telematics (2011)

    Google Scholar 

  11. Das, B., Cook, D., Schmitter-Edgecombe, M., Seelye, A.M.: PUCK: an automated prompting system for smart environments. Theme issue on sensor-driven computing and applications for ambient, intelligence. Pers. Ubiquit. Comput. 16(7), 777–779 (2012)

    Google Scholar 

  12. Denil, M.: The effects of overlap and imbalance on SVM classification. Master’s, Dalhousie University (2010)

    Google Scholar 

  13. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)

    Article  Google Scholar 

  14. Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 6, 769–772 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  15. Jolliffe, I.: Principal component analysis. In: Encyclopedia of Statistics in Behavioral Science, vol. 3, pp. 1580–1584. Wiley, New York (2002)

    Google Scholar 

  16. Xiong, H., Wu, J., Liu, L.: Classification with class overlapping: a systematic study (2010)

    Google Scholar 

  17. Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the International Conference on Machine Learning, pp. 445-453. Morgan Kaufmann, Madison (1998)

    Google Scholar 

  18. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)

    MATH  Google Scholar 

  19. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory under-sampling for class-imbalance learning. In: Proceedings of the International Conference Data Mining, pp. 965–969 (2006)

    Google Scholar 

  20. Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International Conference on Advances in Intelligent Computing Advances in Intelligent Computing, vol. 3644, pp. 878–887. Springer, Hefei (2005)

    Google Scholar 

  21. Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 973–978. Morgan Kaufmann, Seattle (2001)

    Google Scholar 

  22. Maloof, M.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML Workshop on Learning from Imbalanced Data Sets (2003)

    Google Scholar 

  23. McCarthy, K., Zabar, B., Weiss, G.: Does cost-sensitive learning beat sampling for classifying rare classes? In: Proceedings of the First International Workshop on Utility-Based Data Mining, pp. 69–77. ACM Press, New York (2005)

    Google Scholar 

  24. Liu, X.Y., Zhou. Z.H.: The influence of class imbalance on cost-sensitive learning: an empirical study. In: Proceedings Sixth IEEE International Conference Data Mining, Hong Kong (ICDM’06), pp. 970–974. Springer, Hong Kong (2006)

    Google Scholar 

  25. Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)

    Google Scholar 

  26. Drummond, C, Holte, R.C.; Exploiting the cost (in) sensitivity of decision tree splitting criteria. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 239–246. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  27. Kukar, M., Kononenko, I.: Cost-sensitive learning with neural networks. In: Proceedings of the Thirteenth European Conference on Artificial Intelligence (ECAI-98), pp. 445–449. Wiley, New York (1998)

    Google Scholar 

  28. Trappenberg, T.P., Back, A.D.: A classification scheme for applications with ambiguous data.In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, vol. 6, pp. 296–301 (2000)

    Google Scholar 

  29. Hashemi, S., Trappenberg, T.: Using SVM for Classification in Datasets with Ambiguous data. SCI 2002 (2002)

    Google Scholar 

  30. Tang, Y., Gao, J.: Improved classification for problem involving overlapping patterns. IEICE Trans. Inf. Syst. E Ser. D 90(11), 1787–1795 (2007)

    Google Scholar 

  31. Lin, Y.M., Wang, X., Ng, W.W.Y., Chang, Q., Yeung, D.S, Wang, X.L.: Sphere classification for ambiguous data. In: Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC), pp. 2571–2574 (2006)

    Google Scholar 

  32. Liu, C.L.: Partial discriminative training for classification of overlapping classes in document analysis. Int. J. Doc. Anal. Recogn. 11(2), 53–65 (2008)

    Article  Google Scholar 

  33. Andrews, S.J.D., Hofmann. T., Van Hentenryck. P., Black. M.: Learning from ambiguous examples. University of Brown (2007)

    Google Scholar 

  34. Prati, R.C., Batista, G.E., Monard, M.C.: Class imbalances versus class overlapping: an analysis of a learning system behavior. In: MICAI 2004: Advances in Artificial Intelligence, vol. 2972, pp. 312–321. Springer, USA (2004)

    Google Scholar 

  35. Batista, G.E., Prati, R.C., Monard, M.C.: Balancing strategies and class overlapping. In: Advances in Intelligent Data Analysis VI, vol. 3646, pp. 24–35. Springer, Heidelberg (2005)

    Google Scholar 

  36. García, V., Alejo, R., Sánchez, J., Sotoca, J., Mollineda, R.: Combined effects of class imbalance and class overlap on instance-based classification. Intell. Data Eng. Autom. Learn. IDEAL 2006, 371–378 (2006)

    Google Scholar 

  37. García, V., Mollineda, R., Sánchez, J., Alejo, R., Sotoca, J.: When overlapping unexpectedly alters the class imbalance effects. In: Pattern Recognition and Image Analysis, vol. 4478, pp. 499–506 (2007)

    Google Scholar 

  38. Visa, S., Ralescu, A.: Learning imbalanced and overlapping classes using fuzzy sets. In: Workshop on Learning from Imbalanced Datasets (ICML–03), pp. 91–104 (2003)

    Google Scholar 

  39. Batista, G., Bazan, A., Monard, M.C.: Balancing training data for automated annotation of keywords: a case study. In: WOB, pp. 35–43 (2003)

    Google Scholar 

  40. Hand, D.J.: Construction and assessment of classification rules, vol. 15. Wiley, New York (1997)

    Google Scholar 

  41. Denil, M., Trappenberg, T.: Overlap versus imbalance. In: Advances in Artificial Intelligence, vol. 6085, pp. 220–231. Springer, Heidelberg (2010)

    Google Scholar 

  42. Hart, P.: The condensed nearest neighbor rule (corresp.). IEEE Trans. Inf. Theory. 14(3), 515–516 (1968)

    Google Scholar 

  43. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML), pp. 179–186 (1997)

    Google Scholar 

  44. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Portland (1996)

    Google Scholar 

  45. Jiawei, H., Kamber, M.: Data mining: concepts and techniques, vol. 5. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  46. Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  47. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)

    Google Scholar 

  48. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pp. 338–345. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  49. Platt, J.: Sequential minimal optimization: a fast algorithm for training support vector machines. Adv. Kernel Methods Support. Vector Learn. 208, 98–112 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barnan Das .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Das, B., Krishnan, N.C., Cook, D.J. (2014). Handling Imbalanced and Overlapping Classes in Smart Environments Prompting Dataset. In: Yada, K. (eds) Data Mining for Service. Studies in Big Data, vol 3. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45252-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45252-9_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45251-2

  • Online ISBN: 978-3-642-45252-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics