Abstract
The area of supervised machine learning often encounters imbalanced class distribution problem where one class is under represented as compared to other classes. Additionally, in many real-life problem domains, data with an imbalanced class distribution contains ambiguous regions in the data space where the prior probability of two or more classes are approximately equal. This problem, known as overlapping classes, thus makes it difficult for the learners in classification task. In this chapter, intersection between the problems of imbalanced class and overlapping classes is explored from the perspective of Smart Environments as the application domain. In smart environments, the task of delivering in-home interventions to residents for timely reminders or brief instructions to ensure successful completion of daily activities, is an ideal scenario for the problem. As a solution to the aforementioned problem, a novel clustering-based under-sampling (ClusBUS) technique is proposed. Density-based clustering technique, DBSCAN, is used to identify “interesting” clusters in the instance space on which under-sampling is performed on the basis of a threshold value for degree of minority class dominance in the clusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
He, H., Garcia, E.A.: (2008) Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2008)
Das, B., Chen, C., Dasgupta, N., Cook, D.J., Seelye, A.M.: Automated prompting in a smart home environment. In: Paper presented at the 2010 IEEE international conference on data mining workshops (2010)
Singla, G., Cook, D.J., Schmitter-Edgecombe, M.: Recognizing independent and joint activities among multiple residents in smart environments. J. Ambient Intell. Hum. Comput. 1(1), 57–63 (2010)
Singla, G., Cook, D.J., Schmitter-Edgecombe, M.: Tracking activities in complex settings using smart environment technologies. Int. J. Biosci. psychiatry Technol. (IJBSPT) 1(1), 25 (2009)
Tapia, E.M., Intille, S.S., Larson, K.: Activity recognition in the home using simple and ubiquitous sensors. Pervasive Comput. 3001, 158–175 (2004)
Maurer, U., Smailagic, A., Siewiorek, D.P., Deisher, M.: Activity recognition and monitoring using multiple sensors on different body positions. In: IEEE, pp. 113–116 (2006)
Bureau, U.C.: US population projections. http://www.census.gov/population/www/projections/natdet-D1A.html (2011)
Bates, J., Boote, J., Beverley, C.: Psychosocial interventions for people with a milder dementing illness: a systematic review. J. Adv. Nurs. 45(6), 644–658 (2004)
Wadley, V.G., Okonkwo, O., Crowe, M., Ross-Meadows, L.A.: Mild cognitive impairment and everyday function: evidence of reduced speed in performing instrumental activities of daily living. Am. J. Geriatr. Psychol. 16(5), 416 (2008)
Das, B., Chen, C., Seelye, A.M, Cook, D.J.: An automated prompting system for smart environments. In: Paper presented at the 9th international conference on smart homes and health telematics (2011)
Das, B., Cook, D., Schmitter-Edgecombe, M., Seelye, A.M.: PUCK: an automated prompting system for smart environments. Theme issue on sensor-driven computing and applications for ambient, intelligence. Pers. Ubiquit. Comput. 16(7), 777–779 (2012)
Denil, M.: The effects of overlap and imbalance on SVM classification. Master’s, Dalhousie University (2010)
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 6, 769–772 (1976)
Jolliffe, I.: Principal component analysis. In: Encyclopedia of Statistics in Behavioral Science, vol. 3, pp. 1580–1584. Wiley, New York (2002)
Xiong, H., Wu, J., Liu, L.: Classification with class overlapping: a systematic study (2010)
Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the International Conference on Machine Learning, pp. 445-453. Morgan Kaufmann, Madison (1998)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory under-sampling for class-imbalance learning. In: Proceedings of the International Conference Data Mining, pp. 965–969 (2006)
Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International Conference on Advances in Intelligent Computing Advances in Intelligent Computing, vol. 3644, pp. 878–887. Springer, Hefei (2005)
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 973–978. Morgan Kaufmann, Seattle (2001)
Maloof, M.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML Workshop on Learning from Imbalanced Data Sets (2003)
McCarthy, K., Zabar, B., Weiss, G.: Does cost-sensitive learning beat sampling for classifying rare classes? In: Proceedings of the First International Workshop on Utility-Based Data Mining, pp. 69–77. ACM Press, New York (2005)
Liu, X.Y., Zhou. Z.H.: The influence of class imbalance on cost-sensitive learning: an empirical study. In: Proceedings Sixth IEEE International Conference Data Mining, Hong Kong (ICDM’06), pp. 970–974. Springer, Hong Kong (2006)
Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)
Drummond, C, Holte, R.C.; Exploiting the cost (in) sensitivity of decision tree splitting criteria. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 239–246. Morgan Kaufmann, San Francisco (2000)
Kukar, M., Kononenko, I.: Cost-sensitive learning with neural networks. In: Proceedings of the Thirteenth European Conference on Artificial Intelligence (ECAI-98), pp. 445–449. Wiley, New York (1998)
Trappenberg, T.P., Back, A.D.: A classification scheme for applications with ambiguous data.In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, vol. 6, pp. 296–301 (2000)
Hashemi, S., Trappenberg, T.: Using SVM for Classification in Datasets with Ambiguous data. SCI 2002 (2002)
Tang, Y., Gao, J.: Improved classification for problem involving overlapping patterns. IEICE Trans. Inf. Syst. E Ser. D 90(11), 1787–1795 (2007)
Lin, Y.M., Wang, X., Ng, W.W.Y., Chang, Q., Yeung, D.S, Wang, X.L.: Sphere classification for ambiguous data. In: Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC), pp. 2571–2574 (2006)
Liu, C.L.: Partial discriminative training for classification of overlapping classes in document analysis. Int. J. Doc. Anal. Recogn. 11(2), 53–65 (2008)
Andrews, S.J.D., Hofmann. T., Van Hentenryck. P., Black. M.: Learning from ambiguous examples. University of Brown (2007)
Prati, R.C., Batista, G.E., Monard, M.C.: Class imbalances versus class overlapping: an analysis of a learning system behavior. In: MICAI 2004: Advances in Artificial Intelligence, vol. 2972, pp. 312–321. Springer, USA (2004)
Batista, G.E., Prati, R.C., Monard, M.C.: Balancing strategies and class overlapping. In: Advances in Intelligent Data Analysis VI, vol. 3646, pp. 24–35. Springer, Heidelberg (2005)
García, V., Alejo, R., Sánchez, J., Sotoca, J., Mollineda, R.: Combined effects of class imbalance and class overlap on instance-based classification. Intell. Data Eng. Autom. Learn. IDEAL 2006, 371–378 (2006)
García, V., Mollineda, R., Sánchez, J., Alejo, R., Sotoca, J.: When overlapping unexpectedly alters the class imbalance effects. In: Pattern Recognition and Image Analysis, vol. 4478, pp. 499–506 (2007)
Visa, S., Ralescu, A.: Learning imbalanced and overlapping classes using fuzzy sets. In: Workshop on Learning from Imbalanced Datasets (ICML–03), pp. 91–104 (2003)
Batista, G., Bazan, A., Monard, M.C.: Balancing training data for automated annotation of keywords: a case study. In: WOB, pp. 35–43 (2003)
Hand, D.J.: Construction and assessment of classification rules, vol. 15. Wiley, New York (1997)
Denil, M., Trappenberg, T.: Overlap versus imbalance. In: Advances in Artificial Intelligence, vol. 6085, pp. 220–231. Springer, Heidelberg (2010)
Hart, P.: The condensed nearest neighbor rule (corresp.). IEEE Trans. Inf. Theory. 14(3), 515–516 (1968)
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML), pp. 179–186 (1997)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Portland (1996)
Jiawei, H., Kamber, M.: Data mining: concepts and techniques, vol. 5. Morgan Kaufmann, San Francisco (2001)
Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pp. 338–345. Morgan Kaufmann, San Francisco (1995)
Platt, J.: Sequential minimal optimization: a fast algorithm for training support vector machines. Adv. Kernel Methods Support. Vector Learn. 208, 98–112 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Das, B., Krishnan, N.C., Cook, D.J. (2014). Handling Imbalanced and Overlapping Classes in Smart Environments Prompting Dataset. In: Yada, K. (eds) Data Mining for Service. Studies in Big Data, vol 3. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45252-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-45252-9_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45251-2
Online ISBN: 978-3-642-45252-9
eBook Packages: EngineeringEngineering (R0)