User-adaptive models for activity and emotion recognition using deep transfer learning and data augmentation

  • Enrique Garcia-CejaEmail author
  • Michael Riegler
  • Anders K. Kvernberg
  • Jim Torresen


Building predictive models for human-interactive systems is a challenging task. Every individual has unique characteristics and behaviors. A generic human–machine system will not perform equally well for each user given the between-user differences. Alternatively, a system built specifically for each particular user will perform closer to the optimum. However, such a system would require more training data for every specific user, thus hindering its applicability for real-world scenarios. Collecting training data can be time consuming and expensive. For example, in clinical applications it can take weeks or months until enough data is collected to start training machine learning models. End users expect to start receiving quality feedback from a given system as soon as possible without having to rely on time consuming calibration and training procedures. In this work, we build and test user-adaptive models (UAM) which are predictive models that adapt to each users’ characteristics and behaviors with reduced training data. Our UAM are trained using deep transfer learning and data augmentation and were tested on two public datasets. The first one is an activity recognition dataset from accelerometer data. The second one is an emotion recognition dataset from speech recordings. Our results show that the UAM have a significant increase in recognition performance with reduced training data with respect to a general model. Furthermore, we show that individual characteristics such as gender can influence the models’ performance.


Transfer learning User adaptation Personalized models Deep learning Emotion recognition Activity recognition 



This publication is part of the INTROducing Mental health through Adaptive Technology (INTROMAT) project, funded by the Norwegian Research Council (259293/o70) and as a part of the RCN Centres of Excellence scheme (Project No. 262762).


  1. Abdallah, Z., Gaber, M., Srinivasan, B., Krishnaswamy, S.: StreamAR: incremental and active learning with evolving sensory data for activity recognition. In: Tools with Artificial Intelligence (ICTAI), 2012 IEEE 24th International Conference on, vol. 1, pp. 1163–1170 (2012).
  2. Alnujaim, I., Alali, H., Khan, F., Kim, Y.: Hand gesture recognition using input impedance variation of two antennas with transfer learning. IEEE Sens. J. 18(10), 4129–4135 (2018). CrossRefGoogle Scholar
  3. Avci, A., Bosch, S., Marin-Perianu, M., Marin-Perianu, R., Havinga, P.: Activity recognition using inertial sensing for healthcare, wellbeing and sports applications: a survey. In: Architecture of Computing Systems (ARCS), 2010 23rd International Conference on, pp. 1–10 (2010)Google Scholar
  4. Aviezer, H., Hassin, R.R., Ryan, J., Grady, C., Susskind, J., Anderson, A., Moscovitch, M., Bentin, S.: Angry, disgusted, or afraid? Studies on the malleability of emotion perception. Psychol. Sci. 19(7), 724–732 (2008)CrossRefGoogle Scholar
  5. Ayadi, M.E., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011). CrossRefzbMATHGoogle Scholar
  6. Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 International Conference on Platform Technology and Service (PlatCon), pp. 1–5 (2017).
  7. Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pp. 17–36 (2012)Google Scholar
  8. Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 440–447 (2007)Google Scholar
  9. Brezmes, T., Gorricho, J.L., Cotrina, J.: Activity recognition from accelerometer data on a mobile phone. In: Omatu, S., Rocha, M., Bravo, J., Fernndez, F., Corchado, E., Bustillo, A., Corchado, J. (eds.) Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living, Lecture Notes in Computer Science, vol. 5518, pp. 796–799. Springer, Berlin (2009)Google Scholar
  10. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology (2005)Google Scholar
  11. Chapelle, O., Schölkopf, B., Zien, A.: Others Semi-Supervised Learning. MIT Press, Cambridge (2006)CrossRefGoogle Scholar
  12. Chatterjee, J., Mukesh, V., Hsu, H., Vyas, G., Liu, Z.: Speech emotion recognition using cross-correlation and acoustic features. In: 2018 IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pp. 243–249 (2018)Google Scholar
  13. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  14. Devin, C., Gupta, A., Darrell, T., Abbeel, P., Levine, S.: Learning modular neural network policies for multi-task and multi-robot transfer. In: Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp. 2169–2176. IEEE (2017)Google Scholar
  15. EmotionDB.: Berlin Database of Emotional Speech. (1999). Accessed 28 Jan 2018
  16. Fallahzadeh, R., Ghasemzadeh, H.: Personalization without user interruption: boosting activity recognition in new subjects using unlabeled data. In: Proceedings of the 8th International Conference on Cyber-Physical Systems, pp. 293–302. ACM (2017)Google Scholar
  17. Gama, J., liobait, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Sur. (CSUR) 46(4), 44 (2014d)zbMATHGoogle Scholar
  18. Garcia-Ceja, E., Brena, R.: Building personalized activity recognition models with scarce labeled data based on class similarities. In: García-Chamizo, J.M., Fortino, G., Ochoa, S.F. (eds.) Ubiquitous Computing and Ambient Intelligence. Sensing, Processing, and Using Environmental Information, pp. 265–276. Springer, Cham (2015)CrossRefGoogle Scholar
  19. Garcia-Ceja, E., Brena, R.F.: Activity recognition using community data to complement small amounts of labeled instances. Sensors 16(6), 877 (2016). CrossRefGoogle Scholar
  20. Garcia-Ceja, E., Osmani, V., Mayora, O.: Automatic stress detection in working environments from smartphones’ accelerometer data: a first step. IEEE J. Biomed. Health Inf. 20(4), 1053–1060 (2016). CrossRefGoogle Scholar
  21. Garcia-Ceja, E., Riegler, M., Nordgreen, T., Jakobsen, P., Oedegaard, K.J., Trresen, J.: Mental health monitoring with multimodal sensing and machine learning: a survey. Pervasive Mobile Comput. 51, 1–26 (2018). CrossRefGoogle Scholar
  22. Giannakopoulos, T.: Pyaudioanalysis: an open-source python library for audio signal analysis. PLoS ONE 10(12), 1–17 (2015). CrossRefGoogle Scholar
  23. Giannakopoulos, T.: Python audio analysis library. (2016). Accessed 28 Jan 2018
  24. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  25. Grünerbl, A., Muaremi, A., Osmani, V., Bahle, G., Öhler, S., Trster, G., Mayora, O., Haring, C., Lukowicz, P.: Smartphone-based recognition of states and state changes in bipolar disorder patients. IEEE J. Biomed. Health Inf. 19(1), 140–148 (2015). CrossRefGoogle Scholar
  26. Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall PTR, Englewood Cliffs (1994)zbMATHGoogle Scholar
  27. Hutcherson, C.A., Gross, J.J.: The moral emotions: a social-functionalist account of anger, disgust, and contempt. J. Personal. Soc. Psychol. 100(4), 719 (2011)CrossRefGoogle Scholar
  28. Karam, Z.N., Provost, E.M., Singh, S., Montgomery, J., Archer, C., Harrington, G., Mcinnis, M.G.: Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech. In: Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pp. 4858–4862. IEEE (2014)Google Scholar
  29. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR arXiv:1412.6980 (2014)
  30. Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)Google Scholar
  31. Kwapisz, J.R., Weiss, G.M., Moore, S.A.: Activity recognition using cell phone accelerometers. SIGKDD Explor. Newsl. 12(2), 74–82 (2011). CrossRefGoogle Scholar
  32. Lalitha, S., Madhavan, A., Bhushan, B., Saketh, S.: Speech emotion recognition. In: Advances in Electronics, Computers and Communications (ICAECC), 2014 International Conference on, pp. 1–4. IEEE (2014)Google Scholar
  33. Lane, N.D., Xu, Y., Lu, H., Hu, S., Choudhury, T., Campbell, A.T., Zhao, F.: Enabling large-scale human activity inference on smartphones using community similarity networks (CSN). In: Proceedings of the 13th International Conference on Ubiquitous Computing, UbiComp ’11, pp. 355–364. ACM, New York (2011).
  34. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  35. Lin, Y.L., Wei, G.: Speech emotion recognition based on HMM and SVM. In: 2005 International Conference on Machine Learning and Cybernetics, vol. 8, pp. 4898–4901 (2005).
  36. Lockhart, J.W., Weiss, G.M.: The benefits of personalized smartphone-based activity recognition models. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 614–622 (2014).
  37. López-Nava, I., Muñoz-Meléndez, A.: High-level features for recognizing human actions in daily living environments using wearable sensors. In: Multidisciplinary Digital Publishing Institute Proceedings, vol. 2, p. 1238 (2018)Google Scholar
  38. Lu, H., Frauendorfer, D., Rabbi, M., Mast, M.S., Chittaranjan, G.T., Campbell, A.T., Gatica-Perez, D., Choudhury, T.: StressSense: detecting stress in unconstrained acoustic environments using smartphones. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, UbiComp ’12, pp. 351–360. ACM (2012).
  39. Mannini, A., Sabatini, A.M.: Machine learning methods for classifying human physical activity from on-body accelerometers. Sensors 10(2), 1154–1175 (2010). CrossRefGoogle Scholar
  40. Martínez-Pérez, F.E., González-Fraga, J.A., Cuevas-Tello, J.C., Rodríguez, M.D.: Activity inference for ambient intelligence through handling artifacts in a healthcare environment. Sensors 12(1), 1072–1099 (2012). CrossRefGoogle Scholar
  41. Maxhuni, A., Hernandez-Leal, P., Sucar, L.E., Osmani, V., Morales, E.F., Mayora, O.: Stress modelling and prediction in presence of scarce data. J. Biomed. Inf. 63, 344–356 (2016). CrossRefGoogle Scholar
  42. Mitchell, E., Monaghan, D., O’Connor, N.E.: Classification of sporting activities using smartphone accelerometers. Sensors 13(4), 5317–5337 (2013)CrossRefGoogle Scholar
  43. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  44. Parviainen, J., Bojja, J., Collin, J., Leppnen, J., Eronen, A.: Adaptive activity and environment recognition for mobile phones. Sensors 14(11), 20753–20778 (2014). CrossRefGoogle Scholar
  45. Peng, P., Tian, Y., Xiang, T., Wang, Y., Pontil, M., Huang, T.: Joint semantic and latent attribute modelling for cross-class transfer learning. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1625–1638 (2017)CrossRefGoogle Scholar
  46. Richter, J., Wiede, C., Dayangac, E., Shahenshah, A., Hirtz, G.: Activity recognition for elderly care by evaluating proximity to objects and human skeleton data. In: Fred, A., De Marsico, M., Sanniti di Baja, G. (eds.) International Conference on Pattern Recognition Applications and Methods, pp. 139–155. Springer, Berlin (2016) Google Scholar
  47. Rokni, S.A., Nourollahi, M., Ghasemzadeh, H.: Personalized human activity recognition using convolutional neural networks. CoRR arXiv:1801.08252 (2018)
  48. Sanchez, W., Martinez, A., Campos, W., Estrada, H., Pelechano, V.: Inferring loneliness levels in older adults from smartphones. J. Ambient Intell. Smart Environ. 7(1), 85–98 (2015)Google Scholar
  49. Scudder, I.H.: Probability of error of some adaptive pattern-recognition machines. IEEE Trans. Inf. Theory 11(3), 363–371 (1965). MathSciNetCrossRefzbMATHGoogle Scholar
  50. Sevakula, R.K., Singh, V., Verma, N.K., Kumar, C., Cui, Y.: Transfer learning for molecular cancer classification using deep neural networks. IEEE/ACM Trans. Comput. Biol. Bioinf. (2018). CrossRefGoogle Scholar
  51. Shin, H.C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., Summers, R.M.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. imaging 35(5), 1285–1298 (2016)CrossRefGoogle Scholar
  52. Shoaib, M., Bosch, S., Incel, O.D., Scholten, H., Havinga, P.J.M.: Fusion of smartphone motion sensors for physical activity recognition. Sensors 14(6), 10146–10176 (2014). CrossRefGoogle Scholar
  53. Soleymani, M., Riegler, M., Halvorsen, P.: Multimodal analysis of user behavior and browsed content under different image search intents. Int. J. Multimed. Inf. Retr. 7(1), 29–41 (2018)CrossRefGoogle Scholar
  54. Tarnowski, P., Koodziej, M., Majkowski, A., Rak, R.J.: Emotion recognition using facial expressions. Proc. Comput. Sci. 108, 1175–1184 (2017). CrossRefGoogle Scholar
  55. Vildjiounaite, E., Kallio, J., Mntyjrvi, J., Kyllnen, V., Lindholm, M., Gimel’farb, G.: Unsupervised stress detection algorithm and experiments with real life data. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds.) Progress in Artificial Intelligence, pp. 95–107. Springer, Berlin (2017)CrossRefGoogle Scholar
  56. Vo, Q.V., Hoang, M.T., Choi, D.: Personalization in mobile activity recognition system using k-medoids clustering algorithm. Int. J. Distrib. Sens. Netw. 9(7), 315841 (2013)CrossRefGoogle Scholar
  57. Wang, X., Rosenblum, D., Wang, Y.: Context-aware mobile music recommendation for daily activities. In: Proceedings of the 20th ACM international conference on Multimedia, pp. 99–108. ACM (2012)Google Scholar
  58. Wisdm: Activity prediction dataset. (2012). Accessed 28 Jan 2018
  59. Xu, Q., Nwe, T.L., Guan, C.: Cluster-based analysis for personalized stress evaluation using physiological signals. IEEE J. Biomed. Health Inf. 19(1), 275–281 (2015). CrossRefGoogle Scholar
  60. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 189–196 (1995)Google Scholar
  61. Zenonos, A., Khan, A., Kalogridis, G., Vatsikas, S., Lewis, T., Sooriyabandara, M.: HealthyOffice: mood recognition at work using smartphones and wearable sensors. In: 2016 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops), pp. 1–6 (2016).

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.SINTEF DigitalOsloNorway
  2. 2.Simula Research LaboratoryOsloNorway
  3. 3.Department of InformaticsUniversity of OsloOsloNorway
  4. 4.Department of Informatics and RITMOUniversity of OsloOsloNorway
  5. 5.Kristiania University CollegeOsloNorway

Personalised recommendations