Abstract
Achieving safe collaboration between humans and robots in an industrial work-cell requires effective communication. This can be achieved through a robot perception system developed using data-driven machine learning. The challenge for human–robot communication is the availability of extensive, labelled datasets for training. Due to the variations in human behaviour and the impact of environmental conditions on the performance of perception models, models trained on standard, publicly available datasets fail to generalize well to domain and application-specific scenarios. Thus, model personalization involving the adaptation of such models to the individual humans involved in the task in the given environment would lead to better model performance. A novel framework is presented that leverages robust modes of communication and gathers feedback from the human partner to auto-label the mode with the sparse dataset. The strength of the contribution lies in using in-commensurable multimodes of inputs for personalizing models with user-specific data. The personalization through feedback-enabled human–robot communication (PF-HRCom) framework is implemented on the use of facial expression recognition as a safety feature to ensure that the human partner is engaged in the collaborative task with the robot. Additionally, PF-HRCom has been applied to a real-time human–robot handover task with a robotic manipulator. The perception module of the manipulator adapts to the user’s facial expressions and personalizes the model using feedback. Having said that, the framework is applicable to other combinations of multimodal inputs in human–robot collaboration applications.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are available from the corresponding author upon request.
References
Affectiva. Building the ultimate in-cabin experience with renovo and affectiva (2018)
Barrett, L.F., Adolphs, R., Marsella, S., Martinez, A.M., Pollak, S.D.: Emotional expressions reconsidered: challenges to inferring emotion from human facial movements. Psychol. Sci. Public Interest 20(1), 1–68 (2019). (PMID: 31313636)
Caleb-Solly, P., Dogramadzi, S., Huijnen, C.A., van den Heuvel, H.: Exploiting ability for human adaptation to facilitate improved human-robot interaction and acceptance. Inf. Soc. 34(3), 153–165 (2018)
Castellano, G., Carolis, B.D., Macchiarulo, N.: Automatic facial emotion recognition at the COVID-19 pandemic time. Multimedia Tools Appl. 82(9), 12751–12769 (2022)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets (2014)
Chen, L., Zhou, M., Su, W., Wu, M., She, J., Hirota, K.: Softmax regression based deep sparse autoencoder network for facial emotion recognition in human–robot interaction. Inf. Sci. 428, 49–61 (2018)
Chi, S., Tian, Y., Wang, F., Zhou, T., Jin, S., Li, J.: A novel lifelong machine learning-based method to eliminate calibration drift in clinical prediction models. Artif. Intell. Med. 125, 102256 (2022)
Chiurco, A., Frangella, J., Longo, F., Nicoletti, L., Padovano, A., Solina, V., Mirabelli, G., Citraro, C.: Real-time detection of worker’s emotions for advanced human-robot interaction during collaborative tasks in smart factories. In: Procedia Computer Science, 3rd International Conference on Industry 4.0 and Smart Manufacturing, vol. 200, pp. 1875–1884 (2022)
Churamani, N., Anton, P., Brügger, M., Fließwasser, E., Hummel, T., Mayer, J., Mustafa, W., Ng, H. G., Nguyen, T. L. C., Nguyen, Q., Soll, M., Springenberg, S., Griffiths, S., Heinrich, S., Navarro-Guerrero, N., Strahl, E., Twiefel, J., Weber, C., and Wermter, S.: The impact of personalisation on human-robot interaction in learning scenarios. In: Proceedings of the 5th International Conference on Human Agent Interaction, HAI ’17, 171–180, New York, NY, USA. Association for Computing Machinery (2017)
Citron, F.M., Gray, M.A., Critchley, H.D., Weekes, B.S., Ferstl, E.C.: Emotional valence and arousal affect reading in an interactive way: neuroimaging evidence for an approach-withdrawal framework. Neuropsychologia 56, 79–89 (2014)
Di Napoli, C., Valentino, M., Sabatucci, L., Cossentino, M.: Adaptive workflows of home-care services. In: 2018 IEEE 27th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), 3–8 (2018)
Drawdy, C. C., Yanik, P. M.: Gaze estimation technique for directing assistive robotics. In: Procedia Manufacturing, 6th International Conference on Applied Human Factors and Ergonomics (AHFE 2015) and the Affiliated Conferences, AHFE 2015, vol. 3, pp. 837–844 (2015)
Ekman, P.: Emotions revealed: Recognizing faces and feelings to improve communication and emotional life. Emotions revealed: Recognizing faces and feelings to improve communication and emotional life. Times Books/Henry Holt and Co, New York, NY, US. Pages: xvii, 267 (2003)
Ekman, P., Friesen, W.V.: Unmasking the face. Malor Books, Cambridge, MA (2003)
Faria, D. R., Vieira, M., Faria, F. C., Premebida, C.: Affective facial expressions recognition for human-robot interaction. In: 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 805–810. IEEE (2017)
Gajhede, N., Beck, O., Purwins, H.: Convolutional neural networks with batch normalization for classifying hi-hat, snare, and bass percussion sound samples. In: Proceedings of the Audio Mostly 2016, AM ’16, pp. 111–115, New York, NY, USA. Association for Computing Machinery (2016)
Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, 1050–1059. PMLR (2016)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation (2013)
Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.-H., Zhou, Y., Ramaiah, C., Feng, F., Li, R., Wang, X., Athanasakis, D., Shawe-Taylor, J., Milakov, M., Park, J., Ionescu, R., Popescu, M., Grozea, C., Bergstra, J., Xie, J., Romaszko, L., Xu, B., Chuang, Z., Bengio, Y.: Challenges in representation learning: a report on three machine learning contests (2013)
Gupta, S., Hoffman, J., Malik, J.: Cross modal distillation for supervision transfer (2015)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks (2016)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015)
Hsu, S.-C., Huang, H.-H., Huang, C.-L.: Facial expression recognition for human-robot interaction. In: 2017 First IEEE International Conference on Robotic Computing (IRC), pp. 1–7 (2017)
Kale, Y.V., Shetty, A.U., Patil, Y.A., Patil, R.A., Medhane, D.V.: Object detection and face recognition using yolo and inception model. In: Woungang, I., Dhurandher, S.K., Pattanaik, K.K., Verma, A., Verma, P. (eds.) Advanced Network Technologies and Intelligent Computing, pp. 274–287. Springer International Publishing, Cham (2022)
Kardos, C., Kemény, Z., Kovács, A., Pataki, B.E., Váncza, J.: Context-dependent multimodal communication in human-robot collaboration. Procedia CIRP 72, 15–20 (2018)
Khan, O., Badhiwala, J.H., Grasso, G., Fehlings, M.G.: Use of machine learning and artificial intelligence to drive personalized medicine approaches for spine care. World Neurosurg. 140, 512–518 (2020)
Kim, D. Y., Wallraven, C.: Label quality in affectnet: results of crowd-based re-annotation (2021)
Kim, J.-B., Park, J.-S.: Multistage data selection-based unsupervised speaker adaptation for personalized speech emotion recognition. Eng. Appl. Artif. Intell. 52, 126–134 (2016)
Kosti, R., Alvarez, J.M., Recasens, A., Lapedriza, A.: Context based emotion recognition using emotic dataset. IEEE Trans. Pattern Anal. Mach. Intell. 42(11), 2755–2766 (2019)
Kothandaraman, D., Nambiar, A., Mittal, A.: Domain adaptive knowledge distillation for driving scene semantic segmentation (2020)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12, pp. 1097–1105, Red Hook, NY, USA. Curran Associates Inc (2012)
Kumagai, K., Lin, D., Meng, L., Blidaru, A., Beesley, P., Kulić, D., Mizuuchi, I.: Towards individualized affective human-machine interaction. In: 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 678–685 (2018)
Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., Dollár, P.: Microsoft coco: Common objects in context (2014)
Liu, H., Fang, T., Zhou, T., Wang, L.: Towards robust human-robot collaborative manufacturing: Multimodal fusion. IEEE Access 6, 74762–74771 (2018)
Liu, Z., Wu, M., Cao, W., Chen, L., Xu, J., Zhang, R., Zhou, M., Mao, J.: A facial expression emotion recognition based human–robot interaction system. IEEE/CAA J. Automatica Sinica 4(4), 668–676 (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation (2014)
Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, 94–101 (2010)
Maroto-Gómez, M., Marqués-Villaroya, S., Castillo, J.C., Castro-González, Á., Malfaz, M.: Active learning based on computer vision and human-robot interaction for the user profiling and behavior personalization of an autonomous social robot. Eng. Appl. Artif. Intell. 117, 105631 (2023)
Maurtua, I., Fernandez, I., Kildal, J., Susperregi, L., Tellaeche, A., Ibarguren, A.: Enhancing safe human-robot collaboration through natural multimodal communication. In: 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA), 1–8 (2016)
Maurtua, I., Fernández, I., Tellaeche, A., Kildal, J., Susperregi, L., Ibarguren, A., Sierra, B.: Natural multimodal communication for human-robot collaboration. Int. J. Adv. Rob. Syst. 14(4), 1729881417716043 (2017)
Mitruţ, O., Moise, G., Petrescu, L., Moldoveanu, A., Leordeanu, M., Moldoveanu, F.: Emotion classification based on biophysical signals and machine learning techniques. Symmetry 12, 21 (2019)
Mohammed, S.N., Hassan, A.K.A.: A survey on emotion recognition for human robot interaction. J. Comput. Inf. Technol. 28(2), 125–146 (2020)
Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2019)
Mozilla (2022). Mozilla common voice, https://voice.mozilla.org/en
Mukherjee, D.: Statistically-informed multimodal domain adaptation in industrial human-robot collaboration environments. PhD thesis, University of British Columbia (2023)
Mukherjee, D., Gupta, K., Chang, L.H., Najjaran, H.: A survey of robot learning strategies for human-robot collaboration in industrial settings. Robot. Comput. Integr. Manuf. 73, 102231 (2022)
Mukherjee, D., Gupta, K., Najjaran, H.: An ai-powered hierarchical communication framework for robust human-robot collaboration in industrial settings. In: 2022 31st IEEE International Conference on Robot & Human Interactive Communication (RO-MAN),accepted, in press, pp. 1–6 (2022b)
Mukherjee, D., Gupta, K., Najjaran, H.: A critical analysis of industrial human-robot communication and its quest for naturalness through the lens of complexity theory. Front. Robot. A I, 9 (2022)
Mukherjee, D., Singhai, R., Najjaran, H.: Systematic adaptation of communication-focused machine learning models from real to virtual environments for human-robot collaboration (2023)
Nandy, R., Nandy, K., Walters, S.T.: Relationship between valence and arousal for subjective experience in a real-life setting for supportive housing residents: Results from an ecological momentary assessment study. JMIR Format. Res. 7, e34989 (2023)
Nuzzi, C., Pasinetti, S., Pagani, R., Ghidini, S., Beschi, M., Coffetti, G., Sansoni, G.: Meguru: a gesture-based robot program builder for meta-collaborative workstations. Robot. Comput. Integr. Manuf. 68, 102085 (2021)
Rautiainen, S., Pantano, M., Traganos, K., Ahmadi, S., Saenz, J., Mohammed, W.M., Martinez Lastra, J.L.: Multimodal interface for human-robot collaboration. Machines 10(10), 957 (2022)
Rawal, N., Stock-Homburg, R.M.: Facial emotion expressions in human-robot interaction: a survey. Int. J. Soc. Robot. 14(7), 1583–1604 (2022)
Reddy, B.S., Basir, O.A.: Concept-based evidential reasoning for multimodal fusion in human-computer interaction. Appl. Soft Comput. 10(2), 567–577 (2010)
Rossi, S., Ferland, F., Tapus, A.: User profiling and behavioral adaptation for hri: a survey. Pattern Recognit. Lett. 99:3–12. User Profiling and Behavior Adaptation for Human-Robot Interaction (2017)
Rossi, S., Leone, E., Fiore, M., Finzi, A., Cutugno, F. (2013). An extensible architecture for robust multimodal human-robot communication. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2208–2213
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161 (1980)
Shani, R., Tal, S., Derakshan, N., Cohen, N., Enock, P.M., McNally, R.J., Mor, N., Daches, S., Williams, A.D., Yiend, J., Carlbring, P., Kuckertz, J.M., Yang, W., Reinecke, A., Beevers, C.G., Bunnell, B.E., Koster, E.H., Zilcha-Mano, S., Okon-Singer, H.: Personalized cognitive training: protocol for individual-level meta-analysis implementing machine learning methods. J. Psychiatr. Res. 138, 342–348 (2021)
Shu, B., Sziebig, G., Pieters, R.: Architecture for safe human-robot collaboration: Multi-modal communication in virtual reality for efficient task execution. In: 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), 2297–2302 (2019)
Shumanov, M., Johnson, L.: Making conversations with chatbots more personalized. Comput. Hum. Behav. 117, 106627 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)
Skantze, G., Hjalmarsson, A., Oertel, C.: Turn-taking, feedback and joint attention in situated human-robot interaction. Speech Commun. 65, 50–66 (2014)
Spezialetti, M., Placidi, G., Rossi, S.: Emotion recognition for human-robot interaction: Recent advances and future perspectives. Front. Robot. A I, 7 (2020)
Sutton, T.M., Herbert, A.M., Clark, D.Q.: Valence, arousal, and dominance ratings for facial stimuli. Quart. J. Exp. Psychol. 72(8), 2046–2055 (2019). (PMID: 30760113)
Thoker, F.M. Gall, J.: Cross-modal knowledge distillation for action recognition (2019)
Tio, A.E.: Face shape classification using inception v3 (2019)
Tulsiani, S. Malik, J.: Viewpoints and keypoints (2014)
Verma, G.K., Tiwary, U.S.: Affect representation and recognition in 3d continuous valence–arousal–dominance space. Multimedia Tools Appl. 76(2), 2159–2183 (2016)
Wang, J., Tang, Z., Li, X., Yu, M., Fang, Q., Liu, L.: Cross-modal knowledge distillation method for automatic cued speech recognition (2021)
Wang, L., Gao, R., Váncza, J., Krüger, J., Wang, X., Makris, S., Chryssolouris, G.: Symbiotic human-robot collaborative assembly. CIRP Ann. 68(2), 701–726 (2019)
Warden, P.: Speech commands: A dataset for limited-vocabulary speech recognition (2018)
Wilde, N., Kulić, D., Smith, S.L.: Learning user preferences in robot motion planning through interaction. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), 619–626 (2018)
Wongvibulsin, S., Frech, T. M., Chren, M.-M., Tkaczyk, E.R.: Expanding personalized, data-driven dermatology: Leveraging digital health technology and machine learning to improve patient outcomes. JID Innovations, 100105 (2022)
Yi, D., Su, J., Liu, C., Quddus, M., Chen, W.-H.: A machine learning based personalized system for driving state recognition. Transp. Res. Part C Emerg. Technol. 105, 241–261 (2019)
Zhao, M., Li, T., Alsheikh, M. A., Tian, Y., Zhao, H., Torralba, A., Katabi, D.: Through-wall human pose estimation using radio signals. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7356–7365 (2018)
Zhao, X., Zhang, S.: Facial expression recognition based on local binary patterns and kernel discriminant isomap. Sensors (Basel, Switzerland) 11, 9573–88 (2011)
Funding
Research supported by UBC Office of the Vice-President, Research and Innovation in the form of seed funding to establish research on digitalization of manufacturing and Mitacs Globalink Research Internship award, 2020.
Author information
Authors and Affiliations
Contributions
DM developed the methodology presented, and carried out the formal analysis, data curation, investigation, software development of the implemented framework and FER model, visualization, FER application with robot, writing—original draft; JH was involved in robot implementation, writing of Sect. 7, visualization; HV contributed to data curation, software development of FER model; SB helped in software development of VC classification model; HN performed funding acquisition, project administration, supervision, writing—review & editing
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Ethics approval
Not applicable
Consent to participate
Not applicable
Consent for publication
All authors have read and agreed to the published version of the manuscript
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file 1 (mp4 12215 KB)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mukherjee, D., Hong, J., Vats, H. et al. Personalization of industrial human–robot communication through domain adaptation based on user feedback. User Model User-Adap Inter (2024). https://doi.org/10.1007/s11257-024-09394-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11257-024-09394-1