Abstract
Human-computer interfaces and multimodal interaction are increasingly used in everyday life. Environments equipped with sensors are able to acquire and interpret a wide range of information, thus assisting humans in several application areas, such as behaviour understanding, event detection, action recognition, and many others. In these areas, the suitable processing of this information is a key factor to properly structure multimodal data. In particular, heterogeneous devices and different acquisition times can be exploited to improve recognition results. On the basis of these assumptions, in this paper, a multimodal system based on Allen’s temporal logic combined with a prevision method is proposed. The main target of the system is to correlate user’s events with system’s reactions. After the post-processing data coming from different acquisition devices (e.g., RGB images, depth maps, sounds, proximity sensors), the system manages the correlations between recognition/detection results and events, in real-time, thus creating an interactive environment for users. To increase the recognition reliability, a predictive model is also associated with the method. Modularity of the system grants a full dynamic development and upgrade with customized modules. Finally, comparisons with other similar systems are shown, thus underlining the high flexibility and robustness of the proposed event management method.
Similar content being viewed by others
References
Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843
Avola D, Cinque L, Foresti GL, Massaroni C, Pannone D (2017) A keypoint-based method for background modeling and foreground detection using a ptz camera. Pattern Recogn Lett 96:96–105
Avola D, Cinque L, Foresti GL, Marini MR, Pannone D (2018) Vrheab: a fully immersive motor rehabilitation system based on recurrent neural network. Multimed Tools Appl 77(19):24, 955–24, 982
Avola D, Bernardi M, Cinque L, Foresti GL, Massaroni C (2019) Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures. IEEE Trans Multimed 21(1):234–245
Avola D, Bernardi M, Foresti GL (2019) Fusing depth and colour information for human action recognition. Multimed Tools Appl 78(5):5919–5939
Avola D, Cinque L, Foresti GL, Marini MR (2019) An interactive and low-cost full body rehabilitation framework based on 3d immersive serious games. J Biomed Inform 89:81–100
Bennett B, Cohn AG, Wolter F, Zakharyaschev M (2002) Multi-dimensional modal logic as a framework for spatio-temporal reasoning. Appl Intell 17(3):239–251
Cheng G, Wan Y, Buckles BP, Huang Y (2014) An introduction to markov logic networks and application in video activity analysis. In: Proceedings of the international conference on computing, communication and networking technologies (ICCCNT), pp 1–7
Crispim-Junior CF, Buso V, Avgerinakis K, Meditskos G, Briassouli A, Benois-Pineau J, Kompatsiaris IY, Bremond F (2016) Semantic event fusion of different visual modality concepts for activity recognition. IEEE Trans Pattern Anal Mach Intell 38(8):1598–1611
Fan H, Zheng L, Yan C, Yang Y (2018) Unsupervised person re-identification: clustering and fine-tuning. ACM Trans Multimed Comput Commun Appl 14(4):83: 1–83: 18
Ghojogh B, Mohammadzade H, Mokari M (2018) Fisherposes for human action recognition using kinect sensor data. IEEE Sensors J 18(4):1612–1627
Houmanfar R, Karg M, Kulić D (2016) Movement analysis of rehabilitation exercises: distance metrics for measuring patient progress. IEEE Syst J 10(3):1014–1025
Jaimes A, Sebe N (2007) Multimodal human-computer interaction: a survey. Comput Vis Image Understand 108(1-2):116–134
Kemeny JG, Snell JL (1960) Finite Markov Chains, University Series in Undergraduate Mathematics, vol 356. van Nostrand Princeton, NJ
Lalanne D, Nigay L, Robinson P, Vanderdonckt J, Ladry JF et al (2009) Fusion engines for multimodal input: a survey. In: Proceedings of the international conference on multimodal interfaces (ICMI), pp 153–160
Mehlmann GU, André E (2012) Modeling multimodal integration with event logic charts. In: Proceedings of the international conference on multimodal interaction, pp 125–132
Richardson M, Domingos P (2006) Markov logic networks. Machine learning 62(1-2):107–136
Seide F, Agarwal A (2016) Cntk: Microsoft’s open-source deep-learning toolkit. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 2135–2135
Song YC, Kautz H, Allen J, Swift M, Li Y, Luo J, Zhang C (2013) A markov logic framework for recognizing complex events from multimodal data. In: Proceedings of the international conference on multimodal interaction (ICMI), pp 141–148
Wan Y, Santiteerakul W, Cheng G, Buckles B, Parberry I (2013) A representation for human gesture recognition and beyond. In: International conference on computing, communications and networking technologies (ICCCNT), pp 1–6
Yu S, Cheng Y, Su S, Cai G, Li S (2017) Stratified pooling based deep convolutional neural networks for human action recognition. Multimed Tools Appl 76 (11):13, 367–13, 382
Zeng W, Wang C, Wang Q (2018) Hand gesture recognition using leap motion via deterministic learning. Multimed Tools Appl 77(21):28, 185–28, 206
Zhang Y, Ji Q, Lu H (2013) Event detection in complex scenes using interval temporal constraints. In: Proceedings of the international conference on computer vision (ICCV), pp 3184–3191
Acknowledgements
This work was supported in part by the MIUR under grant “Departments of Excellence 2018-2022” of the Department of Computer Science of Sapienza University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Avola, D., Cinque, L., Del Bimbo, A. et al. MIFTel: a multimodal interactive framework based on temporal logic rules. Multimed Tools Appl 79, 13533–13558 (2020). https://doi.org/10.1007/s11042-019-08590-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08590-1