Skip to main content
Log in

MIFTel: a multimodal interactive framework based on temporal logic rules

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Human-computer interfaces and multimodal interaction are increasingly used in everyday life. Environments equipped with sensors are able to acquire and interpret a wide range of information, thus assisting humans in several application areas, such as behaviour understanding, event detection, action recognition, and many others. In these areas, the suitable processing of this information is a key factor to properly structure multimodal data. In particular, heterogeneous devices and different acquisition times can be exploited to improve recognition results. On the basis of these assumptions, in this paper, a multimodal system based on Allen’s temporal logic combined with a prevision method is proposed. The main target of the system is to correlate user’s events with system’s reactions. After the post-processing data coming from different acquisition devices (e.g., RGB images, depth maps, sounds, proximity sensors), the system manages the correlations between recognition/detection results and events, in real-time, thus creating an interactive environment for users. To increase the recognition reliability, a predictive model is also associated with the method. Modularity of the system grants a full dynamic development and upgrade with customized modules. Finally, comparisons with other similar systems are shown, thus underlining the high flexibility and robustness of the proposed event management method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843

    Article  Google Scholar 

  2. Avola D, Cinque L, Foresti GL, Massaroni C, Pannone D (2017) A keypoint-based method for background modeling and foreground detection using a ptz camera. Pattern Recogn Lett 96:96–105

    Article  Google Scholar 

  3. Avola D, Cinque L, Foresti GL, Marini MR, Pannone D (2018) Vrheab: a fully immersive motor rehabilitation system based on recurrent neural network. Multimed Tools Appl 77(19):24, 955–24, 982

    Article  Google Scholar 

  4. Avola D, Bernardi M, Cinque L, Foresti GL, Massaroni C (2019) Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures. IEEE Trans Multimed 21(1):234–245

    Article  Google Scholar 

  5. Avola D, Bernardi M, Foresti GL (2019) Fusing depth and colour information for human action recognition. Multimed Tools Appl 78(5):5919–5939

    Article  Google Scholar 

  6. Avola D, Cinque L, Foresti GL, Marini MR (2019) An interactive and low-cost full body rehabilitation framework based on 3d immersive serious games. J Biomed Inform 89:81–100

    Article  Google Scholar 

  7. Bennett B, Cohn AG, Wolter F, Zakharyaschev M (2002) Multi-dimensional modal logic as a framework for spatio-temporal reasoning. Appl Intell 17(3):239–251

    Article  Google Scholar 

  8. Cheng G, Wan Y, Buckles BP, Huang Y (2014) An introduction to markov logic networks and application in video activity analysis. In: Proceedings of the international conference on computing, communication and networking technologies (ICCCNT), pp 1–7

  9. Crispim-Junior CF, Buso V, Avgerinakis K, Meditskos G, Briassouli A, Benois-Pineau J, Kompatsiaris IY, Bremond F (2016) Semantic event fusion of different visual modality concepts for activity recognition. IEEE Trans Pattern Anal Mach Intell 38(8):1598–1611

    Article  Google Scholar 

  10. Fan H, Zheng L, Yan C, Yang Y (2018) Unsupervised person re-identification: clustering and fine-tuning. ACM Trans Multimed Comput Commun Appl 14(4):83: 1–83: 18

    Article  Google Scholar 

  11. Ghojogh B, Mohammadzade H, Mokari M (2018) Fisherposes for human action recognition using kinect sensor data. IEEE Sensors J 18(4):1612–1627

    Article  Google Scholar 

  12. Houmanfar R, Karg M, Kulić D (2016) Movement analysis of rehabilitation exercises: distance metrics for measuring patient progress. IEEE Syst J 10(3):1014–1025

    Article  Google Scholar 

  13. Jaimes A, Sebe N (2007) Multimodal human-computer interaction: a survey. Comput Vis Image Understand 108(1-2):116–134

    Article  Google Scholar 

  14. Kemeny JG, Snell JL (1960) Finite Markov Chains, University Series in Undergraduate Mathematics, vol 356. van Nostrand Princeton, NJ

  15. Lalanne D, Nigay L, Robinson P, Vanderdonckt J, Ladry JF et al (2009) Fusion engines for multimodal input: a survey. In: Proceedings of the international conference on multimodal interfaces (ICMI), pp 153–160

  16. Mehlmann GU, André E (2012) Modeling multimodal integration with event logic charts. In: Proceedings of the international conference on multimodal interaction, pp 125–132

  17. Richardson M, Domingos P (2006) Markov logic networks. Machine learning 62(1-2):107–136

    Article  Google Scholar 

  18. Seide F, Agarwal A (2016) Cntk: Microsoft’s open-source deep-learning toolkit. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 2135–2135

  19. Song YC, Kautz H, Allen J, Swift M, Li Y, Luo J, Zhang C (2013) A markov logic framework for recognizing complex events from multimodal data. In: Proceedings of the international conference on multimodal interaction (ICMI), pp 141–148

  20. Wan Y, Santiteerakul W, Cheng G, Buckles B, Parberry I (2013) A representation for human gesture recognition and beyond. In: International conference on computing, communications and networking technologies (ICCCNT), pp 1–6

  21. Yu S, Cheng Y, Su S, Cai G, Li S (2017) Stratified pooling based deep convolutional neural networks for human action recognition. Multimed Tools Appl 76 (11):13, 367–13, 382

    Article  Google Scholar 

  22. Zeng W, Wang C, Wang Q (2018) Hand gesture recognition using leap motion via deterministic learning. Multimed Tools Appl 77(21):28, 185–28, 206

    Article  Google Scholar 

  23. Zhang Y, Ji Q, Lu H (2013) Event detection in complex scenes using interval temporal constraints. In: Proceedings of the international conference on computer vision (ICCV), pp 3184–3191

Download references

Acknowledgements

This work was supported in part by the MIUR under grant “Departments of Excellence 2018-2022” of the Department of Computer Science of Sapienza University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danilo Avola.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Avola, D., Cinque, L., Del Bimbo, A. et al. MIFTel: a multimodal interactive framework based on temporal logic rules. Multimed Tools Appl 79, 13533–13558 (2020). https://doi.org/10.1007/s11042-019-08590-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08590-1

Keywords

Navigation