Temporal relation algebra for audiovisual content analysis

  • Zein Al Abidin Ibrahim
  • Isabelle Ferrane
  • Philippe Joly


The context of this work is to characterize the content and the structure of audiovisual documents by analysing the temporal relationships between basic events resulted from different segmentations of the same document. For this objective, we need to represent and reason about time. We propose a parametric representation of temporal relation between segments (points or intervals) in which the parameters are used to characterize the relationship between two non-convex intervals corresponding to two segmentations in the video analysis domain. The relationship is represented by a co-occurrences matrix noted as Temporal Relation Matrix (TRM). Each document is represented by a set of TRMs computed between each couple of segmentations of the same document using different features. The TRMs are analysed later to detect semantic events, highlight clues about the video content structure or to classify documents based on their types. For higher-level semantic events and documents’ structure, we needed to apply some operations on the basic temporal relations and TRMs such as composition, disjunction, complement, intersection, etc. These operations brought to light more complex patterns; e.g. event 1 occurs at the same time of event 2 followed by event 3. In the work presented in this paper, we define a temporal relation algebra including its set of operations based on the parametric representation and TRM defined above. Several experimentations have been done on different audio and video documents to show the efficiency of the proposed representation and the defined operations for audiovisual content analysing.


Audiovisual document analysis Classification Structuring Representation Event detection Temporal relations algebra 



  1. 1.
    Allen JF (1983) Maintaining knowledge about temporal intervals. J Commun ACM 26(11):832–843CrossRefGoogle Scholar
  2. 2.
    Anant B, Cho J, Lee W, Ko B-S (2015) Sports highlights generation based on acoustic events detection: a rugby case study. In: IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, USAGoogle Scholar
  3. 3.
    Avrithis Y, Tsapatsoulis N, Kollias S (2000) Broadcast news parsing using visual cues: a robust face detection approach. In: IEEE International Conference on Multimedia and Expo (ICME2000), New York, USAGoogle Scholar
  4. 4.
    Balbiani P, Osmani A (1999) Représentation et Raisonnement sur les Intervalles Cycliques. In: Journées nationales sur les modèles de raisonnement (JNMR), FranceGoogle Scholar
  5. 5.
    Balbiani P, Condotta J-F, Ligozat G (2003) Reasoning about cyclic Space: axiomatic and computational aspects. In: Spatial Cognition III (SC 2002), AllemagneGoogle Scholar
  6. 6.
    Bigot B, Pinquier J, Ferrane I, Andre-Obrecht R (2012) Detecting individual role using features extracted from speaker diarization results. Multimed Tools Appl 60(2):347–369CrossRefGoogle Scholar
  7. 7.
    Bonzanini A, Leonardi R, Migliorati P (2001) Exploitation of temporal dependencies of descriptors to extract semanic information. In: International Conference on Very Low Bitrate Video Coding (VLBV2001), Athens, GreeceGoogle Scholar
  8. 8.
    Buchanan C, Zellweger P (1993) Automatic temporal layout mechanisms. In: ACM International Conference on Multimedia, California, USAGoogle Scholar
  9. 9.
    Chittaro L, Montanari A (1996) Trends in temporal representation and reasoning. Knowl Eng Rev 11(3):281–288CrossRefGoogle Scholar
  10. 10.
    Chittaro L, Montanari A (2002) Temporal representation and reasoning in artificial intelligence: issues and approaches. Ann Math Artif Intell 28(1–4):47–106MathSciNetzbMATHGoogle Scholar
  11. 11.
    Condotta J-F (2000) Problèmes de Satisfaction de Contraintes Spatiales: Algorithmes et Complexité. Institut de Recherche en Informatique de Toulouse, ToulouseGoogle Scholar
  12. 12.
    Cukierman D, Delgrande J (2004) A theory for convex interval relations including unbounded intervals. In: International Florida Artificial Intelligence Research Society Conference, Florida, USAGoogle Scholar
  13. 13.
    Dechter R, Meiri I, Pearl J (1991) Temporal constraint networks. Artif Intell 49(1–3):61–95MathSciNetCrossRefGoogle Scholar
  14. 14.
    Dingeldein D (1994) Modeling multimedia objects with MME. In: Eurographics Workshop on Object Oriented Graphics, Sintra, PortugalGoogle Scholar
  15. 15.
    Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2017) Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans Pattern Anal Mach Intell 39(4):677–691CrossRefGoogle Scholar
  16. 16.
    Duan L-Y, Xu M, Tian Q, Xu C-S, Jin J (2005) A unified framework for semantic shot classification in sports videos. In: IEEE Transactions on Multimedia, Juan-les-Pins, FranceGoogle Scholar
  17. 17.
    Duda A, Keramane C (1995) Structured temporal composition of multimedia data. In IEEE International Workshop on Multimedia Database Management Systems, New York, USAGoogle Scholar
  18. 18.
    Eickeler S, Muller S (1999) Content-based video indexing of TV broadcast news using hidden Markov models. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP1999)Google Scholar
  19. 19.
    Freksa C (1992) Temporal reasoning based on semi-intervals. Artif Intell 54(1–2):199–227MathSciNetCrossRefGoogle Scholar
  20. 20.
    Geng Y, Zhang G, Li W, Gu Y, Liang R-Z, Liang G, Wang J, Wu Y, Patil N, Wang J-Y (2017) A novel image tag completion method based on convolutional neural transforms. In: International Conference on Artificial Neural Networks, Alghero, ItalyGoogle Scholar
  21. 21.
    M. Golumbic and R. Shamir, "Complexity and algorithms for reasoning about time: a graph-theoretic approach," J ACM, vol. 40, no. 5, pp. 1108–1133, November 1993Google Scholar
  22. 22.
    Graves A, Mohamed A-r, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, CanadaGoogle Scholar
  23. 23.
    Han M, Hua W, Xu W, Gong Y (2002) An integrated baseball digest system using maximum entropy method. In: ACM International Conference on Multimedia, Juan Les Pins, FranceGoogle Scholar
  24. 24.
    Hayes P (1996) A catalog of temporal theories. University of Illinois, IllinoisGoogle Scholar
  25. 25.
    Hinton G, Deng L, Yu D, Dahl G, Mohamed A-r, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine 29(6): 82–97CrossRefGoogle Scholar
  26. 26.
    Hiroki I, Takiguchi T, Ariki Y (2012) 3D tracking of soccer players using time-situation graph in monocular image sequence. In: International Conference on Pattern Recognition (ICPR 2012), Tsukuba, JapanGoogle Scholar
  27. 27.
    Ibrahim ZAA (2007) Caracterisation des structures audiovisuelles par analyse statistique des relations temporelles. University of Paul Sabatier, ToulouseGoogle Scholar
  28. 28.
    Ibrahim ZAA, Ferrane I, Joly P (2006) Conversation detection in audiovisual documents: temporal relation analysis and error handling. In Proceedings of the 11th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Paris, FranceGoogle Scholar
  29. 29.
    Ibrahim ZAA, Ferrane I, Joly P (2011) A similarity-based approach for audiovisual document classification using temporal relation analysis. EURASIP Journal on Image and Video Processing, 2011Google Scholar
  30. 30.
    ISO-10744 (1992) Information technology - hypermedia / time-based structuring language (HyTime). ANSI, New YorkGoogle Scholar
  31. 31.
    Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, Washington, USAGoogle Scholar
  32. 32.
    Kautz H, Ladkin P (1991) Integrating metric and qualitative temporal reasoning. In: AAAI-91, California, USAGoogle Scholar
  33. 33.
    Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, Nevada, USAGoogle Scholar
  34. 34.
    Krokhin A, Jeavons P, Jonsson P (2003) The tractable subalgebras of Allen’s interval algebra. ACM 50(5):591–640MathSciNetzbMATHGoogle Scholar
  35. 35.
    Ladkin P (1986) Time representation: a taxonomy of interval relations. In: National Conference on Artificial Intelligence, Pennsylvania, USAGoogle Scholar
  36. 36.
    Ladkin P (1987) The logic of time representation. University of California, BerkeleyGoogle Scholar
  37. 37.
    Ligozat G (1991) On generalized interval calculi. In: National Conference on Artificial Intelligence (AAAI-91), California, USAGoogle Scholar
  38. 38.
    Ligozat G, Bestougeff H (1989) On relations between intervals. Inf Process Lett 34(4):177–182MathSciNetCrossRefGoogle Scholar
  39. 39.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, USAGoogle Scholar
  40. 40.
    Meiri I (1996) Combining qualitative and quantitative constraints in temporal reasoning. J Artif Intell 87(1–2):295–342MathSciNetGoogle Scholar
  41. 41.
    Moulin B (1992) Conceptual graph approach for the representation of temporal information in discourse. Knowl -Based Syst 5(3):183–192CrossRefGoogle Scholar
  42. 42.
    Navarette I, Marin R (1997) Qualitative temporal reasoning with points and durations. In: International Joint Conference on Artificial Intelligence (IJCAI), Nagoya, JapanGoogle Scholar
  43. 43.
    Nebel B, Burckert H-J (1995) Reasoning about temporal relations: a maximal tractable subclass of Allen's interval algebra. J ACM 42(1):43–66MathSciNetCrossRefGoogle Scholar
  44. 44.
    Pani AK, Bhattacharjee GP (2001) Temporal representation and reasoning in artificial intelligence: a review. J Math Comput Model 34(1–2):55–80CrossRefGoogle Scholar
  45. 45.
    Petrovic M, Mihajlovic V, Jonker W, Djordievic-Kajan S (2002) Multi-modal extraction of highlights from TV formula 1 programs. In: IEEE International Conference on Multimedia and Expo, Lausanne, SwitzerlandGoogle Scholar
  46. 46.
    Pujari A, Sattar A (1999) A new framework for reasoning about points, intervals and durations. In: International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, SwedenGoogle Scholar
  47. 47.
    Pujari A, Kumari V, Sattar, Abdul (1999) INDU: an interval and duration network. In: Australian Joint Conference on Artificial Intelligence, AustraliaGoogle Scholar
  48. 48.
    Qiu Z, Yao T, Mei T (2017) Deep quantization: encoding convolutional activations with deep generative model. In: IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  49. 49.
    Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-theshelf: an astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, Washington, USAGoogle Scholar
  50. 50.
    Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  51. 51.
    Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USAGoogle Scholar
  52. 52.
    Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRefGoogle Scholar
  53. 53.
    Ross G (2015) Fast R-CNN. In: IEEE International Conference on Computer Vision, Santiago, ChileGoogle Scholar
  54. 54.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Zhiheng H, Karpathy A, Khosla A, Bernstein M, Berg A, Fei-Fei L (2015) Image net large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRefGoogle Scholar
  55. 55.
    Schwalb E, Vila L (1998) Temporal constraints: a survey. Constraints 3(2):129–149MathSciNetCrossRefGoogle Scholar
  56. 56.
    Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: International Conference on Neural Information Processing Systems, Montreal, CanadaGoogle Scholar
  57. 57.
    Tang S, Zhi M (2015) Summary generation method based on audio feature. In: IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, ChinaGoogle Scholar
  58. 58.
    Tarski A (1941) On the calculus of relations. Symbolic Logic 6(3):73–89MathSciNetCrossRefGoogle Scholar
  59. 59.
    Tavassolipour M, Karimian M, Kasaei S (2014) Event detection and summarization in soccer videos using Bayesian network and copula. IEEE Trans Circ Syst Video Technol 24(2):291–304CrossRefGoogle Scholar
  60. 60.
    Tovinkere V, Qian RJ (2001) Detecting semantic events in soccer games: towards a complete solution. In: IEEE International Conference on Multimedia and Expo (ICME2001), Tokyo, JapanGoogle Scholar
  61. 61.
    Van Beek P, Cohen R (1990) Exact and approximate reasoning about temporal relations. Comput Intell 6(3):132–144CrossRefGoogle Scholar
  62. 62.
    Vila L (1994) A survey on temporal reasoning in artificial intelligence. J Artif Intel Commun 7(1):4–28Google Scholar
  63. 63.
    Vilain MB (1982) A system for reasoning about time. In: National Conference on Artificial Intelligence (AAAI82), Pittsburgh, USAGoogle Scholar
  64. 64.
    Vilain M, Kautz H (1986) Constraint propagation algorithms for temporal reasoning. In: National Conference on Artificial Intelligence (AAAI86), Philadelphia, USAGoogle Scholar
  65. 65.
    Vilain M, Kautz H, Van Beek P (1990) Constraint propagation algorithms for temporal reasoning: a revised report. In: Weld DS, Kleer JD (eds) Readings in qualitative reasoning about physical systems. Morgan Kaufmann, San Francisco, pp 373–381CrossRefGoogle Scholar
  66. 66.
    Wang Q, Wan J, Yuan Y (2017) Deep metric learning for crowdedness regression. IEEE Transactions on Circuits and Systems for Video TechnologyGoogle Scholar
  67. 67.
    Q. Wang, J. Gao and Y. Yuan, "A Joint Convolutional Networks and context transfer for street scenes labeling," IEEE Trans Intell Transp Syst, vol. 19, no. 5, pp. 1457–1470, 2017CrossRefGoogle Scholar
  68. 68.
    Wei L, Anguelov D, Erhan D, Szegedy C, Reed S, Cheng-Yang F, Berg A (2016) SSD: single shot multibox detector. In: European Conference on Computer Vision, Amsterdam, NetherlandsGoogle Scholar
  69. 69.
    Wetprasit R, Sattar A (1998) Temporal reasoning with qualitative and quantitative information about points and durations. In: National Conference on Artificial Intelligence (AAAI), Madison, USAGoogle Scholar
  70. 70.
    Xie L, Chang S-F, Divakaran A, Sun H (2002) Structure analysis of soccer video with hidden markov models. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2002), Florida, USAGoogle Scholar
  71. 71.
    Yu H, Wang J, Huang Z, Yang Y, Xu W (2016) Video paragraph captioning using hierarchical recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)Google Scholar
  72. 72.
    Zha S, Luisier F, Andrews W, Srivastava N, Salakhutdinov R (2015) Exploiting image-trained CNN architectures for unconstrained video classification. In: British Machine Vision ConferenceGoogle Scholar
  73. 73.
    Zhang S, Zhang C (2002) Propagating temporal relations of intervals by matrix. Appl Artif Intell 16(1):1–27CrossRefGoogle Scholar
  74. 74.
    Zhang G, Liang G, Li W, Fang J, Wang J, Geng Y, Wang J-Y (2017) Learning convolutional ranking-score function by query preference regularization. In: International Conference on Intelligent Data Engineering and Automated Learning, Guilin, ChinaGoogle Scholar
  75. 75.
    Zhou W, Vellaikal A, Kuo CCJ (2000) Rule-based video classification system for basketball video indexing. In: ACM Workshops on Multimedia, New York, USAGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.LARIFA Team, Faculty of Sciences – HadathLebanese UniversityBeirutLebanon
  2. 2.SAMOVA Team, IRITUniversity of Paul SabatierToulouseFrance

Personalised recommendations