Skip to main content
Log in

Information assimilation framework for event detection in multimedia surveillance systems

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript


Most multimedia surveillance and monitoring systems nowadays utilize multiple types of sensors to detect events of interest as and when they occur in the environment. However, due to the asynchrony among and diversity of sensors, information assimilation – how to combine the information obtained from asynchronous and multifarious sources is an important and challenging research problem. In this paper, we propose a framework for information assimilation that addresses the issues – “when”, “what” and “how” to assimilate the information obtained from different media sources in order to detect events in multimedia surveillance systems. The proposed framework adopts a hierarchical probabilistic assimilation approach to detect atomic and compound events. To detect an event, our framework uses not only the media streams available at the current instant but it also utilizes their two important properties – first, accumulated past history of whether they have been providing concurring or contradictory evidences, and – second, the system designer’s confidence in them. The experimental results show the utility of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others


  1. Atrey, P.K., Kankanhalli, M.S., Jain, R.: Timeline-based information assimilation in multimedia surveillance and monitoring systems. In: The ACM International Workshop on Video Surveillance and Sensor Networks. Singapore, pp. 103–112 (2005)

  2. Atrey, P.K., Kankanhalli, M.S., Oommen, J.B.: Goal-oriented optimal subset selection of correlated multimedia streams. ACM Trans. Multimed. Comput. Commun. Appl. (2006) (in press)

  3. Atrey, P.K., Maddage, N.C., Kankanhalli, M.S.: Audio based event detection for multimedia surveillance. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. Toulouse, France, pp. V813–V816 (2006)

  4. Benediktsson J.A., Kanellopoulos I. (1999) Classification of multisource and hyperspectral data based on decision fusion. IEEE Trans. GeoSci. Remote Sens. 37(3): 1367–1377

    Article  Google Scholar 

  5. Bloch D.A., Kraemer H.C. (1989) 2 ×  2 Kappa coefficients: Measures of agreement or association. J. Biom. 45(1): 269–287

    MATH  Google Scholar 

  6. Chair Z., Varshney P.R. (1986) Optimal data fusion in multiple sensor detection systems. IEEE Trans. Aerosp. Electron. Syst. 22, 98–101

    Google Scholar 

  7. Checka, N., Wilson, K.W., Siracusa, M.R., Darrell, T.: Multiple person and speaker activity tracking with a particle filter. In: International Conference on Acoustics Speech and Signal Processing

  8. Chieu, H.L., Lee, Y.K.: Query based event extraction along a timeline. In: International ACM SIGIR Conference on Research and development in Information Retrieval. Sheffield, UK, pp. 425–432 (2004)

  9. Genest C., Zidek J.V. (1986) Combining probability distributions: a critique and annotated bibliography. J. Stat. Sci. 1(1): 114–118

    MATH  MathSciNet  Google Scholar 

  10. Hershey, J., Attias, H., Jojic, N., Krisjianson, T.: Audio visual graphical models for speech processing. In: IEEE International Conference on Speech, Acoustics, and Signal Processing. Montreal, Canada, pp. V649–V652 (2004)

  11. Kam M., Zhu Q., Gray W.S. (1992) Optimal data fusion of correlated local decisions in multiple sensor detection systems. IEEE Trans. Aerosp. Electron. Syst. 28(3): 916–920

    Article  Google Scholar 

  12. Lin L.I.-K. (1989) A concordance correlation coefficient to evaluate reproducibility. J. Biom. 45(1): 255–268

    MATH  Google Scholar 

  13. Maddage, N.C.: Content based music structure analysis. Ph.D. thesis, School of Computing, National University of Singapore (2006)

  14. Nefian A.V., Liang L., Pi X., Liu X., Murphye K. (2002) Dynamic bayesian networks for audio-visual speech recognition. EURASIP J. Appl. Signal Process. 11, 1–15

    Google Scholar 

  15. Nock, H.J., Iyengar, G., Neti, C.: Assessing face and speech consistency for monologue detection in video. In: ACM International Conference on Multimedia (2002)

  16. Rao B.S., Whyte H.D. (1993) A decentralized bayesian algorithm for identification of tracked objects. IEEE Trans. Syst. Man Cybernet. 23, 1683–1698

    Article  Google Scholar 

  17. Siegel, M., Wu, H.: Confidence fusion. In: IEEE International Workshop on Robot Sensing, pp. 96–99 (1993)

  18. Stauffer, C., Grimson, W.E.L.: Adaptive background mixture models for real-time tracking. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2. Ft. Collins, CO, USA, pp. 252–258 (1999)

  19. Wu, Y., Chang, E.Y., Chang, K.C.-C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: ACM International Conference on Multimedia. New York, USA, pp. 572–579 (2004)

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Pradeep Kumar Atrey.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Atrey, P.K., Kankanhalli, M.S. & Jain, R. Information assimilation framework for event detection in multimedia surveillance systems. Multimedia Systems 12, 239–253 (2006).

Download citation

  • Published:

  • Issue Date:

  • DOI: