Advertisement

Interplay Between Visual and Audio Scene Analysis

  • Ziyou Xiong
  • Thomas S. Huang

Conclusions

We have argued the necessity of joint audio-visual scene analysis to deal with the difficult problem of CASA. It is argued that the problem of CASA will benefit from computer audio-visual scene analysis (CAVSA). We also propose a generative probabilistic model on correlogram, the video representation of audio signal, to separate the audio sources.

Keywords

Audio Signal Probabilistic Inference Scene Analysis Markov Blanket Generative Probabilistic Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cooke, M.P., 1993, Modeling Auditory Processing and Organization, Cambridge University Press, Cambridge, U.K.Google Scholar
  2. Dempster, A., Laird, N., and Rubin, D., 1997, Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Statist. Soc. B, vol. 39, no. 1, pp. 1–38.MathSciNetGoogle Scholar
  3. Frey, B.J. and Jojic, N,, 1999, Learning mixture models of images and inferring spatial transformations using the EM algorithm, Proceedings of the IEEE Computer Vision and Pattern Recognition, pp. 416–422.Google Scholar
  4. Frey, B.J., and Jojic, N., 2003, Transformation-invariant clustering using the em algorithm, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 1, pp. 1–17.CrossRefGoogle Scholar
  5. Jojic, N. and Frey, B.J., 2001, Learning flexible sprites in video layers, Proceedings of the IEEE Computer Vision and Pattern Recognition, December 2001.Google Scholar
  6. Jojic, N., Petrovic, N., and Huang, T.S., 2003, Scene generative models for adaptive video fast forward, Proceedings. International Conference on Image Processing, vol. 2, pp. 619–622.Google Scholar
  7. Jojic, N., Petrovic, N., Frey, B.J., and Huang, T.S., 2000, Transformed hidden Markov models: estimating mixture models of images and inferring spatial transformations in video sequences, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 26–33.Google Scholar
  8. Neal, R.M. and Hinton, G.E., 1998, A view of the EM algorithm that justifies incremental, sparse, and other variants, in: Learning in Graphical Models, M.I. Jordan, ed., Kluwer Academic Publishers, pp. 355–368.Google Scholar
  9. Pearl, J., 1988, Probabilistic Reasoning in Intelligent Systems, Kaufmann, 2nd edition.Google Scholar
  10. Slaney, M., Narr, D., and Lyon, R.F., 1994, Auditory model inversion for sound separation, Proceedings of the ICASSP 94, vol. II, pp. 77–80.Google Scholar
  11. Tao, H., Kumar, R., and Sawhney, H.S., 2000, Dynamic layer representation with applications to tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 134–141.Google Scholar
  12. Williams, C., and Titsias, M.K., 2002, Learning about multiple objects in images: Factorial learning without factorial search, Advances in Neural Information Processing Systems(NIPS).Google Scholar
  13. Wilson, A.D. and Bobick, A.F., 1999, Parametric hidden Markov models for gesture recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 9, pp. 884–900.CrossRefGoogle Scholar
  14. Xiong, Z., Radhakrishnan, R., Divakaran, A., and Huang, T.S., (submitted), Audio-visual sports highlights extraction using coupled hidden markov models, Pattern Analysis and Application Journal, Special Issue on Video Based Event Detection.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  • Ziyou Xiong
    • 1
  • Thomas S. Huang
    • 1
  1. 1.Dept. of Computer And Electrical EngineeringUniversity Of Illinois At Urbana-champagneUSA

Personalised recommendations