Modeling Timing Structure in Multimedia Signals

  • Hiroaki Kawashima
  • Kimitaka Tsutsumi
  • Takashi Matsuyama
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4069)


Modeling and describing temporal structure in multimedia signals, which are captured simultaneously by multiple sensors, is important for realizing human machine interaction and motion generation. This paper proposes a method for modeling temporal structure in multimedia signals based on temporal intervals of primitive signal patterns. Using temporal difference between beginning points and the difference between ending points of the intervals, we can explicitly express timing structure; that is, synchronization and mutual dependency among media signals. We applied the model to video signal generation from an audio signal to verify the effectiveness.


Audio Signal Timing Structure Media Signal Mode Pair Active Appearance Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allen, J.F.: Maintaining knowledge about temporal interval. Commun. of the ACM 26(11), 832–843 (1983)MATHCrossRefGoogle Scholar
  2. 2.
    Brand, M.: Voice puppetry. In: Proc. SIGGRAPH, pp. 21–28 (1999)Google Scholar
  3. 3.
    Brand, M., Oliver, N., Pentland, A.: Coupled hidden Markov models for complex action recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 994–999 (1997)Google Scholar
  4. 4.
    Bregler, C.: Learning and recognizing human dynamics in video sequences. In: Proc. Int. Conference on Computer Vision and Pattern Recognition, pp. 568–574 (1997)Google Scholar
  5. 5.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, pp. 484–498. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  6. 6.
    Kawashima, H., Matsuyama, T.: Multiphase learning for an interval-based hybrid dynamical system. IEICE Trans. Fundamentals E88-A(11), 3022–3035 (2005)CrossRefGoogle Scholar
  7. 7.
    Levinson, S.E.: Continuously variable duration hidden Markov models for automatic speech recognition. Computer Speech and Language 1, 29–45 (1986)CrossRefGoogle Scholar
  8. 8.
    Li, Y., Wang, T., Shum, H.-Y.: Motion texture: A two-level statistical model for character motion synthesis. In: Proc. SIGGRAPH, pp. 465–472 (2002)Google Scholar
  9. 9.
    McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature, 746–748 (1976)Google Scholar
  10. 10.
    Murphy, K.P.: Hidden semi-Markov models (HSMMs). Informal Notes (2002)Google Scholar
  11. 11.
    Nefian, A.V., Liang, L., Pi, X., Liu, X., Murphy, K.: Dynamic Bayesian networks for audio-visual speech recognition. EURASIP Journal on Applied Signal Processing 2002(11), 1–15 (2002)Google Scholar
  12. 12.
    Nishiyama, M., Kawashima, H., Hirayama, T., Matsuyama, T.: Facial expression representation based on timing structures in faces. In: Zhao, W., Gong, S., Tang, X. (eds.) AMFG 2005. LNCS, vol. 3723, pp. 140–154. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Ostendorf, M., Digalakis, V., Kimball, O.A.: From HMMs to segment models: A unified view of stochastic modeling for speech recognition. IEEE Trans. Speech and Audio Process 4(5), 360–378 (1996)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hiroaki Kawashima
    • 1
  • Kimitaka Tsutsumi
    • 1
  • Takashi Matsuyama
    • 1
  1. 1.Kyoto UniversityKyotoJapan

Personalised recommendations