Advertisement

Learning Multi-modal Dictionaries: Application to Audiovisual Data

  • Gianluca Monaci
  • Philippe Jost
  • Pierre Vandergheynst
  • Boris Mailhe
  • Sylvain Lesage
  • Rémi Gribonval
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4105)

Abstract

This paper presents a methodology for extracting meaningful synchronous structures from multi-modal signals. Simultaneous processing of multi-modal data can reveal information that is unavailable when handling the sources separately. However, in natural high-dimensional data, the statistical dependencies between modalities are, most of the time, not obvious. Learning fundamental multi-modal patterns is an alternative to classical statistical methods. Typically, recurrent patterns are shift invariant, thus the learning should try to find the best matching filters. We present a new algorithm for iteratively learning multi-modal generating functions that can be shifted at all positions in the signal. The proposed algorithm is applied to audiovisual sequences and it demonstrates to be able to discover underlying structures in the data.

Keywords

Video Sequence Unconstrained Problem Video Representation Classical Statistical Method Redundant Dictionary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Martínez-Montes, E., Valdés-Sosa, P.A., Miwakeichi, F., Goldman, R.I., Cohen, M.S.: Concurrent EEG/fMRI analysis by multiway partial least squares. Neuroimage 22, 1023–1034 (2004)CrossRefGoogle Scholar
  2. 2.
    Carmona-Moreno, C., Belward, A., Malingreau, J.P., Garcia-Alegre, M., Hartley, A., Antonovskiy, M., Buchshtaber, V., Pivovarov, V.: Characterizing inter-annual variations in global fire calendar using data from earth observing satellites. Global Change Biology 11, 1537–1555 (2005)CrossRefGoogle Scholar
  3. 3.
    Smaragdis, P., Casey, M.: Audio/visual independent components. In: Proc. of ICA, pp. 709–714 (2003)Google Scholar
  4. 4.
    Fisher III, J.W., Darrell, T.: Speaker association with signal-level audiovisual fusion. IEEE Transactions on Multimedia 6, 406–413 (2004)CrossRefGoogle Scholar
  5. 5.
    Kidron, E., Schechner, Y., Elad, M.: Pixels that sound. In: CVPR, pp. 88–95 (2005)Google Scholar
  6. 6.
    Monaci, G., Divorra Escoda, O., Vandergheynst, P.: Analysis of multimodal sequences using geometric video representations. Signal Processing (in press, 2006), [Online] available: http://lts2www.epfl.ch/
  7. 7.
    Driver, J.: Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading. Nature 381, 66–68 (1996)CrossRefGoogle Scholar
  8. 8.
    Bell, A., Sejnowski, T.: The “independent components” of natural scenes are edge filters. Vision research 37, 3327–3338 (1997)CrossRefGoogle Scholar
  9. 9.
    Lewicki, M., Sejnowski, T.: Learning overcomplete representations. Neural Computation 12, 337–365 (2000)CrossRefGoogle Scholar
  10. 10.
    Abdallah, S., Plumbley, M.: If edges are the independent components of natural images, what are the independent components of natural sounds? In: Proc. of ICA, pp. 534–539 (2001)Google Scholar
  11. 11.
    Kreutz-Delgado, K., Murray, J., Rao, B., Engan, K., Lee, T., Sejnowski, T.: Dictionary learning algorithms for sparse representation. Neural Computation 15, 349–396 (2003)MATHCrossRefGoogle Scholar
  12. 12.
    Olshausen, B.: Learning sparse, overcomplete representations of time-varying natural images. In: Proc. of ICIP (2003)Google Scholar
  13. 13.
    Jost, P., Vandergheynst, P., Lesage, S., Gribonval, R.: MoTIF: an efficient algorithm for learning translation invariant dictionaries. In: Proc. of ICASSP (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Gianluca Monaci
    • 1
  • Philippe Jost
    • 1
  • Pierre Vandergheynst
    • 1
  • Boris Mailhe
    • 2
  • Sylvain Lesage
    • 2
  • Rémi Gribonval
    • 2
  1. 1.Signal Processing InstituteEcole Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland
  2. 2.IRISA-INRIARennesFrance

Personalised recommendations