Model-based similarity estimation of multidimensional temporal sequences

  • Romain Tavenard
  • Laurent Amsaleg
  • Guillaume Gravier


Content-based queries in multimedia sequence databases where information is sequential is a tough issue, especially when dealing with large-scale applications. One of the key points is similarity estimation between a query sequence and elements of the database. In this paper, we investigate two ways to compare multimedia sequences, one—that comes from the literature—being computed in the feature space while the other one is computed in a model space, leading to a representation less sensitive to noise. We compare these approaches by testing them on a real audio dataset, which points out the utility of working in the model space.


Multidimensional feature sequences Support vector regression Temporal aspects Similarity estimation in a model space 


  1. 1.
    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410Google Scholar
  2. 2.
    Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proceedings of the 47th annual IEEE symposium on foundations of computer science. IEEE, Piscataway, pp 459–468Google Scholar
  3. 3.
    Bouthemy P, Gelgon M, Ganansia F (1999) A unified approach to shot change detection and camera motion characterization. IEEE Trans Circuits Syst Video Technol 9(7):1030–1044CrossRefGoogle Scholar
  4. 4.
    Bruno E, Marchand-Maillet S (2003) Prédiction temporelle de descripteurs visuels pour la mesure de similarité entre vidéos. In: Proceedings of the GRETSI’03. FranceGoogle Scholar
  5. 5.
    Chen L, Ng R (2004) On the marriage of lp-norms and edit distance. In: Proceedings of the 30th international conference on very large data bases. Toronto, 29 August–3 September 2004, pp 792–803Google Scholar
  6. 6.
    Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23th international conference on very large data bases. Athens, Greece, August 1997. Morgan Kaufmann, San Mateo, pp 426–435Google Scholar
  7. 7.
    Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Audio Speech Lang Process 28(4):357–366Google Scholar
  8. 8.
    Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: Experimental comparison of representations and distance measures. In: Proceedings of the 34th international conference on very large data bases. Auckland, 23–28 August 2008Google Scholar
  9. 9.
    Keogh E (2002) Exact indexing of dynamic time warping. In: Proceedings of the 28th international conference on very large data bases. Hong Kong, 20–23 August 2002, pp 406–417Google Scholar
  10. 10.
    Lejsek H, Ásmundsson FH, Jónsson BÞ, Amsaleg L (2009) NV-tree: an efficient disk-based index for approximate search in very large high-dimensional collections. IEEE Trans Pattern Anal Mach Intell 31(5):869–883. doi:10.1109/TPAMI.2008.130 CrossRefGoogle Scholar
  11. 11.
    Law-To J, Chen L, Joly A, Laptev I, Buisson O, Gouet-Brunet V, Boujemaa N, Stentiford F (2007) Video copy detection: a comparative study. In: Proceedings of the 6th ACM international conference on image and video retrieval. New York, NY, USA, July 2007. ACM, New York, pp 371–378Google Scholar
  12. 12.
    Mercer J (1909) Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc Lond A Contain Pap Math Phys Character 209:415–446Google Scholar
  13. 13.
    Muscariello A, Gravier G, Bimbot F (2009) Variability tolerant audio motif discovery. In: The 15th international multimedia modeling conference. Sophia Antipolis, 7–9 January 2009Google Scholar
  14. 14.
    Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. New York, 17–22 June 2006Google Scholar
  15. 15.
    Sakoe H, Chiba S (1978) Dynamic programming optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26:43–49zbMATHCrossRefGoogle Scholar
  16. 16.
  17. 17.
    Tavenard R, Amsaleg L, Gravier G (2007) Machines à vecteurs supports pour la comparaison de séquences de descripteurs. In: Proceedings of the 12th CORESA, pp 247–251Google Scholar
  18. 18.
    Vapnik VN (1995) The nature of statistical learning theory. Springer, New YorkzbMATHGoogle Scholar
  19. 19.
    Vapnik V, Golowich S, Smola A (1997) Support vector method for function approximation. In: Mozer M, Jordan M, Petsche T (eds.) Neural information processing systems, vol 9. MIT, CambridgeGoogle Scholar
  20. 20.
    Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1:80–83CrossRefGoogle Scholar
  21. 21.
    Yi B, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of the 14th international conference on data engineering, pp 201–208Google Scholar

Copyright information

© Institut TELECOM and Springer-Verlag 2009

Authors and Affiliations

  • Romain Tavenard
    • 1
  • Laurent Amsaleg
    • 2
  • Guillaume Gravier
    • 2
  1. 1.IRISA / ENS CachanRennes CedexFrance
  2. 2.CNRS / IRISARennes CedexFrance

Personalised recommendations