Multimedia Tools and Applications

, Volume 74, Issue 10, pp 3579–3598 | Cite as

Comparative study of methods for reducing dimensionality of MPEG-7 audio signature descriptors

  • Shingchern D. YouEmail author
  • Wei-Hwa Chen


We study how to reduce the dimensionality of the MPEG-7 audio signature descriptors in this paper. With the aid of the dimension-reduced descriptors, the comparison time for detecting copyrighted audio can be significantly reduced. The studied methods include block average, principal component analysis (PCA), Hadamard transform, Haar transform, and CDF (Cohen-Daubechies-Feauveau) 9/7 wavelet transform. For the latter four methods, we also examine whether different partition methods would affect the accuracy. The simulation results show that different reduction methods should use different partition strategies for best accuracy. In addition, we also compare the computational complexity of these methods. The experimental results show that, except the CDF 9/7 method, the rest four methods yield comparable accuracy for undistorted and MP-3 coded audio. When also considering the computational complexity, the block average method is a better choice.


MPEG-7 audio signature descriptor Principal component analysis Hadamard transform Haar transform CDF 9/7 wavelet 



The authors thank the National Science Council of Taiwan to provide grants (NSC 94-2213-E-027-042 and NSC 101-2221-E-027-127) for this research.


  1. 1.
    Baluja S, Covell M (2007) Audio fingerprinting: combining computer vision and data stream processing, Proc. of IEEE Int’l Conf on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, USA, pp. II-213 – II-216, AprilGoogle Scholar
  2. 2.
    Bringer J, Chabanne H (2012) Embedding edit distance to enable private keyword search. Human-centric Comput Info Sci 2(2):1–12Google Scholar
  3. 3.
    Burges CJC, Platt JC, Jana S (2003) Distortion discriminant analysis for audio fingerprinting. IEEE Trans Speech Audio Process 11(3):165–174CrossRefGoogle Scholar
  4. 4.
    Crysandt H (2003) Music identification with MPEG-7, 115th AES Convention, Paper 5967, OctGoogle Scholar
  5. 5.
    Doets PJO, Gisbert MM, Lagendijk RL (2006) On the comparison of audio fingerprints for extracting quality parameters of compressed audio, Proc SPIE 6072, Security, Steganography, and Watermarking of Multimedia Contents VIII, San Jose, CA., USA, pp. 60720L-1-12, JanGoogle Scholar
  6. 6.
    Doğan E, Sert M, Yazıcı A (2011) A flexible and scalable audio information retrieval system for mixed-type audio signals. Int J Intell Syst 26(10):952–970CrossRefGoogle Scholar
  7. 7.
    Haitsma JA, Kalker T (2002) A highly robust audio fingerprinting system, Proc. Int’l. Conf. on Music Information Retrieval, Paris, France, 107–115, OctGoogle Scholar
  8. 8.
    Hellmuth O, Allamance E, Cremer M, Grossmann H, Herre J, Kastner T (2003) Using MPEG-7 audio fingerprinting in real-world application, 115th AES Convention, Paper 5961, OctGoogle Scholar
  9. 9.
  10. 10.
  11. 11.
    Huang Y-P, Lai S-L (2012) Novel query-by-humming/singing method with fuzzy inference system. J Convergence 3(4):1–8Google Scholar
  12. 12.
    ISO/IEC (2003) Information technology—multimedia content description interface—Part 6: fnce software, IS 15938Google Scholar
  13. 13.
    ISO/IEC, Information Technology–Multimedia Content Description Interface - Part 4: Audio, IS 15938–4, 2002.Google Scholar
  14. 14.
    Lee J-Y, You SD (2005) Dimension-reduction technique for MPEG-7 audio descriptors. Lecture Notes Comput Sci 3768:526–537CrossRefGoogle Scholar
  15. 15.
    Lin P-C, Wang J-F, Wang J-C, Huang J-J (2009) Personal spoken sentence retrieval using two-level feature matching and MPEG-7 audio LLDs. J Inf Sci Eng 25(4):1221–1238Google Scholar
  16. 16.
    Nack F, Lindsay AT (1999) Everything you wanted to know about MPEG-7 part 1, IEEE Multimedia Magazine, vol. 6, no. 3, pp.65–77, July-SeptGoogle Scholar
  17. 17.
    Nack F, Lindsay AT (1999) Everything you wanted to know about MPEG-7 part 2, IEEE Multimedia Magazine, vol. 6, no. 4, pp.64–73, Oct–DecGoogle Scholar
  18. 18.
    Satone MP, Kharate GK (2012) Face recognition based on PCA on wavelet subband of average-half-face. J Info Process Syst 8(3):483–494CrossRefGoogle Scholar
  19. 19.
    Shlens J, “A tutorial on principal component analysis,”
  20. 20.
    Stankovic RS, Falkowski BJ (2003) The Haar wavelet transform: its status and achievements. Comput Electr Eng 29:25–44CrossRefzbMATHGoogle Scholar
  21. 21.
    Taubman DS, Marcellin MW (2002) JPEG 2000: Image compression fundamentals, standards and practice. Kluwer Academic Publishers, MassachusettsCrossRefGoogle Scholar
  22. 22.
    Theodoridis S, Koutroumbas K (2003) Pattern recognition, 2nd ed., Elsevier Academic PressGoogle Scholar
  23. 23.
    Wang A (2006) The shazam music recognition service. Commun ACM 49(8):44–48CrossRefGoogle Scholar
  24. 24.
    Yang H-Y, Bao D-W, Wang X-Y, Niu P-P (2012) A robust content based audio watermarking using UDWT and invariant histogram. Multimed Tools Appl 57(3):453–467CrossRefGoogle Scholar
  25. 25.
    You SD, Chen W-H, Chen W-K (2013) Music identification system using MPEG-7 audio signature descriptors, The Scientific World Journal, 2013(752464):1–9Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Computer Science and Information EngineeringNational Taipei University of TechnologyTaipeiTaiwan
  2. 2.Hon-Hai Precision Industry Co. LtdNew Taipei CityTaiwan

Personalised recommendations