Skip to main content
Log in

An enhanced direct chord transformation for music retrieval in the AAC transform domain with window switching

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the explosive growth in the number of music albums produced, retrieving music information has become a critical aspect of managing music data. Extracting frequency parameters directly from the compressed files to represent music greatly benefits processing speed when working on a large database. In this study, we focused on advanced audio coding (AAC) files and analyzed the disparity in frequency expression between discrete Fourier transform and discrete cosine transform, considered the frequency resolution to select the appropriate frequency range, and developed a direct chroma feature-transformation method in the AAC transform domain. An added challenge to using AAC files directly is long/short window switching, ignoring which may result in inaccurate frequency mapping and inefficient information retrieval. For a short window in particular, we propose a peak-competition method to enhance the pitch information that does not include ambiguous frequency components when combining eight subframes. Moreover, for chroma feature segmentation, we propose a simple dynamic-segmentation method to replace the complex computation of beat tracking. Our experimental results show that the proposed method increased the accuracy rate by approximately 7 % in Top-1 search results over transform-domain methods described previously and performed nearly as effectively as state-of-the-art waveform-domain approaches did.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Bello JP, Pickens J (2005) A robust mid-level representation for harmonic content in music signals. In Proc. Int. Conf. Music Inf. Retrieval, pp 304–311

  2. Bertin-Mahieux T, Ellis DPW (2011) Large-scale cover song recognition using hashed chroma landmarks. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp 117–120, 2011

  3. Bertin-Mahieux T, Ellis DPW, Whitman B, Lamere P (2011) The million song dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference

  4. Chakrabarti S, Khanna R, Sawant U, Bhattacharyya C (2008) Structured learning for non-smooth ranking losses. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 88–96

  5. Chang TM, Chen ET, Hsieh CB, Chang PC (2013) Cover song identification with direct chroma feature extraction from AAC files. IEEE 2nd Global Conference on Consumer Electronics, in press

  6. Chen S, Xiong N, Park J, Chen M, Hu R (2010) Spatial parameters for audio coding: MDCT domain analysis and synthesis. Multimed Tools Appl 48(2):225–246

    Article  Google Scholar 

  7. Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19:297–301

    Article  MathSciNet  MATH  Google Scholar 

  8. Ellis DPW, Poliner GE (2007) Identifying cover songs with chroma features and dynamic programming beat tracking. In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, U. S. A, pp 1429–1432

  9. Fan J, Yao Q (2005) Nonlinear time series: nonparametric and parametric methods. Springer, New York

    Google Scholar 

  10. Fujishima T (1999) Realtime chord recognition of musical sound: a system using common lisp music. In Proc. Int. Comput. Music Conf., pp 464–467

  11. Hinsen G, Klösters D (1993) The sampling series as a limiting case of Lagrange interpolation. Appl Anal 49(1–2):49–60

    Article  MathSciNet  MATH  Google Scholar 

  12. ISO/IEC 11172–3 (F) (1999) Information technology—coding of moving picture and associated audio for digital storage media at up to about 1.5Mbits/s Part3: Audio

  13. ISO/IEC 13818–7 (1997) Information technology—generic coding of moving pictures and associated audio, Part7: Advance Audio Coding

  14. Lee MH, Rho S, Choi EI (2013) Ontology based user query interpretation for semantic multimedia contents retrieval. Multimed Tools Appl. doi:10.1007/s11042-013-1383-2

    Google Scholar 

  15. Malvar H (1992) Signal processing with lapped transforms. Artech House, Inc.

  16. Müller M, Ewert S (2010) Towards timbre-invariant audio features for harmony-based music. IEEE Trans Audio Speech Signal Proc 18:649–662

    Article  Google Scholar 

  17. Nakajima Y, Lu Y, Sugano M, Yoneyama A, Yamagihara H, Kurematsu A (1999) A fast audio classification from MPEG coded data. Proc IEEE Int Conf Acoust, Speech Signal Process 6:3005–3008

    Google Scholar 

  18. Oetken G, Parks TW, Schussler HW (1975) New results in the design of digital interpolators. IEEE Trans Acoust Speech, Signal Process 23:301–309

    Article  Google Scholar 

  19. Patel N, Sethi I (1996) Audio characterization for video indexing. In Proc. SPIE, pp 373–384

  20. Programs for digital signal processing (1979) IEEE Press

  21. Ravelli E, Richard G, Daudet L (2010) Audio signal representations for indexing in the transform domain. IEEE Trans Audio, Speech, Lang Process 18(3):434–446

    Article  Google Scholar 

  22. Ravuri S, Ellis DPW (2009) The hydra system of unstructured cover song detection. Ext. Abstract for the MIREX Audio Cover Song Identification task submission, Kobe, Japan

  23. Serra J, Emilia G, Perfecto H (2010) Advances in music information retrieval. Springer, Berlin

    Google Scholar 

  24. Serra J, Gomez E, Herrera P, Serra X (2008) Chroma binary similarity and local alignment applied to cover song identification. IEEE Trans Audio, Speech, Lang Process 16(6):1138–1151

    Article  Google Scholar 

  25. Shao X, Xu C, Wang Y, Kankanhalli M (2004) Automatic music summarization in compressed domain. Proc IEEE Int Conf Acoust, Speech Signal Process 4:261–264

    Google Scholar 

  26. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    Article  Google Scholar 

  27. Tsai TH, Liu C (2007) A configurable common filterbank processor for multi-standard audio decoder. IEICE Trans Fundam Electron Commun Comput Sci 90(9):1913–1923

    Article  Google Scholar 

  28. Yu CH, You SD (2002) On the possibility of only using long windows in MPEG-2 AAC coding. IEEE Pacific Rim Conference on Multimedia, pp 663–670

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pao-Chi Chang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chang, TM., Hsieh, CB. & Chang, PC. An enhanced direct chord transformation for music retrieval in the AAC transform domain with window switching. Multimed Tools Appl 74, 7921–7942 (2015). https://doi.org/10.1007/s11042-014-2031-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2031-1

Keywords

Navigation