An enhanced direct chord transformation for music retrieval in the AAC transform domain with window switching

Chang, Tai-Ming; Hsieh, Chia-Bin; Chang, Pao-Chi

doi:10.1007/s11042-014-2031-1

An enhanced direct chord transformation for music retrieval in the AAC transform domain with window switching

Published: 01 June 2014

Volume 74, pages 7921–7942, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Tai-Ming Chang¹,
Chia-Bin Hsieh¹ &
Pao-Chi Chang¹

154 Accesses
2 Citations
Explore all metrics

Abstract

With the explosive growth in the number of music albums produced, retrieving music information has become a critical aspect of managing music data. Extracting frequency parameters directly from the compressed files to represent music greatly benefits processing speed when working on a large database. In this study, we focused on advanced audio coding (AAC) files and analyzed the disparity in frequency expression between discrete Fourier transform and discrete cosine transform, considered the frequency resolution to select the appropriate frequency range, and developed a direct chroma feature-transformation method in the AAC transform domain. An added challenge to using AAC files directly is long/short window switching, ignoring which may result in inaccurate frequency mapping and inefficient information retrieval. For a short window in particular, we propose a peak-competition method to enhance the pitch information that does not include ambiguous frequency components when combining eight subframes. Moreover, for chroma feature segmentation, we propose a simple dynamic-segmentation method to replace the complex computation of beat tracking. Our experimental results show that the proposed method increased the accuracy rate by approximately 7 % in Top-1 search results over transform-domain methods described previously and performed nearly as effectively as state-of-the-art waveform-domain approaches did.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effects of Audio Compression on Chord Recognition

Evaluation of Chord and Chroma Features and Dynamic Time Warping Scores on Cover Song Identification Task

Information Synthesis of Time-Geometry QCurve for Music Retrieval

References

Bello JP, Pickens J (2005) A robust mid-level representation for harmonic content in music signals. In Proc. Int. Conf. Music Inf. Retrieval, pp 304–311
Bertin-Mahieux T, Ellis DPW (2011) Large-scale cover song recognition using hashed chroma landmarks. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp 117–120, 2011
Bertin-Mahieux T, Ellis DPW, Whitman B, Lamere P (2011) The million song dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference
Chakrabarti S, Khanna R, Sawant U, Bhattacharyya C (2008) Structured learning for non-smooth ranking losses. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 88–96
Chang TM, Chen ET, Hsieh CB, Chang PC (2013) Cover song identification with direct chroma feature extraction from AAC files. IEEE 2nd Global Conference on Consumer Electronics, in press
Chen S, Xiong N, Park J, Chen M, Hu R (2010) Spatial parameters for audio coding: MDCT domain analysis and synthesis. Multimed Tools Appl 48(2):225–246
Article Google Scholar
Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19:297–301
Article MathSciNet MATH Google Scholar
Ellis DPW, Poliner GE (2007) Identifying cover songs with chroma features and dynamic programming beat tracking. In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, U. S. A, pp 1429–1432
Fan J, Yao Q (2005) Nonlinear time series: nonparametric and parametric methods. Springer, New York
Google Scholar
Fujishima T (1999) Realtime chord recognition of musical sound: a system using common lisp music. In Proc. Int. Comput. Music Conf., pp 464–467
Hinsen G, Klösters D (1993) The sampling series as a limiting case of Lagrange interpolation. Appl Anal 49(1–2):49–60
Article MathSciNet MATH Google Scholar
ISO/IEC 11172–3 (F) (1999) Information technology—coding of moving picture and associated audio for digital storage media at up to about 1.5Mbits/s Part3: Audio
ISO/IEC 13818–7 (1997) Information technology—generic coding of moving pictures and associated audio, Part7: Advance Audio Coding
Lee MH, Rho S, Choi EI (2013) Ontology based user query interpretation for semantic multimedia contents retrieval. Multimed Tools Appl. doi:10.1007/s11042-013-1383-2
Google Scholar
Malvar H (1992) Signal processing with lapped transforms. Artech House, Inc.
Müller M, Ewert S (2010) Towards timbre-invariant audio features for harmony-based music. IEEE Trans Audio Speech Signal Proc 18:649–662
Article Google Scholar
Nakajima Y, Lu Y, Sugano M, Yoneyama A, Yamagihara H, Kurematsu A (1999) A fast audio classification from MPEG coded data. Proc IEEE Int Conf Acoust, Speech Signal Process 6:3005–3008
Google Scholar
Oetken G, Parks TW, Schussler HW (1975) New results in the design of digital interpolators. IEEE Trans Acoust Speech, Signal Process 23:301–309
Article Google Scholar
Patel N, Sethi I (1996) Audio characterization for video indexing. In Proc. SPIE, pp 373–384
Programs for digital signal processing (1979) IEEE Press
Ravelli E, Richard G, Daudet L (2010) Audio signal representations for indexing in the transform domain. IEEE Trans Audio, Speech, Lang Process 18(3):434–446
Article Google Scholar
Ravuri S, Ellis DPW (2009) The hydra system of unstructured cover song detection. Ext. Abstract for the MIREX Audio Cover Song Identification task submission, Kobe, Japan
Serra J, Emilia G, Perfecto H (2010) Advances in music information retrieval. Springer, Berlin
Google Scholar
Serra J, Gomez E, Herrera P, Serra X (2008) Chroma binary similarity and local alignment applied to cover song identification. IEEE Trans Audio, Speech, Lang Process 16(6):1138–1151
Article Google Scholar
Shao X, Xu C, Wang Y, Kankanhalli M (2004) Automatic music summarization in compressed domain. Proc IEEE Int Conf Acoust, Speech Signal Process 4:261–264
Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Article Google Scholar
Tsai TH, Liu C (2007) A configurable common filterbank processor for multi-standard audio decoder. IEICE Trans Fundam Electron Commun Comput Sci 90(9):1913–1923
Article Google Scholar
Yu CH, You SD (2002) On the possibility of only using long windows in MPEG-2 AAC coding. IEEE Pacific Rim Conference on Multimedia, pp 663–670

Download references

Author information

Authors and Affiliations

Department of Communication Engineering, National Central University, Jhong-Li, Taiwan
Tai-Ming Chang, Chia-Bin Hsieh & Pao-Chi Chang

Authors

Tai-Ming Chang
View author publications
You can also search for this author in PubMed Google Scholar
Chia-Bin Hsieh
View author publications
You can also search for this author in PubMed Google Scholar
Pao-Chi Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pao-Chi Chang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, TM., Hsieh, CB. & Chang, PC. An enhanced direct chord transformation for music retrieval in the AAC transform domain with window switching. Multimed Tools Appl 74, 7921–7942 (2015). https://doi.org/10.1007/s11042-014-2031-1

Download citation

Received: 29 October 2013
Revised: 05 March 2014
Accepted: 14 April 2014
Published: 01 June 2014
Issue Date: September 2015
DOI: https://doi.org/10.1007/s11042-014-2031-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An enhanced direct chord transformation for music retrieval in the AAC transform domain with window switching

Abstract

Access this article

Similar content being viewed by others

Effects of Audio Compression on Chord Recognition

Evaluation of Chord and Chroma Features and Dynamic Time Warping Scores on Cover Song Identification Task

Information Synthesis of Time-Geometry QCurve for Music Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An enhanced direct chord transformation for music retrieval in the AAC transform domain with window switching

Abstract

Access this article

Similar content being viewed by others

Effects of Audio Compression on Chord Recognition

Evaluation of Chord and Chroma Features and Dynamic Time Warping Scores on Cover Song Identification Task

Information Synthesis of Time-Geometry QCurve for Music Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation