Abstract
With the explosive growth in the number of music albums produced, retrieving music information has become a critical aspect of managing music data. Extracting frequency parameters directly from the compressed files to represent music greatly benefits processing speed when working on a large database. In this study, we focused on advanced audio coding (AAC) files and analyzed the disparity in frequency expression between discrete Fourier transform and discrete cosine transform, considered the frequency resolution to select the appropriate frequency range, and developed a direct chroma feature-transformation method in the AAC transform domain. An added challenge to using AAC files directly is long/short window switching, ignoring which may result in inaccurate frequency mapping and inefficient information retrieval. For a short window in particular, we propose a peak-competition method to enhance the pitch information that does not include ambiguous frequency components when combining eight subframes. Moreover, for chroma feature segmentation, we propose a simple dynamic-segmentation method to replace the complex computation of beat tracking. Our experimental results show that the proposed method increased the accuracy rate by approximately 7 % in Top-1 search results over transform-domain methods described previously and performed nearly as effectively as state-of-the-art waveform-domain approaches did.
Similar content being viewed by others
References
Bello JP, Pickens J (2005) A robust mid-level representation for harmonic content in music signals. In Proc. Int. Conf. Music Inf. Retrieval, pp 304–311
Bertin-Mahieux T, Ellis DPW (2011) Large-scale cover song recognition using hashed chroma landmarks. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp 117–120, 2011
Bertin-Mahieux T, Ellis DPW, Whitman B, Lamere P (2011) The million song dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference
Chakrabarti S, Khanna R, Sawant U, Bhattacharyya C (2008) Structured learning for non-smooth ranking losses. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 88–96
Chang TM, Chen ET, Hsieh CB, Chang PC (2013) Cover song identification with direct chroma feature extraction from AAC files. IEEE 2nd Global Conference on Consumer Electronics, in press
Chen S, Xiong N, Park J, Chen M, Hu R (2010) Spatial parameters for audio coding: MDCT domain analysis and synthesis. Multimed Tools Appl 48(2):225–246
Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19:297–301
Ellis DPW, Poliner GE (2007) Identifying cover songs with chroma features and dynamic programming beat tracking. In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, U. S. A, pp 1429–1432
Fan J, Yao Q (2005) Nonlinear time series: nonparametric and parametric methods. Springer, New York
Fujishima T (1999) Realtime chord recognition of musical sound: a system using common lisp music. In Proc. Int. Comput. Music Conf., pp 464–467
Hinsen G, Klösters D (1993) The sampling series as a limiting case of Lagrange interpolation. Appl Anal 49(1–2):49–60
ISO/IEC 11172–3 (F) (1999) Information technology—coding of moving picture and associated audio for digital storage media at up to about 1.5Mbits/s Part3: Audio
ISO/IEC 13818–7 (1997) Information technology—generic coding of moving pictures and associated audio, Part7: Advance Audio Coding
Lee MH, Rho S, Choi EI (2013) Ontology based user query interpretation for semantic multimedia contents retrieval. Multimed Tools Appl. doi:10.1007/s11042-013-1383-2
Malvar H (1992) Signal processing with lapped transforms. Artech House, Inc.
Müller M, Ewert S (2010) Towards timbre-invariant audio features for harmony-based music. IEEE Trans Audio Speech Signal Proc 18:649–662
Nakajima Y, Lu Y, Sugano M, Yoneyama A, Yamagihara H, Kurematsu A (1999) A fast audio classification from MPEG coded data. Proc IEEE Int Conf Acoust, Speech Signal Process 6:3005–3008
Oetken G, Parks TW, Schussler HW (1975) New results in the design of digital interpolators. IEEE Trans Acoust Speech, Signal Process 23:301–309
Patel N, Sethi I (1996) Audio characterization for video indexing. In Proc. SPIE, pp 373–384
Programs for digital signal processing (1979) IEEE Press
Ravelli E, Richard G, Daudet L (2010) Audio signal representations for indexing in the transform domain. IEEE Trans Audio, Speech, Lang Process 18(3):434–446
Ravuri S, Ellis DPW (2009) The hydra system of unstructured cover song detection. Ext. Abstract for the MIREX Audio Cover Song Identification task submission, Kobe, Japan
Serra J, Emilia G, Perfecto H (2010) Advances in music information retrieval. Springer, Berlin
Serra J, Gomez E, Herrera P, Serra X (2008) Chroma binary similarity and local alignment applied to cover song identification. IEEE Trans Audio, Speech, Lang Process 16(6):1138–1151
Shao X, Xu C, Wang Y, Kankanhalli M (2004) Automatic music summarization in compressed domain. Proc IEEE Int Conf Acoust, Speech Signal Process 4:261–264
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Tsai TH, Liu C (2007) A configurable common filterbank processor for multi-standard audio decoder. IEICE Trans Fundam Electron Commun Comput Sci 90(9):1913–1923
Yu CH, You SD (2002) On the possibility of only using long windows in MPEG-2 AAC coding. IEEE Pacific Rim Conference on Multimedia, pp 663–670
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chang, TM., Hsieh, CB. & Chang, PC. An enhanced direct chord transformation for music retrieval in the AAC transform domain with window switching. Multimed Tools Appl 74, 7921–7942 (2015). https://doi.org/10.1007/s11042-014-2031-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2031-1