Multimedia Tools and Applications

, Volume 72, Issue 3, pp 2299–2320 | Cite as

Data reduction of audio by exploiting musical repetition

  • Stuart CunninghamEmail author
  • Vic Grout


This paper presents and evaluates a method of audio compression specifically designed to exploit the natural repetition that occurs within musical audio. Our system is entitled Audio Compression Exploiting Repetition (ACER). ACER is a perceptual technique, but one that does not consider exploiting masking, but rather attempts to apply the principles of Lempel-Ziv and run-length encoding, by substituting audio sequences for numeric or character strings. The ACER procedure applies a pseudo exhaustive search process and spectral difference grading. Since ACER exploits musical structure, the amount of data reduction achieved varies from piece-to-piece. The system is described before results on a corpus of material are presented. The analysis shows moderate amounts of data reduction take place whilst the system is operating within parameters designed to maintain high-levels of perceptual audio quality, whilst lower rates of perceptual quality yield greater data reduction. Objective quality evaluations are conducted that reveal degradation in fidelity that is relative to the compression parameters.


Audio Music Compression Repetition Perceptual coding 


  1. 1.
    Aucouturier J-J, Pachet F, Sandler M (2005) “The Way It Sounds”: timbre models for analysis and retrieval of music signals. IEEE Trans Multimedia 6(7):1028–1035Google Scholar
  2. 2.
    Bello JP (2011) Measuring structural similarity in music. IEEE Trans AudioSpeech Lang Process 7(19):2013–2025Google Scholar
  3. 3.
    Bogdanov D, Serrà J, Wack N, Herrera P, Serra X (2011) Unifying low-level and high-level music similarity measures. IEEE Trans Multimedia 4(14):687–701Google Scholar
  4. 4.
    Brandenburg K (1999) MP3 and AAC explained, Proc AES 17th International Conference on High Quality Audio Coding. Audio Engineering Society, New York, NY, USAGoogle Scholar
  5. 5.
    Cai R, Lu L, Hanjalic A (2008) Co-clustering for auditory scene categorization. IEEE Trans Multimedia 4(10):596–606Google Scholar
  6. 6.
    Cheng K, Nazer B, Uppuluri J, Verret R Beat this >beat detection algorithm. Electrical & Computer Engineering Department, Rice University, Texas, USA. Available at: [Last accessed 10th May 2012]
  7. 7.
    Cunningham S (2005) Waveform analysis for high-quality loop-based audio distribution. Proc of ISCA 20th International Conference on Computers and Their Applications, New Orleans, USAGoogle Scholar
  8. 8.
    Cunningham S, Grout V (2005) Play it again, babbage!—a framework to exploit musical repetition for high-quality audio compression. Proc of IADIS—International Conference on WWW/Internet, Lisbon, PortugalGoogle Scholar
  9. 9.
    Cunningham S, Grout V (2007) Advances in similarity-based audio compression. In: Bleimann UG, Dowland PS, Furnell SM (eds) Proc of the Third Collaborative Research Symposium on Security, E-Learning, Internet and Networking, ISNRG, Plymouth, UKGoogle Scholar
  10. 10.
    Cunningham S, Grout V (2009) Audio Compression Exploiting Repetition (ACER): challenges and solutions. Proc of Third International Conference on Internet Technologies and Applications, Glyndwr University, Wrexham, Wales, UKGoogle Scholar
  11. 11.
    Dubnov S (2008) Unified view of prediction and repetition structure in audio signals with application to interest point detection. IEEE Trans AudioSpeech Lang Process 16(2):327–337Google Scholar
  12. 12.
    Foote J Visualizing music and audio using self-similarity. Proc of seventh ACM international conference on Multimedia (Part 1). Orlando, Florida, USA, pp 77–80Google Scholar
  13. 13.
    Foster P, Klapuri A, Dixon S (2012) A method for identifying repetition structure in musical audio based on time series prediction. Proc of 20th European Signal Processing Conference (EUSIPCO 2012), Bucharest, Romania, pp 1299–1303Google Scholar
  14. 14.
    ITU-R (2001) Recommendation ITU-R BS.1387-1, method for objective measurements of perceived audio quality. International Telecommunication Union—Radio communication Sector (ITU-R), GenevaGoogle Scholar
  15. 15.
    ITU-R (2003) Recommendation ITU-R BS.1284-1, general methods for the subjective assessment of sound quality. International Telecommunication Union Radio communication Sector (ITU-R), GenevaGoogle Scholar
  16. 16.
    Jensen K (2007) Multiple scale music segmentation using rhythm, timbre, and harmony. EURASIP J Adv Sig Process 2007(1):159–159Google Scholar
  17. 17.
    Kabal P (2004) TSP Lab Software. Electrical & Computer Engineering Department, McGill University, Canada. Available at: [Last accessed 20th July 2012]
  18. 18.
    Kashino K, Kurozumi T, Murase H (2003) A quick search method for audio and video signals based on histogram pruning. IEEE Trans Multimedia 3(5):348–357Google Scholar
  19. 19.
    Kirovski D, Landau Z (2005) Parameter analysis for the generalized LZ compression of audio. Proc of Data Compression Conference DCC 2005, Snowbird, UT, USA, pp 465Google Scholar
  20. 20.
    Kirovski D, Landau Z (2007) Generalized Lempel–Ziv compression for audio. IEEE Trans AudioSpeech Lang Process 15(2):509–518Google Scholar
  21. 21.
    Kirovski D, Landau Z (2009) Generalized Lempel-Ziv compression for multimedia signals. U.S. Patent 7505897, March 17Google Scholar
  22. 22.
    Kurth F, Muller M (2008) Efficient index-based audio matching. IEEE Trans AudioSpeech Lang Process 2(16):382–395Google Scholar
  23. 23.
    Lagrange M, Raspaud M (2010) Spectral similarity metrics for sound source formation based on the common variation cue. Multimedia Tools Appl 1(48):185–205, SpringerGoogle Scholar
  24. 24.
    Lyons RG (1999) Understanding digital signal processing. Addison-Wesley, ReadingGoogle Scholar
  25. 25.
    Marolt M (2006) A mid-level melody-based representation for calculating audio similarity. Proc. of 7th International Society for Music Information Retrival (ISMIR) conference, Victoria, CanadaGoogle Scholar
  26. 26.
    Moffitt J (2001) Ogg Vorbis—open, free audio—set your media free. Linux J 81:(January), Specialized Systems Consultants Inc, Seattle, WA, USAGoogle Scholar
  27. 27.
    Muller M, Nanzhu J, Grosche P (2013) A robust fitness measure for capturing repetitions in music recordings with applications to audio thumbnailing. IEEE Trans AudioSpeech Lang Process 3(21):531–543Google Scholar
  28. 28.
    Novello A, McKinney MF, Kohlrausch A (2006) Perceptual evaluation of music similarity. Proc of 7th International Society for Music Information Retrival (ISMIR) conference, Victoria, CanadaGoogle Scholar
  29. 29.
    Paulus J, Klapuri A (2009) Music structure analysis using a probabilistic fitness measure and a greedy search algorithm. IEEE Trans AudioSpeech Lang Process 17(6):1159–1170Google Scholar
  30. 30.
    Pohle T, Knees P, Schedl M, Widmer G (2006) Independent component analysis for music similarity computation. Proc of 7th International Society for Music Information Retrival (ISMIR) conference, Victoria, CanadaGoogle Scholar
  31. 31.
    Rafailidis D, Nanopoulos A, Manolopoulos Y (2011) Nonlinear dimensionality reduction for efficient and effective audio similarity searching. Multimedia Tools Appl 3(51):881–895, SpringerGoogle Scholar
  32. 32.
    Rao VM, Pohlmann KC (2006) Audio compression using repetitive structures. U.S. Patent 20060173692, August 3Google Scholar
  33. 33.
    Schnitzer D, Flexer A, Widmer G (2012) A fast audio similarity retrieval method for millions of music tracks. Multimedia Tools Appl 1(58):23–40, SpringerGoogle Scholar
  34. 34.
    Sturm B, Daudet I (2011) On similarity search in audio signals using adaptive sparse approximations. Adaptive multimedia retrieval. Understanding media and adapting to the user, LCNS, volume 6535. Springer, pp 59–71Google Scholar
  35. 35.
    Tabus I, Tabus V, Astola J (2012) Information theoretic methods for aligning audio signals using chromagram representations. Proc of 5th International Symposium on Communications Control and Signal Processing (ISCCSP), Rome, Italy, pp 1–4Google Scholar
  36. 36.
    Terrell MJ, Fzekas G, Simpson AJR, Smith J, Dixon S (2012) Listening level changes music similarity. Proc of 13th International Society for Music Information Retrival (ISMIR) conference, Porto, PortugalGoogle Scholar
  37. 37.
    Various Artists (2011) Now that’s what I call music! 80. Compilation [Double Audio CD]. EMI TVGoogle Scholar
  38. 38.
    Zapata G (2012) Efficient detection of exact redundancies in audio signals. Proc of 125th AES Convention, San Francisco, CA, USAGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Creative and Applied Research for the Information Society (CARDS)Glyndŵr UniversityWrexhamUK

Personalised recommendations