Data reduction of audio by exploiting musical repetition

Abstract

This paper presents and evaluates a method of audio compression specifically designed to exploit the natural repetition that occurs within musical audio. Our system is entitled Audio Compression Exploiting Repetition (ACER). ACER is a perceptual technique, but one that does not consider exploiting masking, but rather attempts to apply the principles of Lempel-Ziv and run-length encoding, by substituting audio sequences for numeric or character strings. The ACER procedure applies a pseudo exhaustive search process and spectral difference grading. Since ACER exploits musical structure, the amount of data reduction achieved varies from piece-to-piece. The system is described before results on a corpus of material are presented. The analysis shows moderate amounts of data reduction take place whilst the system is operating within parameters designed to maintain high-levels of perceptual audio quality, whilst lower rates of perceptual quality yield greater data reduction. Objective quality evaluations are conducted that reveal degradation in fidelity that is relative to the compression parameters.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3

References

  1. 1.

    Aucouturier J-J, Pachet F, Sandler M (2005) “The Way It Sounds”: timbre models for analysis and retrieval of music signals. IEEE Trans Multimedia 6(7):1028–1035

    Google Scholar 

  2. 2.

    Bello JP (2011) Measuring structural similarity in music. IEEE Trans AudioSpeech Lang Process 7(19):2013–2025

    Google Scholar 

  3. 3.

    Bogdanov D, Serrà J, Wack N, Herrera P, Serra X (2011) Unifying low-level and high-level music similarity measures. IEEE Trans Multimedia 4(14):687–701

    Google Scholar 

  4. 4.

    Brandenburg K (1999) MP3 and AAC explained, Proc AES 17th International Conference on High Quality Audio Coding. Audio Engineering Society, New York, NY, USA

  5. 5.

    Cai R, Lu L, Hanjalic A (2008) Co-clustering for auditory scene categorization. IEEE Trans Multimedia 4(10):596–606

    Google Scholar 

  6. 6.

    Cheng K, Nazer B, Uppuluri J, Verret R Beat this >beat detection algorithm. Electrical & Computer Engineering Department, Rice University, Texas, USA. Available at: http://www.owlnet.rice.edu/~elec301/Projects01/beat_sync/beatalgo.html [Last accessed 10th May 2012]

  7. 7.

    Cunningham S (2005) Waveform analysis for high-quality loop-based audio distribution. Proc of ISCA 20th International Conference on Computers and Their Applications, New Orleans, USA

  8. 8.

    Cunningham S, Grout V (2005) Play it again, babbage!—a framework to exploit musical repetition for high-quality audio compression. Proc of IADIS—International Conference on WWW/Internet, Lisbon, Portugal

  9. 9.

    Cunningham S, Grout V (2007) Advances in similarity-based audio compression. In: Bleimann UG, Dowland PS, Furnell SM (eds) Proc of the Third Collaborative Research Symposium on Security, E-Learning, Internet and Networking, ISNRG, Plymouth, UK

  10. 10.

    Cunningham S, Grout V (2009) Audio Compression Exploiting Repetition (ACER): challenges and solutions. Proc of Third International Conference on Internet Technologies and Applications, Glyndwr University, Wrexham, Wales, UK

  11. 11.

    Dubnov S (2008) Unified view of prediction and repetition structure in audio signals with application to interest point detection. IEEE Trans AudioSpeech Lang Process 16(2):327–337

    Google Scholar 

  12. 12.

    Foote J Visualizing music and audio using self-similarity. Proc of seventh ACM international conference on Multimedia (Part 1). Orlando, Florida, USA, pp 77–80

  13. 13.

    Foster P, Klapuri A, Dixon S (2012) A method for identifying repetition structure in musical audio based on time series prediction. Proc of 20th European Signal Processing Conference (EUSIPCO 2012), Bucharest, Romania, pp 1299–1303

  14. 14.

    ITU-R (2001) Recommendation ITU-R BS.1387-1, method for objective measurements of perceived audio quality. International Telecommunication Union—Radio communication Sector (ITU-R), Geneva

    Google Scholar 

  15. 15.

    ITU-R (2003) Recommendation ITU-R BS.1284-1, general methods for the subjective assessment of sound quality. International Telecommunication Union Radio communication Sector (ITU-R), Geneva

    Google Scholar 

  16. 16.

    Jensen K (2007) Multiple scale music segmentation using rhythm, timbre, and harmony. EURASIP J Adv Sig Process 2007(1):159–159

  17. 17.

    Kabal P (2004) TSP Lab Software. Electrical & Computer Engineering Department, McGill University, Canada. Available at: http://www-mmsp.ece.mcgill.ca/Documents/Software/index.html [Last accessed 20th July 2012]

  18. 18.

    Kashino K, Kurozumi T, Murase H (2003) A quick search method for audio and video signals based on histogram pruning. IEEE Trans Multimedia 3(5):348–357

    Google Scholar 

  19. 19.

    Kirovski D, Landau Z (2005) Parameter analysis for the generalized LZ compression of audio. Proc of Data Compression Conference DCC 2005, Snowbird, UT, USA, pp 465

  20. 20.

    Kirovski D, Landau Z (2007) Generalized Lempel–Ziv compression for audio. IEEE Trans AudioSpeech Lang Process 15(2):509–518

    Google Scholar 

  21. 21.

    Kirovski D, Landau Z (2009) Generalized Lempel-Ziv compression for multimedia signals. U.S. Patent 7505897, March 17

  22. 22.

    Kurth F, Muller M (2008) Efficient index-based audio matching. IEEE Trans AudioSpeech Lang Process 2(16):382–395

    Google Scholar 

  23. 23.

    Lagrange M, Raspaud M (2010) Spectral similarity metrics for sound source formation based on the common variation cue. Multimedia Tools Appl 1(48):185–205, Springer

    Google Scholar 

  24. 24.

    Lyons RG (1999) Understanding digital signal processing. Addison-Wesley, Reading

    Google Scholar 

  25. 25.

    Marolt M (2006) A mid-level melody-based representation for calculating audio similarity. Proc. of 7th International Society for Music Information Retrival (ISMIR) conference, Victoria, Canada

  26. 26.

    Moffitt J (2001) Ogg Vorbis—open, free audio—set your media free. Linux J 81:(January), Specialized Systems Consultants Inc, Seattle, WA, USA

  27. 27.

    Muller M, Nanzhu J, Grosche P (2013) A robust fitness measure for capturing repetitions in music recordings with applications to audio thumbnailing. IEEE Trans AudioSpeech Lang Process 3(21):531–543

    Google Scholar 

  28. 28.

    Novello A, McKinney MF, Kohlrausch A (2006) Perceptual evaluation of music similarity. Proc of 7th International Society for Music Information Retrival (ISMIR) conference, Victoria, Canada

  29. 29.

    Paulus J, Klapuri A (2009) Music structure analysis using a probabilistic fitness measure and a greedy search algorithm. IEEE Trans AudioSpeech Lang Process 17(6):1159–1170

    Google Scholar 

  30. 30.

    Pohle T, Knees P, Schedl M, Widmer G (2006) Independent component analysis for music similarity computation. Proc of 7th International Society for Music Information Retrival (ISMIR) conference, Victoria, Canada

  31. 31.

    Rafailidis D, Nanopoulos A, Manolopoulos Y (2011) Nonlinear dimensionality reduction for efficient and effective audio similarity searching. Multimedia Tools Appl 3(51):881–895, Springer

    Google Scholar 

  32. 32.

    Rao VM, Pohlmann KC (2006) Audio compression using repetitive structures. U.S. Patent 20060173692, August 3

  33. 33.

    Schnitzer D, Flexer A, Widmer G (2012) A fast audio similarity retrieval method for millions of music tracks. Multimedia Tools Appl 1(58):23–40, Springer

    Google Scholar 

  34. 34.

    Sturm B, Daudet I (2011) On similarity search in audio signals using adaptive sparse approximations. Adaptive multimedia retrieval. Understanding media and adapting to the user, LCNS, volume 6535. Springer, pp 59–71

  35. 35.

    Tabus I, Tabus V, Astola J (2012) Information theoretic methods for aligning audio signals using chromagram representations. Proc of 5th International Symposium on Communications Control and Signal Processing (ISCCSP), Rome, Italy, pp 1–4

  36. 36.

    Terrell MJ, Fzekas G, Simpson AJR, Smith J, Dixon S (2012) Listening level changes music similarity. Proc of 13th International Society for Music Information Retrival (ISMIR) conference, Porto, Portugal

  37. 37.

    Various Artists (2011) Now that’s what I call music! 80. Compilation [Double Audio CD]. EMI TV

  38. 38.

    Zapata G (2012) Efficient detection of exact redundancies in audio signals. Proc of 125th AES Convention, San Francisco, CA, USA

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Stuart Cunningham.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Cunningham, S., Grout, V. Data reduction of audio by exploiting musical repetition. Multimed Tools Appl 72, 2299–2320 (2014). https://doi.org/10.1007/s11042-013-1504-y

Download citation

Keywords

  • Audio
  • Music
  • Compression
  • Repetition
  • Perceptual coding