Enabling Interactive and Interoperable Semantic Music Applications

  • Jesús Corral García
  • Panos Kudumakis
  • Isabel Barbancho
  • Lorenzo J. Tardón
  • Mark Sandler
Part of the Springer Handbooks book series (SHB)


New interactive music services have emerged, but many of them use proprietary file formats. In order to enable interoperability among these services, the International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) Moving Picture Experts Group (MPEG) issued a new standard, the so-called MPEG-A: Interactive Music Application Format (IM AF ).

The purpose of this chapter is to review the IM AF standard and its features, and also to provide a detailed description of the design and implementation of an IM AF codec and its integration into a popular open source analysis, annotation and visualization audio tool known as Sonic Visualiser. This is followed by a discussion highlighting the benefits of their combined features, such as automatic chords or melody extraction time-aligned with the song's lyrics. Furthermore, this integration provides the semantic music research community with a testbed enabling further development and comparison of new Sonic Visualiser plug-ins, e. g., from singing voice-to-text conversion with automatic lyrics highlighting for karaoke applications, to source separation-based music instrument extraction from a mixed song.


interactive music application format


ISO based media file format


international standard recording code


Sonic Visualiser



Panos Kudumakis acknowledges that this work was partially done during his visit at the University of Malaga in the context of the program Andalucía TECH: Campus of International Excellence and in conjunction to UK EPSRC project EP/H043101/1 This work has been partially funded by the Ministerio de Economía y Competitividad of the Spanish Government under Project No. TIN2016-75866-C3-2-R.


  1. 45.1
    P. Kudumakis: MP3: Something’s gotta change!, Audio! 1(3), 6 (2011)Google Scholar
  2. 45.2
    I. Jang, P. Kudumakis, M. Sandler, K. Kang: The MPEG interactive music application format standard, IEEE Sig. Process. Mag. 28(1), 150–154 (2011)CrossRefGoogle Scholar
  3. 45.3
    iKlax Media: (last accessed 12.01.14)
  4. 45.4
    MOGG files: Multitrack Digital Audio Format, (last accessed 12.01.14)
  5. 45.5
    MT9: (last accessed 12.01.14)
  6. 45.6
    ISO/IEC 23000-12:2010 – Information technology – Multimedia application format (MPEG-A) – Part 12: Interactive music application formatGoogle Scholar
  7. 45.7
    ISO/IEC 23000-12:2010/Amd.2:2012 – Information technology – Multimedia application format (MPEG-A) – Part 12: Interactive music application format, AMENDMENT 2: Compact representation of dynamic volume change and audio equalizationGoogle Scholar
  8. 45.8
    J.C. Garcia, C. Taglialatela, P. Kudumakis, L.J. Tardon, I. Barbancho, M. Sandler: Interactive music applications by MPEG-A support in Sonic Visualiser. In: AES 53rd Int. Conf. Semant. Audio, London (2014)Google Scholar
  9. 45.9
    C. Cannam, C. Landone, M. Sandler: Sonic Visualiser: An open source application for viewing, analysing, and annotating music audio files. In: Proc. ACM Multimedia Int. Conf. (2010)Google Scholar
  10. 45.10
    M. Mauch, S. Dixon: Approximate note transcription for the improved identification of difficult chords. In: Proc. Int. Symp. Music Inf. Retriev. (2010) pp. 135–140Google Scholar
  11. 45.11
    J. Salamon, E. Gómez: Melody extraction from polyphonic music signals using pitch contour characteristics, IEEE Trans. Audio Speech Lang. Proc. 20(6), 1759–1770 (2012)CrossRefGoogle Scholar
  12. 45.12
    ISO/IEC 23003-2:2010 – Information technology – MPEG audio technologies – Part 2: Spatial Audio Object Coding (SAOC)Google Scholar
  13. 45.13
    ISO/IEC 10918-1:1994 – Information technology – Digital compression and coding of continuous-tone still images (JPEG)Google Scholar
  14. 45.14
    ETS 3GPP TS 26.245-2004 – Transparent end-to-end Packet switched Streaming Service (PSS); Timed text formatGoogle Scholar
  15. 45.15
    ISO/IEC 15938-5:2003 – Information technology – Multimedia content description interface – Part 5: Multimedia description schemesGoogle Scholar
  16. 45.16
    ISO/IEC 14496-12:2008 – Information technology – Coding of audio-visual objects – Part 12: ISO base media file formatGoogle Scholar
  17. 45.17
    C. Taglialatela: MPEG IM AF encoder: Features development, BSc Thesis (Seconda Università degli Studi di Napoli, Napoli 2013)Google Scholar
  18. 45.18
    P. Kudumakis: MPEG developments (last accessed 12.01.14)
  19. 45.19
    T. Hosoya, M. Suzuki, A. Ito, S. Makino: Lyrics recognition from a singing voice based on finite state automation for music information retrieval. In: Proc. Int. Symp. Music Inf. Retriev. (2005) pp. 532–535Google Scholar
  20. 45.20
    J. Han, Z. Rafii, B. Pardo: Audio source separation and REPEAT, Research projects of Northwestern University, Dep. of Elec. Eng. and Comp. Sc., (last accessed 12.01.14)
  21. 45.21
    G. Herrero, P. Kudumakis, L.J. Tardon, I. Barbancho, M. Sandler: An HTML5 interactive (MPEG-A IM AF) music player. In: 10th Int. Symp. Comput. Music Multidiscip. Res. (CMMR), Marseille (2013)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2018

Authors and Affiliations

  • Jesús Corral García
    • 1
  • Panos Kudumakis
    • 2
  • Isabel Barbancho
    • 3
  • Lorenzo J. Tardón
    • 1
  • Mark Sandler
    • 2
  1. 1.Departamento de Ingeniería de Comunicaciones, ETSI TelecomunicaciónUniversidad de MálagaMalagaSpain
  2. 2.School of Electronic Engineering and Computer ScienceQueen Mary University of LondonLondonUK
  3. 3.ATIC Research Group, Dep. Ingeniería de Comunicaciones, ETSI TelecomunicaciónUniversidad de MálagaMalagaSpain

Personalised recommendations