Advertisement

Telecommunication Systems

, Volume 47, Issue 3–4, pp 235–242 | Cite as

Standard audio format encapsulation (SAFE)

  • Homayoon Beigi
  • Judith A. Markowitz
Article
  • 70 Downloads

Abstract

One characteristic that distinguishes speaker recognition (identification, verification, classification, tracking, etc.) from other biometrics is that it is designed to operate with devices and over channels that were created for other technologies and functions. That characteristic supports broad, inexpensive, and speedy deployments. The explosion of mobile devices has exacerbated the mismatch problem and the challenges for interoperability. This paper presents a detailed proposal for interoperability that supports all types of audio interchange operations while, at the same time, limiting the audio formats to a small set of widely-used, open standards. We call this proposal Standard Audio Format Encapsulation (SAFE). The SAFE proposal has been incorporated into speaker-recognition data interchange draft standards by the M1 (biometrics) committee of ANSI/INCITS and ISO/IEC JTC1/SC37 project 19794-13 (Voice data).

Keywords

Speaker biometrics Speaker verification Speaker identification Speaker recognition Audio interchange Audio encapsulation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    ANSI/INCITS (2009). Project 1821—INCITS 456:200x, information technology—speaker recognition format for raw data interchange (SIVR-1). URL abstract: http://www.incits.org/abstracts/1821a.htm, purchase: http://www.techstreet.com.
  2. 2.
    Beigi, H. (2009). Effects of time lapse on speaker recognition results. In 16th internation conference on digital signal processing (pp. 1–6). Google Scholar
  3. 3.
    Beigi, H. (2010). Fundamentals of speaker recognition. New York: Springer. ISBN: 978-0-387-77591-3. Google Scholar
  4. 4.
    Burrows, M., & Wheeler, D. J. (1994). A block-sorting lossless data compression algorithm. Tech. rep., Digital SRC Research Report. Google Scholar
  5. 5.
    Coalson, J. (2009). FLAC comparison. Google Scholar
  6. 6.
    Coalson, J. (2009). FLAC (free lossless audio codec). Google Scholar
  7. 7.
    Coalson, J. (2009). FLAC links. Google Scholar
  8. 8.
    Goncalves, I., Pfeiffer, S., & Montgomery, C. (2008). Ogg media types. RFC 5334 (proposed standard). URL http://www.ietf.org/rfc/rfc5334.txt.
  9. 9.
    Huffman, D. (1952). A method for the construction of minimum-redundancy codes. Proceedings of the Institute of Radio Engineers, 40(9), 1098–1101. Google Scholar
  10. 10.
    ITU-T (1988). G.711 pulse code modulation (PCM) of voice frequencies. ITU-T recommendation. URL http://www.itu.int/rec/T-REC-G.711-198811-I/en.
  11. 11.
    JTC1/SC37, I. (2009). Text of 3rd WD 19794-13 biometric data interchange formats—part 13: voice data. URL http://isotc.iso.org/livelink/livelink/JTC001-SC37-N-3053.pdf?func=doc.Fetch&nodeId=7941680&docTitle=JTC001-SC37-N-3053.
  12. 12.
    Pfeiffer, S. (2003). The ogg encapsulation format version 0. RFC 3533 (informational). URL http://www.ietf.org/rfc/rfc3533.txt.
  13. 13.
    Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Prentice Hall signal processing series. New York: Prentice Hall. ISBN: 0-13-015157-2. Google Scholar
  14. 14.
    Salomon, D. (2006). Data compression: the complete reference (4th ed.). New York: Springer. ISBN: 1-84-628602-6. Google Scholar
  15. 15.
    Sollaud, A. (2008). RTP payload format for ITU-T recommendation G.711.1. RFC 5391 (proposed standard). URL http://www.ietf.org/rfc/rfc5391.txt.
  16. 16.
    Summerfield, R., Dunstone, T., & Summerfield, C. (2008). Speaker verification in a multi-vendor environment. In W3C workshop on speaker identification and verification (SIV). Google Scholar
  17. 17.
    *0.8* 1.2 Vorbis I Specifications (2004). The XIPH open-source community. URL http://xiph.org/ao/doc/.
  18. 18.
    Viswanathan, M., Beigi, H. S., Dharanipragada, S., Maali, F., & Tritschler, A. (2000). Multimedia document retrieval using speech and speaker recognition. International Journal on Document Analysis and Recognition, 2(4), 147–162. Invited paper. Google Scholar
  19. 19.
    Libao ogg audio api. (2004). The XIPH open-source community. URL http://xiph.org/ao/doc/.
  20. 20.
    Ziv, J., & Lempel, A. (1977). A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3), 337–343. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Recognition Technologies, Inc.Yorktown HeightsUSA
  2. 2.J. Markowitz ConsultantsChicagoUSA

Personalised recommendations