International Journal of Speech Technology

, Volume 7, Issue 1, pp 81–91 | Cite as

Verbal Descriptors for VoIP Speech Sounds

  • Gregory W. Cermak


Laboratory studies with human observers were used to develop and apply a set of qualitative verbal descriptors for the sound of digital speech signals in a packet network under various loads of packet loss and jitter. The specific set of speech samples used were derived from 24 systems that were combinations of

– G.729A vs. G.711 codecs,

– Produced by two manufacturers,

– With three levels of packet delay variation (“jitter”) introduced by a packet network emulator, and

– Three levels of packet loss, also by means of the emulator.

Each system was represented by a set of speech samples that had been recorded through it. A suite of verbal descriptors developed in a preliminary study was used to distinguish among the set of speech samples, and thereby the systems. Speech samples that had gone through one kind of system had a different profile of ratings than speech that had gone through other kinds of systems. Statistical analysis showed that the descriptors discriminated among all the experimental variables and most of their interactions. The system variables with the greatest effect on sound character were packet loss and a combination of codec and manufacturer. The qualitative descriptors also predicted overall subjective quality of the speech.

VoIP speech quality packet loss jitter network emulator packet network 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aracil, J., Morato, D., and Izal, M. (1999). Analysis of Internet services in IP over ATM networks. IEEE Communications Magazine, Dec., 37(12):258–266.Google Scholar
  2. Barnwell, T.P. and Voiers, W.D. (1978). Objective measures for speech quality testing. Journal of the Acoustical Society of America, 64(S1):s140 (meeting abstract).Google Scholar
  3. Barthold, J. (2001). Slo-mo packets. Telephony, March 26, 2001 (at Scholar
  4. Barthold, J. (2002). A telephony-sized appetite. Telephony, June 10, 2002 (at Scholar
  5. Bischoff, G. (2002). Opening the VoIP floodgates. Telephony, Feb. 11, 2002 (at Scholar
  6. Carson, M. (1998). Documentation for NIST Net emulator. and Scholar
  7. Cermak, G.W. (2001). Subjective quality of speech over packet networks as a function of packet loss, delay and delay variation. International Journal of Speech Technology, 5(1):65–84.Google Scholar
  8. Committee T1 (2003). Descriptors for user-perceived impairments in speech over voice-over-Internet-Protocol (VoIP) networks. Technical Report T1.TR. p. 80. Washington, DC: Standards Committee T1 Telecommunications.Google Scholar
  9. Dvorak, C. (2002). A framework for setting packet loss objectives for VoIP. Contribution T1A1.3/2002-031. Washington, DC: Standards Committee T1 Telecommunications.Google Scholar
  10. Fitchard, K. (2002). Ramping up toVoIP, Telephony, March 11, 2002 (at Scholar
  11. Funka-Lea, C.A., Janczewski, C.L., Lau,W.C., Nagarajan, R., Wang, Y.-T., and Xin, Z.-L. (1998). QoS routing and performance in packet networks: A visual simulation platform and case study. Bell Labs Technical Journal, 3(4):240–54.Google Scholar
  12. Goodman, D.J. and Sundberg, C.-E. (1983). Combined source and channel coding for variable-bit rate speech transmission. Bell System Technical Journal, 62(7):2017–2036.Google Scholar
  13. Griffin, A. and Hauser, J.R. (1993). The voice of the customer. Marketing Science, 12:1–27.Google Scholar
  14. IEEE (1969). IEEE recommended practice for speech quality measurements. IEEE Transactions on Audio and Electroacoustics, Sept., 227-246.Google Scholar
  15. ITU-T (1996). Methods for subjective determination of transmission quality. Recommendation P.800. Geneva: International Telecommunications Union.Google Scholar
  16. Jayant, N.S. (1981). Adaptive post-filtering of ADPCM speech. Bell System Technical Journal, 60(5):707–717.Google Scholar
  17. Kapilow, D. and Perkins, M. (1999). Proposal for T1 standard on frame erasure concealment for G.711. Contribution T1A1.7/99-012. Washington, DC: Standards Committee T1 Telecommunications.Google Scholar
  18. Kostas, T.J., Borella, M.S., Sidhu, I., Schuster, G.M., Gradiec, J., and Mahler, J. (1998). Real-time voice over packet-switched networks. IEEE Network, 12(1):18–27.Google Scholar
  19. McDermott, B.J. (1978). Subjective attributes that influence judgments of digital transmission quality. Journal of the Acoustical Society of America, 64(S1):s140 (meeting abstract).Google Scholar
  20. Morton, A.C. (2001). Proposal for delay variation parameters in Rec. Y.1540. Contribution T1A1.3/2001-015. Washington, DC: Standards Committee T1 Telecommunications.Google Scholar
  21. O'Shea, D. (2002). VoIP carriers, vendors stand at the crossroads. Telephony, July 1, 2002 (at Scholar
  22. Perkins, M.E., Dvorak, C.A., Lerich, B.H., and Zebarth, J.A. (1999). Speech transmission performance planning in hybrid IP/SCN networks. IEEE Communications Magazine, 37(7): 126–131.Google Scholar
  23. SAS Institute Inc. (1989). SAS/STAT User's Guide, Version 6, 4th edition, Vol. 2. Cary, NC: SAS Institute, Inc.Google Scholar
  24. Thorpe, L. and Yang, W. (1999). Performance of current perceptual objective speech quality measures. Proceedings of the 1999 IEEE Workshop on Speech Coding for Telecommunications. Porvoo, Finland: IEEE, pp. 144–146.Google Scholar
  25. Voiers, W.D. (1977). Individual differences in valuation of perceived speech qualities. Journal of the Acoustical Society of America, 62(S1):s5 (meeting abstract).Google Scholar
  26. Voiers, W.D., Sharpley, A.D., and Lake, O.L. (1976). Journal of the Acoustical Society of America, 59(S1):s55 (meeting abstract).Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Gregory W. Cermak
    • 1
  1. 1.User Centered Design, Mailcode LA0MS38, Verizon LaboratoriesWalthamUSA

Personalised recommendations