Language Resources and Evaluation

, Volume 44, Issue 4, pp 387–419 | Cite as

The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue

  • Sasha CalhounEmail author
  • Jean Carletta
  • Jason M. Brenier
  • Neil Mayo
  • Dan Jurafsky
  • Mark Steedman
  • David Beaver


This paper describes a recently completed common resource for the study of spoken discourse, the NXT-format Switchboard Corpus. Switchboard is a long-standing corpus of telephone conversations (Godfrey et al. in SWITCHBOARD: Telephone speech corpus for research and development. In Proceedings of ICASSP-92, pp. 517–520, 1992). We have brought together transcriptions with existing annotations for syntax, disfluency, speech acts, animacy, information status, coreference, and prosody; along with substantial new annotations of focus/contrast, more prosody, syllables and phones. The combined corpus uses the format of the NITE XML Toolkit, which allows these annotations to be browsed and searched as a coherent set (Carletta et al. in Lang Resour Eval J 39(4):313–334, 2005). The resulting corpus is a rich resource for the investigation of the linguistic features of dialogue and how they interact. As well as describing the corpus itself, we discuss our approach to overcoming issues involved in such a data integration project, relevant to both users of the corpus and others in the language resource community undertaking similar projects.


Linguistic annotation Language resources Discourse  Prosody Semantics Spoken dialogue 



This work was supported by Scottish Enterprise through the Edinburgh-Stanford Link, and via EU IST Cognitive Systems IP FP6-2004-IST-4-27657 “Paco-Plus” to Mark Steedman. Thanks to Bob Ladd, Florian Jaeger, Jonathan Kilgour, Colin Matheson and Shipra Dingare for useful discussions, advice and technical help in the development of the corpus and annotation standards; and to Joanna Keating, Joseph Arko and Hannele Nicholson for their hard work in annotating. Thanks also to the creators of existing Switchboard annotations who kindly agreed to include them in the corpus, including Joseph Piccone, Malvina Nissim, Annie Zaenen, Joan Bresnan, Mari Ostendorf and their respective colleagues. Finally, thank you to the Linguistics Data Consortium for agreeing to release the corpus under a ShareAlike licence through their website, and for their work in finalising the corpus data and permissions for release.


  1. Aylett, M. P., & Turk, A. (2004). The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech, 47(1):31–56.CrossRefGoogle Scholar
  2. Badino, L., & Clark, R. A. (2008). Automatic labeling of contrastive word pairs from spontaneous spoken English. In IEEE/ACL Workshop on Spoken Language Technology, Goa, India.Google Scholar
  3. Bard, E., Anderson, A., Sotillo, C., Aylett, M., Doherty-Sneddon, G., & Newlands, A. (2000). Controlling the intelligibility of referring expressions in dialogue. Journal of Memory and Language, 42(1), 1–22.CrossRefGoogle Scholar
  4. Beckman, M., & Hirschberg, J. (1999). The ToBI annotation conventions. Accessed 9 June 2006
  5. Bell, A., Jurafsky, D., Fosler-Lussier, E., Girand, C., Gregory, M., & Gildea, D. (2003). Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. Journal of the Acoustical Society of America, 113(2), 1001–1024.CrossRefGoogle Scholar
  6. Bird, S., & Liberman, M. (2001). A formal framework for linguistic annotation. Speech Communication, 33(1–2), 23–60.CrossRefGoogle Scholar
  7. Boersma, P., & Weenink, D. (2006). Praat: doing phonetics by computer. Accessed 9 June 2006.
  8. Brants, S., Dipper, S., Hansen, S., Lezius, W., & Smith, G. (2002). The TIGER Treebank. In Proceedings of the workshop on Treebanks and linguistic theories, Sozopol.Google Scholar
  9. Brenier, J., & Calhoun, S. (2006). Switchboard prosody annotation scheme. Internal Publication, Stanford University and University of Edinburgh: Accessed 15 January 2008.
  10. Brenier, J., Nenkova, A., Kothari, A., Whitton, L., Beaver, D., & Jurafsky, D. (2006). The (non)utility of linguistic features for predicting prominence in spontaneous speech. In Proceedings of IEEE/ACL 2006 workshop on spoken language technology, Aruba.Google Scholar
  11. Bresnan, J., Cueni, A., Nikitina, T., & Baayen, R. H. (2007). Predicting the dative alternation. In G. Bouma, I. Kraemer, & J. Zwarts (Eds.), Cognitive foundations of interpretation (pp. 69–94). Amsterdam: Royal Netherlands Academy of Arts and Sciences.Google Scholar
  12. Buráňová, E., Hajičová, E., & Sgall, P. (2000). Tagging of very large corpora: Topic-focus articulation. In Proceedings of COLING conference (pp. 278–284), Saarbrücken, Germany.Google Scholar
  13. Calhoun, S. (2005). Annotation scheme for discourse relations in Paraphrase Corpus. Internal Publication, University of Edinburgh: Accessed 15 January 2008.
  14. Calhoun, S. (2006). Information structure and the prosodic structure of English: A probabilistic relationship. PhD thesis, University of Edinburgh.Google Scholar
  15. Calhoun, S. (2007). Predicting focus through prominence structure. In Proceedings of interspeech. Antwerp, Belgium.Google Scholar
  16. Calhoun, S. (2009). What makes a word contrastive: Prosodic, semantic and pragmatic perspectives. In D. Barth-Weingarten, N. Dehé, & A. Wichmann (Eds.), Where prosody meets pragmatics: Research at the interface, Vol. 8 of Studies in pragmatics (pp. 53–78). Emerald, Bingley.Google Scholar
  17. Calhoun, S. (2010). How does informativeness affect prosodic prominence? Language and Cognitive Processes. Special Issue on Prosody (to appear).Google Scholar
  18. Carletta, J. (1996). Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2), 249–254.Google Scholar
  19. Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraiij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, M., Post, W., Reidsma, D., & Wellner, P. (2006). The AMI Meeting Corpus: A pre-announcement. In S. Renals & S. Bengio (Eds.), Machine learning for multimodal interaction: Second international workshop, Vol. 3869 of Lecture notes in computer science. Springer.Google Scholar
  20. Carletta, J., Dingare, S., Nissim, M., & Nikitina, T. (2004). Using the NITE XML toolkit on the Switchboard Corpus to study syntactic choice: A case study. In Proceedings of LREC2004, Lisbon, Portugal.Google Scholar
  21. Carletta, J., Evert, S., Heid, U., & Kilgour, J. (2005). The NITE XML Toolkit: Data model and query language. Language Resources and Evaluation Journal, 39(4), 313–334.CrossRefGoogle Scholar
  22. Crystal, D. (1969). Prosodic systems and intonation in English. Cambridge, UK: Cambridge University Press.Google Scholar
  23. Deshmukh, N., Ganapathiraju, A., Gleeson, A., Hamaker, J., & Picone, J. (1998). Resegmentation of Switchboard. In Proceedings of ICSLP (pp. 1543–1546), Sydney, Australia.Google Scholar
  24. Dubey, A., Sturt, P., & Keller, F. (2005). Parallelism in coordination as an instance of syntactic priming: Evidence from corpus-based modeling. In HLT/EMNLP, Vancouver, Canada.Google Scholar
  25. Fisher, W. M. (1997). tsylb: NIST Syllabification Software. Accessed 9 October 2005.
  26. Godfrey, J., Holliman, E., & McDaniel, J. (1992). SWITCHBOARD: Telephone speech corpus for research and development. In Proceedings of ICASSP-92 (pp. 517–520).Google Scholar
  27. Godfrey, J. J., & Holliman, E. (1997). Switchboard-1 Release 2. Linguistic Data Consortium, Philadelphia. Catalog #LDC97S62.Google Scholar
  28. Graff, D., & Bird, S. (2000). Many uses, many annotations for large speech corpora: Switchboard and TDT as case studies. In LREC, Athens, Greece.Google Scholar
  29. Greenberg, S., Ellis, D., & Hollenback, J. (1996). Insights into spoken language gleaned from phonetic transcription of the Switchboard Corpus. In The fourth international conference on spoken language processing (pp. S24–S27), Philadelphia, PA.Google Scholar
  30. Halliday, M. (1968). Notes on transitivity and theme in English: Part 3. Journal of Linguistics, 4, 179–215.CrossRefGoogle Scholar
  31. Harkins, D. (2003). Switchboard resegmentation project. Accessed 1 February 2005.
  32. Hedberg, N., & Sosa, J. M. (2001). The prosodic structure of topic and focus in spontaneous English dialogue. In Topic & focus: A workshop on intonation and meaning. University of California, Santa Barbara, July 2001. LSA Summer Institute.Google Scholar
  33. Jaeger, T. F., & Wasow, T. (2005). Processing as a source of accessibility effects on variation. In Proceedings of the 31st Berkeley Linguistics Society.Google Scholar
  34. Johnson, K. (2004). Massive reduction in conversational American English. In K. Yoneyama & K. Maekawa (Eds.), Spontaneous speech: Data and analysis. Proceedings of the 1st session of the 10th international symposium (pp. 29–54), Tokyo, Japan, 2004. The National International Institute for Japanese Language.Google Scholar
  35. Jurafsky, D., Bates, R., Coccaro, N., Martin, R., Meteer, M., Ries, K., Shriberg, E., Stolcke, A., Taylor, P., & Ess-Dykema, C. V. (1998). Switchboard discourse language modeling project report. Center for Speech and Language Processing, Johns Hopkins University, Baltimore, MD, 1998. Research Note No. 30.Google Scholar
  36. Jurafsky, D., Shriberg, E., & Biasca, D. (1997). Switchboard SWBD-DAMSL Labeling Project Coder’s Manual, Draft 13. Technical Report 97-02, University of Colorado Institute of Cognitive Science .Google Scholar
  37. Ladd, D. R. (2008) Intonational phonology (2nd edn.). Cambridge, UK: Cambridge University PressGoogle Scholar
  38. Laprun, C., Fiscus, J. G., Garofolo, J., & Pajot, S. (2002). A practical introduction to ATLAS. In Proceedings of LREC, Las Palmas, Spain.Google Scholar
  39. Liberman, M. (1975). The intonational system of English. PhD thesis, MIT Linguistics, Cambridge, MA.Google Scholar
  40. Marcus, M., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19, 313–330.Google Scholar
  41. Marcus, M. P., Santorini, B., Marcinkiewicz, M. A., & Taylor, A. (1999). Treebank-3. Linguistic Data Consortium (LDC). Catalog #LDC99T42.Google Scholar
  42. Meteer, M., & Taylor, A. (1995). Disfluency annotation stylebook for the Switchboard Corpus. Ms., Department of Computer and Information Science, University of Pennsylvania, Accessed 30 September 2003.
  43. Michaelis, L. A., & Francis, H. S. (2004). Lexical subjects and the conflation strategy. In N. Hedberg & R. Zacharski (Eds.), Topics in the grammar-pragmatics interface: Papers in honor of Jeanette K. Gundel (pp. 19–48), Benjamins.Google Scholar
  44. Müller, C., & Strube, M. (2006). Multi-level annotation of linguistic data with MMAX2. In S. Braun, K. Kohn, & J. Mukherjee (Eds.), Corpus technology and language pedagogy: New resources, new tools, new methods, English Corpus Linguistics (Vol. 3, pp. 197–214), Peter Lang.Google Scholar
  45. Nakatani, C., Hirschberg, J., & Grosz, B. (1995). Discourse structure in spoken language: Studies on speech corpora. In Working notes of the AAAI spring symposium on empirical methods in discourse interpretation and generation (pp. 106–112), Stanford, CA.Google Scholar
  46. Nenkova, A., Brenier, J., Kothari, A., Calhoun, S., Whitton, L., Beaver, D., & Jurafsky, D. (2007). To memorize or to predict: Prominence labeling in conversational speech. In NAACL human language technology conference, Rochester, NY.Google Scholar
  47. Nenkova, A., & Jurafsky, D. (2007). Automatic detection of contrastive elements in spontaneous speech. In IEEE workshop on automatic speech recognition and understanding (ASRU), Kyoto, Japan.Google Scholar
  48. Nissim, M. (2006). Learning information status of discourse entities. In Proceedings of the empirical methods in natural language processing conference, Sydney, Australia.Google Scholar
  49. Nissim, M., Dingare, S., Carletta, J., & Steedman, M. (2004). An annotation scheme for information status in dialogue. In Fourth language resources and evaluation conference, Lisbon, Portugal.Google Scholar
  50. Ostendorf, M., Shafran, I., Shattuck-Hufnagel, S., Carmichael, L., & Byrne, W. (2001). A prosodically labeled database of spontaneous speech. In Proceedings of the ISCA workshop on prosody in speech recognition and understanding (pp. 119–121), Red Bank, NJ.Google Scholar
  51. Pellom, B. (2001). SONIC: The University of Colorado continuous speech recognizer. Technical Report TR-CSLR-2001-01, University of Colorado at Boulder.Google Scholar
  52. Pierrehumbert, J., & Hirschberg, J. (1990). The meaning of intonational contours in the interpretation of discourse. In P. Cohen, J. Morgan, & M. Pollack (Eds.), Intentions in communication (pp. 271–311). MIT Press, Cambridge, MA.Google Scholar
  53. Pitrelli, J., Beckman, M., & Hirschberg, J. (1994). Evaluation of prosodic transcription labelling reliability in the ToBI framework. In Proceedings of the third international conference on spoken language processing (Vol. 2, pp. 123–126).Google Scholar
  54. Prince, E. (1992). The ZPG letter: Subjects, definiteness, and information-status. In S. Thompson & W. Mann (Eds.), Discourse description: Diverse analyses of a fund raising text (pp. 295–325). Philadelphia/Amsterdam: John Benjamins.Google Scholar
  55. Reitter, D. (2008). Context effects in language production: Models of syntactic priming in dialogue corpora. PhD thesis, University of Edinburgh.Google Scholar
  56. Reitter, D., Moore, J. D., & Keller, F. (2006). Priming of syntactic rules in task-oriented dialogue and spontaneous conversation. In Proceedings of the conference of the cognitive science society (pp. 685–690), Vancouver, Canada.Google Scholar
  57. Rooth, M. (1992). A theory of focus intepretation. Natural Language Semantics, 1, 75–116.CrossRefGoogle Scholar
  58. Selkirk, E. (1995). Sentence prosody: Intonation, stress and phrasing. In J. Goldsmith (Ed.), The handbook of phonological theory (pp. 550–569). Cambridge, MA & Oxford: Blackwell.Google Scholar
  59. Shriberg, E. (1994). Preliminaries to a theory of speech disfluencies. PhD thesis, University of California at Berkeley.Google Scholar
  60. Shriberg, E., Taylor, P., Bates, R., Stolcke, A., Ries, K., Jurafsky, D., Coccaro, N., Martin, R., Meteer, M., & Ess-Dykema, C. (1998). Can prosody aid the automatic classification of dialog acts in conversational speech? Language and Speech, 41(3–4), 439–487.Google Scholar
  61. Siegel, S., & Castellan, N.J. (1988). Nonparametric statistics for the behavioral sciences (2nd edition). McGraw-Hill.Google Scholar
  62. Sridhar, V. K. R., Nenkova, A., Narayanan, S., & Jurafsky, D. (2008). Detecting prominence in conversational speech: Pitch accent, givenness and focus. In Speech prosody, Campinas, Brazil.Google Scholar
  63. Steedman, M. (2000). Information structure and the syntax-phonology interface. Linguistic Inquiry, 31(4), 649–689.CrossRefGoogle Scholar
  64. Taylor, P. (2000). Analysis and synthesis of intonation using the Tilt model. Journal of the Acoustical Society of America, 107, 1697–1714.CrossRefGoogle Scholar
  65. Taylor, A., Marcus, M., & Santorini, B. (2003). The Penn Treebank: An overview.Google Scholar
  66. Terken, J., & Hirschberg, J. (1994). Deaccentuation of words representing ‘given’ information: Effects of persistence of grammatical role and surface position. Language and Speech, 37, 125–145.Google Scholar
  67. Vallduví, E., & Vilkuna, M. (1998). On rheme and kontrast. Syntax and Semantics, 29, 79–108.Google Scholar
  68. Weide, R. (1998). The Carnegie Mellon Pronouncing Dictionary [cmudict. 0.6]. Carnegie Mellon University: Accessed 9 October 2005.
  69. Yoon, T.-J., Chavarría, S., Cole, J., & Hasegawa-Johnson, M. (2004). Intertranscriber reliability of prosodic labeling on telephone conversation using ToBI. In Proceedings of ICSLP, Jeju, Korea.Google Scholar
  70. Zaenen, A., Carletta, J., Garretson, G., Bresnan, J., Koontz-Garboden, A., Nikitina, T., O’Connor, M., & Wasow, T. (2004). Animacy encoding in English: Why and how. In B. Webber & D. Byron (Eds.), ACL 2004 workshop on discourse annotation (pp. 118–125).Google Scholar
  71. Zhang, T., Hasegawa-Johnson, M., & Levinson, S. (2006). Extraction of pragmatic and semantic salience from spontaneous spoken English. Speech Communication, 48, 437–462.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  • Sasha Calhoun
    • 1
    Email author
  • Jean Carletta
    • 2
  • Jason M. Brenier
    • 3
  • Neil Mayo
    • 2
  • Dan Jurafsky
    • 4
  • Mark Steedman
    • 2
  • David Beaver
    • 5
  1. 1.School of Philosophy, Psychology and Language SciencesUniversity of EdinburghEdinburghScotland, UK
  2. 2.School of InformaticsUniversity of EdinburghEdinburghScotland, UK
  3. 3.Nuance Communications, Inc.SunnyvaleUSA
  4. 4.Department of LinguisticsStanford UniversityStanfordUSA
  5. 5.Department of LinguisticsUniversity of Texas at AustinAustinUSA

Personalised recommendations