Advertisement

Language Resources and Evaluation

, Volume 46, Issue 3, pp 461–491 | Cite as

REX-J: Japanese referring expression corpus of situated dialogs

  • Philipp Spanger
  • Masaaki Yasuhara
  • Ryu Iida
  • Takenobu Tokunaga
  • Asuka Terai
  • Naoko Kuriyama
Original Paper

Abstract

Identifying objects in conversation is a fundamental human capability necessary to achieve efficient collaboration on any real world task. Hence the deepening of our understanding of human referential behaviour is indispensable for the creation of systems that collaborate with humans in a meaningful way. We present the construction of REX-J, a multi-modal Japanese corpus of referring expressions in situated dialogs, based on the collaborative task of solving the Tangram puzzle. This corpus contains 24 dialogs with over 4 h of recordings and over 1,400 referring expressions. We outline the characteristics of the collected data and point out the important differences from previous corpora. The corpus records extra-linguistic information during the interaction (e.g. the position of pieces, the actions on the pieces) in synchronization with the participants’ utterances. This in turn allows us to discuss the importance of creating a unified model of linguistic and extra-linguistic information from a new perspective. Demonstrating the potential uses of this corpus, we present the analysis of a specific type of referring expression (“action-mentioning expression”) as well as the results of research into the generation of demonstrative pronouns. Furthermore, we discuss some perspectives on potential uses of this corpus as well as our planned future work, underlining how it is a valuable addition to the existing databases in the community for the study and modeling of referring expressions in situated dialog.

Keywords

Multi-modal corpus Referring expressions Collaborative task Japanese 

References

  1. Anderson A. H., Bader, M., Bard E. G., Boyle E., Doherty G., Garrod S., et al. (1991). The HCRC map task corpus. Language and Speech, 34(4), 351–366.Google Scholar
  2. Artstein, R., & Poesio, M. (2005). Kappa 3 = Alpha (or Beta). Technical Report CSM-437, University of Essex.Google Scholar
  3. Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555–596.CrossRefGoogle Scholar
  4. Baran B., Dogusoy, B., & Cagiltay K. (2007). How do adults solve digital tangram problems? Analyzing cognitive strategies through eye tracking approach. In HCI International 2007—12th international conference—Part III (pp. 555–563).Google Scholar
  5. Bard, E. G., Hill, R., Arai, M., & Foster, M. E. (2009). Accessibility and attention in situated dialogue: Roles and regulations. In Proceedings of the workshop on production of referring expressions Pre-CogSci 2009.Google Scholar
  6. Blache, P., Bertrand, R., & Ferré, G. (2009). Creating and exploiting multimodal annotated corpora: The ToMA project. In M. Kipp, J.-C. Martin, P. Paggio, & D. Heylen (Eds.), Multimodal corpora (pp. 38–53). Berlin: Springer.CrossRefGoogle Scholar
  7. Bolt, R. A. (1980). “Put-that-there”: Voice and gesture at the graphics interface. In Proceedings of the 7th annual conference on computer graphics and interactive techniques (SIGRAPH 1980) (pp. 262–270). ACM.Google Scholar
  8. Brennan, S. E., & Clark, H. H. (1996). Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory and Cognition, 22(6), 1482–1493.CrossRefGoogle Scholar
  9. Brennan, S. E., Friedman, M. W., & Pollard, C. J. (1987). A centering approach to pronouns. In Proceedings of the 25th annual meeting on association for computational linguistics (pp. 155–162). Morristown, NJ. Association for Computational Linguistics.Google Scholar
  10. Buschmeier, H., Bergmann, K., & Kopp, S. (2009). An alignment-capable microplanner for natural language generation. In Proceedings of the 12th European workshop on natural language generation (ENLG 2009) (pp. 82–89), Athens, Greece. Association for Computational Linguistics.Google Scholar
  11. Byron, D., Mampilly, T., Sharma, V., & Xu, T. (2005). Utilizing visual attention for cross-modal coreference interpretation. In Modeling and using context—5th international and interdisciplinary conference CONTEXT 2005 (pp. 83–96).Google Scholar
  12. Byron, D. K., & Fosler-Lussier, E. (2006). The OSU Quake 2004 corpus of two-party situated problem-solving dialogs. In Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2006).Google Scholar
  13. Byron, D. K., & Stoia, L. (2005). An analysis of proximity markers in collaborative dialogs. In Proceedings of the 41st annual meeting of the Chicago Linguistic Society.Google Scholar
  14. Carletta, J. (1996). Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2), 249–254.Google Scholar
  15. Cavicchio, F., & Poesio, M. (2009). Multimodal corpora annotation: Validation methods to assess coding scheme reliability. In M. Kipp, J.-C. Martin, P. Paggio, & D. Heylen (Eds.), Multimodal corpora (pp. 109–121). Berlin: Springer.CrossRefGoogle Scholar
  16. Clark, H. H., & Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22, 1–39.CrossRefGoogle Scholar
  17. Dale, R. (1989). Cooking up referring expressions. In Proceedings of 27th annual meeting of the association for computational linguistics (pp. 68–75).Google Scholar
  18. Dale, R., & Reiter, E. (1995). Computational interpretation of the Gricean maxims in the generation of referring expressions. Cognitive Science, 19(2), 233–263.CrossRefGoogle Scholar
  19. Dale, R., & Viethen, J. (2009). Referring expression generation through attribute-based heuristics. In Proceedings of the 12th European workshop on natural language generation (ENLG 2009) (pp. 58–65).Google Scholar
  20. Di Eugenio, B., Jordan, P. W., Thomason R. H., & Moore, J. D. (2000). The agreement process: An empirical investigation of human-human computer-mediated collaborative dialogues. International Journal of Human-Computer Studies, 53(6), 1017–1076.CrossRefGoogle Scholar
  21. Diessel, H. (2006). Demonstratives, joint attention, and the emergence of grammar. Cognitive Linguistics, 17(4), 463–489.CrossRefGoogle Scholar
  22. Foster, M. E., Bard, E. G., Guhe, M., Hill, R. L., Oberlander, J., & Knoll, A. (2008). The roles of haptic-ostensive referring expressions in cooperative, task-based human-robot dialogue. In Proceedings of 3rd human–robot interaction (pp. 295–302).Google Scholar
  23. Foster, M. E., & Oberlander, J. (2007). Corpus-based generation of head and eyebrow motion for an embodied conversational agent. Language Resources and Evaluation, 41(3–4), 305–323.CrossRefGoogle Scholar
  24. Funakoshi, K., & Tokunaga, S. W. T. (2006). Group-based generation of referring expressions. In Proceedings of the 4th international natural language generation conference (INLG 2006) (pp. 73–80).Google Scholar
  25. Gatt, A., Belz, A., & Kow, E. (2009). The TUNA-REG challenge 2009: Overview and evaluation results. In Proceedings of the 12th European workshop on natural language generation (ENLG 2009) (pp. 174–182).Google Scholar
  26. Gatt, A., van der Sluis, I., & van Deemter, K. (2007). Evaluating algorithms for the generation of referring expressions using a balanced corpus. In Proceedings of the 11th European workshop on natural language generation (ENLG 2007) (pp. 49–56).Google Scholar
  27. Gergle, D., & Kraut, C. P. R. R. E. (2007). Modeling the impact of shared visual information on collaborative reference. In Proceedings of 25th computer/human interaction conference (pp. 1543–1552).Google Scholar
  28. Grishman, R., & Sundheim, B. (1996). Message understanding conference 6: A brief history. In Proceedings of the 16th international conference on computational linguistics (COLING 1996) (pp. 466–471).Google Scholar
  29. Grosz, B. J., Joshi, A. K., & Weinstein, S. (1983). Providing a unified account of definite noun phrases in discourse. In Proceedings of the 21st annual meeting of the association for computational linguistics (ACL 1983) (pp. 44–50).Google Scholar
  30. Grosz, B. J., Joshi, A. K., & Weinstein, S. (1995). Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21(2), 203–225.Google Scholar
  31. Gupta, S., & Stent, A. J. (2005). Automatic evaluation of referring expression generation using corpora. In Proceedings of the 1st workshop on using corpora in NLG.Google Scholar
  32. Halliday, M. A. K., & Hassan, R. (1976). Cohesion in English. London: Longaman.Google Scholar
  33. Heeman, P. A., & Hirst, G. (1995). Collaborating on referring expressions. Computational Linguistics, 21, 351–382.Google Scholar
  34. Hobbs, J. R. (1978). Resolving pronoun references. Lingua, 44, 311–338.CrossRefGoogle Scholar
  35. Iida, R., Kobayashi, S., & Tokunaga, T. (2010). Incorporating extra-linguistic information into reference resolution in collaborative task dialogue. In Proceedings of 48th annual meeting of the association for computational linguistics (pp. 1259–1267).Google Scholar
  36. Janarthanam, S., & Lemon, O. (2009). Learning lexical alignment policies for generating referring expressions for spoken dialogue systems. In Proceedings of the 12th European workshop on natural language generation (ENLG 2009) (pp. 74–81). Association for Computational Linguistics.Google Scholar
  37. Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of European conference on machine learning (ECML 1998) (pp. 137–142).Google Scholar
  38. Jokinen, K. (2010). Non-verbal signals for turn-taking and feedback. In Proceedings of the 7th conference on international language resources and evaluation (LREC 2010), Valletta, Malta (pp. 2961–2967). European Language Resources Association (ELRA).Google Scholar
  39. Jordan, P. W., & Walker, M. A. (2005). Learning content selection rules for generating object descriptions in dialogue. Journal of Artificial Intelligence Research, 24, 157–194.Google Scholar
  40. Kameyama, M. (1998). Intrasentential centering. In Centering in discourse (pp. 89–114). Oxford University Press.Google Scholar
  41. Kelleher, J., Costello, F., & van Genabith. J. (2005). Dynamically structuring updating and interrelating representations of visual and linguistic discourse. Artificial Intelligence, 167, 62–102.CrossRefGoogle Scholar
  42. Kiyokawa, S., & Nakazawa, M. (2006). Effects of reflective verbalization on insight problem solving. In Proceedings of 5th international conference of the cognitive science (pp. 137–139).Google Scholar
  43. Kranstedt, A., Lücking, A., Pfeiffer, T., Rieser, H., & Wachsmuth, I. (2006). Deixis: How to determine demonstrated objects using a pointing cone. In Gesture in human-computer interaction and simulation (pp. 300–311). Springer.Google Scholar
  44. Krippendorff, K. (1980). Content analysis: An introduction to its methodology. Newbury Park, CA: Sage.Google Scholar
  45. Kruijff, G.-J. M., Lison, P., Benjamin, T., Jacobsson, H., Zender, H., & Kruijff-Korbayova, I. (2010). Situated dialogue processing for human-robot interaction. In Cognitive systems: Final report of the CoSy project (pp. 311–364). Springer.Google Scholar
  46. Kudo, T., Yamamoto, K., & Matsumoto, Y. (2004). Applying conditional random fields to japanese morphological analysis. In Proceedings of the 2004 conference on empirical methods in natural language processing.Google Scholar
  47. Kuriyama, N., Terai, A., Yasuhara, M., Tokunaga, T., Yamagishi, K., & Kusumi, T. (2009). The role of gaze agreement in collaborative problem solving. In Proceedings of the 26th annual conference of the Japanese cognitive science society (pp. 390–391) (in Japanese).Google Scholar
  48. Mitkov, R. (2002). Anaphora resolution. London: Longman.Google Scholar
  49. Nakatani, C., & Hirschberg, J. (1993). A speech-first model for repair identification and correction. In Proceedings of 31th annual meeting of ACL (pp. 200–207).Google Scholar
  50. Noguchi, M., Miyoshi, K., Tokunaga, T., Iida, R., Komachi, M., & Inui, K. (2008). Multiple purpose annotation using SLAT-Segment and link-based annotation tool. In Proceedings of 2nd linguistic annotation workshop (pp. 61–64).Google Scholar
  51. Novak, H.-J. (1986). Generating a coherent text describing a traffic scene. In Proceedings of the 11th coference on computational linguistics (pp. 570–575).Google Scholar
  52. Piwek, P. L. A. (2007). Modality choise for generation of referring acts. In Proceedings of the workshop on multimodal output generation (MOG 2007) (pp. 129–139).Google Scholar
  53. Poesio, M., Cheng, H., Henschel, R., Hitzeman, J. M., Kibble, R. &, Stevenson, R. J. (2000). Specifying the parameters of centering theory: A corpus-based evaluation using text from application-oriented domains. In ACL 2000 (pp. 400–407), Hong Kong.Google Scholar
  54. Prasov, Z., & Chai, J. Y. (2008). What’s in a gaze?: The role of eye-gaze in reference resolution in multimodal conversational interfaces. In Proceedings of the 13th international conference on intelligent user interfaces (pp. 20–29).Google Scholar
  55. Qvarfordt, P., Beymer, D., & Zhai, S. (2005). RealTourist—A study of augmenting human–human and human–computer dialogue with eye-gaze overlay. In M. F. Costabile & F. Paternò (Eds.), Human–computer interaction-INTERACT 2005 (LNCS 3585, pp. 767–780). Springer.Google Scholar
  56. Rehm, M., Nakano, Y., Huang, H.-H., Lipi, A. A., Yamaoka, Y., & Gruneberg, F. (2008). Creating a standardized corpus of multimodal interactions for enculturating conversational interfaces. In Workshop on enculturating conversational interfaces by socio-cultural aspects of communication (ECI 2008).Google Scholar
  57. Schiel, F., & Mögele, H. (2008). Talking and looking: The SmartWeb multimodal interaction corpus. In E. L. R. A. (ELRA) (Ed.), Proceedings of the 6th international language resources and evaluation (LREC 2008), Marrakech, Morocco.Google Scholar
  58. Spanger, P., Yasuhara, M., Iida, R., & Tokunaga, T. (2009a). A Japanese corpus of referring expressions used in a situated collaboration task. In Proceedings of the 12th European workshop on natural language generation (ENLG 2009) (pp. 110–113).Google Scholar
  59. Spanger, P., Yasuhara, M., Iida, R., & Tokunaga, T. (2009b). Using extra linguistic information for generating demonstrative pronouns in a situated collaboration task. In Proceedings of PreCogSci 2009: Production of referring expressions: Bridging the gap between computational and empirical approaches to reference.Google Scholar
  60. Sternberg, R. J., & Davidson, J. E. (Eds.) (1996). The nature of insight. Cambridge, MA: The MIT Press.Google Scholar
  61. Stoia, L., Shockley, D. M., Byron, D. K., & Fosler-Lussier, E. (2006). Noun phrase generation for situated dialogs. In Proceedings of the 4th international natural language generation conference (INLG 2006) (pp. 81–88).Google Scholar
  62. Stoia, L., Shockley, D. M., Byron, D. K., & Fosler-Lussier, E. (2008). SCARE: A situated corpus with annotated referring expressions. In Proceedings of the 6th international conference on language resources and evaluation (LREC 2008) (pp. 28–30).Google Scholar
  63. Strassel, S., Przybocki, M., Peterson, K., Song, Z., & Maeda, K. (2008). Linguistic resources and evaluation techniques for evaluation of cross-document automatic content extraction. In Proceedings of the 6th international language resources and evaluation (LREC 2008), Marrakech, Morocco.Google Scholar
  64. Suzuki, H., Abe, K., Hiraki, K., & Miyazaki, M. (2001). Cue-readiness in insight problem-solving. In Proceedings of the 23rd annual meeting of the cognitive science society (pp. 1012–1017).Google Scholar
  65. Tokunaga, T., Huang, C.-R., & Lee, S.Y.M. (2008). Asian language resources: The state-of-the-art. Language Resources and Evaluation, 42(2), 109–116.CrossRefGoogle Scholar
  66. Tokunaga, T., Iida, R., Yasuhara, M., Terai, A., Morris, D., & Belz, A. (2010). Construction of bilingual multimodal corpora of referring expressions in collaborative problem solving. In Proceedings of 8th workshop on asian language resources (pp. 38–46).Google Scholar
  67. van Deemter, K. (2007). TUNA: Towards a unified algorithm for the generation of referring expressions. Technical report, Aberdeen University. http://www.csd.abdn.ac.uk/research/tuna/pubs/TUNA-final-report.pdf.
  68. van Deemter, K., Gatt, A., van Gompel R., & Krahmer, E. (Eds.). (2009). Production of referring expressions (PRE-CogSci) 2009: Bridging the gap between computational and empirical approaches to reference.Google Scholar
  69. van der Sluis, I., Piwek, P., Gatt, A., & Bangerter, A. (2008). Towards a balanced corpus of multimodal referring expressions in dialogue. In Proceedings of the symposium on multimodal output generation (MOG 2008).Google Scholar
  70. Vapnik, V.N. (1998). Statistical learning theory, adaptive and learning systems for signal processing communications, and control. New york: Wiley.Google Scholar
  71. Viethen J., & Dale, R. (2008). The use of spatial relations in referring expression generation. In Proceesings of 5th international natural language generation conference (pp. 59–67).Google Scholar
  72. Walker, M., M. Iida, & Cote, S. (1994). Japanese discourse and the process of centering. Computational Linguistics, 20(2), 193–232.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  • Philipp Spanger
    • 1
  • Masaaki Yasuhara
    • 1
  • Ryu Iida
    • 1
  • Takenobu Tokunaga
    • 1
  • Asuka Terai
    • 2
  • Naoko Kuriyama
    • 3
  1. 1.Department of Computer ScienceTokyo Institute of TechnologyTokyoJapan
  2. 2.Global Edge InstituteTokyo Institute of TechnologyTokyoJapan
  3. 3.Department of Human System ScienceTokyo Institute of TechnologyTokyoJapan

Personalised recommendations