Utterance Retrieval Based on Recurrent Surface Text Patterns

Dubuisson Duplessis, Guillaume; Charras, Franck; Letard, Vincent; Ligozat, Anne-Laure; Rosset, Sophie

doi:10.1007/978-3-319-56608-5_16

Guillaume Dubuisson Duplessis²⁰,
Franck Charras²⁰,
Vincent Letard²¹,
Anne-Laure Ligozat²² &
…
Sophie Rosset²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10193))

Included in the following conference series:

European Conference on Information Retrieval

2511 Accesses
3 Citations

Abstract

This paper investigates the use of recurrent surface text patterns to represent and index open-domain dialogue utterances for a retrieval system that can be embedded in a conversational agent. This approach involves both the building of a database of such patterns by mining a corpus of written dialogic interactions, and the exploitation of this database in a generalised vector space model for utterance retrieval. It is a corpus-based, unsupervised, parameterless and language-independent process. Our study indicates that the proposed model performs objectively well comparatively to other retrieval models on a task of selection of dialogue examples derived from a large corpus of written dialogues.

This work was funded by the JOKER project (www.chistera.eu/projects/joker).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See http://workshop.colips.org/re-wochat/ and http://workshop.colips.org/wochat/.

References

Ameixa, D., Coheur, L., Redol, R.A.: From subtitles to human interactions: introducing the subtle corpus. Technical report, INESC-ID (2013)
Google Scholar
Banchs, R.E., Li, H.: IRIS: a chat-oriented dialogue system based on the vector space model. In: Proceedings of the ACL 2012 Demonstrations, pp. 37–42 (2012)
Google Scholar
Charras, F., Dubuisson Duplessis, G., Letard, V., Ligozat, A.L., Rosset, S.: Comparing system-response retrieval models for open-domain and casual conversational agent. In: Workshop on Chatbots and Conversational Agent Technologies (2016)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370 (2005)
Google Scholar
Gandhe, S., Traum, D.R.: Surface text based dialogue models for virtual humans. In: Proceedings of the SIGDIAL (2013)
Google Scholar
Gusfield, D.: Algorithms on Strings, Trees and Sequences. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, vol. 14, pp. 1188–1196 (2014)
Google Scholar
Lee, C., Jung, S., Kim, S., Lee, G.G.: Example-based dialog modeling for practical multi-domain dialog system. Speech Commun. 51(5), 466–484 (2009)
Article Google Scholar
Lison, P., Tiedemann, J.: OpenSubtitles2016: extracting large parallel corpora from movie and tv subtitles. In: 10th edition of the Language Resources and Evaluation Conference (LREC), Portorož, Slovenia, May 2016
Google Scholar
Lowe, R., Pow, N., Serban, I.V., Pineau, J.: The ubuntu dialogue corpus: a large dataset for research in unstructured multi-turn dialogue systems. In: SIGDIAL, p. 285 (2015)
Google Scholar
Mooney, C.H., Roddick, J.F.: Sequential pattern mining - approaches and algorithms. ACM Comput. Surv. 45(2), 19:1–19:39 (2013)
Article MATH Google Scholar
Nakov, P., Màrquez, L., Moschitti, A., Magdy, W., Mubarak, H., Freihat, A.A., Glass, J., Randeree, B.: Semeval-2016 task 3: community question answering. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 525–545 (2016)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, May 2010
Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)
MATH Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, vol. 200 (2006)
Google Scholar
Sordoni, A., Galley, M., Auli, M., Brockett, C., Ji, Y., Mitchell, M., Nie, J.Y., Gao, J., Dolan, B.: A neural network approach to context-sensitive generation of conversational responses. CoRR abs/1506.06714 (2015)
Google Scholar
Weizenbaum, J.: ELIZA - a computer program for the study of natural language communication between man and machine. Commun. ACM 9(1), 36–45 (1966)
Article Google Scholar
Wong, S.K.M., Ziarko, W., Raghavan, V.V., Wong, P.: On modeling of information retrieval concepts in vector spaces. ACM Trans. Database Syst. 12(2), 299–321 (1987)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LIMSI, CNRS, Université Paris-Saclay, 91405, Orsay, France
Guillaume Dubuisson Duplessis, Franck Charras & Sophie Rosset
LIMSI, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91405, Orsay, France
Vincent Letard
LIMSI, CNRS, ENSIIE, Université Paris-Saclay, 91405, Orsay, France
Anne-Laure Ligozat

Authors

Guillaume Dubuisson Duplessis
View author publications
You can also search for this author in PubMed Google Scholar
Franck Charras
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Letard
View author publications
You can also search for this author in PubMed Google Scholar
Anne-Laure Ligozat
View author publications
You can also search for this author in PubMed Google Scholar
Sophie Rosset
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guillaume Dubuisson Duplessis .

Editor information

Editors and Affiliations

University of Glasgow , Glasgow, United Kingdom
Joemon M Jose
TU Delft - EWI/ST/WIS , Delft, The Netherlands
Claudia Hauff
Middle East Technical University , Ankara, Turkey
Ismail Sengor Altıngovde
Open University , Milton Keynes, United Kingdom
Dawei Song
Signal Media , London, United Kingdom
Dyaa Albakour
Toronto, Canada
Stuart Watt
JohnTait.net Ltd. and BCS IRSG , Sunderland, United Kingdom
John Tait

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dubuisson Duplessis, G., Charras, F., Letard, V., Ligozat, AL., Rosset, S. (2017). Utterance Retrieval Based on Recurrent Surface Text Patterns. In: Jose, J., et al. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science(), vol 10193. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-5_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-56608-5_16
Published: 08 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56607-8
Online ISBN: 978-3-319-56608-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics