Construction of Multimodal Dialog System via Knowledge Graph in Travel Domain

Wan, Jing; Yuan, Minghui; Dong, Zhenhao; Hou, Lei; Xie, Jiawang; Zhu, Hongyin; Wen, Qinghua

doi:10.1007/978-981-97-2421-5_28

Jing Wan¹²,
Minghui Yuan¹²,
Zhenhao Dong¹²,
Lei Hou¹³,
Jiawang Xie¹³,
Hongyin Zhu¹³ &
…
Qinghua Wen¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14334))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

46 Accesses

Abstract

When traveling to a foreign city, we often find ourselves in dire need of an intelligent agent that can provide instant and informative responses to our various queries. Such an agent should have the ability to understand our queries and possess the knowledge to generate helpful responses. Furthermore, if the agent can comprehend image information, it can provide solutions from multiple perspectives. Knowledge graph-based multimodal dialog systems offer a promising approach to fulfill these requirements. In this paper, we present a solution for efficiently constructing a multimodal dialog system in the travel domain without large-scale datasets. The system’s main objective is to assist users in completing various travel-related tasks, specifically attraction recommendation and route planning, which are frequently requested by users while traveling. We introduce the Multimodal Chinese Tourism Knowledge Graph (MCTKG) and integrate image processing and recommendation technology into a dialog system. Specifically, our approach utilizes modular design to construct the dialog system, and leverages the rich information available in the knowledge graph to enhance the performance of each module. To the best of our knowledge, this is the first multimodal travel dialog system that provides users with personalized travel route recommendations. Multiple experiments have proven that our dialog system can effectively enhance the user’s travel experience.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, H., Liu, X., Yin, D., Tang, J.: A survey on dialogue systems: recent advances and new frontiers. ACM SIGKDD Explor. Newsl. 19(2), 25–35 (2017)
Article Google Scholar
Chen, Q., Zhuo, Z., Wang, W.: Bert for joint intent classification and slot filling. arXiv preprint arXiv:1902.10909 (2019)
Dai, Z., Callan, J.: Deeper text understanding for IR with contextual neural language modeling. In: Proceedings of SIGIR, pp. 985–988. Association for Computing Machinery, New York (2019)
Google Scholar
Dhingra, B., et al.: Towards end-to-end reinforcement learning of dialogue agents for information access. In: Proceedings of ACL, Vancouver, Canada, pp. 484–495. Association for Computational Linguistics (2017)
Google Scholar
Goo, C.W., et al.: Slot-gated modeling for joint slot filling and intent prediction. In: Proceedings of NAACL-HLT, New Orleans, Louisiana, pp. 753–757. Association for Computational Linguistics (2018)
Google Scholar
Han, S., Bang, J., Ryu, S., Lee, G.G.: Exploiting knowledge base to generate responses for natural language dialog listening agents. In: Proceedings of SIGDIAL, Prague, Czech Republic, pp. 129–133. Association for Computational Linguistics (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, New York, USA, pp. 770–778. IEEE (2016)
Google Scholar
Huang, J., Zhao, W.X., Dou, H., Wen, J.R., Chang, E.Y.: Improving sequential recommendation with knowledge-enhanced memory networks. In: Proceedings of SIGIR, pp. 505–514. Association for Computing Machinery, New York (2018)
Google Scholar
Jung, J., Son, B., Lyu, S.: AttnIO: knowledge graph exploration with in-and-out attention flow for knowledge-grounded dialogue. In: Proceedings of EMNLP, Stroudsburg, PA, pp. 3484–3497. Association for Computational Linguistics (2020)
Google Scholar
Kurata, G., Xiang, B., Zhou, B., Yu, M.: Leveraging sentence-level information with encoder LSTM for semantic slot filling. In: Proceedings of EMNLP, Austin, Texas, pp. 2077–2083. Association for Computational Linguistics (2016)
Google Scholar
Liao, L., Ma, Y., He, X., Hong, R., Chua, T.S.: Knowledge-aware multimodal dialogue systems. In: Proceedings of ACM MM, pp. 801–809. Association for Computing Machinery, New York (2018)
Google Scholar
Liu, B., Lane, I.: Attention-based recurrent neural network models for joint intent detection and slot filling. In: Proceedings of Interspeech, Baixas, France, pp. 685–689. ISCA-INT Speech Communication Association (2016)
Google Scholar
Liu, H., Zhang, F., Zhang, X., Zhao, S., Zhang, X.: An explicit-joint and supervised-contrastive learning framework for few-shot intent classification and slot filling. In: Proceedings of EMNLP, Punta Cana, Dominican Republic, pp. 1945–1955. Association for Computational Linguistics (2021)
Google Scholar
Mrkšić, N., Séaghdha, D.O., Wen, T.H., Thomson, B., Young, S.: Neural belief tracker: data-driven dialogue state tracking. In: Proceedings of ACL, Stroudsburg, PA, pp. 1777–1788. Association for Computational Linguistics (2017)
Google Scholar
Peng, B., Yao, K., Jing, L., Wong, K.F.: Recurrent neural networks with external memory for spoken language understanding. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) NLPCC 2015. LNCS, vol. 9362, pp. 25–35. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25207-0_3
Chapter Google Scholar
Qin, L., Xu, X., Che, W., Liu, T.: AGIF: an adaptive graph-interactive framework for joint multiple intent detection and slot filling. In: Proceedings of EMNLP, Stroudsburg, PA, pp. 1807–1816. Association for Computational Linguistics (2020)
Google Scholar
Saha, A., Khapra, M.M., Sankaranarayanan, K.: Towards building large scale multimodal domain-aware conversation systems. In: Proceedings of AAAI, Palo Alto, CA, pp. 696–704. AAAI Press (2018)
Google Scholar
Serban, I., Sordoni, A., Bengio, Y., Courville, A., Pineau, J.: Building end-to-end dialogue systems using generative hierarchical neural network models. In: Proceedings of AAAI, Palo Alto, CA, vol. 30, pp. 3776–3783. AAAI Press (2016)
Google Scholar
Tur, G., Hakkani-Tür, D., Heck, L., Parthasarathy, S.: Sentence simplification for spoken language understanding. In: Proceedings of ICASSP, New York, USA, pp. 5628–5631. IEEE (2011)
Google Scholar
Tur, G., Hakkani-Tür, D., Heck, L.: What is left to be understood in atis? In: IEEE Spoken Language Technology Workshop, pp. 19–24. IEEE (2010)
Google Scholar
Wang, X., Wang, D., Xu, C., He, X., Cao, Y., Chua, T.S.: Explainable reasoning over knowledge graphs for recommendation. In: Proceedings of AAAI, Palo Alto, CA, vol. 33, pp. 5329–5336. AAAI Press (2019)
Google Scholar
Wen, Q., Tian, Y., Zhang, X., Hu, R., Wang, J., Hou, L., Li, J.: Type-aware open information extraction via graph augmentation model. In: Chen, H., Liu, K., Sun, Y., Wang, S., Hou, L. (eds.) CCKS 2020. CCIS, vol. 1356, pp. 119–131. Springer, Singapore (2020). https://doi.org/10.1007/978-981-16-1964-9_10
Chapter Google Scholar
Wen, T.H., et al.: A network-based end-to-end trainable task-oriented dialogue system. In: Proceedings of EACL, Stroudsburg, PA, pp. 438–449. Association for Computational Linguistics (2017)
Google Scholar
Xie, J., et al.: Construction of multimodal Chinese tourism knowledge graph. In: Zeng, J., Qin, P., Jing, W., Song, X., Lu, Z. (eds.) ICPCSEE 2021. CCIS, vol. 1452, pp. 16–29. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-5943-0_2
Chapter Google Scholar
Yan, Z., Duan, N., Chen, P., Zhou, M., Zhou, J., Li, Z.: Building task-oriented dialogue systems for online shopping. In: Proceedings of AAAI, Palo Alto, CA, vol. 31, pp. 4618–4625. AAAI Press (2017)
Google Scholar
Yu, Z., Yu, J., Fan, J., Tao, D.: Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: Proceedings of ICCV, New York, USA, pp. 1839–1848. IEEE (2017)
Google Scholar
Zhang, C., Wang, H., Jiang, F., Yin, H.: Adapting to context-aware knowledge in natural conversation for multi-turn response selection. In: Proceedings of the Web Conference, pp. 1990—2001. Association for Computing Machinery, New York (2021)
Google Scholar
Zhou, K., Zhao, W.X., Bian, S., Zhou, Y., Wen, J.R., Yu, J.: Improving conversational recommender systems via knowledge graph based semantic fusion. In: Proceedings of KDD, pp. 1006–1014. Association for Computing Machinery, New York (2020)
Google Scholar
Zhu, Q., Huang, K., Zhang, Z., Zhu, X., Huang, M.: Crosswoz: a large-scale Chinese cross-domain task-oriented dialogue dataset. Trans. Assoc. Comput. Linguist. 8, 281–295 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Beijing University of Chemical Technology, Beijing, 100029, China
Jing Wan, Minghui Yuan & Zhenhao Dong
Tsinghua University, Beijing, 100084, China
Lei Hou, Jiawang Xie, Hongyin Zhu & Qinghua Wen

Authors

Jing Wan
View author publications
You can also search for this author in PubMed Google Scholar
Minghui Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Zhenhao Dong
View author publications
You can also search for this author in PubMed Google Scholar
Lei Hou
View author publications
You can also search for this author in PubMed Google Scholar
Jiawang Xie
View author publications
You can also search for this author in PubMed Google Scholar
Hongyin Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Qinghua Wen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Hou .

Editor information

Editors and Affiliations

Peng Cheng Laboratory, Shenzhen, China
Xiangyu Song
China University of Geosciences, Wuhan, China
Ruyi Feng
China University of Geosciences, Wuhan, China
Yunliang Chen
Deakin University, Burwood, VIC, Australia
Jianxin Li
University of Exeter, Exeter, UK
Geyong Min

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wan, J. et al. (2024). Construction of Multimodal Dialog System via Knowledge Graph in Travel Domain. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14334. Springer, Singapore. https://doi.org/10.1007/978-981-97-2421-5_28

Download citation

DOI: https://doi.org/10.1007/978-981-97-2421-5_28
Published: 12 May 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2420-8
Online ISBN: 978-981-97-2421-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Construction of Multimodal Dialog System via Knowledge Graph in Travel Domain