Abstract
The subject of conversational mining has become of great interest recently due to the explosion of social and other online media. Supplementing this explosion of text is the advancement in pre-trained language models which have helped us to leverage these sources of information. An interesting domain to analyse is conversations in terms of complexity and value. Complexity arises due to the fact that a conversation can be asynchronous and can involve multiple parties. It is also computationally intensive to process. We use unsupervised methods in our work in order to develop a conversational pattern mining technique which does not require time consuming, knowledge demanding and resource intensive labelling exercises. The task of identifying repeating patterns in sequences is well researched in the Bioinformatics field. In our work, we adapt this to the field of Natural Language Processing and make several extensions to a motif detection algorithm. In order to demonstrate the application of the algorithm on a dynamic, real world data set; we extract motifs from an open-source film script data source. We run an exploratory investigation into the types of motifs we are able to mine.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Köhler, S., Seitzer, P., Facciotti, M.T., Ludascher, B.: Improved motif detection in large sequence sets with random sampling in a Kepler workflow. Procedia Comput. Sci. 9, 1999 (2012)
Meira, L.A., Maximo, V.R., Fazenda, A.L., da Conceicao, A.F.: An improved network motif detection tool. arXiv preprint arXiv:1804.09741 (2018)
Ciriello, G., Guerra, C.: A review on models and algorithms for motif discovery in protein-protein interaction networks. Brief. Funct. Genomic. Proteomic. 7(2), 147–156 (2008)
Wong, E., Baur, B., Quader, S., Huang, C.-H.: Biological network motif detection: principles and practice. Briefings Bioinform. 13(2), 202–215 (2012)
Hu, J., Li, B., Kihara, D.: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res. 33(15), 4899–4913 (2005)
Kirschbaum, E., et al.: Learned motif and neuronal assembly detection in calcium imaging videos. arXiv preprint arXiv:1806.09963 (2018)
Cambria, E., White, B.: Jumping NLP curves: a review of natural language processing research [review article]. IEEE Comput. Intell. Mag. 9(2), 48–57 (2014)
Chen, H., Liu, X., Yin, D., Tang, J.: A survey on dialogue systems: recent advances and new frontiers. ACM SIGKDD Explor. Newsl. 19(2), 25–35 (2017)
Lipizzi, C., Iandoli, L., Marquez, J.E.R.: Extracting and evaluating conversational patterns in social media: a socio-semantic analysis of customers’ reactions to the launch of new products using twitter streams. Int. J. Inf. Manag. 35(4), 490–503 (2015)
Dudani, S.A.: The distance-weighted k-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC-6(4), 325–327 (1976)
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 105(4), 1118–1123 (2008)
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262(5131), 208–214 (1993)
McDonald, J.T.: Romantic Comedy: Boy Meets Girl Meets Genre, 2nd edn. Columbia University Press, New York (2007)
Ribeiro, P., Silva, F., Kaiser, M.: Strategies for network motifs discovery. In: 2009 Fifth IEEE International Conference on e-Science, pp. 80–87. IEEE (2009)
Danescu-Niculescu-Mizil, C., Lee, L.: Chameleons in imagined conversations: a new approach to understanding coordination of linguistic style in dialogs. In: Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, pp. 76–87 (2011)
Das, M.K., Dai, H.-K.: A survey of DNA motif finding algorithms. BMC Bioinform. 8(S7), S21 (2007)
Zhao, H., Zhou, Y., Song, Y., Lee, D.L.: Motif enhanced recommendation over heterogeneous information network. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 2189–2192 (2019)
Szpektor, I., et al.: Dynamic composition for conversational domain exploration. In: Proceedings of the Web Conference 2020, pp. 872–883 (2020)
Chen, H., Ren, Z., Tang, J., Zhao, Y.E., Yin, D.: Hierarchical variational memory network for dialogue generation. In: Proceedings of the 2018 World Wide Web Conference, pp. 1653–1662 (2018)
Li, Y., Yu, J., Wang, Z.: Dense semantic matching network for multi-turn conversation. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 1186–1191. IEEE (2019)
Bagavathi, A., Bashiri, P., Reid, S., Phillips, M., Krishnan, S.: Examining untempered social media: analyzing cascades of polarized conversations. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 625–632 (2019)
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pre-training distributed word representations. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 670–680. Association for Computational Linguistics (2017). https://www.aclweb.org/anthology/D17-1070
Boytsov, L., Naidan, B.: Engineering efficient and effective non-metric space library. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 280–293. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41062-8_28
Author, F., Author, S.: Title of a proceedings paper. In: Editor, F., Editor, S. (eds.) CONFERENCE 2016. LNCS, vol. 9999, pp. 1–13. Springer, Heidelberg (2016). https://doi.org/10.10007/1234567890
LNCS. http://www.springer.com/lncs. Accessed 4 Oct 2017
Gašić, M., Hakkani-Tür, D., Celikyilmaz, A.: Spoken language understanding and interaction: machine learning for human-like conversational systems (2017)
Liu, C.-W., Lowe, R., Serban, I.V., Noseworthy, M., Charlin, L., Pineau, J.: How not to evaluate your dialogue system: an empirical study of unsupervised evaluation metrics for dialogue response generation (2016)
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction (2018)
Sczopek, S.: DNA Motif Finding via Gibbs Sampler (2017). https://github.com/sczopek/Python-Sample-Motif-Finding-via-Gibbs-Sampler/commits/master
Acknowledgements
The authors want to acknowledge the contribution of ABSA bank which sponsors the Data Science Chair.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Garber, N., Marivate, V. (2023). Conversational Pattern Mining Using Motif Detection. In: Ngatched Nkouatchah, T.M., Woungang, I., Tapamo, JR., Viriri, S. (eds) Pan-African Artificial Intelligence and Smart Systems. PAAISS 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 459. Springer, Cham. https://doi.org/10.1007/978-3-031-25271-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-25271-6_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25270-9
Online ISBN: 978-3-031-25271-6
eBook Packages: Computer ScienceComputer Science (R0)