Reducing Event Variability in Logs by Clustering of Word Embeddings

  • David Sánchez-CharlesEmail author
  • Josep Carmona
  • Victor Muntés-Mulero
  • Marc Solé
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 308)


Several business-to-business and business-to-consumer services are provided as a human-to-human conversation in which the provider representative guides the conversation towards its resolution based on her experience, following internal guidelines. Several attempts to automatize these services are becoming popular, but they are currently limited to procedures and objectives set during design step. Process discovery techniques could provide the necessary mechanisms to monitor event logs derived from textual conversations and expand the capabilities of conversational bots. Still, variability of textual messages hinders the utility of process discovery techniques by producing non-understandable unstructured process models. In this paper, we propose the usage of word embedding for combining events that have a semantically similar name.


Unstructured processes Process discovery Machine learning Word embedding 



This work is funded by Secretaria de Universitats i Recerca of Generalitat de Catalunya, under the Industrial Doctorate Program 2013DI062, and the Spanish Ministry for Economy and Competitiveness, the European Union (FEDER funds) under grant COMMAS (Ref. TIN2013-46181-C2-1-R).


  1. 1.
    Adriansyah, A., Munoz-Gama, J., Carmona, J., Dongen, B.F., Aalst, W.M.: Measuring precision of modeled behavior. Inf. Syst. E-bus. Manag. 13(1), 37–67 (2015)CrossRefGoogle Scholar
  2. 2.
    Adriansyah, A., van Dongen, B.F., van der Aalst, W.M.P.: Conformance checking using cost-based fitness analysis. In: Proceedings of the 2011 IEEE 15th International Enterprise Distributed Object Computing Conference, EDOC 2011, Washington, DC, USA, pp. 55–64. IEEE Computer Society (2011)Google Scholar
  3. 3.
    Baier, T., Mendling, J., Weske, M.: Bridging abstraction layers in process mining. Inf. Syst. 46, 123–139 (2014)CrossRefGoogle Scholar
  4. 4.
    Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of Association for Computational Linguistics (ACL), vol. 1 (2014)Google Scholar
  5. 5.
    Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)zbMATHGoogle Scholar
  6. 6.
    Jagadeesh Chandra Bose, R.P., van der Aalst, W.M.P.: Abstractions in process mining: a taxonomy of patterns. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 159–175. Springer, Heidelberg (2009). CrossRefGoogle Scholar
  7. 7.
    Da Silva, G.A., Ferreira, D.R.: Applying hidden Markov models to process mining. Sistemas e Tecnologias de Informação. AISTI/FEUP/UPF (2009)Google Scholar
  8. 8.
    Günther, C.W., Rozinat, A., van der Aalst, W.M.P.: Activity mining by global trace segmentation. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 128–139. Springer, Heidelberg (2010). CrossRefGoogle Scholar
  9. 9.
    Günther, C.W., van der Aalst W.M.P.: Mining activity clusters from low-level event logs. Beta, Research School for Operations Management and Logistics (2006)Google Scholar
  10. 10.
    He, Z., Liu, X., Lv, P., Wu, J.: Hidden softmax sequence model for dialogue structure analysis. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (2016)Google Scholar
  11. 11.
    Kenter, T., de Rijke, M.: Short text similarity with word embeddings. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM 2015, pp. 1411–1420. ACM, New York (2015)Google Scholar
  12. 12.
    Klinkmüller, C., Weber, I., Mendling, J., Leopold, H., Ludwig, A.: Increasing recall of process model matching by improved activity label matching. In: Daniel, F., Wang, J., Weber, B. (eds.) BPM 2013. LNCS, vol. 8094, pp. 211–218. Springer, Heidelberg (2013). CrossRefGoogle Scholar
  13. 13.
    Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, 21–26 June 2014, Beijing, China, pp. 1188–1196 (2014)Google Scholar
  14. 14.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. CoRR, abs/1310.4546 (2013)Google Scholar
  15. 15.
    Morelli, R.A., Bronzino, J.D., Goethe, J.W.: A computational speech-act model of human-computer conversations. In: Proceedings of the 1991 IEEE Seventeenth Annual Northeast Bioengineering Conference, pp. 263–264. IEEE (1991)Google Scholar
  16. 16.
    Sanchez-Charles, D.: Title and subtitles of wikipedia articles (2017).
  17. 17.
    Tax, N., Sidorova, N., Haakma, R., van der Aalst, W.M.P.: Event abstraction for process mining using supervised learning techniques. CoRR, abs/1606.07283 (2016)Google Scholar
  18. 18.
    van der Aalst, W.M.P.: Process Mining - Discovery Conformance and Enhancement of Business Processes. Springer, Berlin (2011)zbMATHGoogle Scholar
  19. 19.
    van der Aalst, W.M.P., Günther, C.W.: Finding structure in unstructured processes: the case for process mining. In: ACSD, pp. 3–12. IEEE Computer Society (2007)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • David Sánchez-Charles
    • 1
    Email author
  • Josep Carmona
    • 2
  • Victor Muntés-Mulero
    • 1
  • Marc Solé
    • 1
  1. 1.CA Strategic Research, CA TechnologiesBarcelonaSpain
  2. 2.Universitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations