Business Process Mining from E-Commerce Web Logs

  • Nicolas Poggi
  • Vinod Muthusamy
  • David Carrera
  • Rania Khalaf
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8094)


The dynamic nature of the Web and its increasing importance as an economic platform create the need of new methods and tools for business efficiency. Current Web analytic tools do not provide the necessary abstracted view of the underlying customer processes and critical paths of site visitor behavior. Such information can offer insights for businesses to react effectively and efficiently. We propose applying Business Process Management (BPM) methodologies to e-commerce Website logs, and present the challenges, results and potential benefits of such an approach.

We use the Business Process Insight (BPI) platform, a collaborative process intelligence toolset that implements the discovery of loosely-coupled processes, and includes novel process mining techniques suitable for the Web. Experiments are performed on custom click-stream logs from a large online travel and booking agency. We first compare Web clicks and BPM events, and then present a methodology to classify and transform URLs into events. We evaluate traditional and custom process mining algorithms to extract business models from real-life Web data. The resulting models present an abstracted view of the relation between pages, exit points, and critical paths taken by customers. Such models show important improvements and aid high-level decision making and optimization of e-commerce sites compared to current state-of-art Web analytics.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aalst, W., et al.: Process mining manifesto. In: Business Process Management Workshops, vol. 99, Springer, Heidelberg (2012)Google Scholar
  2. 2.
    Agrawal, R., Gunopulos, D., Leymann, F.: Mining process models from workflow logs. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 469–483. Springer, Heidelberg (1998)Google Scholar
  3. 3.
    Bhushan, R., Nath, R.: Automatic recommendation of web pages for online users using web usage mining. In: ICCS (2012)Google Scholar
  4. 4.
    De Weerdt, J., et al.: A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs. Inf. Syst. 37(7) (2012)Google Scholar
  5. 5.
    Ferreira, D.R., Gillblad, D.: Discovering process models from unlabelled event logs. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 143–158. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  6. 6.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explorations 11(1) (2009)Google Scholar
  7. 7.
    Kemsley, S.: It’s not about BPM vs. ACM, it’s about a spectrum of process functionality,
  8. 8.
    Koehler, J.: Business process modelingGoogle Scholar
  9. 9.
    Kumar, L., Singh, H., Kaur, R.: Web analytics and metrics: a survey. In: ACM ICACCI (2012)Google Scholar
  10. 10.
    Menascé, D.A., Almeida, V.A., Fonseca, R., Mendes, M.A.: A methodology for workload characterization of e-commerce sites. In: ACM EC (1999)Google Scholar
  11. 11.
    Nezhad, H.R.M., Saint-Paul, R., Casati, F., Benatallah, B.: Event correlation for process discovery from web service interaction logs. VLDB J. 20(3) (2011)Google Scholar
  12. 12.
    Nielsen. Trends in online shopping, a Nielsen Consumer report. Technical report, Nielsen (February 2008)Google Scholar
  13. 13.
    Pfeffer, A.: Functional specification of probabilistic process models. In: AAAI (2005)Google Scholar
  14. 14.
    Poggi, N., Carrera, D., Gavald, R., Ayguad, E., Torres, J.: A methodology for the evaluation of high response time on e-commerce users and sales. In: ISF (2012)Google Scholar
  15. 15.
    Poggi, N., et al.: Characterization of workload and resource consumption for an online travel and booking site. In: IEEE IISWC (2010)Google Scholar
  16. 16.
    Rembert, A.J., Ellis, C.S.: Learning the control-flow of a business process using icn-based process models. In: ACM ICSOC, pp. 346–351 (2009)Google Scholar
  17. 17.
    Rozinat, A., Mans, R.S., Song, M., van der Aalst, W.M.P.: Discovering colored petri nets from event logs. STTT 10(1) (2008)Google Scholar
  18. 18.
    Rozinat, A., van der Aalst, W.M.P.: Decision mining in ProM. In: Dustdar, S., Fiadeiro, J.L., Sheth, A.P. (eds.) BPM 2006. LNCS, vol. 4102, pp. 420–425. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Rozsnyai, S., et al.: Business process insight: An approach and platform for the discovery and analysis of end-to-end business processes. In: IEEE SRII (2012)Google Scholar
  20. 20.
    Rozsnyai, S., Slominski, A., Lakshmanan, G.T.: Discovering event correlation rules for semi-structured business processes. In: ACM DEBS (2011)Google Scholar
  21. 21.
    Sharma, K., Shrivastava, G., Kumar, V.: Web mining: Today and tomorrow. In: ICECT, vol. 1 (2011)Google Scholar
  22. 22.
    Spiliopoulou, M., Pohle, C., Faulstich, L.C.: Improving the effectiveness of a web site with web usage mining. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 142–162. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  23. 23.
    van der Aalst, W.M.P.: Process Mining - Discovery, Conformance and Enhancement of Business Processes. Springer (2011)Google Scholar
  24. 24.
    van der Aalst, W.M.P.: et al. Workflow mining: a survey of issues and approaches. Data Knowl. Eng., 47(2) (November 2003)Google Scholar
  25. 25.
    van der Aalst, W.M.P., Schonenberg, M.H., Song, M.: Time prediction based on process mining. Inf. Syst. 36(2), 450–475 (2011)CrossRefGoogle Scholar
  26. 26.
    van der Aalst, W.M.P., van Dongen, B.F., Gunther, C.W., Rozinat, A., Verbeek, E., Weijters, T.: ProM: The process mining toolkit. In: BPM (Demos) (2009)Google Scholar
  27. 27.
    Waisberg, D., et al.: Web analytics 2.0: Empowering customer centricity (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Nicolas Poggi
    • 1
    • 2
  • Vinod Muthusamy
    • 3
  • David Carrera
    • 1
    • 2
  • Rania Khalaf
    • 3
  1. 1.Technical University of Catalonia (UPC)BarcelonaSpain
  2. 2.Barcelona Supercomputing Center (BSC)BarcelonaSpain
  3. 3.IBM T. J. Watson Research CenterYorktownUSA

Personalised recommendations