AB Testing for Process Versions with Contextual Multi-armed Bandit Algorithms

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10816)


Business process improvement ideas can be validated through sequential experiment techniques like AB Testing. Such approaches have the inherent risk of exposing customers to an inferior process version, which is why the inferior version should be discarded as quickly as possible. In this paper, we propose a contextual multi-armed bandit algorithm that can observe the performance of process versions and dynamically adjust the routing policy so that the customers are directed to the version that can best serve them. Our algorithm learns the best routing policy in the presence of complications such as multiple process performance indicators, delays in indicator observation, incomplete or partial observations, and contextual factors. We also propose a pluggable architecture that supports such routing algorithms. We evaluate our approach with a case study. Furthermore, we demonstrate that our approach identifies the best routing policy given the process performance and that it scales horizontally.


Multi-armed bandit Business Process Management AB Testing Process Performance Indicators 



The work of Claudio Di Ciccio has received funding from the EU H2020 programme under MSCA-RISE agreement 645751 (RISE_BPM).


  1. 1.
    van der Aalst, W.M.P., Rosemann, M., Dumas, M.: Deadline-based escalation in process-aware information systems. Decis. Support Syst. 43(2), 492–511 (2007)CrossRefGoogle Scholar
  2. 2.
    Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: International Conference on Machine Learning, ICML (2013)Google Scholar
  3. 3.
    Branke, J., Deb, K., Miettinen, K., Słowiński, R. (eds.): Multiobjective Optimization. LNCS, vol. 5252. Springer, Heidelberg (2008). Scholar
  4. 4.
    Burtini, G., Loeppky, J., Lawrence, R.: A survey of online experiment design with the stochastic multi-armed bandit. CoRR abs/1510.00757 (2015)Google Scholar
  5. 5.
    Chu, W., Li, L., Reyzin, L., Schapire, R.E.: Contextual bandits with linear payoff functions. In: International Conference on Artificial Intelligence and Statistics, pp. 208–214 (2011)Google Scholar
  6. 6.
    Crook, T., Frasca, B., Kohavi, R., Longbotham, R.: Seven pitfalls to avoid when running controlled experiments on the web. In: ACM SIGKDD, pp. 1105–1114 (2009)Google Scholar
  7. 7.
    Dumas, M., Rosa, M.L., Mendling, J., Reijers, H.A.: Fundamentals of Business Process Management. Springer, Heidelberg (2013). Scholar
  8. 8.
    He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley, Hoboken (2013)CrossRefGoogle Scholar
  9. 9.
    Holland, C.W.: Breakthrough Business Results with MVT: A Fast, Cost-Free “Secret Weapon” for Boosting Sales, Cutting Expenses, and Improving Any Business Process. Wiley, Hoboken (2005)Google Scholar
  10. 10.
    Kohavi, R., Longbotham, R., Sommerfield, D., Henne, R.M.: Controlled experiments on the web: survey and practical guide. Data Min. Knowl. Discov. 18(1), 140–181 (2009)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Kohavi, R., Crook, T., Longbotham, R., Frasca, B., Henne, R., Ferres, J.L., Melamed, T.: Online experimentation at Microsoft. In: Workshop on Data Mining Case Studies (2009)Google Scholar
  12. 12.
    Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: International Conference on World Wide Web (2010)Google Scholar
  13. 13.
    Reijers, H.A., Mansar, S.L.: Best practices in business process redesign: an overview and qualitative evaluation of successful redesign heuristics. Omega 33(4), 283–306 (2005)CrossRefGoogle Scholar
  14. 14.
    Satyal, S., Weber, I., Paik, H., Di Ciccio, C., Mendling, J.: AB-BPM: performance-driven instance routing for business process improvement. In: Carmona, J., Engels, G., Kumar, A. (eds.) BPM 2017. LNCS, vol. 10445, pp. 113–129. Springer, Cham (2017). Scholar
  15. 15.
    Sauermann, H., Roach, M.: Increasing web survey response rates in innovation research: an experimental study of static and dynamic contact design features. Res. Policy 42(1), 273–286 (2013)CrossRefGoogle Scholar
  16. 16.
    Silver, D., Newnham, L., Barker, D., Weller, S., McFall, J.: Concurrent reinforcement learning from customer interactions. In: ICML, pp. 924–932 (2013)Google Scholar
  17. 17.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)Google Scholar
  18. 18.
    Teinemaa, I., Leontjeva, A., Masing, K.O.: BPIC 2015: Diagnostics of building permit application process in dutch municipalities. BPI Challenge Report 72 (2015)Google Scholar
  19. 19.
    Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Proceedings of the ECML European Conference on Machine Learning, pp. 437–448 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Data61CSIROSydneyAustralia
  2. 2.University of New South WalesSydneyAustralia
  3. 3.Vienna University of Economics and BusinessViennaAustria

Personalised recommendations