AB Testing for Process Versions with Contextual Multi-armed Bandit Algorithms
Business process improvement ideas can be validated through sequential experiment techniques like AB Testing. Such approaches have the inherent risk of exposing customers to an inferior process version, which is why the inferior version should be discarded as quickly as possible. In this paper, we propose a contextual multi-armed bandit algorithm that can observe the performance of process versions and dynamically adjust the routing policy so that the customers are directed to the version that can best serve them. Our algorithm learns the best routing policy in the presence of complications such as multiple process performance indicators, delays in indicator observation, incomplete or partial observations, and contextual factors. We also propose a pluggable architecture that supports such routing algorithms. We evaluate our approach with a case study. Furthermore, we demonstrate that our approach identifies the best routing policy given the process performance and that it scales horizontally.
KeywordsMulti-armed bandit Business Process Management AB Testing Process Performance Indicators
The work of Claudio Di Ciccio has received funding from the EU H2020 programme under MSCA-RISE agreement 645751 (RISE_BPM).
- 2.Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: International Conference on Machine Learning, ICML (2013)Google Scholar
- 4.Burtini, G., Loeppky, J., Lawrence, R.: A survey of online experiment design with the stochastic multi-armed bandit. CoRR abs/1510.00757 (2015)Google Scholar
- 5.Chu, W., Li, L., Reyzin, L., Schapire, R.E.: Contextual bandits with linear payoff functions. In: International Conference on Artificial Intelligence and Statistics, pp. 208–214 (2011)Google Scholar
- 6.Crook, T., Frasca, B., Kohavi, R., Longbotham, R.: Seven pitfalls to avoid when running controlled experiments on the web. In: ACM SIGKDD, pp. 1105–1114 (2009)Google Scholar
- 9.Holland, C.W.: Breakthrough Business Results with MVT: A Fast, Cost-Free “Secret Weapon” for Boosting Sales, Cutting Expenses, and Improving Any Business Process. Wiley, Hoboken (2005)Google Scholar
- 11.Kohavi, R., Crook, T., Longbotham, R., Frasca, B., Henne, R., Ferres, J.L., Melamed, T.: Online experimentation at Microsoft. In: Workshop on Data Mining Case Studies (2009)Google Scholar
- 12.Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: International Conference on World Wide Web (2010)Google Scholar
- 14.Satyal, S., Weber, I., Paik, H., Di Ciccio, C., Mendling, J.: AB-BPM: performance-driven instance routing for business process improvement. In: Carmona, J., Engels, G., Kumar, A. (eds.) BPM 2017. LNCS, vol. 10445, pp. 113–129. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65000-5_7CrossRefGoogle Scholar
- 16.Silver, D., Newnham, L., Barker, D., Weller, S., McFall, J.: Concurrent reinforcement learning from customer interactions. In: ICML, pp. 924–932 (2013)Google Scholar
- 17.Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)Google Scholar
- 18.Teinemaa, I., Leontjeva, A., Masing, K.O.: BPIC 2015: Diagnostics of building permit application process in dutch municipalities. BPI Challenge Report 72 (2015)Google Scholar