, Volume 102, Issue 1, pp 141–200 | Cite as

A new efficient approach for extracting the closed episodes for workload prediction in cloud

  • Maryam Amiri
  • Leyli Mohammad-KhanliEmail author
  • Raffaela Mirandola


The prediction of the future workload of applications is an essential step guiding resource provisioning in cloud environments. In our previous works, we proposed two prediction models based on pattern mining. This paper builds on our previous experience and focuses on the issue of time and space complexities of the prediction model. Specifically, it presents a general approach to improve the efficiency of the pattern mining engine, which leads to improving the efficiency of the predictors. The approach is composed of two steps: (1) Firstly, to improve space complexity, redundant occurrences of patterns are defined and algorithms are suggested to identify and omit them. (2) To improve time complexity, a new data structure, called closed pattern backward tree, is presented for mining closed patterns directly. The approach not only improves the efficiency of our predictors, but also can be employed in different fields of pattern mining. The performance of the proposed approach is investigated based on real and synthetic workloads of cloud. The experimental results show that the proposed approach could improve the efficiency of the pattern mining engine significantly in comparison to common methods to extract closed patterns.


Closed episode Cloud computing Prediction Pattern mining engine Workload 

Mathematics Subject Classification

68T10 62-07 



The GWA-T-12 Bitbrains traces are provided by Bitbrains IT Services Inc., which is a service provider that specializes in managed hosting and business computation for enterprises. We thank the GWA team and all those who have graciously provided the data for us.

Supplementary material


  1. 1.
    Petcu D, Vzquez-Poletti JL (2012) European research activities in cloud computing. Cambridge Scholars Publishing, CambridgeGoogle Scholar
  2. 2.
    Amiri M, Mohammad-Khanli L, Mirandola R (2018) An online learning model based on episode mining for workload prediction in cloud. Future Gener Comput Syst 87:83CrossRefGoogle Scholar
  3. 3.
    Amiri M, Mohammad-Khanli L (2017) Survey on prediction models of applications for resources provisioning in cloud. J Netw Comput Appl 82:93–113CrossRefGoogle Scholar
  4. 4.
    Jiang Y, Perng C-S, Li T, Chang RN (2013) Cloud analytics for capacity planning and instant VM provisioning. IEEE Trans Netw Serv Manag 10(3):312–325CrossRefGoogle Scholar
  5. 5.
    Cetinski K, Juric MB (2015) AME-WPC: advanced model for efficient workload prediction in the cloud. J Netw Comput Appl 55:191–201CrossRefGoogle Scholar
  6. 6.
    Amiri M, Feizi-Derakhshi MR, Mohammad-Khanli L (2017) IDS fitted Q improvement using fuzzy approach for resource provisioning in cloud. J Intell Fuzzy Syst 32(1):229–240CrossRefGoogle Scholar
  7. 7.
    Altevogt P, Denzel W, Kiss T (2016) Cloud modeling and simulation. Wiley-IEEE Press, LondonCrossRefGoogle Scholar
  8. 8.
    Yang J, Liu C, Shang Y, Cheng B, Mao Z, Liu C, Niu L, Chen J (2014) A cost-aware auto-scaling approach using the workload prediction in service clouds. Inf Syst Front 16(1):7–18CrossRefGoogle Scholar
  9. 9.
    Shi P, Wang H, Yin G, Fengshun L, Wang T (2012) Prediction-based federated management of multi-scale resources in cloud. Adv Inf Sci Serv Sci 4(6):324–334Google Scholar
  10. 10.
    Matsunaga A, Fortes JAB (2010) On the use of machine learning to predict the time and resources consumed by applications. In: Proceedings of the 2010 10th IEEE/ACM international conference on cluster, cloud and grid computing, Melbourne, Victoria, Australia, pp 495–504. IEEE Computer SocietyGoogle Scholar
  11. 11.
    Amiri M, Mohammad-Khanli L, Mirandola R (2018) A sequential pattern mining model for application workload prediction in cloud environment. J Netw Comput Appl 105:21–62CrossRefGoogle Scholar
  12. 12.
    Achar A, Ibrahim A, Sastry PS (2013) Pattern-growth based frequent serial episode discovery. Data Knowl Eng 87:91–108CrossRefGoogle Scholar
  13. 13.
    Yan X, Han J, Afshar R (2003) CloSpan: mining—closed sequential patterns in large datasets. In: Proceedings of the 2003 SIAM international conference on data mining, San Francisco, CA, USA, pp 166–177Google Scholar
  14. 14.
    Fahed L, Brun A, Boyer A (2014) Episode rules mining algorithm for distant event prediction. Technical Report hal-01062542, HALGoogle Scholar
  15. 15.
    Huang P, Liu CJ, Yang X, Xiao L, Chen J (2014) Wireless spectrum occupancy prediction based on partial periodic pattern mining. IEEE Trans Parallel Distrib Syst 25(7):1925–1934CrossRefGoogle Scholar
  16. 16.
    Li K, Fu Y (2014) Prediction of human activity by discovering temporal sequence patterns. IEEE Trans Pattern Anal Mach Intell 36(8):1644–1657CrossRefGoogle Scholar
  17. 17.
    Wright AP, Wright AT, McCoy AB, Sittig DF (2015) The use of sequential pattern mining to predict next prescribed medications. J Biomed Inf 53:73–80CrossRefGoogle Scholar
  18. 18.
    Gan W, Lin JCW, Fournier-Viger P, Chao HC, Yu PS (2018) A survey of parallel sequential pattern mining. CoRR, arXiv:1805.10515
  19. 19.
    Dinh D-T, Le B, Fournier-Viger P, Huynh V-N (2018) An efficient algorithm for mining periodic high-utility sequential patterns. Appl Intell 48(12):4694–4714CrossRefGoogle Scholar
  20. 20.
    Martin F, Méger N, Galichet S, Becourt N (2012) Forecasting failures in a data stream context application to vacuum pumping system prognosis. Trans Mach Learn Data Min 5(2):87–116Google Scholar
  21. 21.
    D’Andreagiovanni M, Baiardi F, Lipilini J, Ruggieri S, Tonelli F (2019) Sequential pattern mining for ict risk assessment and management. J Log Algebraic Methods Program 102:1–16MathSciNetCrossRefGoogle Scholar
  22. 22.
    Van T, Yoshitaka A, Le B (2018) Mining web access patterns with super-pattern constraint. Appl Intell 48(11):3902–3914CrossRefGoogle Scholar
  23. 23.
    Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289CrossRefGoogle Scholar
  24. 24.
    Rathore S, Goyal V (2015) Top-K high utility episode mining in complex event sequence. PhD thesisGoogle Scholar
  25. 25.
    Höppner F (2001) Discovery of temporal patterns. Learning rules about the qualitative behaviour of time series. In: Proceedings of the 5th European conference on principles of data mining and knowledge discovery, PKDD ’01. Springer, London, pp 192–203CrossRefGoogle Scholar
  26. 26.
    Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (Nov 2005) Discovering frequent arrangements of temporal intervals. In: Fifth IEEE international conference on data mining (ICDM’05), Houston, TX, USA. IEEEGoogle Scholar
  27. 27.
    Batal I, Cooper GF, Fradkin D, Harrison J Jr, Moerchen F, Hauskrecht M (2016) An efficient pattern mining approach for event detection in multivariate temporal data. Knowl Inf Syst 46(1):115–150CrossRefGoogle Scholar
  28. 28.
    Winarko E, Roddick JF (2007) ARMADA: an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl Eng 63(1):76–90 (Data Warehouse and Knowledge Discovery, DAWAK’05) CrossRefGoogle Scholar
  29. 29.
    Papadopoulos S, Drosou A, Tzovaras D (2016) Fast frequent episode mining based on finite-state machines. In: Abdelrahman OH, Gelenbe E, Gorbil G, Lent R (eds) Information sciences and systems 2015. Springer International Publishing, Cham, pp 199–208CrossRefGoogle Scholar
  30. 30.
    Lin M-Y, Lee S-Y (2002) Fast discovery of sequential patterns by memory indexing. Springer, Berlin, pp 150–160zbMATHGoogle Scholar
  31. 31.
    Moskovitch R, Shahar Y (2009) Medical temporal-knowledge discovery via temporal abstraction. AMIA Annu Symp Proc 2009:452–456Google Scholar
  32. 32.
    Moskovitch R, Walsh C, Wang F, Hripcsak G, Tatonetti N (Nov 2015) Outcomes prediction via time intervals related patterns. In: 2015 IEEE international conference on data mining, pp 919–924Google Scholar
  33. 33.
    Sacchi L, Larizza C, Combi C, Bellazzi R (2007) Data mining with temporal abstractions: learning rules from time series. Data Min Knowl Discov 15(2):217–247MathSciNetCrossRefGoogle Scholar
  34. 34.
    Allen JF (1984) Towards a general theory of action and time. Artif Intell 23(2):123–154CrossRefGoogle Scholar
  35. 35.
    Patel D, Hsu W, Lee ML (2008) Mining relationships among interval-based events for classification. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08. ACM, New York, NY, USA, pp 393–404Google Scholar
  36. 36.
    Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, Beijing, China, pp 280–288Google Scholar
  37. 37.
    Ghosh S, Li J, Cao L, Ramamohanarao K (2017) Septic shock prediction for ICU patients via coupled HMM walking on sequential contrast patterns. J Biomed Inf 66:19–31CrossRefGoogle Scholar
  38. 38.
    Laxman S, Sastry P, Unnikrishnan K (2007) Discovering frequent generalized episodes when events persist for different durations. IEEE Trans Knowl Data Eng 19(9):1188–1201CrossRefGoogle Scholar
  39. 39.
    Tatti N, Cule B (2010) Mining closed strict episodes. In: Proceedings of the 2010 IEEE international conference on data mining, ICDM ’10. IEEE Computer Society, Washington, DC, USA, pp 501–510Google Scholar
  40. 40.
    Wu S-Y, Chen Y-L (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758CrossRefGoogle Scholar
  41. 41.
    Laxman S, Sastry PS, Unnikrishnan KP (2005) Discovering frequent episodes and learning hidden markov models: a formal connection. IEEE Trans Knowl Data Eng 17(11):1505–1517CrossRefGoogle Scholar
  42. 42.
    Hwang K, Bai X, Shi M, Li Y, Chen WG, Wu Y (2016) Cloud performance modeling and benchmark evaluation of elastic scaling strategies. IEEE Trans Parallel Distrib Syst 27(1):130–143CrossRefGoogle Scholar
  43. 43.
    Tatti N, Cule B (2012) Mining closed strict episodes. Data Min Knowl Discov 25(1):34–66MathSciNetCrossRefGoogle Scholar
  44. 44.
    Zaki MJ (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42(1–2):31–60CrossRefGoogle Scholar
  45. 45.
    Neapolitan RE, Neapolitan R, Naimipour K (2010) Foundations of algorithms. Jones & Bartlett Learning, BurlingtonzbMATHGoogle Scholar
  46. 46.
    Alam M, Shakil KA, Sethi S (2016) Analysis and clustering of workload in google cluster trace based on resource usage. In: 2016 IEEE international conference on computational science and engineering (CSE) and IEEE international conference on embedded and ubiquitous computing (EUC) and 15th international symposium on distributed computing and applications for business engineering (DCABES), pp 740–747. IEEEGoogle Scholar
  47. 47.
    Alexandru I, Hui L, Mathieu J, Shanny A, Catalin D, Lex W, Epema Dick HJ (2008) The grid workloads archive. Future Gener Comput Syst 24(7):672–686CrossRefGoogle Scholar
  48. 48.
    Shen S, van Beek V, Iosup A (2015) Statistical characterization of business-critical workloads hosted in cloud datacenters. In: 2015 15th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid), pp 465–474. IEEEGoogle Scholar
  49. 49.
    Li A, Yang X, Kandula S, Zhang M (2010) Cloudcmp: comparing public cloud providers. In: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pp 1–14. ACMGoogle Scholar

Copyright information

© Springer-Verlag GmbH Austria, part of Springer Nature 2019

Authors and Affiliations

  • Maryam Amiri
    • 1
  • Leyli Mohammad-Khanli
    • 2
    Email author
  • Raffaela Mirandola
    • 3
  1. 1.Department of Computer Engineering, Faculty of EngineeringArak UniversityArakIran
  2. 2.Faculty of Electrical and Computer EngineeringUniversity of TabrizTabrizIran
  3. 3.Dipartimento di ElettronicaInformazione e Bioingegneria Politecnico di MilanoMilanItaly

Personalised recommendations