# A new efficient approach for extracting the closed episodes for workload prediction in cloud

- 65 Downloads
- 1 Citations

## Abstract

The prediction of the future workload of applications is an essential step guiding resource provisioning in cloud environments. In our previous works, we proposed two prediction models based on pattern mining. This paper builds on our previous experience and focuses on the issue of time and space complexities of the prediction model. Specifically, it presents a general approach to improve the efficiency of the pattern mining engine, which leads to improving the efficiency of the predictors. The approach is composed of two steps: (1) Firstly, to improve space complexity, redundant occurrences of patterns are defined and algorithms are suggested to identify and omit them. (2) To improve time complexity, a new data structure, called closed pattern backward tree, is presented for mining closed patterns directly. The approach not only improves the efficiency of our predictors, but also can be employed in different fields of pattern mining. The performance of the proposed approach is investigated based on real and synthetic workloads of cloud. The experimental results show that the proposed approach could improve the efficiency of the pattern mining engine significantly in comparison to common methods to extract closed patterns.

## Keywords

Closed episode Cloud computing Prediction Pattern mining engine Workload## Mathematics Subject Classification

68T10 62-07## Notes

### Acknowledgements

The GWA-T-12 Bitbrains traces are provided by Bitbrains IT Services Inc., which is a service provider that specializes in managed hosting and business computation for enterprises. We thank the GWA team and all those who have graciously provided the data for us.

## Supplementary material

## References

- 1.Petcu D, Vzquez-Poletti JL (2012) European research activities in cloud computing. Cambridge Scholars Publishing, CambridgeGoogle Scholar
- 2.Amiri M, Mohammad-Khanli L, Mirandola R (2018) An online learning model based on episode mining for workload prediction in cloud. Future Gener Comput Syst 87:83CrossRefGoogle Scholar
- 3.Amiri M, Mohammad-Khanli L (2017) Survey on prediction models of applications for resources provisioning in cloud. J Netw Comput Appl 82:93–113CrossRefGoogle Scholar
- 4.Jiang Y, Perng C-S, Li T, Chang RN (2013) Cloud analytics for capacity planning and instant VM provisioning. IEEE Trans Netw Serv Manag 10(3):312–325CrossRefGoogle Scholar
- 5.Cetinski K, Juric MB (2015) AME-WPC: advanced model for efficient workload prediction in the cloud. J Netw Comput Appl 55:191–201CrossRefGoogle Scholar
- 6.Amiri M, Feizi-Derakhshi MR, Mohammad-Khanli L (2017) IDS fitted Q improvement using fuzzy approach for resource provisioning in cloud. J Intell Fuzzy Syst 32(1):229–240CrossRefGoogle Scholar
- 7.Altevogt P, Denzel W, Kiss T (2016) Cloud modeling and simulation. Wiley-IEEE Press, LondonCrossRefGoogle Scholar
- 8.Yang J, Liu C, Shang Y, Cheng B, Mao Z, Liu C, Niu L, Chen J (2014) A cost-aware auto-scaling approach using the workload prediction in service clouds. Inf Syst Front 16(1):7–18CrossRefGoogle Scholar
- 9.Shi P, Wang H, Yin G, Fengshun L, Wang T (2012) Prediction-based federated management of multi-scale resources in cloud. Adv Inf Sci Serv Sci 4(6):324–334Google Scholar
- 10.Matsunaga A, Fortes JAB (2010) On the use of machine learning to predict the time and resources consumed by applications. In: Proceedings of the 2010 10th IEEE/ACM international conference on cluster, cloud and grid computing, Melbourne, Victoria, Australia, pp 495–504. IEEE Computer SocietyGoogle Scholar
- 11.Amiri M, Mohammad-Khanli L, Mirandola R (2018) A sequential pattern mining model for application workload prediction in cloud environment. J Netw Comput Appl 105:21–62CrossRefGoogle Scholar
- 12.Achar A, Ibrahim A, Sastry PS (2013) Pattern-growth based frequent serial episode discovery. Data Knowl Eng 87:91–108CrossRefGoogle Scholar
- 13.Yan X, Han J, Afshar R (2003) CloSpan: mining—closed sequential patterns in large datasets. In: Proceedings of the 2003 SIAM international conference on data mining, San Francisco, CA, USA, pp 166–177Google Scholar
- 14.Fahed L, Brun A, Boyer A (2014) Episode rules mining algorithm for distant event prediction. Technical Report hal-01062542, HALGoogle Scholar
- 15.Huang P, Liu CJ, Yang X, Xiao L, Chen J (2014) Wireless spectrum occupancy prediction based on partial periodic pattern mining. IEEE Trans Parallel Distrib Syst 25(7):1925–1934CrossRefGoogle Scholar
- 16.Li K, Fu Y (2014) Prediction of human activity by discovering temporal sequence patterns. IEEE Trans Pattern Anal Mach Intell 36(8):1644–1657CrossRefGoogle Scholar
- 17.Wright AP, Wright AT, McCoy AB, Sittig DF (2015) The use of sequential pattern mining to predict next prescribed medications. J Biomed Inf 53:73–80CrossRefGoogle Scholar
- 18.Gan W, Lin JCW, Fournier-Viger P, Chao HC, Yu PS (2018) A survey of parallel sequential pattern mining. CoRR, arXiv:1805.10515
- 19.Dinh D-T, Le B, Fournier-Viger P, Huynh V-N (2018) An efficient algorithm for mining periodic high-utility sequential patterns. Appl Intell 48(12):4694–4714CrossRefGoogle Scholar
- 20.Martin F, Méger N, Galichet S, Becourt N (2012) Forecasting failures in a data stream context application to vacuum pumping system prognosis. Trans Mach Learn Data Min 5(2):87–116Google Scholar
- 21.D’Andreagiovanni M, Baiardi F, Lipilini J, Ruggieri S, Tonelli F (2019) Sequential pattern mining for ict risk assessment and management. J Log Algebraic Methods Program 102:1–16MathSciNetCrossRefGoogle Scholar
- 22.Van T, Yoshitaka A, Le B (2018) Mining web access patterns with super-pattern constraint. Appl Intell 48(11):3902–3914CrossRefGoogle Scholar
- 23.Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289CrossRefGoogle Scholar
- 24.Rathore S, Goyal V (2015) Top-K high utility episode mining in complex event sequence. PhD thesisGoogle Scholar
- 25.Höppner F (2001) Discovery of temporal patterns. Learning rules about the qualitative behaviour of time series. In: Proceedings of the 5th European conference on principles of data mining and knowledge discovery, PKDD ’01. Springer, London, pp 192–203CrossRefGoogle Scholar
- 26.Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (Nov 2005) Discovering frequent arrangements of temporal intervals. In: Fifth IEEE international conference on data mining (ICDM’05), Houston, TX, USA. IEEEGoogle Scholar
- 27.Batal I, Cooper GF, Fradkin D, Harrison J Jr, Moerchen F, Hauskrecht M (2016) An efficient pattern mining approach for event detection in multivariate temporal data. Knowl Inf Syst 46(1):115–150CrossRefGoogle Scholar
- 28.Winarko E, Roddick JF (2007) ARMADA: an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl Eng 63(1):76–90
**(Data Warehouse and Knowledge Discovery, DAWAK’05)**CrossRefGoogle Scholar - 29.Papadopoulos S, Drosou A, Tzovaras D (2016) Fast frequent episode mining based on finite-state machines. In: Abdelrahman OH, Gelenbe E, Gorbil G, Lent R (eds) Information sciences and systems 2015. Springer International Publishing, Cham, pp 199–208CrossRefGoogle Scholar
- 30.Lin M-Y, Lee S-Y (2002) Fast discovery of sequential patterns by memory indexing. Springer, Berlin, pp 150–160zbMATHGoogle Scholar
- 31.Moskovitch R, Shahar Y (2009) Medical temporal-knowledge discovery via temporal abstraction. AMIA Annu Symp Proc 2009:452–456Google Scholar
- 32.Moskovitch R, Walsh C, Wang F, Hripcsak G, Tatonetti N (Nov 2015) Outcomes prediction via time intervals related patterns. In: 2015 IEEE international conference on data mining, pp 919–924Google Scholar
- 33.Sacchi L, Larizza C, Combi C, Bellazzi R (2007) Data mining with temporal abstractions: learning rules from time series. Data Min Knowl Discov 15(2):217–247MathSciNetCrossRefGoogle Scholar
- 34.Allen JF (1984) Towards a general theory of action and time. Artif Intell 23(2):123–154CrossRefGoogle Scholar
- 35.Patel D, Hsu W, Lee ML (2008) Mining relationships among interval-based events for classification. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08. ACM, New York, NY, USA, pp 393–404Google Scholar
- 36.Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, Beijing, China, pp 280–288Google Scholar
- 37.Ghosh S, Li J, Cao L, Ramamohanarao K (2017) Septic shock prediction for ICU patients via coupled HMM walking on sequential contrast patterns. J Biomed Inf 66:19–31CrossRefGoogle Scholar
- 38.Laxman S, Sastry P, Unnikrishnan K (2007) Discovering frequent generalized episodes when events persist for different durations. IEEE Trans Knowl Data Eng 19(9):1188–1201CrossRefGoogle Scholar
- 39.Tatti N, Cule B (2010) Mining closed strict episodes. In: Proceedings of the 2010 IEEE international conference on data mining, ICDM ’10. IEEE Computer Society, Washington, DC, USA, pp 501–510Google Scholar
- 40.Wu S-Y, Chen Y-L (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758CrossRefGoogle Scholar
- 41.Laxman S, Sastry PS, Unnikrishnan KP (2005) Discovering frequent episodes and learning hidden markov models: a formal connection. IEEE Trans Knowl Data Eng 17(11):1505–1517CrossRefGoogle Scholar
- 42.Hwang K, Bai X, Shi M, Li Y, Chen WG, Wu Y (2016) Cloud performance modeling and benchmark evaluation of elastic scaling strategies. IEEE Trans Parallel Distrib Syst 27(1):130–143CrossRefGoogle Scholar
- 43.Tatti N, Cule B (2012) Mining closed strict episodes. Data Min Knowl Discov 25(1):34–66MathSciNetCrossRefGoogle Scholar
- 44.Zaki MJ (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42(1–2):31–60CrossRefGoogle Scholar
- 45.Neapolitan RE, Neapolitan R, Naimipour K (2010) Foundations of algorithms. Jones & Bartlett Learning, BurlingtonzbMATHGoogle Scholar
- 46.Alam M, Shakil KA, Sethi S (2016) Analysis and clustering of workload in google cluster trace based on resource usage. In: 2016 IEEE international conference on computational science and engineering (CSE) and IEEE international conference on embedded and ubiquitous computing (EUC) and 15th international symposium on distributed computing and applications for business engineering (DCABES), pp 740–747. IEEEGoogle Scholar
- 47.Alexandru I, Hui L, Mathieu J, Shanny A, Catalin D, Lex W, Epema Dick HJ (2008) The grid workloads archive. Future Gener Comput Syst 24(7):672–686CrossRefGoogle Scholar
- 48.Shen S, van Beek V, Iosup A (2015) Statistical characterization of business-critical workloads hosted in cloud datacenters. In: 2015 15th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid), pp 465–474. IEEEGoogle Scholar
- 49.Li A, Yang X, Kandula S, Zhang M (2010) Cloudcmp: comparing public cloud providers. In: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pp 1–14. ACMGoogle Scholar