Abstract
Rare patterns are essential forms of patterns in many real-world applications such as interpretation of biological data, mining of rare association rules between diseases and their causes, detection of anomalies. However, discovering rare patterns can be challenging. In this paper, we present an efficient algorithm for mining minimal rare patterns from sparse and weakly correlated data. The algorithm non-trivially integrates and adapts vertical frequent pattern algorithm VIPER to discover minimal rare patterns in an efficient manner. Evaluation results on our algorithm RP-VIPER show its superiority over existing horizontal rare pattern mining algorithms. Results also highlight the performance improvements brought by our optimized strategies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C.C.: Data Mining: The Textbook. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8
Han, J., et al.: Data Mining: Concepts and Techniques, 4th edn. MK (2022)
Brown, P.O., et al.: Mahalanobis distance based k-means clustering. In: Wrembel, R., Gamper, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) Big Data Analytics and Knowledge Discovery. DaWaK 2022. LNCS, vol. 13428, pp. 256–262. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-12670-3_23
Dierckens, K.E., et al.: A data science and engineering solution for fast k-means clustering of big data. In: IEEE TrustCom-BigDataSE-ICESS 2017, pp. 925–932
Choudhery, D., Leung, C.K.: Social media mining: prediction of box office revenue. In: IDEAS 2017, pp. 20–29
Agrawal, R., et al.: Mining association rules between sets of items in large databases. In: ACM SIGMOD 1993, pp. 207–216
Agrawal, R., Srikanth, R.: Fast algorithms for mining association rules. In: VLDB 1994, pp. 487–499
de Guia, J., et al.: DeepGx: deep learning using gene expression for cancer classification. In: IEEE/ACM ASONAM 2019, pp. 913–920
Fung, D.L.X., Liu, Q., Zammit, J., et al.: Self-supervised deep learning model for COVID-19 lung CT image segmentation highlighting putative causal relationship among age, underlying disease and COVID-19. BMC J. Transl. Med. 19, 318:1–318:18 (2021). https://doi.org/10.1186/s12967-021-02992-2
Leung, C.K., Fung, D.L.X., Hoi, C.S.H.: Health analytics on COVID-19 data with few-shot learning. In: Golfarelli, M., Wrembel, R., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DaWaK 2021. LNCS, vol. 12925, pp. 67–80. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86534-4_6
Balbin, P.P.F., et al.: Predictive analytics on open big data for supporting smart transportation services. Procedia Comput. Sci. 176, 3009–3018 (2020)
Leung, C.K., Braun, P., Pazdor, A.G.M.: Effective classification of ground transportation modes for urban data mining in smart cities. In: Ordonez, C., Bellatreche, L. (eds.) DaWaK 2018. LNCS, vol. 11031, pp. 83–97. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98539-8_7
Leung, C.K., Braun, P., Hoi, C.S.H., Souza, J., Cuzzocrea, A.: Urban analytics of big transportation data for supporting smart cities. In: Ordonez, C., Song, I.-Y., Anderst-Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DaWaK 2019. LNCS, vol. 11708, pp. 24–33. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27520-4_3
Braun, P., Cuzzocrea, A., Jiang, F., Leung, C.-S., Pazdor, A.G.M.: MapReduce-based complex big data analytics over uncertain and imprecise social networks. In: Bellatreche, L., Chakravarthy, S. (eds.) DaWaK 2017. LNCS, vol. 10440, pp. 130–145. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64283-3_10
Leung, C.K., Jiang, F., Poon, T.W., Crevier, P.: Big data analytics of social network data: who cares most about you on Facebook? In: Moshirpour, M., Far, B., Alhajj, R. (eds.) Highlighting the Importance of Big Data Management and Analysis for Various Applications, vol. 27, pp. 1–15. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-60255-4_1
Leung, C.K., et al., Personalized DeepInf: enhanced social influence prediction with deep learning and transfer learning. In: IEEE BigData 2019, pp. 2871–2880
Dong, G., Bailey, J.: Contrast Data Mining: Concepts, Algorithms, and Applications. Chapman & Hall/CRC, New York (2012)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: IEEE ICDE 1995, pp. 3–14
Madill, E.W., Leung, C.K., Gouge, J.M.: Enhanced sliding window-based periodic pattern mining from dynamic streams. In: Wrembel, R., Gamper, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) Big Data Analytics and Knowledge Discovery. DaWaK 2022. LNCS, vol. 13428, pp. 234–240. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-12670-3_20
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: ACM SIGMOD 1996, pp. 1–12
Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explor. 6(1), 7–19 (2004)
Szathmary, L., et al.: Towards rare itemset mining. In: IEEE ICTAI 2007, pp. 305–312
Szathmary, L., et al.: Efficient vertical mining of minimal rare itemsets. In: CLA 2012, pp. 269–280
Shenoy, P., et al.: Turbo-charging vertical mining of large databases. In: ACM SIGMOD 2000, pp. 22–33
Czubryt, T.J., Leung, C.K., Pazdor, A.G.M.: Q-VIPER: quantitative vertical bitwise algorithm to mine frequent patterns. In: Wrembel, R., Gamper, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DaWaK 2022. LNCS, vol. 13428, pp. 219–233. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-12670-3_19
Acknowledgement
This work is partially supported by Natural Sciences and Engineering Research Council of Canada (NSERC) and University of Manitoba.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Capillar, E., Ishmam, C.A.M., Leung, C.K., Pazdor, A.G.M., Shrivastava, P., Truong, N.B.C. (2023). Bitwise Vertical Mining of Minimal Rare Patterns. In: Wrembel, R., Gamper, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2023. Lecture Notes in Computer Science, vol 14148. Springer, Cham. https://doi.org/10.1007/978-3-031-39831-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-39831-5_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39830-8
Online ISBN: 978-3-031-39831-5
eBook Packages: Computer ScienceComputer Science (R0)