Abstract
Contrast pattern mining, which finds patterns describing differences between two classes of data, is an important task in various scenarios. As real-world data is usually a mixture of nominal and numerical attributes (e.g., electronic medical records), contrast pattern mining algorithms over nominal-numerical mixed data are in great demand. Existing algorithms on contrast pattern mining either can only handle a single type of attribute or transform numerical attributes into nominal attributes with prior knowledge. However, these algorithms may result in limited discrimination of contrast patterns due to the failure to exploit the original data information and inflexible pattern forms. In this paper, we propose a novel algorithm, CHPMiner, which mines a new kind of contrast pattern called contrast hybrid pattern (CHP) that contains nominal attributes and numerical relationships among numerical attributes based on extended gene expression programming (GEP). Specifically, CHPMiner develops two sub-expressions and a novel structure to combine nominal and numerical attributes. Moreover, CHPMiner leverages a specific fitness function to guide the evolution direction for mining CHPs that are highly discriminating. Experiments on four real-world datasets show that CHPMiner outperforms baselines. The case study further demonstrates the effectiveness of CHPMiner.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Chavary, E.A., Erfani, S.M., Leckie, C.: Mining rare recurring events in network traffic using second order contrast patterns. In: IJCNN, pp. 1–8 (2021)
Chavary, E.A., Erfani, S.M., Leckie, C.: Scalable contrast pattern mining over data streams. In: CIKM, pp. 2842–2846 (2021)
Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: KDD, pp. 43–52 (1999)
Duan, L., Dong, G., Wang, X., Tang, C.: Efficient mining of discriminating relationships among attributes involving arithmetic operations. Comput. Intell. 32(1), 102–126 (2016)
Duan, L., Tang, C., Tang, L., Zhang, T., Zuo, J.: Mining class contrast functions by gene expression programming. In: ADMA, pp. 116–127 (2009)
Duan, L., Zuo, J., Zhang, T., Peng, J., Gong, J.: Mining contrast inequalities in numeric dataset. In: WAIM, pp. 194–205 (2010)
Ferreira, C.: Gene expression programming: a new adaptive algorithm for solving problems. Complex Syst. 13(2) (2001)
Grosskreutz, H., Rüping, S.: On subgroup discovery in numerical domains. In: ECML PKDD, p. 30 (2009)
Khade, R., Lin, J., Patel, N.: Finding contrast patterns for mixed streaming data. In: EDBT, pp. 632–641 (2018)
Khade, R., Lin, J., Patel, N.: Finding meaningful contrast patterns for quantitative data. In: EDBT, pp. 444–455 (2019)
Komiyama, J., Ishihata, M., Arimura, H., Nishibayashi, T., Minato, S.I.: Statistical emerging pattern mining with multiple testing correction. In: SIGKDD, pp. 897–906 (2017)
Koza, J.R., Andre, D., Keane, M.A., Bennett III, F.H.: Genetic programming III: Darwinian invention and problem solving, vol. 3. Morgan Kaufmann (1999)
Li, J., et al.: Differential lipids in pregnant women with subclinical hypothyroidism and their correlation to the pregnancy outcomes. Sci. Rep. 11(1), 1–9 (2021)
Li, J., Dong, G., Ramamohanarao, K.: Making use of the most expressive jumping emerging patterns for classification. In: PAKDD, pp. 220–232
Li, J., Liu, G., Wong, L.: Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: SIGKDD, pp. 430–439 (2007)
Li, Q., Chen, X., Wu, R.: Mining contrast sequential patterns based on subsequence location distribution from biological sequences. In: DSIT, pp. 204–209 (2019)
Li, Y., Matzka, L., Flahive, J., Weber, D.: Potential use of leukocytosis and anion gap elevation in differentiating psychogenic nonepileptic seizures from epileptic seizures. Epilepsia Open 4(1), 210–215 (2019)
Loekito, E., Bailey, J.: Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams. In: KDD, pp. 307–316 (2006)
Redford, C., Vaidya, B.: Subclinical hypothyroidism: should we treat? Post Reprod. Health. 23(2), 55–62 (2017)
Schmidt, J., et al.: Interpreting PET scans by structured patient data: a data mining case study in dementia research. Knowl. Inf. Syst. 24(1), 149–170 (2010)
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (61972268), the Sichuan Science and Technology Program (2020YFG0034), and the Med-X Center for Informatics funding project of SCU (YGJC001).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fu, M., Duan, L., Yu, Z. (2022). Effective Mining of Contrast Hybrid Patterns from Nominal-numerical Mixed Data. In: Chen, W., Yao, L., Cai, T., Pan, S., Shen, T., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2022. Lecture Notes in Computer Science(), vol 13725. Springer, Cham. https://doi.org/10.1007/978-3-031-22064-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-22064-7_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22063-0
Online ISBN: 978-3-031-22064-7
eBook Packages: Computer ScienceComputer Science (R0)