Skip to main content

Effective Mining of Contrast Hybrid Patterns from Nominal-numerical Mixed Data

  • Conference paper
  • First Online:
Advanced Data Mining and Applications (ADMA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13725))

Included in the following conference series:

  • 973 Accesses

Abstract

Contrast pattern mining, which finds patterns describing differences between two classes of data, is an important task in various scenarios. As real-world data is usually a mixture of nominal and numerical attributes (e.g., electronic medical records), contrast pattern mining algorithms over nominal-numerical mixed data are in great demand. Existing algorithms on contrast pattern mining either can only handle a single type of attribute or transform numerical attributes into nominal attributes with prior knowledge. However, these algorithms may result in limited discrimination of contrast patterns due to the failure to exploit the original data information and inflexible pattern forms. In this paper, we propose a novel algorithm, CHPMiner, which mines a new kind of contrast pattern called contrast hybrid pattern (CHP) that contains nominal attributes and numerical relationships among numerical attributes based on extended gene expression programming (GEP). Specifically, CHPMiner develops two sub-expressions and a novel structure to combine nominal and numerical attributes. Moreover, CHPMiner leverages a specific fitness function to guide the evolution direction for mining CHPs that are highly discriminating. Experiments on four real-world datasets show that CHPMiner outperforms baselines. The case study further demonstrates the effectiveness of CHPMiner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Link to the source code: https://github.com/fumin-git/CHP-Miner.git.

  2. 2.

    https://archive.ics.uci.edu/ml/datasets/Credit+Approval.

  3. 3.

    https://archive.ics.uci.edu/ml/datasets/Cylinder+Bands.

  4. 4.

    https://archive.ics.uci.edu/ml/datasets/census+income.

  5. 5.

    https://archive.ics.uci.edu/ml/datasets/thyroid+disease.

References

  1. Chavary, E.A., Erfani, S.M., Leckie, C.: Mining rare recurring events in network traffic using second order contrast patterns. In: IJCNN, pp. 1–8 (2021)

    Google Scholar 

  2. Chavary, E.A., Erfani, S.M., Leckie, C.: Scalable contrast pattern mining over data streams. In: CIKM, pp. 2842–2846 (2021)

    Google Scholar 

  3. Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: KDD, pp. 43–52 (1999)

    Google Scholar 

  4. Duan, L., Dong, G., Wang, X., Tang, C.: Efficient mining of discriminating relationships among attributes involving arithmetic operations. Comput. Intell. 32(1), 102–126 (2016)

    Article  MathSciNet  Google Scholar 

  5. Duan, L., Tang, C., Tang, L., Zhang, T., Zuo, J.: Mining class contrast functions by gene expression programming. In: ADMA, pp. 116–127 (2009)

    Google Scholar 

  6. Duan, L., Zuo, J., Zhang, T., Peng, J., Gong, J.: Mining contrast inequalities in numeric dataset. In: WAIM, pp. 194–205 (2010)

    Google Scholar 

  7. Ferreira, C.: Gene expression programming: a new adaptive algorithm for solving problems. Complex Syst. 13(2) (2001)

    Google Scholar 

  8. Grosskreutz, H., Rüping, S.: On subgroup discovery in numerical domains. In: ECML PKDD, p. 30 (2009)

    Google Scholar 

  9. Khade, R., Lin, J., Patel, N.: Finding contrast patterns for mixed streaming data. In: EDBT, pp. 632–641 (2018)

    Google Scholar 

  10. Khade, R., Lin, J., Patel, N.: Finding meaningful contrast patterns for quantitative data. In: EDBT, pp. 444–455 (2019)

    Google Scholar 

  11. Komiyama, J., Ishihata, M., Arimura, H., Nishibayashi, T., Minato, S.I.: Statistical emerging pattern mining with multiple testing correction. In: SIGKDD, pp. 897–906 (2017)

    Google Scholar 

  12. Koza, J.R., Andre, D., Keane, M.A., Bennett III, F.H.: Genetic programming III: Darwinian invention and problem solving, vol. 3. Morgan Kaufmann (1999)

    Google Scholar 

  13. Li, J., et al.: Differential lipids in pregnant women with subclinical hypothyroidism and their correlation to the pregnancy outcomes. Sci. Rep. 11(1), 1–9 (2021)

    Google Scholar 

  14. Li, J., Dong, G., Ramamohanarao, K.: Making use of the most expressive jumping emerging patterns for classification. In: PAKDD, pp. 220–232

    Google Scholar 

  15. Li, J., Liu, G., Wong, L.: Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: SIGKDD, pp. 430–439 (2007)

    Google Scholar 

  16. Li, Q., Chen, X., Wu, R.: Mining contrast sequential patterns based on subsequence location distribution from biological sequences. In: DSIT, pp. 204–209 (2019)

    Google Scholar 

  17. Li, Y., Matzka, L., Flahive, J., Weber, D.: Potential use of leukocytosis and anion gap elevation in differentiating psychogenic nonepileptic seizures from epileptic seizures. Epilepsia Open 4(1), 210–215 (2019)

    Article  Google Scholar 

  18. Loekito, E., Bailey, J.: Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams. In: KDD, pp. 307–316 (2006)

    Google Scholar 

  19. Redford, C., Vaidya, B.: Subclinical hypothyroidism: should we treat? Post Reprod. Health. 23(2), 55–62 (2017)

    Article  Google Scholar 

  20. Schmidt, J., et al.: Interpreting PET scans by structured patient data: a data mining case study in dementia research. Knowl. Inf. Syst. 24(1), 149–170 (2010)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (61972268), the Sichuan Science and Technology Program (2020YFG0034), and the Med-X Center for Informatics funding project of SCU (YGJC001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Duan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fu, M., Duan, L., Yu, Z. (2022). Effective Mining of Contrast Hybrid Patterns from Nominal-numerical Mixed Data. In: Chen, W., Yao, L., Cai, T., Pan, S., Shen, T., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2022. Lecture Notes in Computer Science(), vol 13725. Springer, Cham. https://doi.org/10.1007/978-3-031-22064-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22064-7_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22063-0

  • Online ISBN: 978-3-031-22064-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics