Skip to main content

Efficiently Mining Interesting Emerging Patterns

  • Conference paper
Advances in Web-Age Information Management (WAIM 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2762))

Included in the following conference series:

Abstract

Emerging patterns (EPs) are itemsets whose supports change significantly from one class to another. It has been shown that they are very powerful distinguishable features and they are very useful for constructing accurate classifiers. Previous EP mining approaches often produce a large number of EPs, which makes it very difficult to choose interesting ones manually. Usually, a post-processing filter step is applied for selecting interesting EPs based on some interestingness measures.

In this paper, we first generalize the interestingness measures for EPs, including the minimum support, the minimum growth rate, the subset relationship between EPs and the correlation based on common statistical measures such as chi-squared value. We then develop an efficient algorithm for mining only those interesting EPs, where the chi-squared test is used as heuristic to prune the search space. The experimental results show that our algorithm maintains efficiency even at low supports on data that is large, dense and has high dimensionality. They also show that the heuristic is admissible, because only unimportant EPs with low supports are ignored. Our work based on EPs for classification confirms that the discovered interesting EPs are excellent candidates for building accurate classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bayardo, R.J.: Efficiently mining long patterns from databases. In: Proc. ACMSIGMOD 1998, Seattle, WA, USA, June 1998, pp. 85–93 (1998)

    Google Scholar 

  2. Bethea, R.M., Duran, B.S., Boullion, T.L.: Statistical methods for engineers and scientists. M. Dekker, New York (1995)

    Google Scholar 

  3. Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)

    Google Scholar 

  4. Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Proc. ACM-SIGKDD 1999, San Diego, CA, August 1999, pp. 43–52 (1999)

    Google Scholar 

  5. Dong, G., Zhang, X., Wong, L., Li, J.: Classification by aggregating emerging patterns. In: Proc. the 2nd Intl. Conf. on Discovery Science, Tokyo, pp. 30–42

    Google Scholar 

  6. Fan, H., Ramamohanarao, K.: An efficient single-scan algorithm for mining essential jumping emerging patterns for classification. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 456. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  7. Fan, H., Ramamohanarao, K.: A bayesian approach to use emerging patterns for classification. In: Proc. 14th Australasian Database Conference, ADC 2003 (2003)

    Google Scholar 

  8. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. ACM-SIGMOD 2000, Dallas, TX, USA, May 2000, pp. 1–12 (2000)

    Google Scholar 

  9. Wong, L., Li, J.: Identifying good diagnostic genes or genes groups from gene expression data by using the concept of emerging patterns. Bioinformatics 18(5), 725–734 (2002)

    Article  Google Scholar 

  10. Li, J., Dong, G., Ramamohanarao, K.: Making use of the most expressive jumping emerging patterns for classification. Knowledge and Information Systems 3(2), 131–145 (2001)

    Article  Google Scholar 

  11. Li, J., Dong, G., Ramamohanarao, K., Wong, L.: DeEPs: A new instance-based discovery and classification system. Machine Learning (to appear)

    Google Scholar 

  12. Li, J., Liu, H., Downing, J.R., Wong, L., Yeoh, A.: Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (all) patients. Bioinformatics 19(1), 71–78 (2003)

    Article  MATH  Google Scholar 

  13. Li, J., Wong, L.: Geography of differences between two classes of data. In: Proc. 6th European Conf. on Principles of Data Mining and Knowledge Discovery, Helsinki, Finland, August 2002, pp. 325–337 (2002)

    Google Scholar 

  14. Lim, T., Loh, W., Shih, Y.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning 40, 203–228 (2000)

    Article  MATH  Google Scholar 

  15. Silberschatz, A., Tuzhilin, A.: What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering 8(6), 970–974 (1996)

    Article  Google Scholar 

  16. Zhang, X., Dong, G., Ramamohanarao, K.: Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets. In: Proc. ACMSIGKDD 2000, Boston, USA, August 2000, pp. 310–314 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fan, H., Ramamohanarao, K. (2003). Efficiently Mining Interesting Emerging Patterns. In: Dong, G., Tang, C., Wang, W. (eds) Advances in Web-Age Information Management. WAIM 2003. Lecture Notes in Computer Science, vol 2762. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45160-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45160-0_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40715-7

  • Online ISBN: 978-3-540-45160-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics