Skip to main content

K-Optimal Pattern Discovery: An Efficient and Effective Approach to Exploratory Data Mining

  • Conference paper
AI 2005: Advances in Artificial Intelligence (AI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3809))

Included in the following conference series:

Abstract

Most data-mining techniques seek a single model that optimizes an objective function with respect to the data. In many real-world applications several models will equally optimize this function. However, they may not all equally satisfy a user’s preferences, which will be affected by background knowledge and pragmatic considerations that are infeasible to quantify into an objective function.

Thus, the program may make arbitrary and potentially suboptimal decisions. In contrast, methods for exploratory pattern discovery seek all models that satisfy user-defined criteria. This allows the user select between these models, rather than relinquishing control to the program. Association rule discovery [1] is the best known example of this approach. However, it is based on the minimum-support technique, by which patterns are only discovered that occur in the data more than a user-specified number of times. While this approach has proved very effective in many applications, it is subject to a number of limitations.

  • It creates an arbitrary discontinuity in the interestingness function by which one more or less case supporting a pattern can transform its assessment from uninteresting to most interesting.

  • Sometimes the most interesting patterns are very rare [3].

  • Minimum support may not be relevant to whether a pattern is interesting.

  • It is often difficult to find a minimum support level that results in sufficient but not excessive numbers of patterns being discovered.

  • It cannot handle dense data [2].

  • It limits the ability to efficiently prune the search space on the basis on constraints that are neither monotone nor anti-monotone with respect to support.

K-optimal pattern discovery [4,5,11,14,15,17-20] is an exploratory technique that finds the k patterns that optimize a user-selected objective function while respecting other user-specified constraints. This strategy avoids the above problems while empowering the user to select between preference criteria and to directly control the number of patterns that are discovered. It also supports statistically sound exploratory pattern discovery [8]. Its effectiveness is demonstrated by a large range of applications [5-10,12,13].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, R., Imielinski, T., Swami, A.N.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. 1993 ACM SIGMOD Int. Conf. Management of Data, Washington, D.C., pp. 207–216 (1993)

    Google Scholar 

  2. Bayardo Jr., R.J., Agrawal, R., Gunopulos, D.: Constraint-Based Rule Mining in Large, Dense Databases. Data Mining and Knowledge Discovery 4, 217–240 (2000)

    Article  Google Scholar 

  3. Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J.D., Yang, C.: Finding Interesting Associations without Support Pruning. In: Proceedings Int. Conf. Data Engineering, pp. 489–499 (2000)

    Google Scholar 

  4. Han, J., Wang, J., Lu, Y., Tzvetkov, P.: Mining Top-K Frequent Closed Patterns without Minimum Support. In: Int. Conf. Data Mining, pp. 211–218 (2002)

    Google Scholar 

  5. Hellström, T.: Learning Robotic Behaviors with Association Rules. WSEAS transactions on systems (2003) ISBN 1109-2777

    Google Scholar 

  6. Eirinaki, M., Vazirgiannis, M., Varlamis, I.: SEWeP: using site semantics and a taxonomy to enhance the Web personalization process. In: Proc. KDD-2003: the SIGKDD Conference of Knowledge Discovery and Datamining, pp. 99–108. ACM Press, New York (2003)

    Chapter  Google Scholar 

  7. Jiao, J., Zhang, Y.: Product portfolio identification based on association rule mining. Computer-Aided Design 37, 149–172 (2005)

    Article  Google Scholar 

  8. McAullay, D., Williams, G.J., Chen, J., Jin, H.: A Delivery Framework for Health Data Mining and Analytics. In: Australian Computer Science Conference, pp. 381–390 (2005)

    Google Scholar 

  9. Mennis, J., Liu, J.W.: Mining association rules in spatio-temporal data: an analysis of urban socioeconomic and land cover change. Transactions in GIS 9, 13–18 (2005)

    Article  Google Scholar 

  10. Raz, O.: Helping Everyday Users Find Anomalies in Data Feeds, Ph.D. Thesis - Software Engineering, Carnegie-Mellon University (2004)

    Google Scholar 

  11. Scheffer, T., Wrobel, S.: Finding the Most Interesting Patterns in a Database Quickly by Using Sequential Sampling. Journal of Machine Learning Research 3, 833–862 (2002)

    Article  MathSciNet  Google Scholar 

  12. Siu, K.K.W., Butler, S.M., Beveridge, T., Gillam, J.E., Hall, C.J., Kaye, A.H., Lewis, R.A., Mannan, K., McLoughlin, G., Pearson, S., Round, A.R., Schultke, E., Webb, G.I., Wilkinson, S.J.: Identifying markers of pathology in SAXS data of malignant tissues of the brain. Nuclear Instruments and Methods in Physics Research A (in press)

    Google Scholar 

  13. Tsironis, L., Bilalis, N., Moustakis, V.: Using inductive Machine Learning to support Quality Management. In: Proc. 3rd Int. Conf. Design and Analysis of Manufacturing Systems, Tinos Island, University of Aegean (2001)

    Google Scholar 

  14. Webb, G.I.: Discovering associations with numeric variables. In: Proc. 7th ACM SIGKDD Int. Conf. Knowledge Discovery and Data mining, pp. 383–388. ACM Press, New York (2001)

    Chapter  Google Scholar 

  15. Webb, G.I.: OPUS: An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research 3, 431–465 (1995)

    MATH  Google Scholar 

  16. Webb, G.I.: Preliminary investigations into statistically valid exploratory rule discovery. In: Proc. Australasian Data Mining Workshop (AusDM 2003), University of Technology, Sydney, pp. 1–9 (2003)

    Google Scholar 

  17. Webb, G.I., Butler, S., Newlands, D.: On detecting differences between groups. In: Proc. KDD-2003: The SIGKDD Conference of Knowledge Discovery and Datamining, pp. 256–265. ACM Press, New York (2003)

    Chapter  Google Scholar 

  18. Webb, G.I., Zhang, S.: K-Optimal-Rule-Discovery. Data Mining and Knowledge Discovery 10, 39–79 (2005)

    Article  MathSciNet  Google Scholar 

  19. Webb, G.I.: Efficient search for association rules. In: Proc. KDD-2000: the SIGKDD Conf. Knowledge Discovery and Datamining, pp. 99–107. ACM Press, New York (2000)

    Chapter  Google Scholar 

  20. Wrobel, S.: An Algorithm for Multi-relational Discovery of Subgroups. In: Proc. Principles of Data Mining and Knowledge Discovery, pp. 78–87. Springer, Berlin (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Webb, G.I. (2005). K-Optimal Pattern Discovery: An Efficient and Effective Approach to Exploratory Data Mining. In: Zhang, S., Jarvis, R. (eds) AI 2005: Advances in Artificial Intelligence. AI 2005. Lecture Notes in Computer Science(), vol 3809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11589990_1

Download citation

  • DOI: https://doi.org/10.1007/11589990_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30462-3

  • Online ISBN: 978-3-540-31652-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics