Skip to main content

Metrics for Association Rule Clustering Assessment

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems XVII

Abstract

Issues related to association mining have received attention, especially the ones aiming to discover and facilitate the search for interesting patterns. A promising approach, in this context, is the application of clustering in the pre-processing step. In this paper, eleven metrics are proposed to provide an assessment procedure in order to support the evaluation of this kind of approach. To propose the metrics, a subjective evaluation was done. The metrics are important since they provide criteria to: (a) analyze the methodologies, (b) identify their positive and negative aspects, (c) carry out comparisons among them and, therefore, (d) help the users to select the most suitable solution for their problems. Besides, the metrics do the users think about aspects related to the problems and provide a flexible way to solve them. Some experiments were done in order to present how the metrics can be used and their usefulness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this work, it is assumed that a pattern is interesting if it is relevant and/or useful to the user – rules having high support and/or high confidence are not necessarily interesting to the user.

  2. 2.

    Any other criteria could be adopted to select the \(h\)-top interesting rules.

  3. 3.

    In this work, it is considered that this labeling method is the one presented by [22].

  4. 4.

    http://cran.r-project.org/web/packages/arules/index.html.

  5. 5.

    In this work, each dendrogram obtained by Ward were cut considering each one of the values of \(k\).

  6. 6.

    http://www.borgelt.net/apriori.html.

  7. 7.

    Rule set obtained through a traditional process.

  8. 8.

    Rule set obtained through a partitioned data.

References

  1. Wu, X., Kumar, V.: The Top Ten Algorithms in Data Mining. Chapman & Hall/CRC, Boca Raton (2009)

    Book  MATH  Google Scholar 

  2. Dadaser-Celik, F., Celik, M., Dokuz, A.S.: Associations between stream flow and climatic variables at Kizilirmak river basin in Turkey. Glob. NEST J. 14(3), 354–361 (2012)

    Google Scholar 

  3. Xiao, G.: Association rules algorithm in bank risk assessment. In: Lee, J. (ed.) Advanced Electrical and Electronics Engineering. LNEE, vol. 87, pp. 675–681. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  4. Nuwangi, S.M., Oruthotaarachchi, C.R., Tilakaratna, J.M.P.P., Caldera, H.A.: Usage of association rules and classification techniques in knowledge extraction of diabetes. In: Proceedings of the 6th International Conference on Advanced Information Management and Service, pp. 372–377 (2010)

    Google Scholar 

  5. Rajasekar, U., Weng, Q.: Application of association rule mining for exploring the relationship between urban land surface temperature and biophysical/social parameters. Photogram. Eng. Remote Sens. 75(3), 385–396 (2009)

    Article  Google Scholar 

  6. Changguo, Y., Nianzhong, W., Tailei, W., Qin, Z., Xiaorong, Z.: The research on the application of association rules mining algorithm in network intrusion detection. In: Proceedings of the 1st International Workshop on Education Technology and Computer Science, vol. 2, pp. 849–852 (2009)

    Google Scholar 

  7. Koh, Y.S., Pears, R.: Rare association rule mining via transaction clustering. In: 7th Australasian Data Mining Conference. CRPIT, vol. 87, pp. 87–94 (2008)

    Google Scholar 

  8. Maquee, A., Shojaie, A.A., Mosaddar, D.: Clustering and association rules in analyzing the efficiency of maintenance system of an urban bus network. Int. J. Syst. Assur. Eng. Manage. 3(3), 175–183 (2012)

    Article  Google Scholar 

  9. Farajian, M.A., Mohammadi, S.: Mining the banking customer behavior using clustering and association rules methods. Int. J. Ind. Eng. Prod. Res. 21(4), 239–245 (2010)

    Google Scholar 

  10. Fan, L.: Research on classification mining method of frequent itemset. J. Convergence Inf. Technol. 5(8), 71–77 (2010)

    Article  Google Scholar 

  11. Plasse, M., Niang, N., Saporta, G., Villeminot, A., Leblond, L.: Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set. Comput. Stat. Data Anal. 52(1), 596–613 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  12. de Carvalho, V.O., dos Santos, F.F., Rezende, S.O.: Metrics to support the evaluation of association rule clustering. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2013. LNCS, vol. 8057, pp. 248–259. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  13. Aggarwal, C.C., Procopiuc, C., Yu, P.S.: Finding localized associations in market basket data. IEEE Trans. Knowl. Data Eng. 14(1), 51–62 (2002)

    Article  Google Scholar 

  14. Wang, K., Xu, C., Liu, B.: Clustering transactions using large items. In: 8th International Conference on Information and Knowledge Management, pp. 483–490 (1999)

    Google Scholar 

  15. Yun, C.-H., Chuang, K.-T., Chen, M.-S.: An efficient clustering algorithm for market basket data based on small large ratios. In: 25th International Computer Software and Applications Conference on Invigorating Software Development, pp. 505–510 (2001)

    Google Scholar 

  16. Wang, J., Karypis, G.: Summary: efficiently summarizing transactions for clustering. In: 4th IEEE International Conference on Data Mining, pp. 241–248 (2004)

    Google Scholar 

  17. Yang, L.: Pruning and visualizing generalized association rules in parallel coordinates. IEEE Trans. Knowl. Data Eng. 17(1), 60–70 (2005)

    Article  Google Scholar 

  18. D’Enza, A.I., Palumbo, F., Greenacre, M.: Exploratory data analysis leading towards the most interesting binary association rules. In: 11th Symposium on Applied Stochastic Models and Data Analysis, pp. 256–265 (2005)

    Google Scholar 

  19. Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)

    Article  Google Scholar 

  20. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2/3), 107–145 (2001)

    Article  MATH  Google Scholar 

  21. Carvalho, V.O., Biondi, D.S., Santos, F.F., Rezende, S.O.: Labeling methods for association rule clustering. In: Proceedings of the 14th International Conference on Enterprise Information Systems, pp. 105–109 (2012)

    Google Scholar 

  22. Padua, R., Carvalho, V.O., Serapião, A.B.S.: Labeling association rule clustering through a genetic algorithm approach. In: Proceedings of the 17th East European Conference on Advances in Databases and Information Systems, pp. 45–52 (2013)

    Google Scholar 

  23. Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Inf. Syst. 29(4), 293–313 (2004)

    Article  Google Scholar 

  24. Xu, R., Wunsch, D.: Clustering. Computational Intelligence. IEEE Press/Wiley, New York (2008)

    Book  Google Scholar 

  25. Carvalho, V.O., Santos, F.F., Rezende, S.O., Padua, R.: PAR-COM: a new methodology for post-processing association rules. Lect. Notes Bus. Inf. Process. 102, 66–80 (2012)

    Article  Google Scholar 

  26. Carvalho, V.O., Santos, F.F., Rezende, S.O.: Post-processing association rules with clustering and objective measures. In: Proceedings of 13th International Conference on Enterprise Information Systems, vol. 1, pp. 54–63 (2011)

    Google Scholar 

Download references

Acknowledgments

We wish to thank Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) (processes numbers: 2010/07879-0 and 2011/19850-9) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) (process number DS-6345378/D) for the financial support. Besides, we also want to thank the reviewers for the great contributions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Veronica Oliveira de Carvalho .

Editor information

Editors and Affiliations

Appendix: Questionnaire

Appendix: Questionnaire

Introduction. Many issues related to association rule mining have received attention in the last years, especially the ones aiming to discover and facilitate the search for the interesting patterns of the domain. One approach related to this issue is the application of clustering in the pre-process step. In this case, as noticed in the figure below, data are initially grouped in \(n\) groups (\(GD_1\),\(GD_2\),...,\(GD_n\)). From this initial clustering, the rules are then extracted within each group (cluster), obtaining \(n\) groups of rules (\(GR_1\),\(GR_2\),...,\(GR_n\)). The aim is to obtain potentially interesting rules that would not be extracted from unpartitioned data sets, for not having enough support, without overloading the user with a great amount of patterns. The user must set the minimum support to a low value to discover these same patterns from unpartitioned data sets, causing a rapidly increase in the number of rules. Thereby, data are initially split and the rules are extracted within each group, in a manner that each group expresses its own associations without the interference of the other groups that contain different association patterns. Distinct methodologies have been proposed to enable this process. Each methodology uses a different combination of clustering algorithms and similarity measures in order to obtain the groups of rules.

figure c

It is in this context that this evaluation should be done. Some scenarios that can occur in this scope are shown below, waiting for your contribution for a better understanding of the problem. In all the cases, it is assumed that two rule sets are available, in order to evaluate the presented scenarios: one extracted through traditional process, RsTFootnote 7, and one extracted through clustering (process above described), RsPFootnote 8 – the examples presented below are merely illustrations of the scenarios and, therefore, should not be evaluated considering the knowledge they express. Based on this evaluation, the aim is to propose an assessment procedure to support the analysis of the existing methodologies.

Scenarios

  1. 1.

    In your opinion, observing “Scenario-A” (Table 6), how do you consider the occurrence of rules obtained in RsT in RsP (cases in green and orange)? Both the cases, green and orange, represent rules obtained in both of the sets, but the rules in orange are extracted more than once in RsP over the groups. If needed to distinguish the green cases of the orange cases, please let it indicated.

    ( ) desirable ( ) indifferent ( ) no desirable

    a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

    ( ) yes ( ) no

    b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?

  2. 2.

    In your opinion, observing “Scenario-A” (Table 6), how do you consider the non-occurrence of rules obtained in RsP in RsT (cases in purple and red)? Both the cases, purple and red, represent rules obtained only in RsP, but the rules in red are extracted more than once in RsP over the groups. If needed to distinguish the purple cases of the red cases, please let it indicated.

    ( ) desirable ( ) indifferent ( ) no desirable

    a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

    ( ) yes ( ) no

    b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?

Table 6. Scenario-A. This scenario was formulated based on the Sup data set described in Sect. 5. In this scenario the rules in RsP are presented in their own clusters since the aim here is to detach to the user the situations that can occur among the groups (repetitions of rules) and between RsT and RsP (occurrence/non-occurrence of rules between the sets).

For questions “3” to “6”, consider that for each rule set, RsP and RsT, it is shown only the subset related to the \(n\) most interesting rules of the domain. These subsets can be identified, for example, automatically, based on a set of objective measures – assuming that objective measures are suitable to find the most interesting knowledge of a given domain.

  1. 3.

    In your opinion, observing “Scenario-B” (Table 7), how do you consider the non-occurrence of some (or none) of the \(n\) most interesting rules in RsP in RsT (cases in blue)? Notice that the blue rules belong only to the RsP set.

    ( ) desirable ( ) indifferent ( ) no desirable

    a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

    ( ) yes ( ) no

    b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?

  2. 4.

    In your opinion, observing “Scenario-B” (Table 7), how do you consider the reverse scenario? This is, the non-occurrence of some (or none) of the \(n\) most interesting rules in RsT in RsP (cases in orange)? Notice that the orange rules belong only to the RsT set.

    ( ) desirable ( ) indifferent ( ) no desirable

    a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

    ( ) yes ( ) no

    b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?

  3. 5.

    In your opinion, observing “Scenario-B” (Table 7), how do you consider the existing intersection between the \(n\) most interesting rules in RsP and the \(n\) most interesting rules in RsT (cases in red)?

    ( ) desirable ( ) indifferent ( ) no desirable

    a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

    ( ) yes ( ) no

    b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?

  4. 6.

    In your opinion, how do you would consider the spread of the \(n\) most interesting rules in RsP in a small number of clusters?

    ( ) desirable ( ) indifferent ( ) no desirable

    a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

    ( ) yes ( ) no

    b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?

Table 7. Scenario-B. This scenario was formulated based on the Sup data set described in Sect. 5. In this scenario the rules in RsP are presented all together since, in this case, only the \(n\) most interesting rules in the set are exhibited to the user, independently of the group they were extracted – the aim here is to detach to the user the situations that can occur between the subsets containing the \(n\) most interesting rules.
  1. 7.

    In your opinion, do you consider that the amount of rules to be extracted through clustering, compared to the traditional process, should be:

    ( ) low ( ) average ( ) high

    a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

    ( ) yes ( ) no

    b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?

  2. 8.

    In your opinion, only in relation to RsP, do you consider that the clustering process should, as a consequence, enable each cluster to express a distinct topic of the domain?

    ( ) yes ( ) indifferent ( ) no

    a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

    ( ) yes ( ) no

    b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?

  3. 9.

    Can you identify other scenario(s), not previously explored, that can be relevant to the presented context? Give an example of the scenario(s) that you identified.

    a. Do you think important to consider this(these) scenario(s) in an assessment procedure to be used in the presented context?

    ( ) yes ( ) no

  4. 10.

    If you want to leave any comment/observation, please do it below.

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

de Carvalho, V.O., dos Santos, F.F., Rezende, S.O. (2015). Metrics for Association Rule Clustering Assessment. In: Hameurlain, A., Küng, J., Wagner, R., Bellatreche, L., Mohania, M. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XVII. Lecture Notes in Computer Science(), vol 8970. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46335-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46335-2_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46334-5

  • Online ISBN: 978-3-662-46335-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics