Metrics for Association Rule Clustering Assessment

de Carvalho, Veronica Oliveira; dos Santos, Fabiano Fernandes; Rezende, Solange Oliveira

doi:10.1007/978-3-662-46335-2_5

Veronica Oliveira de Carvalho²¹,
Fabiano Fernandes dos Santos²² &
Solange Oliveira Rezende²²

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 8970))

491 Accesses

Abstract

Issues related to association mining have received attention, especially the ones aiming to discover and facilitate the search for interesting patterns. A promising approach, in this context, is the application of clustering in the pre-processing step. In this paper, eleven metrics are proposed to provide an assessment procedure in order to support the evaluation of this kind of approach. To propose the metrics, a subjective evaluation was done. The metrics are important since they provide criteria to: (a) analyze the methodologies, (b) identify their positive and negative aspects, (c) carry out comparisons among them and, therefore, (d) help the users to select the most suitable solution for their problems. Besides, the metrics do the users think about aspects related to the problems and provide a flexible way to solve them. Some experiments were done in order to present how the metrics can be used and their usefulness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this work, it is assumed that a pattern is interesting if it is relevant and/or useful to the user – rules having high support and/or high confidence are not necessarily interesting to the user.
2.
Any other criteria could be adopted to select the \(h\)-top interesting rules.
3.
In this work, it is considered that this labeling method is the one presented by [22].
4.
http://cran.r-project.org/web/packages/arules/index.html.
5.
In this work, each dendrogram obtained by Ward were cut considering each one of the values of \(k\).
6.
http://www.borgelt.net/apriori.html.
7.
Rule set obtained through a traditional process.
8.
Rule set obtained through a partitioned data.

References

Wu, X., Kumar, V.: The Top Ten Algorithms in Data Mining. Chapman & Hall/CRC, Boca Raton (2009)
Book MATH Google Scholar
Dadaser-Celik, F., Celik, M., Dokuz, A.S.: Associations between stream flow and climatic variables at Kizilirmak river basin in Turkey. Glob. NEST J. 14(3), 354–361 (2012)
Google Scholar
Xiao, G.: Association rules algorithm in bank risk assessment. In: Lee, J. (ed.) Advanced Electrical and Electronics Engineering. LNEE, vol. 87, pp. 675–681. Springer, Heidelberg (2011)
Chapter Google Scholar
Nuwangi, S.M., Oruthotaarachchi, C.R., Tilakaratna, J.M.P.P., Caldera, H.A.: Usage of association rules and classification techniques in knowledge extraction of diabetes. In: Proceedings of the 6th International Conference on Advanced Information Management and Service, pp. 372–377 (2010)
Google Scholar
Rajasekar, U., Weng, Q.: Application of association rule mining for exploring the relationship between urban land surface temperature and biophysical/social parameters. Photogram. Eng. Remote Sens. 75(3), 385–396 (2009)
Article Google Scholar
Changguo, Y., Nianzhong, W., Tailei, W., Qin, Z., Xiaorong, Z.: The research on the application of association rules mining algorithm in network intrusion detection. In: Proceedings of the 1st International Workshop on Education Technology and Computer Science, vol. 2, pp. 849–852 (2009)
Google Scholar
Koh, Y.S., Pears, R.: Rare association rule mining via transaction clustering. In: 7th Australasian Data Mining Conference. CRPIT, vol. 87, pp. 87–94 (2008)
Google Scholar
Maquee, A., Shojaie, A.A., Mosaddar, D.: Clustering and association rules in analyzing the efficiency of maintenance system of an urban bus network. Int. J. Syst. Assur. Eng. Manage. 3(3), 175–183 (2012)
Article Google Scholar
Farajian, M.A., Mohammadi, S.: Mining the banking customer behavior using clustering and association rules methods. Int. J. Ind. Eng. Prod. Res. 21(4), 239–245 (2010)
Google Scholar
Fan, L.: Research on classification mining method of frequent itemset. J. Convergence Inf. Technol. 5(8), 71–77 (2010)
Article Google Scholar
Plasse, M., Niang, N., Saporta, G., Villeminot, A., Leblond, L.: Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set. Comput. Stat. Data Anal. 52(1), 596–613 (2007)
Article MATH MathSciNet Google Scholar
de Carvalho, V.O., dos Santos, F.F., Rezende, S.O.: Metrics to support the evaluation of association rule clustering. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2013. LNCS, vol. 8057, pp. 248–259. Springer, Heidelberg (2013)
Chapter Google Scholar
Aggarwal, C.C., Procopiuc, C., Yu, P.S.: Finding localized associations in market basket data. IEEE Trans. Knowl. Data Eng. 14(1), 51–62 (2002)
Article Google Scholar
Wang, K., Xu, C., Liu, B.: Clustering transactions using large items. In: 8th International Conference on Information and Knowledge Management, pp. 483–490 (1999)
Google Scholar
Yun, C.-H., Chuang, K.-T., Chen, M.-S.: An efficient clustering algorithm for market basket data based on small large ratios. In: 25th International Computer Software and Applications Conference on Invigorating Software Development, pp. 505–510 (2001)
Google Scholar
Wang, J., Karypis, G.: Summary: efficiently summarizing transactions for clustering. In: 4th IEEE International Conference on Data Mining, pp. 241–248 (2004)
Google Scholar
Yang, L.: Pruning and visualizing generalized association rules in parallel coordinates. IEEE Trans. Knowl. Data Eng. 17(1), 60–70 (2005)
Article Google Scholar
D’Enza, A.I., Palumbo, F., Greenacre, M.: Exploratory data analysis leading towards the most interesting binary association rules. In: 11th Symposium on Applied Stochastic Models and Data Analysis, pp. 256–265 (2005)
Google Scholar
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)
Article Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2/3), 107–145 (2001)
Article MATH Google Scholar
Carvalho, V.O., Biondi, D.S., Santos, F.F., Rezende, S.O.: Labeling methods for association rule clustering. In: Proceedings of the 14th International Conference on Enterprise Information Systems, pp. 105–109 (2012)
Google Scholar
Padua, R., Carvalho, V.O., Serapião, A.B.S.: Labeling association rule clustering through a genetic algorithm approach. In: Proceedings of the 17th East European Conference on Advances in Databases and Information Systems, pp. 45–52 (2013)
Google Scholar
Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Inf. Syst. 29(4), 293–313 (2004)
Article Google Scholar
Xu, R., Wunsch, D.: Clustering. Computational Intelligence. IEEE Press/Wiley, New York (2008)
Book Google Scholar
Carvalho, V.O., Santos, F.F., Rezende, S.O., Padua, R.: PAR-COM: a new methodology for post-processing association rules. Lect. Notes Bus. Inf. Process. 102, 66–80 (2012)
Article Google Scholar
Carvalho, V.O., Santos, F.F., Rezende, S.O.: Post-processing association rules with clustering and objective measures. In: Proceedings of 13th International Conference on Enterprise Information Systems, vol. 1, pp. 54–63 (2011)
Google Scholar

Download references

Acknowledgments

We wish to thank Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) (processes numbers: 2010/07879-0 and 2011/19850-9) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) (process number DS-6345378/D) for the financial support. Besides, we also want to thank the reviewers for the great contributions.

Author information

Authors and Affiliations

Instituto de Geociências e Ciências Exatas, UNESP - Universidade Estadual Paulista, Rio Claro, Brazil
Veronica Oliveira de Carvalho
Instituto de Ciências Matemáticas e de Computação, USP - Universidade de São Paulo, São Carlos, Brazil
Fabiano Fernandes dos Santos & Solange Oliveira Rezende

Authors

Veronica Oliveira de Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Fabiano Fernandes dos Santos
View author publications
You can also search for this author in PubMed Google Scholar
Solange Oliveira Rezende
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Veronica Oliveira de Carvalho .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Josef Küng
FAW, University of Linz, Linz, Austria
Roland Wagner
LIAS/ISAE-ENSMA, Chasseneuil-du-Poitou, France
Ladjel Bellatreche
IBM India Research Lab, New Delhi, India
Mukesh Mohania

Appendix: Questionnaire

Introduction. Many issues related to association rule mining have received attention in the last years, especially the ones aiming to discover and facilitate the search for the interesting patterns of the domain. One approach related to this issue is the application of clustering in the pre-process step. In this case, as noticed in the figure below, data are initially grouped in \(n\) groups (\(GD_1\),\(GD_2\),...,\(GD_n\)). From this initial clustering, the rules are then extracted within each group (cluster), obtaining \(n\) groups of rules (\(GR_1\),\(GR_2\),...,\(GR_n\)). The aim is to obtain potentially interesting rules that would not be extracted from unpartitioned data sets, for not having enough support, without overloading the user with a great amount of patterns. The user must set the minimum support to a low value to discover these same patterns from unpartitioned data sets, causing a rapidly increase in the number of rules. Thereby, data are initially split and the rules are extracted within each group, in a manner that each group expresses its own associations without the interference of the other groups that contain different association patterns. Distinct methodologies have been proposed to enable this process. Each methodology uses a different combination of clustering algorithms and similarity measures in order to obtain the groups of rules.

It is in this context that this evaluation should be done. Some scenarios that can occur in this scope are shown below, waiting for your contribution for a better understanding of the problem. In all the cases, it is assumed that two rule sets are available, in order to evaluate the presented scenarios: one extracted through traditional process, RsT^{Footnote 7}, and one extracted through clustering (process above described), RsP^{Footnote 8} – the examples presented below are merely illustrations of the scenarios and, therefore, should not be evaluated considering the knowledge they express. Based on this evaluation, the aim is to propose an assessment procedure to support the analysis of the existing methodologies.

Scenarios

1.
In your opinion, observing “Scenario-A” (Table 6), how do you consider the occurrence of rules obtained in RsT in RsP (cases in green and orange)? Both the cases, green and orange, represent rules obtained in both of the sets, but the rules in orange are extracted more than once in RsP over the groups. If needed to distinguish the green cases of the orange cases, please let it indicated.

( ) desirable ( ) indifferent ( ) no desirable

a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

( ) yes ( ) no

b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?
2.
In your opinion, observing “Scenario-A” (Table 6), how do you consider the non-occurrence of rules obtained in RsP in RsT (cases in purple and red)? Both the cases, purple and red, represent rules obtained only in RsP, but the rules in red are extracted more than once in RsP over the groups. If needed to distinguish the purple cases of the red cases, please let it indicated.

( ) desirable ( ) indifferent ( ) no desirable

a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

( ) yes ( ) no

b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?

Table 6. Scenario-A. This scenario was formulated based on the Sup data set described in Sect. 5. In this scenario the rules in RsP are presented in their own clusters since the aim here is to detach to the user the situations that can occur among the groups (repetitions of rules) and between RsT and RsP (occurrence/non-occurrence of rules between the sets).

Full size table

For questions “3” to “6”, consider that for each rule set, RsP and RsT, it is shown only the subset related to the \(n\) most interesting rules of the domain. These subsets can be identified, for example, automatically, based on a set of objective measures – assuming that objective measures are suitable to find the most interesting knowledge of a given domain.

3.
In your opinion, observing “Scenario-B” (Table 7), how do you consider the non-occurrence of some (or none) of the \(n\) most interesting rules in RsP in RsT (cases in blue)? Notice that the blue rules belong only to the RsP set.

( ) desirable ( ) indifferent ( ) no desirable

a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

( ) yes ( ) no

b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?
4.
In your opinion, observing “Scenario-B” (Table 7), how do you consider the reverse scenario? This is, the non-occurrence of some (or none) of the \(n\) most interesting rules in RsT in RsP (cases in orange)? Notice that the orange rules belong only to the RsT set.

( ) desirable ( ) indifferent ( ) no desirable

a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

( ) yes ( ) no

b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?
5.
In your opinion, observing “Scenario-B” (Table 7), how do you consider the existing intersection between the \(n\) most interesting rules in RsP and the \(n\) most interesting rules in RsT (cases in red)?

( ) desirable ( ) indifferent ( ) no desirable

a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

( ) yes ( ) no

b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?
6.
In your opinion, how do you would consider the spread of the \(n\) most interesting rules in RsP in a small number of clusters?

( ) desirable ( ) indifferent ( ) no desirable

a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

( ) yes ( ) no

b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?

Table 7. Scenario-B. This scenario was formulated based on the Sup data set described in Sect. 5. In this scenario the rules in RsP are presented all together since, in this case, only the \(n\) most interesting rules in the set are exhibited to the user, independently of the group they were extracted – the aim here is to detach to the user the situations that can occur between the subsets containing the \(n\) most interesting rules.

Full size table

7.
In your opinion, do you consider that the amount of rules to be extracted through clustering, compared to the traditional process, should be:

( ) low ( ) average ( ) high

a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

( ) yes ( ) no

b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?
8.
In your opinion, only in relation to RsP, do you consider that the clustering process should, as a consequence, enable each cluster to express a distinct topic of the domain?

( ) yes ( ) indifferent ( ) no

a. Do you think important to consider this scenario in an assessment procedure to be used in the presented context?

( ) yes ( ) no

b. Would you like to make any comment about the scenario (advantage, disadvantage, etc.)?
9.
Can you identify other scenario(s), not previously explored, that can be relevant to the presented context? Give an example of the scenario(s) that you identified.

a. Do you think important to consider this(these) scenario(s) in an assessment procedure to be used in the presented context?

( ) yes ( ) no
10.
If you want to leave any comment/observation, please do it below.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

de Carvalho, V.O., dos Santos, F.F., Rezende, S.O. (2015). Metrics for Association Rule Clustering Assessment. In: Hameurlain, A., Küng, J., Wagner, R., Bellatreche, L., Mohania, M. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XVII. Lecture Notes in Computer Science(), vol 8970. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46335-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-662-46335-2_5
Published: 30 January 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46334-5
Online ISBN: 978-3-662-46335-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Metrics for Association Rule Clustering Assessment

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Questionnaire

Appendix: Questionnaire

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation