A novel algorithm for mining couples of enhanced association rules based on the number of output couples and its application

Máša, Petr; Rauch, Jan

doi:10.1007/s10844-023-00820-1

A novel algorithm for mining couples of enhanced association rules based on the number of output couples and its application

Research
Published: 01 November 2023

(2023)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

71 Accesses
Explore all metrics

Abstract

Besides the need for more advanced predictive methods, there is increasing demand for easily interpretable results. Couples of enhanced association rules (a generalization of association rules/apriori/frequent itemsets) are excellent candidates for this task. They can be interpreted in various ways, subgroup discovery being an example. A typical result in rule mining is that there are too low or too many rules in the resulting ruleset. Analysts must usually iterate 5–15 times to get a reasonable number of rules. Inspired by research in a similar area of frequent itemsets to simplify input and parameter-free frequent itemsets, we have proposed a novel algorithm that finds rules based not on parameters like support and confidence but the best rules by a given range of required rule count in output. We propose this algorithm for couples of rules – SD4ft-Miner procedure and benefits from a brand new implementation of methods of mechanizing hypothesis formation in Python called Cleverminer that allows easy implementation of this algorithm. We have verified the algorithm by several applications on eight public data sets. Our original case was a case study, and it was also the reason why we developed the algorithm. However, implementation is in Python, and the algorithm itself can be used on a broader class of methods in any language. The algorithm iterates quickly, in all experiments we needed a maximum of 10 iterations. Possible enhancements to this algorithm are also outlined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 4

A Comparative Analysis of Algorithms for Mining Frequent Itemsets

Enhanced Association Rules and Python

Significant Association Rule Mining Without Support and Confidence Thresholds

Data Availability

Dataset used is a publicly accessible dataset referenced in the manuscript. The repository with detailed results and source code enabling replication of experiments is also publicly available.

Notes

we use terms rule and SD4ft-rule interchangeably as this article is about SD4ft-Miner and rules it finds
In the proposed algorithm, the maximum number of iterations is the parameter and is set to 100.
no specification means subset 1–1 for nominal attributes and sequence 1–1 for ordinal attributes (note that from the rule-mining point of view, these two definitions are equivalent)
Rules are displayed in order of how they are returned from CleverMiner package ver. 1.0.2, as this version currently does not provide how to order them and is supposed to do so by manual work in postprocessing.
attributes without specification are again subsets 1–1 or sequences 1–1

References

Agrawal, R., & Srikant, R.(1994). Fast algorithms for mining association rules in large databases. In 20th International conference on very large data bases, (pp. 487–499). San Francisco: VLDB ’94, Morgan Kaufmann Publishers Inc. http://dl.acm.org/citation.cfm?id=645920.672836
Agrawal, R., Imielinski, T., & Swami, A. N. (1993). Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International conference on management of data, Washington, DC, USA, May 26-28, (pp. 207–216). https://doi.org/10.1145/170035.170072.
Aqra, I., Herawan, T., Abdul Ghani, N., Akhunzada, A., Ali, A., Bin Razali, R., Ilahi, M., & Raymond Choo, K. K. (2018). A novel association rule mining approach using tid intermediate itemset. PLOS ONE, 13(1), 1–32. https://doi.org/10.1371/journal.pone.0179703
Article Google Scholar
Atzmueller, M. (2015). Subgroup discovery. WIREs Data Mining and Knowledge Discovery, 5(1), 35–49. https://doi.org/10.1002/widm.1144
Article Google Scholar
Hahsler, M. (2023). ARULESPY: Exploring Association Rules and Frequent Itemsets in Python
BigML (2023) BigML – Machine learning platform. https://bigml.com/. Accessed: 08 Dec 2023.
Boley, M., Goldsmith, B.R., Ghiringhelli, L.M., & Vreeken, J. (2017). Identifying consistent statements about numerical data with dispersion-corrected subgroup discovery. arXiv:1701.07696.
Dardzinska, A. (2013). Action rules mining. In Studies in Computational Intelligence, (vol. 468). Springer. https://doi.org/10.1007/978-3-642-35650-6.
Dong, G., & Bailey, J. (2012). Contrast Data Mining: Concepts, Algorithms, and Applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, Taylor & Francis. https://books.google.cz/books?id=_uxNRbzNdfAC
Dua, D., & Graff, C. (2017). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Egho, E., Gay, D., Boullé, M., Voisine, N., & Clérot, F.: A parameter-free approach for mining robust sequential classification rules. In 2015 IEEE International Conference on Data Mining, (pp. 745–750). https://doi.org/10.1109/ICDM.2015.87.
Fürnkranz, J., & Kliegr, T. (2015). A brief overview of rule learning. In N. Bassiliades, G. Gottlob, F. Sadri, A. Paschke, & D. Roman (Eds.), Rule Technologies: Foundations, Tools, and Applications - 9th International Symposium, RuleML 2015, Berlin, Germany, August 2-5, 2015, Proceedings. Lecture Notes in Computer Science, (vol. 9202, pp. 54–69). Springer. https://doi.org/10.1007/978-3-319-21542-6_4.
Grzymala-Busse, J.W., & Ziarko, W. (2009). Rough sets and data mining. In: J. Wang (Ed.), Encyclopedia of data warehousing and mining, (2nd ed., vol. 4, pp. 1696–1701). IGI Global. http://www.igi-global.com/Bookstore/Chapter.aspx?TitleId=11046
Hahsler, M., Chelluboina, S., Hornik, K., & Buchta, C. (2011a). The arules r-package ecosystem: Analyzing interesting patterns from large transaction data sets. Journal of Machine Learning Research,12(57), 2021–2025. http://jmlr.org/papers/v12/hahsler11a.html
Hahsler, M., Chelluboina, S., Hornik, K., & Buchta, C. (2011b). The arules R-package ecosystem: Analyzing interesting patterns from large transaction data sets. Journal of Machine Learning Research,12, 2021–2025. http://dl.acm.org/citation.cfm?id=2021064
Hahsler, M., Gruen, B., Hornik, K., & Buchta, C. (2015). Mining association rules and frequent itemsets. R package version 1.3-1. http://CRAN.R-project.org/package=arules
Hájek, P. (1984). The new version of the GUHA procedure ASSOC. In COMPSTAT 1984, Proceedings in Computational Statistics, (pp. 360–365). https://www.springer.com/gp/book/9783705100077
Hájek, P., & Havránek, T. (1978). Mechanising Hypothesis Formation - Mathematical Foundations for a General Theory. Springer. https://www.springer.com/gp/book/9783540087380.
Hájek, P., Havel, I., & Chytil, M. (1966). The GUHA method of automatic hypotheses determination. Computing, 1(4), 293–308. https://doi.org/10.1007/BF02345483
Article MATH Google Scholar
Hájek, P., Holeňa, M., & Rauch, J. (2010). The GUHA method and its meaning for data mining. Journal of Computer Systems Science, 76(1), 34–48. https://doi.org/10.1016/j.jcss.2009.05.004
Article MathSciNet MATH Google Scholar
Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. SIGMOD Rec., 29(2), 1–12. https://doi.org/10.1145/335191.335372
Article Google Scholar
Havránek, T. (1981). The present state of the GUHA software. International Journal of Man-Machine Studies, 15(3), 253–264. https://doi.org/10.1016/S0020-7373(81)80009-0. https://www.sciencedirect.com/science/article/pii/S0020737381800090
Havránek, T., Chyba, M., & Pokorný, D. (1977). Processing sociological data by the GUHA method - an example. International Journal of Man-Machine Studies, 9(4), 439–447. https://doi.org/10.1016/S0020-7373(77)80012-6
Article Google Scholar
Herrera, F., Carmona, C. J., González, P., & del Jesús, M. J. (2011). An overview on subgroup discovery: foundations and applications. Knowledge Information Systems, 29(3), 495–525. https://doi.org/10.1007/s10115-010-0356-2
Article Google Scholar
Kleene, S. C. (1952). Introduction to Metamathematics. Van Nostrand.
MATH Google Scholar
Kliegr, T., Kuchar, J., Vojír, S., & Zeman, V. (2017) Easyminer - short history of research and current development. In J. Hlavácová (Ed.), Proceedings of the 17th Conference on Information Technologies - Applications and Theory (ITAT 2017), Martinské hole, Slovakia, September 22-26, 2017. CEUR Workshop Proceedings, (vol. 1885, pp. 235–239). CEUR-WS.org. https://ceur-ws.org/Vol-1885/235.pdf
Li, G., Wang, T., Chen, Q., Shao, P., Xiong, N., & Vasilakos, A. (2022). A survey on particle swarm optimization for association rule mining. Electronics,11(19). https://doi.org/10.3390/electronics11193044. https://www.mdpi.com/2079-9292/11/19/3044.
Máša, P., & Rauch, J. (2022) Enhanced association rules and python. In G. Nicosia, V. Ojha, E. L. Malfa, G. L. Malfa, P. M. Pardalos, G. D. Fatta, G. Giuffrida, & R. Umeton (Eds.) Machine Learning, Optimization, and Data Science - 8th International Workshop, LOD 2022, Certosa di Pontignano, Italy, September 19-22, 2022, Revised Selected Papers, Part II. Lecture Notes in Computer Science, (vol. 13811, pp. 123–138). Springer. https://doi.org/10.1007/978-3-031-25891-6_10
Máša, P., Rauch, J. (2022). GUHA method and Python language. In Proceedings of the 12th Workshop on Uncertainty Processing, (pp. 147–158). MatfyzPress. http://wupes.utia.cas.cz/2022/Proceedings.pdf.
Nguyen, H. S., & Nguyen, S. H. (1999). Rough sets and association rule generation. Fundamentals Informaticae, 40(4), 383–405. https://doi.org/10.3233/FI-1999-40403
Article MathSciNet MATH Google Scholar
Nie, Y., Luo, X., & Yu, Y. (2023). A data-driven knowledge discovery framework for smart education management using behavioral characteristics. IEEE Access, 11, 72562–72574. https://doi.org/10.1109/ACCESS.2023.3295239
Article Google Scholar
Pawlak, Z. (1991). Rough sets - theoretical aspects of reasoning about data, Theory and decision library: series D, (vol. 9). Kluwer. https://doi.org/10.1007/978-94-011-3534-4.
Powell, L., Gelich, A., & Ras, Z.W. (2020). The construction of action rules to raise artwork prices. In Foundations of Intelligent Systems - 25th International Symposium, ISMIS 2020. Lecture Notes in Computer Science, (vol. 12117, pp. 11–20). Springer. https://doi.org/10.1007/978-3-030-59491-6_2.
Rafea, A.A., Shaalan, K., & Shafik, S. (2004). An interactive system for association rule discovery for life assurance. In H. Chu (Ed.), Proceedings of the 2nd International Conference Computing, Communications and Control Technologies, CCCT 2004, Austin, TX, USA, August 14-17, 2004, (vol. 1, pp. 32–37). The International Institute of Informatics and Systemics (IIIS).
Ras, Z. W., & Wieczorkowska, A. (2000) Action-rules: How to increase profit of a company. In D. A. Zighed, H. J. Komorowski, J. M. Zytkow (Eds.) Principles of Data Mining and Knowledge Discovery, 4th European Conference, PKDD 2000, Lyon, France, September 13-16, 2000, Proceedings. Lecture Notes in Computer Science, (vol. 1910, pp. 587–592). Springer. https://doi.org/10.1007/3-540-45372-5_70.
Rauch, J. (2013). Observational Calculi and Association Rules, Studies in Computational Intelligence, (vol. 469). Springer. https://doi.org/10.1007/978-3-642-11737-4
Rauch, J., & Šimunek, M. (2005). An alternative approach to mining association rules. In Foundations of Data Mining and Knowledge Discovery, (pp. 211–231). Springer. https://www.researchgate.net/publication/225673818_An_Alternative_Approach_to_Mining_Association_Rules
Rauch, J., Šimunek, M., Chudán, D., & Máša, P. (2022). Mechanising hypothesis formation - principles and case studies. CRC Press. https://www.routledge.com/Mechanizing-Hypothesis-Formation-Principles-and-Case-Studies/Rauch-Simunek-Chudan-Masa/p/book/9780367549800#
Rauch, J. (1978). Some remarks on computer realizations of GUHA procedures. International Journal of Man-Machine Studies, 10(1), 23–28. https://doi.org/10.1016/S0020-7373(78)80032-7
Article Google Scholar
Rauch, J. (2012). Everminer: consideration on knowledge driven permanent data mining process. International Journal of Data Mining, Modelling and Management, 4(3), 224–243. https://doi.org/10.1504/IJDMMM.2012.048105
Article Google Scholar
Rauch, J. (2019). Expert deduction rules in data mining with association rules: a case study. Knowledge and Information Systems, 59(1), 167–195. https://doi.org/10.1007/s10115-018-1206-x
Article Google Scholar
Rauch, J., & Šimunek, M. (2017). Apriori and GUHA - comparing two approaches to data mining with association rules. Intelligent Data Analysis, 21(4), 981–1013. https://doi.org/10.3233/IDA-160069
Article Google Scholar
Renc, Z., Kubát, K., & Kouřim, J. (1978). An application of the GUHA method in medicine. International Journal of Man-Machine Studies, 10(1), 29–35. https://doi.org/10.1016/S0020-7373(78)80033-9
Article Google Scholar
Sikora, M., Wróbel, L., & Gudys, A. (2019). Guider: A guided separate-and-conquer rule learning in classification, regression, and survival settings. Knowledge Based Systems, 173, 1–14. https://doi.org/10.1016/j.knosys.2019.02.019
Article Google Scholar
Šimunek, M., & Rauch, J. (2014). EverMiner prototype using LISp-Miner Control Language. In T. Andreasen, H. Christiansen, J. C. C. Talavera, Z. W. Ras (Eds.) Foundations of Intelligent Systems - 21st International Symposium, ISMIS 2014, Roskilde, Denmark, June 25-27, 2014. Proceedings. Lecture Notes in Computer Science, (vol. 8502, pp. 113–122). Springer. https://doi.org/10.1007/978-3-319-08326-1_12.
Singaram, S., & Jeyakarthic, M. (2019). Parameter-free algorithm for mining rare association rules. International Journal of Computer Sciences and Engineering, 7, 40–46.
Google Scholar
Turunen, E., & Dolos, K. (2021). Revealing drivers natural behavior – a GUHA data mining approach. Mathematics,9(15). https://doi.org/10.3390/math9151818.
Turunen, E. (2017). Using GUHA data mining method in analyzing road traffic accidents occurred in the years 2004–2008 in Finland. Data Science Engineering, 2(3), 224–231. https://doi.org/10.1007/s41019-017-0044-2
Article Google Scholar
Zaki, M.J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, (pp. 283–286). KDD’97, AAAI Press.
Zorrilla, M. E., García-Saiz, D., & Balcázar, J. L. (2011). Towards parameter-free data mining: Mining educational data with yacaree. In Educational Data Mining

Download references

Acknowledgements

Not Applicable.

Funding

No funding received. Authors received no financial support for the research and the authorship of this manuscript.

Author information

Authors and Affiliations

Prague University of Economics and Business, Prague, Czechia
Petr Máša & Jan Rauch

Authors

Petr Máša
View author publications
You can also search for this author in PubMed Google Scholar
Jan Rauch
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Petr Máša: Algorithm described in Section 5, design of the system of analytic tasks, running analytic tasks, related work sections, editing, repository; Jan Rauch: design of the system of analytic tasks, related work sections, editing.

Corresponding author

Correspondence to Petr Máša.

Ethics declarations

Conflicts of interest

The authors declare no competing interests.

Ethics approval and consent to participate

Not Applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Máša, P., Rauch, J. A novel algorithm for mining couples of enhanced association rules based on the number of output couples and its application. J Intell Inf Syst (2023). https://doi.org/10.1007/s10844-023-00820-1

Download citation

Received: 03 June 2023
Revised: 28 August 2023
Accepted: 28 September 2023
Published: 01 November 2023
DOI: https://doi.org/10.1007/s10844-023-00820-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel algorithm for mining couples of enhanced association rules based on the number of output couples and its application

Abstract

Access this article

Similar content being viewed by others

A Comparative Analysis of Algorithms for Mining Frequent Itemsets

Enhanced Association Rules and Python

Significant Association Rule Mining Without Support and Confidence Thresholds

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethics approval and consent to participate

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel algorithm for mining couples of enhanced association rules based on the number of output couples and its application

Abstract

Access this article

Similar content being viewed by others

A Comparative Analysis of Algorithms for Mining Frequent Itemsets

Enhanced Association Rules and Python

Significant Association Rule Mining Without Support and Confidence Thresholds

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethics approval and consent to participate

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation