Abstract
Besides the need for more advanced predictive methods, there is increasing demand for easily interpretable results. Couples of enhanced association rules (a generalization of association rules/apriori/frequent itemsets) are excellent candidates for this task. They can be interpreted in various ways, subgroup discovery being an example. A typical result in rule mining is that there are too low or too many rules in the resulting ruleset. Analysts must usually iterate 5–15 times to get a reasonable number of rules. Inspired by research in a similar area of frequent itemsets to simplify input and parameter-free frequent itemsets, we have proposed a novel algorithm that finds rules based not on parameters like support and confidence but the best rules by a given range of required rule count in output. We propose this algorithm for couples of rules – SD4ft-Miner procedure and benefits from a brand new implementation of methods of mechanizing hypothesis formation in Python called Cleverminer that allows easy implementation of this algorithm. We have verified the algorithm by several applications on eight public data sets. Our original case was a case study, and it was also the reason why we developed the algorithm. However, implementation is in Python, and the algorithm itself can be used on a broader class of methods in any language. The algorithm iterates quickly, in all experiments we needed a maximum of 10 iterations. Possible enhancements to this algorithm are also outlined.
Similar content being viewed by others
Data Availability
Dataset used is a publicly accessible dataset referenced in the manuscript. The repository with detailed results and source code enabling replication of experiments is also publicly available.
Notes
we use terms rule and SD4ft-rule interchangeably as this article is about SD4ft-Miner and rules it finds
In the proposed algorithm, the maximum number of iterations is the parameter and is set to 100.
no specification means subset 1–1 for nominal attributes and sequence 1–1 for ordinal attributes (note that from the rule-mining point of view, these two definitions are equivalent)
Rules are displayed in order of how they are returned from CleverMiner package ver. 1.0.2, as this version currently does not provide how to order them and is supposed to do so by manual work in postprocessing.
attributes without specification are again subsets 1–1 or sequences 1–1
References
Agrawal, R., & Srikant, R.(1994). Fast algorithms for mining association rules in large databases. In 20th International conference on very large data bases, (pp. 487–499). San Francisco: VLDB ’94, Morgan Kaufmann Publishers Inc. http://dl.acm.org/citation.cfm?id=645920.672836
Agrawal, R., Imielinski, T., & Swami, A. N. (1993). Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International conference on management of data, Washington, DC, USA, May 26-28, (pp. 207–216). https://doi.org/10.1145/170035.170072.
Aqra, I., Herawan, T., Abdul Ghani, N., Akhunzada, A., Ali, A., Bin Razali, R., Ilahi, M., & Raymond Choo, K. K. (2018). A novel association rule mining approach using tid intermediate itemset. PLOS ONE, 13(1), 1–32. https://doi.org/10.1371/journal.pone.0179703
Atzmueller, M. (2015). Subgroup discovery. WIREs Data Mining and Knowledge Discovery, 5(1), 35–49. https://doi.org/10.1002/widm.1144
Hahsler, M. (2023). ARULESPY: Exploring Association Rules and Frequent Itemsets in Python
BigML (2023) BigML – Machine learning platform. https://bigml.com/. Accessed: 08 Dec 2023.
Boley, M., Goldsmith, B.R., Ghiringhelli, L.M., & Vreeken, J. (2017). Identifying consistent statements about numerical data with dispersion-corrected subgroup discovery. arXiv:1701.07696.
Dardzinska, A. (2013). Action rules mining. In Studies in Computational Intelligence, (vol. 468). Springer. https://doi.org/10.1007/978-3-642-35650-6.
Dong, G., & Bailey, J. (2012). Contrast Data Mining: Concepts, Algorithms, and Applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, Taylor & Francis. https://books.google.cz/books?id=_uxNRbzNdfAC
Dua, D., & Graff, C. (2017). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Egho, E., Gay, D., Boullé, M., Voisine, N., & Clérot, F.: A parameter-free approach for mining robust sequential classification rules. In 2015 IEEE International Conference on Data Mining, (pp. 745–750). https://doi.org/10.1109/ICDM.2015.87.
Fürnkranz, J., & Kliegr, T. (2015). A brief overview of rule learning. In N. Bassiliades, G. Gottlob, F. Sadri, A. Paschke, & D. Roman (Eds.), Rule Technologies: Foundations, Tools, and Applications - 9th International Symposium, RuleML 2015, Berlin, Germany, August 2-5, 2015, Proceedings. Lecture Notes in Computer Science, (vol. 9202, pp. 54–69). Springer. https://doi.org/10.1007/978-3-319-21542-6_4.
Grzymala-Busse, J.W., & Ziarko, W. (2009). Rough sets and data mining. In: J. Wang (Ed.), Encyclopedia of data warehousing and mining, (2nd ed., vol. 4, pp. 1696–1701). IGI Global. http://www.igi-global.com/Bookstore/Chapter.aspx?TitleId=11046
Hahsler, M., Chelluboina, S., Hornik, K., & Buchta, C. (2011a). The arules r-package ecosystem: Analyzing interesting patterns from large transaction data sets. Journal of Machine Learning Research,12(57), 2021–2025. http://jmlr.org/papers/v12/hahsler11a.html
Hahsler, M., Chelluboina, S., Hornik, K., & Buchta, C. (2011b). The arules R-package ecosystem: Analyzing interesting patterns from large transaction data sets. Journal of Machine Learning Research,12, 2021–2025. http://dl.acm.org/citation.cfm?id=2021064
Hahsler, M., Gruen, B., Hornik, K., & Buchta, C. (2015). Mining association rules and frequent itemsets. R package version 1.3-1. http://CRAN.R-project.org/package=arules
Hájek, P. (1984). The new version of the GUHA procedure ASSOC. In COMPSTAT 1984, Proceedings in Computational Statistics, (pp. 360–365). https://www.springer.com/gp/book/9783705100077
Hájek, P., & Havránek, T. (1978). Mechanising Hypothesis Formation - Mathematical Foundations for a General Theory. Springer. https://www.springer.com/gp/book/9783540087380.
Hájek, P., Havel, I., & Chytil, M. (1966). The GUHA method of automatic hypotheses determination. Computing, 1(4), 293–308. https://doi.org/10.1007/BF02345483
Hájek, P., Holeňa, M., & Rauch, J. (2010). The GUHA method and its meaning for data mining. Journal of Computer Systems Science, 76(1), 34–48. https://doi.org/10.1016/j.jcss.2009.05.004
Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. SIGMOD Rec., 29(2), 1–12. https://doi.org/10.1145/335191.335372
Havránek, T. (1981). The present state of the GUHA software. International Journal of Man-Machine Studies, 15(3), 253–264. https://doi.org/10.1016/S0020-7373(81)80009-0. https://www.sciencedirect.com/science/article/pii/S0020737381800090
Havránek, T., Chyba, M., & Pokorný, D. (1977). Processing sociological data by the GUHA method - an example. International Journal of Man-Machine Studies, 9(4), 439–447. https://doi.org/10.1016/S0020-7373(77)80012-6
Herrera, F., Carmona, C. J., González, P., & del Jesús, M. J. (2011). An overview on subgroup discovery: foundations and applications. Knowledge Information Systems, 29(3), 495–525. https://doi.org/10.1007/s10115-010-0356-2
Kleene, S. C. (1952). Introduction to Metamathematics. Van Nostrand.
Kliegr, T., Kuchar, J., Vojír, S., & Zeman, V. (2017) Easyminer - short history of research and current development. In J. Hlavácová (Ed.), Proceedings of the 17th Conference on Information Technologies - Applications and Theory (ITAT 2017), Martinské hole, Slovakia, September 22-26, 2017. CEUR Workshop Proceedings, (vol. 1885, pp. 235–239). CEUR-WS.org. https://ceur-ws.org/Vol-1885/235.pdf
Li, G., Wang, T., Chen, Q., Shao, P., Xiong, N., & Vasilakos, A. (2022). A survey on particle swarm optimization for association rule mining. Electronics,11(19). https://doi.org/10.3390/electronics11193044. https://www.mdpi.com/2079-9292/11/19/3044.
Máša, P., & Rauch, J. (2022) Enhanced association rules and python. In G. Nicosia, V. Ojha, E. L. Malfa, G. L. Malfa, P. M. Pardalos, G. D. Fatta, G. Giuffrida, & R. Umeton (Eds.) Machine Learning, Optimization, and Data Science - 8th International Workshop, LOD 2022, Certosa di Pontignano, Italy, September 19-22, 2022, Revised Selected Papers, Part II. Lecture Notes in Computer Science, (vol. 13811, pp. 123–138). Springer. https://doi.org/10.1007/978-3-031-25891-6_10
Máša, P., Rauch, J. (2022). GUHA method and Python language. In Proceedings of the 12th Workshop on Uncertainty Processing, (pp. 147–158). MatfyzPress. http://wupes.utia.cas.cz/2022/Proceedings.pdf.
Nguyen, H. S., & Nguyen, S. H. (1999). Rough sets and association rule generation. Fundamentals Informaticae, 40(4), 383–405. https://doi.org/10.3233/FI-1999-40403
Nie, Y., Luo, X., & Yu, Y. (2023). A data-driven knowledge discovery framework for smart education management using behavioral characteristics. IEEE Access, 11, 72562–72574. https://doi.org/10.1109/ACCESS.2023.3295239
Pawlak, Z. (1991). Rough sets - theoretical aspects of reasoning about data, Theory and decision library: series D, (vol. 9). Kluwer. https://doi.org/10.1007/978-94-011-3534-4.
Powell, L., Gelich, A., & Ras, Z.W. (2020). The construction of action rules to raise artwork prices. In Foundations of Intelligent Systems - 25th International Symposium, ISMIS 2020. Lecture Notes in Computer Science, (vol. 12117, pp. 11–20). Springer. https://doi.org/10.1007/978-3-030-59491-6_2.
Rafea, A.A., Shaalan, K., & Shafik, S. (2004). An interactive system for association rule discovery for life assurance. In H. Chu (Ed.), Proceedings of the 2nd International Conference Computing, Communications and Control Technologies, CCCT 2004, Austin, TX, USA, August 14-17, 2004, (vol. 1, pp. 32–37). The International Institute of Informatics and Systemics (IIIS).
Ras, Z. W., & Wieczorkowska, A. (2000) Action-rules: How to increase profit of a company. In D. A. Zighed, H. J. Komorowski, J. M. Zytkow (Eds.) Principles of Data Mining and Knowledge Discovery, 4th European Conference, PKDD 2000, Lyon, France, September 13-16, 2000, Proceedings. Lecture Notes in Computer Science, (vol. 1910, pp. 587–592). Springer. https://doi.org/10.1007/3-540-45372-5_70.
Rauch, J. (2013). Observational Calculi and Association Rules, Studies in Computational Intelligence, (vol. 469). Springer. https://doi.org/10.1007/978-3-642-11737-4
Rauch, J., & Šimunek, M. (2005). An alternative approach to mining association rules. In Foundations of Data Mining and Knowledge Discovery, (pp. 211–231). Springer. https://www.researchgate.net/publication/225673818_An_Alternative_Approach_to_Mining_Association_Rules
Rauch, J., Šimunek, M., Chudán, D., & Máša, P. (2022). Mechanising hypothesis formation - principles and case studies. CRC Press. https://www.routledge.com/Mechanizing-Hypothesis-Formation-Principles-and-Case-Studies/Rauch-Simunek-Chudan-Masa/p/book/9780367549800#
Rauch, J. (1978). Some remarks on computer realizations of GUHA procedures. International Journal of Man-Machine Studies, 10(1), 23–28. https://doi.org/10.1016/S0020-7373(78)80032-7
Rauch, J. (2012). Everminer: consideration on knowledge driven permanent data mining process. International Journal of Data Mining, Modelling and Management, 4(3), 224–243. https://doi.org/10.1504/IJDMMM.2012.048105
Rauch, J. (2019). Expert deduction rules in data mining with association rules: a case study. Knowledge and Information Systems, 59(1), 167–195. https://doi.org/10.1007/s10115-018-1206-x
Rauch, J., & Šimunek, M. (2017). Apriori and GUHA - comparing two approaches to data mining with association rules. Intelligent Data Analysis, 21(4), 981–1013. https://doi.org/10.3233/IDA-160069
Renc, Z., Kubát, K., & Kouřim, J. (1978). An application of the GUHA method in medicine. International Journal of Man-Machine Studies, 10(1), 29–35. https://doi.org/10.1016/S0020-7373(78)80033-9
Sikora, M., Wróbel, L., & Gudys, A. (2019). Guider: A guided separate-and-conquer rule learning in classification, regression, and survival settings. Knowledge Based Systems, 173, 1–14. https://doi.org/10.1016/j.knosys.2019.02.019
Šimunek, M., & Rauch, J. (2014). EverMiner prototype using LISp-Miner Control Language. In T. Andreasen, H. Christiansen, J. C. C. Talavera, Z. W. Ras (Eds.) Foundations of Intelligent Systems - 21st International Symposium, ISMIS 2014, Roskilde, Denmark, June 25-27, 2014. Proceedings. Lecture Notes in Computer Science, (vol. 8502, pp. 113–122). Springer. https://doi.org/10.1007/978-3-319-08326-1_12.
Singaram, S., & Jeyakarthic, M. (2019). Parameter-free algorithm for mining rare association rules. International Journal of Computer Sciences and Engineering, 7, 40–46.
Turunen, E., & Dolos, K. (2021). Revealing drivers natural behavior – a GUHA data mining approach. Mathematics,9(15). https://doi.org/10.3390/math9151818.
Turunen, E. (2017). Using GUHA data mining method in analyzing road traffic accidents occurred in the years 2004–2008 in Finland. Data Science Engineering, 2(3), 224–231. https://doi.org/10.1007/s41019-017-0044-2
Zaki, M.J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, (pp. 283–286). KDD’97, AAAI Press.
Zorrilla, M. E., García-Saiz, D., & Balcázar, J. L. (2011). Towards parameter-free data mining: Mining educational data with yacaree. In Educational Data Mining
Acknowledgements
Not Applicable.
Funding
No funding received. Authors received no financial support for the research and the authorship of this manuscript.
Author information
Authors and Affiliations
Contributions
Petr Máša: Algorithm described in Section 5, design of the system of analytic tasks, running analytic tasks, related work sections, editing, repository; Jan Rauch: design of the system of analytic tasks, related work sections, editing.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare no competing interests.
Ethics approval and consent to participate
Not Applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Máša, P., Rauch, J. A novel algorithm for mining couples of enhanced association rules based on the number of output couples and its application. J Intell Inf Syst (2023). https://doi.org/10.1007/s10844-023-00820-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10844-023-00820-1