Skip to main content
Log in

Building a model to exploit association rules and analyze purchasing behavior based on rough set theory

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In recent years, the information technology industry around the world has grown strong. At the same time, we also face a new challenge with the explosion in the amount of information. Although there is a huge amount of data, the information that we actually have is lacking, and the implications behind the data have not been fully exploited. Scientists have researched new ways to fully exploit the information contained in the database. Since the late 1980s, the concept of knowledge discovery in databases was first mentioned. This is the process of detecting latent, unknown, and useful knowledge in large databases, while overcoming the limitations of traditional database models with only data query tools that cannot find new information, and is information hidden in the database. Knowledge mining in a database is the process of discovering new, useful, and information hidden in a database. Since the early 1980s, Z. Pawlak has proposed the Rough Set theory with a very solid mathematical basis. This theory is practiced by many research groups working in the field of general information technology and exploring knowledge in the database and applied in research. Rough Set theory is more widely applied in the field of knowledge discovery, while being useful in solving problems of data classification and association rules through discovery, and especially useful in problems dealing with ambiguous and uncertain data. Specifically, in theory, the raw set of data is displayed using information systems or tables. With large data tables having imperfect data, redundant data, or continuous data or represented in the form of symbols, the Rough Set theory allows knowledge exploration in databases like this to detect hidden knowledge from these "raw" blocks of data. The found knowledge is expressed in the form of rules and patterns. After finding the most general rules for data representation, one can calculate the strength and dependence between attributes in the information system. In this paper, the authors research the recommendation system, rough set theory, theory of approximation, and fuzzy rough set theory, thereby building a partial model. Software enables users to exploit association rules of their database, thereby facilitating appropriate purchase or import decisions. The system can support user design options of database features, load data from the SQL Server by Apache Spark, and export the statistics to website to be reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

Availability of data and materials

Please contact the corresponding author for data requests. The C# coding and sample rough sets database is available.

References

  1. Fayyad U (1997) Data mining and knowledge discovery in databases: implications for scientific databases. In: Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150), 1997, pp 2–11. https://doi.org/10.1109/SSDM.1997.621141

  2. Garani G, Chernov A, Savvas I, Butakova M (2019) A data warehouse approach for business intelligence. In: 2019 IEEE 28th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), 2019, pp 70–75. https://doi.org/10.1109/WETICE.2019.00022.

  3. Zhang Q, Xie Q, Wang G (2016) A survey on rough set theory and its applications. CAAI Trans Intell Technol 1(4):323–333. https://doi.org/10.1016/j.trit.2016.11.001

    Article  Google Scholar 

  4. Kusiak A (2021) Rough set theory: a data mining tool for semiconductor manufacturing. IEEE Trans Electron Packag Manuf 24(1):44–50. https://doi.org/10.1109/6104.924792

    Article  Google Scholar 

  5. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356. https://doi.org/10.1007/BF01001956

    Article  MATH  Google Scholar 

  6. Patel H, Patel D (2017) Crop prediction framework using rough set theory. Int J Eng Technol 9:2505–2513. https://doi.org/10.21817/ijet/2017/v9i3/1709030266

    Article  Google Scholar 

  7. Grzymala-Busse JW (2005) Rough set theory with applications to data mining. https://doi.org/10.1007/11364160_7

  8. Nair B, Mohandas V, Sakthivel N (2010) A decision tree-rough set hybrid system for stock market trend prediction. Int J Comput Appl. https://doi.org/10.5120/1106-1449

    Article  Google Scholar 

  9. Khanzadi M, Gholamian M (2018) Building a rough sets-based prediction model for classifying large-scale construction projects based on sustainable success index. Eng Constr Archit Manag. https://doi.org/10.1108/ECAM-05-2016-0110

    Article  Google Scholar 

  10. Tiwari S, Pandit R, Richhariya V (2012) Predicting future trends in stock market by decision tree rough-set based hybrid system with HHMM. Int J Electron Comput Sci Eng 1:1578–1587

    Google Scholar 

  11. Talasila V, Madhubabu K, Mahadasyam M, Atchala N, Kande L (2020) The prediction of diseases using rough set theory with recurrent neural network in big data analytics. Int J Intell Eng Syst 13:10–18. https://doi.org/10.22266/ijies2020.1031.02

    Article  Google Scholar 

  12. Isinkaye FO, Folajimi YO, Ojokoh BA (2015) Recommendation systems: principles, methods and evaluation. Egypt Inform J 16(3):261–273. https://doi.org/10.1016/j.eij.2015.06.005

    Article  Google Scholar 

  13. Düntsch I, Gediga G (1998) Uncertainty measures of rough set prediction. Artif Intell 106(1):109–137. https://doi.org/10.1016/S0004-3702(98)00091-5

    Article  MathSciNet  MATH  Google Scholar 

  14. Yu D, Xu Z, Pedrycz W (2020) Bibliometric analysis of rough sets research. Appl Soft Comput 94:1–10. https://doi.org/10.1016/j.asoc.2020.106467

    Article  Google Scholar 

  15. Vidhya KA, Geetha TV (2017) Rough set theory for document clustering: a review. J Intell Fuzzy Syst 32(3):2165–2185. https://doi.org/10.3233/JIFS-162006

    Article  Google Scholar 

  16. Ang KK, Quek C (2005) Stock trading using PSEC and RSPOP: a novel evolving rough set-based neuro-fuzzy approach. In: 2005 IEEE Congress on Evolutionary Computation, vol 2, pp 1032–1039. https://doi.org/10.1109/CEC.2005.1554804

  17. Andhalkar S, Momin BF (2018) Rough set theory and its extended algorithms. In: 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), 2018, pp 1434–1438. https://doi.org/10.1109/ICCONS.2018.8663100

  18. Chaudhuri A, De K, Chatterjee D (2013) Discovering stock price prediction rules of bombay stock exchange using rough fuzzy multi layer perception networks. https://arxiv.org/abs/1307.1895.

  19. Ibedou I, Abbas SE (2020) Fuzzy rough sets with a fuzzy ideal. J Egypt Math Soc 28:1–13. https://doi.org/10.1186/s42787-020-00096-2

    Article  MathSciNet  MATH  Google Scholar 

  20. Rybinski H, Podsiadło M (2015) Application of fuzzy rough sets to financial time series forecasting. https://doi.org/10.1007/978-3-319-19941-2_38

  21. Behmanesh M, Adibi P, Karshenas H (2021) Weighted least squares twin support vector machine with fuzzy rough set theory for imbalanced data classification. https://arxiv.org/abs/2105.01198.

  22. Zhang K, Zhan J, Wu W-Z (2020) Novel fuzzy rough set models and corresponding applications to multi-criteria decision-making. Fuzzy Sets Syst 383:92–126. https://doi.org/10.1016/j.fss.2019.06.019

    Article  MathSciNet  MATH  Google Scholar 

  23. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: SIGMOD Conference. https://doi.org/10.1145/170036.170072

  24. Fan J, Li D (1998) An overview of data mining and knowledge discovery. J Comput Sci Technol 13:348–368. https://doi.org/10.1007/BF02946624

    Article  MATH  Google Scholar 

  25. Huh J-H (2018) Big data analysis for personalized health activities: machine learning processing for automatic keyword extraction approach. Symmetry 10:93. https://doi.org/10.3390/sym10040093

    Article  Google Scholar 

  26. Yingzhuo X, Xuewen W (2021) Research on community consumer behavior based on association rules analysis. In: 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP), pp 1213–1216, https://doi.org/10.1109/ICSP51882.2021.9408917.

  27. Ai D, Pan H, Li X, Gao Y, He D (2018) Association rule mining algorithms on high-dimensional datasets. Artif Life Robot 23:423–427. https://doi.org/10.1007/s10015-018-0437-y

    Article  Google Scholar 

  28. Dhandayudam P, Krishnamurthi I (2013) Customer behavior analysis using rough set approach. J Theor Appl Electron Commerce Res 8:21–33. https://doi.org/10.4067/S0718-18762013000200003

    Article  Google Scholar 

  29. Zhang Y, Zhao Z, Yu J, Wang K (2015) Research on E-commerce consumer behavior prediction based on rough sets. Int J u- e-Serv Sci Technol 8:69–76. https://doi.org/10.14257/ijunesst.2015.8.4.08

    Article  Google Scholar 

  30. Hassan NRS, Ibrahim SFM (2012) Forecasting stock market trends using rough set. 9(1), 1–20. https://doi.org/10.21608/jsfc.2012.26367.

  31. Shaikh E, Mohiuddin I, Alufaisan Y, Nahvi I (2019) Apache Spark: a big data processing engine, pp 1–6. https://doi.org/10.1109/MENACOMM46666.2019.8988541.

  32. Wang F, Wen Y, Guo T, Liu J, Cao B (2020) Collaborative filtering and association rule mining-based market basket recommendation on spark. Concurr Comput Pract Exp. https://doi.org/10.1002/cpe.5565

    Article  Google Scholar 

  33. https://docs.microsoft.com/en-us/dotnet/spark/

  34. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB '94). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 487–499

  35. Sun D, Teng S, Zhang W, Zhu H (2007) An algorithm to improve the effectiveness of apriori, pp 385–390. https://doi.org/10.1109/COGINF.2007.4341914.

  36. Albuquerque L, Roque F, Valente Neto F, Koroiva R, Buss D, Baptista D, Hepp L, Kuhlmann M, Sundar S, Covich A, Pinto J (2021) Large-scale prediction of tropical stream water quality using rough sets theory. Ecol Inform 61:101226. https://doi.org/10.1016/j.ecoinf.2021.101226

    Article  Google Scholar 

  37. Cheng C-H, Chen Y-H, Liu J-W (2009) Classifying Cinnamomums using rough sets classifier based on interval-discretization. Plant Syst Evol 280:89–97. https://doi.org/10.1007/s00606-009-0161-0

    Article  Google Scholar 

  38. Yao Y (2020) Three-way granular computing, rough sets, and formal concept analysis. Int J Approx Reason 116:106–125. https://doi.org/10.1016/j.ijar.2019.11.002

    Article  MathSciNet  MATH  Google Scholar 

  39. Stanczyk U, Zielosko B (2020) Heuristic-based feature selection for rough set approach. Int J Approx Reason 125:187–202. https://doi.org/10.1016/j.ijar.2020.07.005

    Article  MathSciNet  MATH  Google Scholar 

  40. Chelly Dagdia Z, Zarges C, Beck G et al (2020) A scalable and effective rough set theory-based approach for big data pre-processing. Knowl Inf Syst 62:3321–3386. https://doi.org/10.1007/s10115-020-01467-y

    Article  Google Scholar 

  41. Golan RH, Ziarko W (1995) A methodology for stock market analysis utilizing rough set theory. In: Proceedings of 1995 Conference on Computational Intelligence for Financial Engineering (CIFEr), 1995, pp 32–40. https://doi.org/10.1109/CIFER.1995.495230.

  42. Mardani A, Nilashi M, Antucheviciene J, Tavana M, Bausys R, Ibrahim O (2017) Recent fuzzy generalisations of rough sets theory: a systematic review and methodological critique of the literature. Complexity. https://doi.org/10.1155/2017/1608147

    Article  MathSciNet  MATH  Google Scholar 

  43. Novák V (2020) Topology in the alternative set theory and rough sets via fuzzy type theory. Mathematics 8:1–22. https://doi.org/10.3390/math8030432

    Article  Google Scholar 

  44. Ducange P, Fazzolari M, Marcelloni F (2020) An overview of recent distributed algorithms for learning fuzzy models in Big Data classification. J Big Data. https://doi.org/10.1186/s40537-020-00298-6

    Article  Google Scholar 

  45. Chelly Dagdia Z, Zarges C, Beck G, Lebbah M (2020) A scalable and effective rough set theory-based approach for big data pre-processing. Knowl Inf Syst 62:1–66. https://doi.org/10.1007/s10115-020-01467-y

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun-Ho Huh.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tran, D.T., Huh, JH. Building a model to exploit association rules and analyze purchasing behavior based on rough set theory. J Supercomput 78, 11051–11091 (2022). https://doi.org/10.1007/s11227-021-04275-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04275-5

Keywords

Navigation