Skip to main content
Log in

Comprehensive mining of frequent itemsets for a combination of certain and uncertain databases

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

The mechanism of Frequent Itemset Mining can be performed by using sequential algorithms like Apriori on a standalone system, or it can be applied using parallel algorithms like Count Distribution on a distributed system. Due to communication overhead in parallel algorithms and exponential candidate generation, many algorithms were developed for calculating frequent items either over the certain or uncertain database. Yet not a single algorithm is developed so far which can cover the requirement of generating frequent itemset by combining both the databases. We had proposed earlier MasterApriori algorithm which is used to calculate Approximate Frequent Items for a combination of certain and uncertain databases with the support of Apriori for Certain and Expected support based UApriori for the uncertain database. In this paper, the researcher would like to extend the former work by using Poisson and Normal Distribution based UApriori for the uncertain database. In proposed algorithms, there is only one-time communication between sites where data is distributed, which reduce the communication overhead. Scalability and efficiency of proposed algorithms are then checked by using standard, and synthetic databases. The performances were then measured by comparing time taken and a number of frequent items generated by each algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Aggarwal CC, Han J (2014) Frequent pattern mining. Springer, Cham. https://doi.org/10.1007/978-3-319-07821-2

    Book  MATH  Google Scholar 

  2. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22:207–216. https://doi.org/10.1145/170036.170072

    Article  Google Scholar 

  3. Han J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

    MATH  Google Scholar 

  4. Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12:372–390. https://doi.org/10.1109/69.846291

    Article  Google Scholar 

  5. Wang C, Parthasarathy S (2006) Summarizing itemset patterns using probabilistic models. In: Proc 12th ACM SIGKDD int conf knowl discov data min—KDD’06 730. https://doi.org/10.1145/1150402.1150495

  6. Zhang S, Wu X, Zhang C, Lu J (2008) Computing the minimum-support for mining frequent patterns. Knowl Inf Syst 15:233–257. https://doi.org/10.1007/s10115-007-0081-7

    Article  Google Scholar 

  7. Bernecker T, Cheng R, Cheung DW et al (2013) Model-based probabilistic Frequent Itemset Mining. Knowl Inf Syst 37:181–217. https://doi.org/10.1007/s10115-012-0561-2

    Article  Google Scholar 

  8. Cheng R, Kalashnikov D, Prabhakar S (2003) Evaluating probabilistic queries over imprecise data. In: Proc 2003 ACM SIGMOD Int Conf Manag data, pp 551–562. https://doi.org/10.1145/872819.872823

  9. Dalvi N, Suciu D (2004) Efficient query evaluation on probabilistic databases. In: VLDB. pp 864–875

  10. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB’94, proceedings of 20th international conference on very large data bases, September 12–15, 1994, Santiago de Chile, Chile. pp 487–499

  11. Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: Proc 15th ACM SIGKDD int conf knowl discov data min—KDD’09 29. https://doi.org/10.1145/1557019.1557030

  12. Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21:609–623. https://doi.org/10.1109/TKDE.2008.190

    Article  Google Scholar 

  13. Aggarwal CC (2009) Managing and Mining Uncertain Data. Manag Min Uncertain Data 35:45–76. https://doi.org/10.1007/978-0-387-09690-2

    Article  MATH  Google Scholar 

  14. Aggarwal CC, Yu PS (2008) Outlier detection with uncertain data. In: Proc SIAM Int Conf Data Min (SDM 2008), vol 2, pp 483–493

  15. Huang J, Antova L, Koch C, Olteanu D (2009) MayBMS: a probabilistic database management system. In: Proc 2009 ACM SIGMOD Int Conf Manag data, pp 1071–1074. https://doi.org/10.1145/1559845.1559984

  16. Hua M, Pei J (2008) Ranking queries on uncertain data: a probabilistic threshold approach. In: Proc 2008 ACM SIGMOD Int Conf Manag data, pp 673–686. https://doi.org/10.1145/1376616.1376685

  17. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc 2000 ACM SIGMOD Int Conf Manag data—SIGMOD’00, pp 1–12. https://doi.org/10.1145/342009.335372

  18. Tong Y, Chen L, Ding B (2012) Discovering threshold-based frequent closed itemsets over probabilistic data. In: 2012 IEEE 28th international conference on data engineering. IEEE, pp 270–281

  19. Word Health Organization (1998) Essential drugs monitor. Essent Drugs Monit 30:2. https://doi.org/10.1007/BF02722352

    Article  Google Scholar 

  20. Caulder CR, Mehta B, Bookstaver PB et al (2015) Impact of Drug shortages on health system pharmacies in the southeastern United States. Hosp Pharm 50:279–286. https://doi.org/10.1310/hpj5004-279

    Article  Google Scholar 

  21. Santos EP (2017) Over 300 M worth of medicine, hospital equipment “wasted” in 2016. Report of the Commission on Audit (COA), Department of Health(DOH)—CNN Philippines

  22. Goethals B (2003) Frequent Itemset Mining implementations repository. http://fimi.ua.ac.be/. Accessed 24 Jan 2018

  23. Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. Proc VLDB Endow 5:1650–1661. https://doi.org/10.14778/2350229.2350277

    Article  Google Scholar 

  24. Lawrence B, Miller TR, Eduard Z, Lawrence BA (2014) The economic and societal impact of motor vehicle crashes, 2010. 30, Report number: DOT HS 812 013

  25. Geurts K, Wets G, Brijs T, Vanhoof K (2003) Profiling of high-frequency accident locations by use of association rules. Transp Res Rec J Transp Res Board 1840:123–130. https://doi.org/10.3141/1840-14

    Article  Google Scholar 

  26. Strand R, Oughton D (2009) Risk and uncertainty as a research ethics challenge. National Committees for Research Ethics in Norway. ISBN: 978-82-7682-056-0

  27. Han E, Karypis G, Kumar V (1997) Scalable parallel data mining for association rules. ACM 1997:277–288

    Google Scholar 

  28. Han Eui-Hong, Karypis G, Kumar V (2000) Scalable parallel data mining for association rules. IEEE Trans Knowl Data Eng 12:337–352. https://doi.org/10.1109/69.846289

    Article  Google Scholar 

  29. Wazir S, Ahmad T, Sufyan Beg MM (2018) Frequent itemset mining for a combination of certain and uncertain databases. In: 6th world conference on soft computing (WConSC2016) Berkeley, California, USA. pp 25–39

  30. Conci A, Castro EMM (2002) Image mining by content. Expert Syst Appl 23:377–383. https://doi.org/10.1016/S0957-4174(02)00073-8

    Article  Google Scholar 

  31. Chen YL, Tang K, Shen RJ, Hu YH (2005) Market basket analysis in a multiple store environment. Decis Support Syst 40:339–354. https://doi.org/10.1016/j.dss.2004.04.009

    Article  Google Scholar 

  32. Cheung DW, Ng VT, Fu AW, Yongjian Fu (1996) Efficient mining of association rules in distributed databases. IEEE Trans Knowl Data Eng 8:911–922. https://doi.org/10.1109/69.553158

    Article  Google Scholar 

  33. Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8:962–969. https://doi.org/10.1109/69.553164

    Article  Google Scholar 

  34. Joshi MV, Han E-HS, Karypis G, Kumar V (2002) Efficient parallel algorithms for mining associations. Springer, Berlin, pp 83–126

    Google Scholar 

  35. Cheung D, Han J, Ng V (1996) A fast distributed algorithm for mining association rules. In: Parallel Distrib Inf Syst 1996, Fourth Int Conf, vol 56, pp 31–42

  36. Cheung DW, Xiao Y (1999) Effect of data distribution in parallel mining of associations. Data Min Knowl Discov 3:291–314. https://doi.org/10.1023/A:1009836926181

    Article  Google Scholar 

  37. Calders T, Garboni C, Goethals B (2010) Approximation of frequentness probability of itemsets in uncertain data. In: Proc—IEEE int conf data mining, ICDM 749–754. https://doi.org/10.1109/icdm.2010.42

  38. Calders T, Garboni C, Goethals B (2010) Efficient pattern mining of uncertain data with sampling. In: PAKDD. pp 480–487

  39. Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Proc 16th ACM SIGKDD int conf knowl discov data min—KDD’10, pp 273. https://doi.org/10.1145/1835804.1835841

  40. Wang L, Cheng R, Lee SD, Cheung DW-L (2010) Accelerating probabilistic Frequent Itemset Mining: a model-based approach. Cikm, pp 429–438. https://doi.org/10.1145/1871437.1871494

  41. Zhang Q, Li F, Yi K (2008) Finding frequent items in probabilistic data. In: Proc 2008 ACM SIGMOD int conf manag data—SIGMOD’08 819. https://doi.org/10.1145/1376616.1376698

  42. Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. Adv Knowl Discov Data Min 44:47–58

    Article  Google Scholar 

  43. Chui CK, Kao B (2008) A decremental approach for mining frequent itemsets from uncertain data. In: PAKDD. pp 64–75

  44. Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data. Lect Notes Comput Sci (Incl Subser Lect Not Artif Intell Lect Not Bioinform) 5012 LNAI:653–661. https://doi.org/10.1007/978-3-540-68125-0_61

  45. Bernecker T, Kriegel H-P, Renz M et al (2009) Probabilistic frequent itemset mining in uncertain databases. In: 15th ACM SIGKDD conference on knowledge discovery and data mining, Paris, France. pp 119–127

  46. Le Cam L (1960) An approximation theorem for the Poisson binomial distribution. Pac J Math 10:1181–1197

    Article  MathSciNet  Google Scholar 

  47. Hodges JL, Cam Le (1959) The poisson approximation to the poisson binomial distribution. Ann Math Stat Inst Math Stat Probab Lett 31:737–740. https://doi.org/10.1016/0167-7152(91)90170-v

    Article  MathSciNet  MATH  Google Scholar 

  48. Feller W (1945) The fundamental limit theorems in probability. Bull Am Math Soc 51:800–832. https://doi.org/10.1090/S0002-9904-1945-08448-1

    Article  MathSciNet  MATH  Google Scholar 

  49. Feller W (1968) An introduction to probability theory and its applications, vol I. xviii + 509. Wiley, Amsterdam

  50. Fournier-Viger SPMF (2018) A Java open-source data mining library. http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php. Accessed 25 Jan 2018

  51. Tong C, Chen L, Yu P (2012) UFIMT: an uncertain Frequent Itemset Mining toolbox. Proc ACM KDD Conf 1210. https://doi.org/10.1145/2339530.2339767

  52. Zadeh LA (1978) Fuzzy sets as a basis for a theory of possibility. Fuzzy Set Syst 1:3–28

    Article  MathSciNet  Google Scholar 

  53. Zadeh LA (1968) Probability measures of fuzzy events. J Math Anal Appl 23:421–427

    Article  MathSciNet  Google Scholar 

  54. Zadeh LA (2006) Fuzzy sets and possibility distribution. StndFuzz 195:47–58

    Google Scholar 

  55. Zadeh LA (1984) Fuzzy probabilities. Inf Process Manag 20(3):363–372

    Article  Google Scholar 

  56. Dubois D, Prade H (2004) Probability-possibility transformations, triangular fuzzy sets and probabilistic inequalities. Reliab Comput 10:273–297

    Article  MathSciNet  Google Scholar 

  57. Weng CH, Chen YL (2010) Mining fuzzy association rules from uncertain data. Knowl Inf Syst 23:129–152

    Article  Google Scholar 

  58. Hong TP, Kuo CS, Chi SC (1999) Mining association rules from quantitative data. Intell Data Anal 3:363–376

    MATH  Google Scholar 

  59. Wazir S, Sufyan Beg MM, Ahmad,T (2017) Mining the frequent itemsets for a database with certain and uncertain transactions. In: 21st world multiconference on systemics, cybernetics and informatics (WMSCI 2017), Orlando, USA

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samar Wazir.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wazir, S., Beg, M.M.S. & Ahmad, T. Comprehensive mining of frequent itemsets for a combination of certain and uncertain databases. Int. j. inf. tecnol. 12, 1205–1216 (2020). https://doi.org/10.1007/s41870-019-00310-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-019-00310-0

Keywords

Navigation