Advertisement

Minimal weighted infrequent itemset mining-based outlier detection approach on uncertain data stream

  • Saihua Cai
  • Ruizhi SunEmail author
  • Shangbo Hao
  • Sicong Li
  • Gang Yuan
Multi-Source Data Understanding (MSDU)
  • 32 Downloads

Abstract

Outliers are a critical factor that affects the accuracy of data-based predictions and some other data-based processing; thus, outliers must be effectively detected as soon as possible to improve the credibility of the data. In recent years, massive outlier detection approaches have been proposed for static data and precise data; however, the uncertainty and weight information of each item was not considered in this prior work. Moreover, traditional outlier detection approaches only take the deviation degree of each data element as the standard for determining outliers; therefore, the detected outliers do not fit the definition of an outlier (i.e., rarely appearing and different from most of the other data). Aimed at these problems, a minimal weighted infrequent itemset mining-based outlier detection approach that can be applied to an uncertain data stream, called MWIFIM–OD–UDS, is proposed in this paper to effectively detect implicit outliers, which have a rarely occurring frequency, uncertainty and a certain weight of the itemset, while the characteristics of the data stream are considered. In particular, a matrix structure-based approach that is called MWIFIM–UDS is proposed to mine the minimal weighted infrequent itemsets (MWiFIs) from an uncertain data stream, and then, the MWIFIM–OD–UDS method is proposed based on the mined MWiFIs and the designed deviation indexes. Experimental results show that the proposed MWIFIM–OD–UDS method outperforms the frequent itemset mining-based outlier detection methods, FindFPOF and LFP, in terms of its runtime and detection accuracy.

Keywords

Minimal infrequent itemset mining Outlier detection Uncertain weighted data stream Deviation index 

Notes

Acknowledgements

This work was supported in part by the Chinese Universities Scientific Fund under grant number 2017XD001 and the Fundamental Research Funds for the Central Universities under grant number 2018XD004.

References

  1. 1.
    Adda M, Wu L, Feng Y (2007) Rare itemset mining. In: Proceedings of the 6th international conference on machine learning and applications, pp 73–80Google Scholar
  2. 2.
    Aggarwal CC (2013) Managing and mining sensor data. Springer, New YorkCrossRefGoogle Scholar
  3. 3.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases (VLDB), pp 487–499Google Scholar
  4. 4.
    Ahmed CF, Tanbeer SK, Jeong BS, Lee YK, Choi HJ (2012) Single-pass incremental and interactive mining for weighted frequent patterns. Expert Syst Appl 39(9):7976–7994CrossRefGoogle Scholar
  5. 5.
    AsSadhan B, Zeb K, Al-Muhtadi J, Alshebeili S (2017) Anomaly detection based on LRD behavior analysis of decomposed control and data planes network traffic using SOSS and FARIMA models. IEEE Access 5:13501–13519CrossRefGoogle Scholar
  6. 6.
    Bai M, Wang X, Xin J, Wang GR (2016) An efficient algorithm for distributed density-based outlier detection on big data. Neurocomputing 181:19–28CrossRefGoogle Scholar
  7. 7.
    Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 93–104Google Scholar
  8. 8.
    Cagliero L, Garza P (2014) Infrequent weighted itemset mining using frequent pattern growth. IEEE Trans Knowl Data Eng 26(4):903–915CrossRefGoogle Scholar
  9. 9.
    Cai CH, Fu AWC, Cheng CH, Kwong WW (1998) Mining association rules with weighted items. In: Proceedings of international database engineering and applications symposium (IDEAS’98), pp 68–77Google Scholar
  10. 10.
    Cai SH, Sun RZ, Cheng CM, Wu G (2017) Exception detection of data stream based on improved maximal frequent itemsets mining. In: Chinese conference on trusted computing and information security, pp 112–125Google Scholar
  11. 11.
    Cao KY, Wang GR, Han DH, Ding GH, Wang AX, Shi LX (2014) Continuous outlier monitoring on uncertain data streams. J Comput Sci Technol 29(3):436–448MathSciNetCrossRefGoogle Scholar
  12. 12.
    Cao L, Yang D, Wang Q, Yu Y, Wang J (2014) Scalable distance-based outlier detection over high-volume data streams. In: Proceedings of the 30th IEEE international conference on data engineering (ICDE), pp 76–87Google Scholar
  13. 13.
    Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: Pacific-Asia Conference on knowledge discovery and data mining, pp 47–58Google Scholar
  14. 14.
    Cuzzocrea A, Leung CKS, MacKinnon RK (2014) Mining constrained frequent itemsets from distributed uncertain data. Future Gener Comput Syst 37:117–126CrossRefGoogle Scholar
  15. 15.
    Haglin DJ, Manning AM (2007) On minimal infrequent itemset mining. In: Proceedings of the 7th international conference on data mining, pp 141–147Google Scholar
  16. 16.
    Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: ACM SIGMOD record, pp 1–12CrossRefGoogle Scholar
  17. 17.
    Hawkins DM (1980) Identification of outliers. Chapman and Hall, LondonCrossRefGoogle Scholar
  18. 18.
    He ZY, Xu XF, Huang JZ, Deng SC (2005) FP-outlier: frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118CrossRefGoogle Scholar
  19. 19.
    Hemalatha CS, Vaidehi V, Lakshmi R (2015) Minimal infrequent pattern based approach for mining outliers in data streams. Expert Syst Appl 42(4):1998–2012CrossRefGoogle Scholar
  20. 20.
    Huang J, Zhu Q, Yang L, Cheng D, Wu Q (2017) A novel outlier cluster detection algorithm without top-n parameter. Knowl-Based Syst 121:32–40CrossRefGoogle Scholar
  21. 21.
    Karim MR, Cochez M, Beyan OD, Ahmed CF, Decker S (2018) Mining maximal frequent patterns in transactional databases and dynamic data streams: a spark-based approach. Inf Sci 432:278–300MathSciNetCrossRefGoogle Scholar
  22. 22.
    Kontaki M, Gounaris A, Papadopoulos AN, Tsichlas K, Manolopoulos Y (2016) Efficient and flexible algorithms for monitoring distance-based outliers over data streams. Inf Syst 55:37–53CrossRefGoogle Scholar
  23. 23.
    Lee G, Yun U, Ryang H (2015) An uncertainty-based approach: frequent itemset mining from uncertain data with different item importance. Knowl-Based Syst 90:239–256CrossRefGoogle Scholar
  24. 24.
    Lee G, Yun U, Ryu KH (2017) Mining frequent weighted itemsets without storing transaction ids and generating candidates. Int J Uncertain Fuzziness Knowl-Based Syst 25(01):111–144CrossRefGoogle Scholar
  25. 25.
    Lim Y, Kang U (2017) Time-weighted counting for recently frequent pattern mining in data streams. Knowl Inf Syst 53(2):391–422CrossRefGoogle Scholar
  26. 26.
    Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016) Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst 96:171–187CrossRefGoogle Scholar
  27. 27.
    Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016) Weighted frequent itemset mining over uncertain databases. Appl Intell 44(1):232–250CrossRefGoogle Scholar
  28. 28.
    Lin JCW, Gan W, Fournier-Viger P, Hong TP, Chao HC (2017) Mining weighted frequent itemsets without candidate generation in uncertain databases. Int J Inf Technol Decis Mak 16(06):1549–1579CrossRefGoogle Scholar
  29. 29.
    Liu J, Deng HF (2013) Outlier detection on uncertain data based on local information. Knowl-Based Syst 51:60–71CrossRefGoogle Scholar
  30. 30.
    Mao G, Wu X, Jiang X (2012) Intrusion detection models based on data mining. Int J Comput Intell Syst 5(1):30–38CrossRefGoogle Scholar
  31. 31.
    Park SH, Kim SM, Ha YG (2016) Highway traffic accident prediction using VDS big data analysis. J Supercomput 72(7):2815–2831CrossRefGoogle Scholar
  32. 32.
    Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: ACM SIGMOD international conference on management of data, pp 427–438CrossRefGoogle Scholar
  33. 33.
    Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 57:214–231CrossRefGoogle Scholar
  34. 34.
    Szathmary L, Napoli A, Valtchev P (2007) Towards rare itemset mining. In: 19th IEEE international conference on tools with artificial intelligence (ICTAI), pp 305–312Google Scholar
  35. 35.
    Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180CrossRefGoogle Scholar
  36. 36.
    Tao F, Murtagh F, Farid M (2003) Weighted association rule mining using weighted support and significance framework. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 661–666Google Scholar
  37. 37.
    Troiano L, Scibelli G (2014) A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets. Data Min Knowl Discov 28(3):773–807MathSciNetCrossRefGoogle Scholar
  38. 38.
    Tsang S, Koh YS, Dobbie G (2011) RP-tree: rare pattern tree mining. In: Proceedings of the 13th international conference on data warehousing and knowledge discovery, pp 277–288CrossRefGoogle Scholar
  39. 39.
    Vo B, Coenen F, Le B (2013) A new method for mining frequent weighted itemsets based on WIT-trees. Expert Syst Appl 40(4):1256–1264CrossRefGoogle Scholar
  40. 40.
    Wang B, Yang XC, Wang GR, Yu G (2010) Outlier detection over sliding windows for probabilistic data streams. J Comput Sci Technol 25(3):389–400CrossRefGoogle Scholar
  41. 41.
    Wang W, Yang J, Yu PS (2004) WAR: weighted association rules for item intensities. Knowl Inf Syst 6:203–229CrossRefGoogle Scholar
  42. 42.
    Yan QY, Xia SX, Feng KW (2012) Probabilistic distance based abnormal pattern detection in uncertain series data. Knowl-Based Syst 36:182–190CrossRefGoogle Scholar
  43. 43.
    Yu JX, Chong Z, Lu H, Zhang Z, Zhou A (2006) A false negative approach to mining frequent itemsets from high speed transactional data streams. Inf Sci 176(14):1986–2015CrossRefGoogle Scholar
  44. 44.
    Yun U, Kim D, Yoon E, Fujita H (2017) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205CrossRefGoogle Scholar
  45. 45.
    Yun U, Leggett JJ (2005) WFIM: weighted frequent itemset mining with a weight range and a minimum weight. In: Proceedings of the 4th SIAM international conference on data mining, pp 636–640Google Scholar
  46. 46.
    Zhang S, Li X, Zong M, Zhu X, Cheng D (2017) Learning k for knn classification. ACM Trans Intell Syst Technol 8(3):43Google Scholar
  47. 47.
    Zhang W, Wu J, Yu J (2010) An improved method of outlier detection based on frequent pattern. In: WASE international conference on information engineering (ICIE), pp 3–6Google Scholar
  48. 48.
    Zhu X, Li X, Zhang S (2016) Block-row sparse multiview multilabel learning for image classification. IEEE Trans Cybern 46(2):450–461CrossRefGoogle Scholar
  49. 49.
    Zhu X, Li X, Zhang S, Ju C, Wu X (2017) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans Neural Netw Learn Syst 28(6):1263–1275MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  • Saihua Cai
    • 1
  • Ruizhi Sun
    • 1
    • 2
    Email author
  • Shangbo Hao
    • 1
  • Sicong Li
    • 1
  • Gang Yuan
    • 1
  1. 1.College of Information and Electrical EngineeringChina Agricultural UniversityBeijingChina
  2. 2.Key Laboratory of Agricultural Information Acquisition TechnologyMinistry of AgricultureBeijingChina

Personalised recommendations