An incremental framework to extract coverage patterns for dynamic databases

Abstract

Pattern mining is an important task of data mining and involves the extraction of interesting associations from large transactional databases. Typically, a given transactional database D gets updated due to the addition and deletion of transactions. Consequently, some of the previously discovered patterns may become invalid, while some new patterns may emerge. This has motivated significant research efforts in the area of incremental mining. The goal of incremental mining is to efficiently mine patterns when D gets updated with additions and/or deletions of transactions as opposed to mining all of the patterns from scratch. Incidentally, active research efforts are being made to develop incremental pattern mining algorithms for extracting frequent patterns, sequential patterns and utility patterns. Another important type of pattern is the coverage pattern (CP), which has significant applications in areas such as banner advertising, search engine advertising and visibility mining. However, none of the existing works address the issue of incremental mining for extracting CPs. In this regard, the main contributions of this work are twofold. First, we introduce the problem of incremental mining of CPs. Second, we propose an approach, designated as Comprehensive Coverage Pattern Mining, for efficiently extracting CPs under the incremental paradigm. We have also performed extensive experiments using two real click-stream datasets and one synthetic dataset to demonstrate the overall effectiveness of our proposed approach.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. 1.

    Abdullah, Z., Herawan, T., Noraziah, A., Deris, M.M.: DFP-Growth: an efficient algorithm for mining frequent patterns in dynamic database. In: Proceedings of International Conference on Information Computing and Applications, pp. 51–58. Springer (2012)

  2. 2.

    Adnan, M., Alhajj, R., Barker, K.: Constructing complete FP-tree for incremental mining of frequent patterns in dynamic databases. In: Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp. 363–372. Springer (2006)

  3. 3.

    Aggarwal, C.C., Bhuiyan, M.A., Al Hasan, M.: Frequent pattern mining algorithms: a survey. In: Frequent Pattern Mining, pp 19–64. Springer (2014)

  4. 4.

    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the Very Large Data Bases, pp. 487–499. Springer (1994)

  5. 5.

    Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the Special Interest Group on Management of Data, pp. 207–216. ACM (1993)

  6. 6.

    Ahmed, C.F., Tanbeer, S.K., Jeong, B.S., Lee, Y.K.: Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)

    Article  Google Scholar 

  7. 7.

    Aumann, Y., Feldman, R., Lipshtat, O., Manilla, H.: Borders: an efficient algorithm for association generation in dynamic databases. J. Intell. Inf. Syst. 12(1), 61–73 (1999)

    Article  Google Scholar 

  8. 8.

    Borah, A., Nath, B.: Rare association rule mining from incremental databases. Pattern Anal. Appl. 23(1), 113–134 (2020)

    Article  Google Scholar 

  9. 9.

    Budhiraja, A., Reddy, P.K.: An approach to cover more advertisers in adwords. In: Proceedings of the International Conference on Data Science and Advanced Analytics. IEEE, pp. 1–10 (2015)

  10. 10.

    Budhiraja, A., Reddy, P.K.: An improved approach for long tail advertising in sponsored search. In: Proceedings of the Database Systems for Advanced Applications, pp. 169–184 (2017)

  11. 11.

    Budhiraja, A., Ralla, A., Reddy, P.K.: Coverage pattern based framework to improve search engine advertising. Int. J. Data Sci. Anal. 8(2), 199–211 (2019)

    Article  Google Scholar 

  12. 12.

    Chang, L., Wang, T., Yang, D., Luan, H.: SeqStream: mining closed sequential patterns over stream sliding windows. In: Proceedings of the International Conference on Data Mining. IEEE, pp. 83–92 (2008)

  13. 13.

    Chau, M., Fang, X., Liu Sheng, O.R.: Analysis of the query logs of a web site search engine. J. Am. Soc. Inf. Sci. Technol. 56(13), 1363–1376 (2005)

    Article  Google Scholar 

  14. 14.

    Cheng, H., Yan, X., Han, J.: IncSpan: incremental mining of sequential patterns in large database. In: Proceedings of the Special Interest Group on Knowledge Discovery and Data Mining, pp. 527–532. ACM (2004)

  15. 15.

    Cheung, D.W., Wong, C., Han, J., Ng, V.T.: Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of the International Conference on Data Engineering. IEEE, pp. 106–114 (1996)

  16. 16.

    Cheung, D.W., Lee, S.D., Kao, B.: A general incremental technique for maintaining discovered association rules. In: Proceedings of the Database Systems for Advanced Applications, pp. 185–194. World Scientific (1997)

  17. 17.

    Chuang, P.J., Tu, Y.S.: Efficient frequent pattern mining in data streams. In: IOP Conference Series: Earth and Environmental Science, vol. 234, no. 1, pp. 012–066. IOP Publishing (2019)

  18. 18.

    Gangumalla, L., Reddy, P.K., Mondal, A.: Multi-location visibility query processing using portion-based transactional modeling and pattern mining. Data Min. Knowl. Discov. 33(5), 1393–1416 (2019)

    Article  Google Scholar 

  19. 19.

    Goethals, B., Zaki, M.J.: Advances in frequent itemset mining implementations: report on FIMI’03. Spec. Interest Group Knowl. Discov. Data Min. Explor. Newslett. ACM 6(1), 109–117 (2004)

    Google Scholar 

  20. 20.

    Guo, F., Li, Y., Li, L.: Research on improvement of high utility pattern mining algorithm over data streams. In: IOP Conference Series: Materials Science and Engineering, vol. 715, no. 1, pp. 012–022. IOP Publishing (2020)

  21. 21.

    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Elsevier, San Francisco (2011)

    MATH  Google Scholar 

  22. 22.

    Hassani, M., Töws, D., Cuzzocrea, A., Seidl, T.: BFSPMiner: an effective and efficient batch-free algorithm for mining sequential patterns over data streams. Int. J. Data Sci. Anal. 8(3), 223–239 (2019)

    Article  Google Scholar 

  23. 23.

    Ho, C.C., Li, H.F., Kuo, F.F., Lee, S.Y.: Incremental mining of sequential patterns over a stream sliding window. In: Proceedings of the International Conference on Data Mining-Workshops, IEEE, pp. 677–681 (2006)

  24. 24.

    Ishita, S.Z., Ahmed, C.F., Leung, C.K., Hoi, C.H.: Mining regular high utility sequential patterns in static and dynamic databases. In: Proceedings of the International Conference on Ubiquitous Information Management and Communication, , pp. 897–916. Springer (2019)

  25. 25.

    Karim, M.R., Cochez, M., Beyan, O.D., Ahmed, C.F., Decker, S.: Mining maximal frequent patterns in transactional databases and dynamic data streams: a spark-based approach. Inf. Sci. 432, 278–300 (2018)

    MathSciNet  Article  Google Scholar 

  26. 26.

    Kavya, V.N.S., Reddy, P.K.: Coverage patterns-based approach to allocate advertisement slots for display advertising. In: Proceedings of the International Conference on Web Engineering, pp. 152–169. Springer (2016)

  27. 27.

    Lee, G., Yun, U., Ryu, K.H.: Sliding window based weighted maximal frequent pattern mining over data streams. Expert Syst. Appl. 41(2), 694–708 (2014)

    Article  Google Scholar 

  28. 28.

    Lin, M.Y., Hsueh, S.C., Chan, C.C.: Mining and maintenance of sequential patterns using a backward generation framework. J. Inf. Sci. Eng. 34(5), 1329–1349 (2018)

    MathSciNet  Google Scholar 

  29. 29.

    Marascu, A., Masseglia, F.: Mining sequential patterns from data streams: a centroid approach. J. Intell. Inf. Syst. 27(3), 291–307 (2006)

    Article  Google Scholar 

  30. 30.

    Masseglia, F., Poncelet, P., Teisseire, M.: Incremental mining of sequential patterns in large databases. Data Knowl. Eng. 46(1), 97–121 (2003)

    Article  Google Scholar 

  31. 31.

    Nguyen, L.T., Nguyen, P., Nguyen, T.D., Vo, B., Fournier-Viger, P., Tseng, V.S.: Mining high-utility itemsets in dynamic profit databases. Knowl.-Based Syst. 175, 130–144 (2019)

    Article  Google Scholar 

  32. 32.

    Nguyen, S.N., Sun, X., Orlowska, M.E.: Improvements of IncSpan: incremental mining of sequential patterns in large database. In: Proceedings of the Pacific Asia Conference on Knowledge Discovery and Data Mining, pp. 442–451. Springer (2005)

  33. 33.

    Nguyen, T.T.: Mining incrementally closed item sets with constructive pattern set. Expert Syst. Appl. 100, 41–67 (2018)

    Article  Google Scholar 

  34. 34.

    Noll, M.G., Meinel, C.: The metadata triumvirate: Social annotations, anchor texts and search queries. In: Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology, IEEE, pp. 640–647 (2008)

  35. 35.

    Ralla, A., Reddy, P.K., Mondal, A.: An incremental technique for mining coverage patterns in large databases. In: Proceedings of the International Conference on Data Science and Advanced Analytics, IEEE, pp. 211–220 (2019)

  36. 36.

    Ryang, H., Yun, U.: High utility pattern mining over data streams with sliding window technique. Expert Syst. Appl. 57, 214–231 (2016)

    Article  Google Scholar 

  37. 37.

    Srinivas, P.G., Reddy, P.K., Bhargav, S., Kiran, R.U., Kumar, D.S.: Discovering coverage patterns for banner advertisement placement. In: Proceedings of the Pacific–Asia Conference on Knowledge Discovery and Data Mining, pp. 133–144. Springer (2012)

  38. 38.

    Srinivas, P.G., Reddy, P.K., Trinath, A.V., Bhargav, S., Kiran, R.U.: Mining coverage patterns from transactional databases. J. Intell. Inf. Syst. 45(3), 423–439 (2015)

    Article  Google Scholar 

  39. 39.

    Tanbeer, S.K., Ahmed, C.F., Jeong, B.S., Lee, Y.K.: Sliding window-based frequent pattern mining over data streams. Inf. Sci. 179(22), 3843–3865 (2009)

    MathSciNet  Article  Google Scholar 

  40. 40.

    Trinath, A., Srinivas, P.G., Reddy, P.K.: Content specific coverage patterns for banner advertisement placement. In: Proceedings of the International Conference on Data Science and Advanced Analytics, IEEE, pp. 263–269 (2014)

  41. 41.

    Wang, J.Z., Huang, J.L.: On incremental high utility sequential pattern mining. Trans. Intell. Syst. Technol. 9(5), 1–26 (2018)

    Article  Google Scholar 

  42. 42.

    Yen, S.J., Lee, Y.S.: Efficient approaches for updating sequential patterns. In: Proceedings of the Asian Conference on Intelligent Information and Database Systems, pp. 553–564. Springer (2020)

  43. 43.

    Yun, U., Lee, G.: Incremental mining of weighted maximal frequent itemsets from dynamic databases. Expert Syst. Appl. 54, 304–327 (2016)

    Article  Google Scholar 

  44. 44.

    Yun, U., Lee, G., Yoon, E.: Advanced approach of sliding window based erasable pattern mining with list structure of industrial fields. Inf. Sci. 494, 37–59 (2019a)

    Article  Google Scholar 

  45. 45.

    Yun, U., Nam, H., Lee, G., Yoon, E.: Efficient approach for incremental high utility pattern mining with indexed list structure. Future Gener. Comput. Syst. 95, 221–239 (2019b)

    Article  Google Scholar 

  46. 46.

    Zhang, B., Lin, C.W., Gan, W., Hong, T.P.: Maintaining the discovered sequential patterns for sequence insertion in dynamic databases. Eng. Appl. Artif. Intell. 35, 131–142 (2014)

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Komallapalli Kaushik.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kaushik, K., Reddy, P.K., Mondal, A. et al. An incremental framework to extract coverage patterns for dynamic databases. Int J Data Sci Anal (2021). https://doi.org/10.1007/s41060-021-00262-4

Download citation

Keywords

  • Data mining
  • Pattern mining
  • Coverage patterns
  • Dynamic databases
  • Incremental mining