Skip to main content
Log in

HDSHUI-miner: a novel algorithm for discovering spatial high-utility itemsets in high-dimensional spatiotemporal databases

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Spatial high-utility itemset (SHUI) mining is a significant big data analysis technique. It aims to locate all geographically interesting itemsets with high utility in a spatiotemporal database. An SHUI-Miner algorithm was presented in the literature to find the desired itemsets. Unfortunately, this algorithm suffered from performance issues when dealing with high-dimensional spatiotemporal databases. Based on this finding, this paper extends the state-of-the-art method by proposing a novel algorithm known as the high-dimensional SHUI-miner (HDSHUI-Miner). Our algorithm explores several novel pruning strategies to decrease the search space and computational cost required to find the desired itemsets. Experimental results obtained on seven real-world databases demonstrate that HDSHUI-Miner outperforms SHUI-Miner with respect to memory consumption, runtime, and scalability. Finally, we present two real-world case studies to illustrate the usefulness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Data Availability

The databases generated during and/or analysed during the current study are available in the well known open-source data mining library named sequence pattern mining repository found at [44]. We have also used one more real-world database named Drought found at [45].

Code Availability

To ensure the repeatability of our experiments, we made the complete evaluation results, as well as the databases and algorithms, available on GitHub [47].

Notes

  1. 1 The downward closure property says that all nonempty subsets of an interesting itemset are also interesting itemsets [2]. This property is widely employed in itemset mining algorithms to reduce their search space and computational costs. The apriori property and anti-monotonic property are other names for this property.

References

  1. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Acm sigmod record, vol 22, pp 207–216

  2. Agrawal R (1994) Srikant, R. In: Proceedings 20th international conference very large data bases, VLDB, vol 1215, pp 487–499

  3. Luna JM, Fournier-Viger P, Ventura S (2019) Frequent itemset mining: a 25 years review. Wiley Interdiscip Rev Data Min Knowl Discov 9(6)

  4. Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: SIAM, pp 482–486

  5. Ahmed CF, Tanbeer SK, Jeong B-S (2010) Mining high utility web access sequences in dynamic web log data. In: International conference on software engineering, artificial intelligence, networking and parallel/distributed computing. SNPD ’10, pp 76–81

  6. Tseng VS, Shie B-E, Wu C-W, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786

    Article  Google Scholar 

  7. Liu Y-C, Cheng C-P, Tseng VS (2013) Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinforma 14(1):230

    Article  Google Scholar 

  8. Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Hong T-P, Fujita H (2018) A survey of incremental high-utility itemset mining. Wiley Interdiscip Rev: Data Min Knowl Discov 8(2)

  9. Uday Kiran R, Yashwanth Reddy T, Fournier-Viger P, Toyoda M, Krishna Reddy P, Kitsuregawa M (2019) Efficiently finding high utility-frequent itemsets using cutoff and suffix utility. In: PAKDD, pp 191–203

  10. Lin JC, Djenouri Y, Srivastava G, Li Y, Yu PS (2022) Scalable mining of high-utility sequential patterns with three-tier mapreduce model. ACM Trans Knowl Discov Data 16(3):60–16026. https://doi.org/10.1145/3487046

    Article  Google Scholar 

  11. Lin JC, Djenouri Y, Srivastava G, Yun U, Fournier-Viger P (2021) A predictive ga-based model for closed high-utility itemset mining. Appl Soft Comput 108:107422. https://doi.org/10.1016/j.asoc.2021.107422

    Article  Google Scholar 

  12. Lin JC, Li Y, Fournier-Viger P, Djenouri Y, Zhang J (2020) Efficient chain structure for high-utility sequential pattern mining. IEEE Access 8:40714–40722. https://doi.org/10.1109/ACCESS.2020.2976662

    Article  Google Scholar 

  13. Lin JC, Gan W, Fournier-Viger P, Hong T, Tseng VS (2016) Fast algorithms for mining high-utility itemsets with various discount strategies. Adv Eng Inform 30(2):109–126. https://doi.org/10.1016/j.aei.2016.02.003

    Article  Google Scholar 

  14. Wu JM, Srivastava G, Wei M, Yun U, Lin JC (2021) Fuzzy high-utility pattern mining in parallel and distributed hadoop framework. Inf Sci 553:31–48. https://doi.org/10.1016/j.ins.2020.12.004

    Article  MathSciNet  Google Scholar 

  15. Fournier-Viger P, Zhang Y, Lin JC, Dinh D, Le HB (2020) Mining correlated high-utility itemsets using various measures. Log J IGPL 28(1):19–32. https://doi.org/10.1093/jigpal/jzz068

    Article  MathSciNet  MATH  Google Scholar 

  16. Yin J, Zheng Z, Cao L (2012) Uspan: an efficient algorithm for mining high utility sequential patterns. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’12, pp 660–668

  17. Nouioua M, Fournier Viger P, Wu C-W, Lin C-W, Gan W (2021) Fhuqi-miner: fast high utility quantitative itemset mining. Appl Intell 51:1–25. https://doi.org/10.1007/s10489-021-02204-w

    Article  Google Scholar 

  18. Verma A, Dawar S, Kumar R, Navathe S, Goyal V (2021) High-utility and diverse itemset mining. Appl Intell 51(7):4649–4663. https://doi.org/10.1007/s10489-020-02063-x

    Article  Google Scholar 

  19. Wu JM-T, Li Z, Srivastava G, Yun U, Lin JC-W (2022) Analytics of high average-utility patterns in the industrial internet of things. Appl Intell 52(6):6450–6463. https://doi.org/10.1007/s10489-021-02751-2

    Article  Google Scholar 

  20. Lin JC, Djenouri Y, Srivastava G (2021) Efficient closed high-utility pattern fusion model in large-scale databases. Inf Fusion 76:122–132. https://doi.org/10.1016/j.inffus.2021.05.011

    Article  Google Scholar 

  21. Lin JC, Zhang J, Fournier-Viger P, Hong T, Zhang J (2017) A two-phase approach to mine short-period high-utility itemsets in transactional databases. Adv Eng Inform 33:29–43. https://doi.org/10.1016/j.aei.2017.04.007

    Article  Google Scholar 

  22. Fournier-Viger P, Lin JC, Duong Q, Dam T (2016) PHM: mining periodic high-utility itemsets. In: Industrial conference on data mining, pp 64–79

  23. Kiran RU, Zettsu K, Toyoda M, Fournier-Viger P, Reddy PK, Kitsuregawa M (2019) Discovering spatial high utility itemsets in spatiotemporal databases. In: Proceedings of the 31st international conference on scientific and statistical database management. SSDBM ’19. Association for Computing Machinery, New York, pp 49–60. https://doi.org/10.1145/3335783.3335789

  24. Kiran RU, Ito S, Dao M-S, Zettsu K, Wu C-W, Watanobe Y, Paik I, Thang TC (2020) Distributed mining of spatial high utility itemsets in very large spatiotemporal databases using spark in-memory computing architecture. In: 2020 IEEE international conference on big data (big data), pp 4724–4733. https://doi.org/10.1109/BigData50022.2020.9377946

  25. Bommisetty SC, Penugonda R, Rage UK, Dao MS, Zettsu K (2021) Discovering spatial high utility itemsets in high-dimensional spatiotemporal databases. In: Fujita H, Selamat A, Lin JC-W, Ali M (eds) Advances and trends in artificial intelligence. Artificial intelligence practices. Springer, Cham, pp 53–65

  26. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87

    Article  MathSciNet  Google Scholar 

  27. Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Disc 14(1)

  28. Aggarwal CC (2014) . In: Aggarwal CC, Han J (eds) Applications of frequent pattern mining. Springer, Cham, pp 443–467. https://doi.org/10.1007/978-3-319-07821-2_18

  29. Fournier-Viger P, Lin JC-W, Kiran RU, Koh YS (2017) A survey of sequential pattern mining. Data Sci Pattern Recog 1(1):54–77

    Google Scholar 

  30. Kiran RU, Shrivastava S, Fournier-Viger P, Zettsu K, Toyoda M, Kitsuregawa M (2020) Discovering frequent spatial patterns in very large spatiotemporal databases. In: Proceedings of the 28th international conference on advances in geographic information systems. SIGSPATIAL ’20. Association for Computing Machinery, New York, pp 445–448. https://doi.org/10.1145/3397536.3422206

  31. Aggarwal A, Toshniwal D (2019) Frequent pattern mining on time and location aware air quality data. IEEE Access 7:98921–98933. https://doi.org/10.1109/ACCESS.2019.2930004

    Article  Google Scholar 

  32. Ding W, Eick CF, Wang J, Yuan X (2006) A framework for regional association rule mining in spatial datasets. In: 6th international conference on data mining (ICDM’06), pp 1851–856. https://doi.org/10.1109/ICDM.2006.5

  33. Mohan P, Shekhar S, Shine JA, Rogers JP, Jiang Z, Wayant N (2011) A neighborhood graph based approach to regional co-location pattern discovery: a summary of results. In: Proceedings of the 19th ACM SIGSPATIAL international conference on advances in geographic information systems. GIS ’11. Association for Computing Machinery, New York, pp 122–132. https://doi.org/10.1145/2093973.2093991

  34. Sengstock C, Gertz M (2013) Spatial itemset mining: a framework to explore itemsets in geographic space. In: Catania B, Guerrini G, Pokorný J (eds) Advances in databases and information systems. Springer, Berlin, pp 148–161

  35. Tran-The H, Zettsu K (2017) Discovering co-occurrence patterns of heterogeneous events from unevenly-distributed spatiotemporal data. In: 2017 IEEE international conference on big data (Big Data), pp 1006–1011. https://doi.org/10.1109/BigData.2017.8258023

  36. Chan R, Yang Q, Shen Y-D (2003) Mining high utility itemsets. In: 3rd IEEE international conference on data mining, pp 19–26. https://doi.org/10.1109/ICDM.2003.1250893

  37. Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 55–64

  38. Fournier Viger P, Wu C-W, Zida S, Tseng V (2014) Fhm: faster high-utility itemset mining using estimated utility co-occurrence pruning. https://doi.org/10.1007/978-3-319-08326-1_9

  39. Lin JC-W, Zhang J, Fournier-Viger P, Hong T-P, Zhang J (2017) A two-phase approach to mine short-period high-utility itemsets in transactional databases. Adv Eng Inform 33:29–43. https://doi.org/10.1016/j.aei.2017.04.007

    Article  Google Scholar 

  40. Zida S, Fournier-Viger P, Lin JC-W, Wu C-W, Tseng VS (2017) Efim: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625

    Article  Google Scholar 

  41. Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management. CIKM ’12. Association for Computing Machinery, New York, pp 55–64. https://doi.org/10.1145/2396761.2396773

  42. Tung NT, Nguyen LTT, Nguyen TDD, Vo B (2022) An efficient method for mining multi-level high utility itemsets. Appl Intell 52(5):5475–5496. https://doi.org/10.1007/s10489-021-02681-z

    Article  Google Scholar 

  43. Krishnamoorthy S (2017) Hminer: efficiently mining high utility itemsets. Expert Syst Appl 90:168–183

    Article  Google Scholar 

  44. Fournier-Viger P (2020) SPMF: a java open-source data mining library. http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php. Accessed 4 June 2020

  45. National Center for Atmospheric Research, University Corporation for Atmospheric Research: Standardized precipitation index (SPI) for global land surface (1949-2012) (2013) Research data archive at the national center for atmospheric research, computational and information systems laboratory, Boulder CO

  46. Atmospheric Environmental Regional Observation System: AEROS. http://soramame.taiki.go.jp/

  47. Kiran RU (2022) PAMI: Pattern mining. https://github.com/udayRage/PAMI/tree/main/PAMI/highUtilitySpatialPattern/basic. Accessed 10 Sept 2022

Download references

Acknowledgements

We would like to acknowledge that some of the databases named Congestion and Pollution that were used in the SHUI-Miner [23] and distributed SHUI-Miner [24] papers have been re-used for experimental evaluation purposes with appropriate citations of those databases.

Funding

This research was funded by JSPS Kakenhi 21K12034.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rage Uday Kiran.

Ethics declarations

Conflict of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Emerging Topics in Artificial Intelligence Selected from IEA/AIE2021 Guest Editors: Ali Selamat and Jerry Chun-Wei Lin

Uday Kiran Rage, Veena Pamalla and Ravikumar Penugonda contributed equally to this work.

Uday proposed the idea. Veena introduced optimizations to reduce the search space. Venus and Sai have done coding and conducted the experiments. Ravi verified the experiments. Dao and Zettsu have shared the real-world datasets.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Uday Kiran, R., Veena, P., Ravikumar, P. et al. HDSHUI-miner: a novel algorithm for discovering spatial high-utility itemsets in high-dimensional spatiotemporal databases. Appl Intell 53, 8536–8561 (2023). https://doi.org/10.1007/s10489-022-04436-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04436-w

Keywords

Navigation