Adaptive Cluster Based Discovery of High Utility Itemsets

Lakhawat, Piyush; Somani, Arun

doi:10.1007/978-3-030-15640-4_8

Piyush Lakhawat¹⁵ &
Arun Somani¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 976))

Included in the following conference series:

International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management

333 Accesses

Abstract

Utility Itemset Mining (UIM) is a key analysis technique for data which is modeled by the Transactional data model. While improving the computational time and space efficiency of the mining of itemsets is important, it is also critically important to predict future itemsets accurately. In today’s time, when both scientific and business competitive edge is commonly derived from first access to knowledge via advanced predictive ability, this problem becomes increasingly relevant. We established in our most recent work that having prior knowledge of approximate cluster structure of the dataset and using it implicitly in the mining process, can lend itself to accurate prediction of future itemsets. We evaluate the individual strength of each transaction while focusing on itemset prediction, and reshape the transaction utilities based on that. We extend our work by identifying that such reshaping of transaction utilities should be adaptive to the anticipated cluster structure, if there is a specific intended prediction window. We define novel concepts for making such an anticipation and integrate Time Series Forecasting into the evaluation. We perform additional illustrative experiments to demonstrate the application of our improved technique and also discuss future direction for this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 6, 962–969 (1996)
Article Google Scholar
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB 1994, vol. 1215, pp. 487–499 (1994)
Google Scholar
Ahmed, C.F., Tanbeer, S.K., Jeong, B.-S., Lee, Y.-K.: Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)
Article Google Scholar
Alves, R., Rodriguez-Baena, D.S., Aguilar-Ruiz, J.S.: Gene association analysis: a survey of frequent pattern mining from gene expression data. Brief. Bioinform. 11(2), 210–224 (2009)
Article Google Scholar
Andreopoulos, B., An, A., Wang, X., Schroeder, M.: A roadmap of clustering algorithms: finding a match for a biomedical application. Brief. Bioinform. 10(3), 297–314 (2009)
Article Google Scholar
BMSWebView1: SMPF: an open-source data mining library (2016). http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php. Accessed 14 June 2016
Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: Knowledge Discovery and Data Mining, pp. 254–260 (1999)
Google Scholar
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: ACM SIGMOD Record, vol. 26, pp. 255–264. ACM (1997)
Google Scholar
Chan, R.C., Yang, Q., Shen, Y.-D.: Mining high utility itemsets. In: Third IEEE International Conference on Data Mining, ICDM 2003, pp. 19–26. IEEE (2003)
Google Scholar
Chen, K., Liu, L.: The “Best k” for entropy-based categorical data clustering (2005)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of 15th International Conference on Data Engineering, pp. 512–521. IEEE (1999)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD Record, vol. 29, pp. 1–12. ACM (2000)
Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2(3), 283–304 (1998)
Article Google Scholar
Lakhawat, P., Mishra, M., Somani, A.: A clustering based prediction scheme for high utility itemsets. In: Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, pp. 123–134. INSTICC, SciTePress (2017)
Google Scholar
Lakhawat, P., Mishra, M., Somani, A.K.: A novel clustering algorithm to capture utility information in transactional data. In: KDIR, pp. 456–462 (2016)
Google Scholar
Li, H.-F., Huang, H.-Y., Chen, Y.-C., Liu, Y.-J., Lee, S.-Y.: Fast and memory efficient mining of high utility itemsets in data streams. In: Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 881–886. IEEE (2008)
Google Scholar
Liao, S.-H., Chu, P.-H., Hsiao, P.-Y.: Data mining techniques and applications-a decade review from 2000 to 2011. Expert. Syst. Appl. 39(12), 11303–11311 (2012)
Article Google Scholar
Liu, Y., Liao, W.-K., Choudhary, A.: A fast high utility itemsets mining algorithm. In: Proceedings of the 1st International Workshop on Utility-based Data Mining, pp. 90–99. ACM (2005)
Google Scholar
Naulaerts, S., et al.: A primer to frequent itemset mining for bioinformatics. Brief. Bioinform. 16(2), 216–231 (2015)
Article Google Scholar
Ngai, E.W., Xiu, L., Chau, D.C.: Application of data mining techniques in customer relationship management: a literature review and classification. Expert. Syst. Appl. 36(2), 2592–2602 (2009)
Article Google Scholar
RetailDataset: Frequent itemset mining dataset repository (2016). http://fimi.ua.ac.be/data/. Accessed 14 June 2016
Seabold, S., Perktold, J.: StatsModels: econometric and statistical modeling with python. In: 9th Python in Science Conference (2010)
Google Scholar
Toivonen, H., et al.: Sampling large databases for association rules. VLDB 96, 134–145 (1996)
Google Scholar
Tseng, V.S., Wu, C.-W., Fournier-Viger, P., Yu, P.S.: Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans. Knowl. Data Eng. 27(3), 726–739 (2015)
Article Google Scholar
Tseng, V.S., Wu, C.-W., Shie, B.-E., Yu, P.S.: Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 253–262. ACM (2010)
Google Scholar
Yan, H., Chen, K., Liu, L., Yi, Z.: Scale: a scalable framework for efficiently clustering transactional data. Data Min. Knowl. Discov. 20(1), 1–27 (2010)
Article MathSciNet Google Scholar
Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
Article Google Scholar

Download references

Acknowledgements

The research reported in this paper is funded in part by Philip and Virginia Sproul Professorship Endowment and Jerry R. Junkins Endowments at Iowa State University. The research computation is supported by the HPC@ISU equipment at Iowa State University, some of which has been purchased through funding provided by NSF under MRI grant number CNS 1229081 and CRI grant number 1205413. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies. We also thank Dr. Mayank Mishra, former Post Doctoral Fellow at Iowa State University for his contributions, suggestions and feedback during the intial phase of this work. He was also co-author on the conference paper [14] of which this work is an extension.

Author information

Authors and Affiliations

Iowa State University, Ames, IA, 50011, USA
Piyush Lakhawat & Arun Somani

Authors

Piyush Lakhawat
View author publications
You can also search for this author in PubMed Google Scholar
Arun Somani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Piyush Lakhawat .

Editor information

Editors and Affiliations

Instituto de Telecomunicações, Lisbon, Portugal
Ana Fred
University of Madeira, Funchal, Portugal
David Aveiro
Delft University of Technology, Delft, The Netherlands
Jan L. G. Dietz
Henley Business School, University of Reading, Reading, UK
Kecheng Liu
University of Coimbra, Coimbra, Portugal
Jorge Bernardino
Federal University of Pernambuco, Recife, Brazil
Ana Salgado
INSTICC and Instituto Politecnico de Setúbal, Setúbal, Portugal
Joaquim Filipe

Appendix

C is the set of all given clusters. A cluster $C_k \in C$ is essentially a subset of transactions from D.

$$\begin{aligned} C_k = \{T_1, T_2 \ldots T_k |T_i \in D \} \end{aligned}$$

(23)

$$\begin{aligned} I_{C_k}=\{a_i|a_i\in T_j\wedge T_j\in C_k \}=\text {item types in } C_k \end{aligned}$$

(24)

Cluster utility (CU), relative utility (ru) of a category type in a cluster and the affinity between clusters have the following definitions:

$$\begin{aligned} CU(C_k)=\sum _{T_j\in C_k }TU(T_j)=\text {Cluster utility of } C_k \end{aligned}$$

(25)

CU is an overall measure of importance of a cluster, since it is the sum of utilities of all transactions in it.

$$\begin{aligned} \forall a_i\in I_{C_k}, ru(a_i, C_k)=\frac{\sum _{a_i\in I_{C_k}\wedge T_j\in C_k}au(a_i, T_j)}{CU(C_k)} \end{aligned}$$

(26)

ru is the relative importance (since utility is a unit of importance) given to $a_i$ among all $I_{C_k}$ in $C_k$.

For clusters $C_i$ and $C_j$:

$$\begin{aligned} affinity(C_i, C_j)=\sum _{a\in I_{C_k}\wedge a\in I_{C_j}}min(ru(a,C_i),ru(a,C_j)) \end{aligned}$$

(27)

It is the sum of shared utility of common category types among two clusters. $min_{aff}$ and $min_{uty}$ are tunable parameters of the algorithm. $min_{aff}$ decides the termination criterion of the clustering and $min_{uty}$ decides the final selection criterion for the clusters.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lakhawat, P., Somani, A. (2019). Adaptive Cluster Based Discovery of High Utility Itemsets. In: Fred, A., et al. Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2017. Communications in Computer and Information Science, vol 976. Springer, Cham. https://doi.org/10.1007/978-3-030-15640-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-15640-4_8
Published: 15 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15639-8
Online ISBN: 978-3-030-15640-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Adaptive Cluster Based Discovery of High Utility Itemsets

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation