Abstract
Given a database of transactions, where each transaction is a set of items, maximal frequent itemset mining aims to find all itemsets that are frequent, meaning that they consist of items that co-occur in transactions more often than a given threshold, and that are maximal, meaning that they are not contained in other frequent itemsets. Such itemsets are the most interesting ones in a meaningful sense. We study the problem of efficiently finding such itemsets with the added constraint that only the top-k most diverse ones should be returned. An itemset is diverse if its items belong to many different categories according to a given hierarchy of item categories. We propose a solution that relies on a purposefully designed index structure called the FP*-tree and an accompanying bound-based algorithm. An extensive experimental study offers insight into the performance of the solution, indicating that it is capable of outperforming an existing method by orders of magnitude and of scaling to large databases of transactions.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The support of an itemset can be also defined as the fraction of transactions that contain it. For simplicity, we use the count of transactions, which is equivalent when database \(\mathcal {D}\) is fixed.
References
Agarwal, R.C., Aggarwal, C.C., Prasad, V.V.V.: Depth first generation of long patterns. In: KDD, pp. 108–118 (2000)
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: SIGMOD, pp. 207–216 (1993)
Bayardo Jr., R.J.: Efficiently mining long patterns from databases. SIGMOD Rec. 27(2), 85–93 (1998)
Burdick, D., Calimlim, M., Flannick, J., Gehrke, J., Yiu, T.: MAFIA: a maximal frequent itemset algorithm. IEEE Trans. Knowl. Data Eng. 17(11), 1490–1504 (2005)
Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: a maximal frequent itemset algorithm for transactional databases. In: ICDE, pp. 443–452 (2001)
Gouda, K., Zaki, M.J.: GenMax: an efficient algorithm for mining maximal frequent itemsets. Data Min. Knowl. Discov. 11(3), 223–242 (2005)
Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: ICDM, pp. 163–170 (2001)
Grahne, G., Zhu, J.: High performance mining of maximal frequent itemsets. In: HPDM (2003)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD, pp. 1–12 (2000)
Hu, T., Sung, S.Y., Xiong, H., Fu, Q.: Discovery of maximum length frequent itemsets. Inf. Sci. 178(1), 69–87 (2008)
Lin, D.I., Kedem, Z.M.: Pincer-search: an efficient algorithm for discovering the maximum frequent set. IEEE Trans. Knowl. Data Eng. 14(3), 553–566 (2002)
Mallick, B., Garg, D., Grover, P.S.: Incremental mining of sequential patterns: progress and challenges. Intell. Data Anal. 17(3), 507–530 (2013)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49257-7_25
Pyun, G., Yun, U.: Mining top-k frequent patterns with combination reducing techniques. Appl. Intell. 41(1), 76–98 (2014)
Ryang, H., Yun, U., Ryu, K.H.: Fast algorithm for high utility pattern mining with the sum of item quantities. Intell. Data Anal. 20(2), 395–415 (2016)
Srikumar, K., Bhasker, B.: Efficiently mining maximal frequent sets in dense databases for discovering association rules. Intell. Data Anal. 8(2), 171–182 (2004)
Srivastava, S., Kiran, R.U., Reddy, P.K.: Discovering diverse-frequent patterns in transactional databases. In: COMAD, pp. 69–78 (2011)
Kumara Swamy, M., Reddy, P.K., Srivastava, S.: Extracting diverse patterns with unbalanced concept hierarchy. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8443, pp. 15–27. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06608-0_2
Tanbeer, S.K., Ahmed, C.F., Jeong, B.-S., Lee, Y.-K.: Discovering periodic-frequent patterns in transactional databases. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 242–253. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_24
Vo, B., Coenen, F., Le, B.: A new method for mining frequent weighted itemsets based on WIT-trees. Expert Syst. Appl. 40(4), 1256–1264 (2013)
Wang, H., Li, Q., Ma, C., Li, K.: A maximal frequent itemset algorithm. In: RSFDGrC, pp. 484–490 (2003)
Yan, Y., Li, Z., Wang, T., Chen, Y., Chen, H.: Mining maximal frequent itemsets using combined FP-tree. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 475–487. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30549-1_42
Yang, G.: The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: KDD, pp. 344–353 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, D., Luo, D., Jensen, C.S., Huang, J.Z. (2019). Efficiently Mining Maximal Diverse Frequent Itemsets. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11447. Springer, Cham. https://doi.org/10.1007/978-3-030-18579-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-18579-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18578-7
Online ISBN: 978-3-030-18579-4
eBook Packages: Computer ScienceComputer Science (R0)