Abstract
High-utility patterns generated from mining the unknown and different databases can be clustered to identify the most valid patterns. Sources include the internet, journals, and enterprise data. Here, a grid-based clustering method (CLIQUE) is used to aggregate patterns mined from multiple databases. The proposed model forms the clusters based on all the utilities of patterns to determine the interestingness and the correct interval of its utility measure. The set of all patterns is collected by first mining the databases individually, at the local level. The problem arises when the same pattern is identified by all of the databases but with different utility factors. In this case, it becomes difficult to decide whether the pattern should be considered as a valid or not, due to the presence of multiple utility values. Hence, an aggregation model is applied to test whether a pattern satisfies the utility threshold set by a domain expert. We found that the proposed aggregation model effectively clusters all of the interesting patterns by discarding those patterns that do not satisfy the threshold condition. The proposed model accurately optimizes the utility interval of the valid patterns.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhang, S., Zhang, C., Wu, X.: Knowledge Discovery in Multiple Databases. Springer, London (2004). https://doi.org/10.1007/978-0-85729-388-6
Lesser, V., Horling, B., Klassner, F., Raja, A., Wagner, T., Zhang, S.X.: BIG: an agent for resource-bounded information gathering and decision making. Artif. Intell. 118(1–2), 197–244 (2000)
Zhong, N., Yao, Y.Y., Ohishima, M.: Peculiarity oriented multi database mining. IEEE Trans. Knowl. Data Eng. 15(4), 952–960 (2003)
Wu, X., Zhang, S.: Synthesizing high-frequency rules from different data sources. IEEE Trans. Knowl. Data Eng. 15(2), 353–367 (2003)
Zhang, S., Zaki, M.J.: Mining multiple data sources: local pattern analysis. Data Min. Knowl. Discov. 12(2–3), 121–125 (2006)
Adhikari, A., Ramachandra Rao, P., Pedrycz, W.: Developing Multi-Database Mining Applications. Springer, London (2010). https://doi.org/10.1007/978-1-84996-044-1
Muley, A., Gudadhe, M.: Synthesizing high-utility patterns from different data sources. Data. 3(3), 32 (2018)
Arabie, P., Hubert, L.J.: An overview of combinatorial data. In: Clustering and Classification, p. 5 (1996)
Piatetsky-Shapiro, G., Fayyad, U.M., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, vol. 21. AAAI Press, Menlo Park (1996)
Michalski, R.S., Stepp, R.E.: Learning from observation: conceptual clustering. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning. SYMBOLIC, vol. 1, pp. 331–363. Springer, Berlin (1983). https://doi.org/10.1007/978-3-662-12405-5_11
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. ACM SIGMOD Rec. 27(2), 94–105 (1998)
Fournier-Viger, P., et al.: The SPMF open-source data mining library version 2. In: Berendt, B., et al. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9853, pp. 36–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46131-1_8
Zida, S., Fournier-Viger, P., Lin, J.C.W., Wu, C.W., Tseng, V.S.: EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl. Inf. Syst. 51(2), 595–625 (2017)
Good, I.: Probability and the Weighting of Evidence. Charles Griffin, London (1950)
Chen, Y., An, A.: Approximate parallel high-utility itemset mining. Big Data Res. 6, 26–42 (2016)
Xun, Y., Zhang, J., Qin, X.: FiDoop: parallel mining of frequent itemsets using mapreduce. IEEE Trans. Syst. Man Cybern.: Syst. 46(3), 313–325 (2016)
Zhang, F., Liu, M., Gui, F., Shen, W., Shami, A., Ma, Y.: A distributed frequent itemset mining algorithm using spark for big data analytics. Clust. Comput. 18(4), 1493–1501 (2015)
Marjani, M., et al.: Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5, 5247–5261 (2017)
Wang, R., et al.: Review on mining data from multiple data sources. Pattern Recognit. Lett. 109, 120–128 (2018)
Adhikari, A., Adhikari, J.: Mining patterns of select items in different data sources. Advances in Knowledge Discovery in Databases. ISRL, vol. 79, pp. 233–254. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13212-9_12
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)
Yao, H., Hamilton, H.J., Geng, L.: A unified framework for utility-based measures for mining itemsets. In: Proceedings of ACM SIGKDD 2nd Workshop on Utility-Based Data Mining, pp. 28–37, August 2006
Ahmed, C.F., Tanbeer, S.K., Jeong, B.S., Lee, Y.K.: Efficient tree structures for high-utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)
Fournier-Viger, P., Wu, C.-W., Tseng, Vincent S.: Novel concise representations of high utility itemsets using generator patterns. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS (LNAI), vol. 8933, pp. 30–43. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14717-8_3
Lin, Y., Chen, H., Lin, G., Chen, J., Ma, Z., Li, J.: Synthesizing decision rules from multiple information sources: a neighborhood granulation viewpoint. Int. J. Mach. Learn. Cybern. 9, 1919–1928 (2018)
Zhang, S., Wu, X., Zhang, C.: Multi-database mining. IEEE Comput. Intell. Bull. 2(1), 5–13 (2003)
Xu, W., Yu, J.: A novel approach to information fusion in multi-source datasets: a granular computing viewpoint. Inf. Sci. 378, 410–423 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer-Verlag GmbH Germany, part of Springer Nature
About this chapter
Cite this chapter
Muley, A., Gudadhe, M. (2019). Clustering-Based Aggregation of High-Utility Patterns from Unknown Multi-database. In: Gavrilova, M., Tan, C. (eds) Transactions on Computational Science XXXIV. Lecture Notes in Computer Science(), vol 11820. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-59958-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-662-59958-7_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-59957-0
Online ISBN: 978-3-662-59958-7
eBook Packages: Computer ScienceComputer Science (R0)