Clustering-Based Aggregation of High-Utility Patterns from Unknown Multi-database

Muley, Abhinav; Gudadhe, Manish

doi:10.1007/978-3-662-59958-7_2

Part of the book series: Lecture Notes in Computer Science ((TCOMPUTATSCIE,volume 11820))

260 Accesses

Abstract

High-utility patterns generated from mining the unknown and different databases can be clustered to identify the most valid patterns. Sources include the internet, journals, and enterprise data. Here, a grid-based clustering method (CLIQUE) is used to aggregate patterns mined from multiple databases. The proposed model forms the clusters based on all the utilities of patterns to determine the interestingness and the correct interval of its utility measure. The set of all patterns is collected by first mining the databases individually, at the local level. The problem arises when the same pattern is identified by all of the databases but with different utility factors. In this case, it becomes difficult to decide whether the pattern should be considered as a valid or not, due to the presence of multiple utility values. Hence, an aggregation model is applied to test whether a pattern satisfies the utility threshold set by a domain expert. We found that the proposed aggregation model effectively clusters all of the interesting patterns by discarding those patterns that do not satisfy the threshold condition. The proposed model accurately optimizes the utility interval of the valid patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhang, S., Zhang, C., Wu, X.: Knowledge Discovery in Multiple Databases. Springer, London (2004). https://doi.org/10.1007/978-0-85729-388-6
Book MATH Google Scholar
Lesser, V., Horling, B., Klassner, F., Raja, A., Wagner, T., Zhang, S.X.: BIG: an agent for resource-bounded information gathering and decision making. Artif. Intell. 118(1–2), 197–244 (2000)
Article MATH Google Scholar
Zhong, N., Yao, Y.Y., Ohishima, M.: Peculiarity oriented multi database mining. IEEE Trans. Knowl. Data Eng. 15(4), 952–960 (2003)
Article Google Scholar
Wu, X., Zhang, S.: Synthesizing high-frequency rules from different data sources. IEEE Trans. Knowl. Data Eng. 15(2), 353–367 (2003)
Article Google Scholar
Zhang, S., Zaki, M.J.: Mining multiple data sources: local pattern analysis. Data Min. Knowl. Discov. 12(2–3), 121–125 (2006)
Article MathSciNet Google Scholar
Adhikari, A., Ramachandra Rao, P., Pedrycz, W.: Developing Multi-Database Mining Applications. Springer, London (2010). https://doi.org/10.1007/978-1-84996-044-1
Book MATH Google Scholar
Muley, A., Gudadhe, M.: Synthesizing high-utility patterns from different data sources. Data. 3(3), 32 (2018)
Article Google Scholar
Arabie, P., Hubert, L.J.: An overview of combinatorial data. In: Clustering and Classification, p. 5 (1996)
Chapter Google Scholar
Piatetsky-Shapiro, G., Fayyad, U.M., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, vol. 21. AAAI Press, Menlo Park (1996)
Google Scholar
Michalski, R.S., Stepp, R.E.: Learning from observation: conceptual clustering. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning. SYMBOLIC, vol. 1, pp. 331–363. Springer, Berlin (1983). https://doi.org/10.1007/978-3-662-12405-5_11
Chapter Google Scholar
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)
MATH Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. ACM SIGMOD Rec. 27(2), 94–105 (1998)
Article Google Scholar
Fournier-Viger, P., et al.: The SPMF open-source data mining library version 2. In: Berendt, B., et al. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9853, pp. 36–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46131-1_8
Chapter Google Scholar
Zida, S., Fournier-Viger, P., Lin, J.C.W., Wu, C.W., Tseng, V.S.: EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl. Inf. Syst. 51(2), 595–625 (2017)
Article Google Scholar
Good, I.: Probability and the Weighting of Evidence. Charles Griffin, London (1950)
MATH Google Scholar
Chen, Y., An, A.: Approximate parallel high-utility itemset mining. Big Data Res. 6, 26–42 (2016)
Article Google Scholar
Xun, Y., Zhang, J., Qin, X.: FiDoop: parallel mining of frequent itemsets using mapreduce. IEEE Trans. Syst. Man Cybern.: Syst. 46(3), 313–325 (2016)
Article Google Scholar
Zhang, F., Liu, M., Gui, F., Shen, W., Shami, A., Ma, Y.: A distributed frequent itemset mining algorithm using spark for big data analytics. Clust. Comput. 18(4), 1493–1501 (2015)
Article Google Scholar
Marjani, M., et al.: Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5, 5247–5261 (2017)
Article Google Scholar
Wang, R., et al.: Review on mining data from multiple data sources. Pattern Recognit. Lett. 109, 120–128 (2018)
Article Google Scholar
Adhikari, A., Adhikari, J.: Mining patterns of select items in different data sources. Advances in Knowledge Discovery in Databases. ISRL, vol. 79, pp. 233–254. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13212-9_12
Chapter Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)
Article Google Scholar
Yao, H., Hamilton, H.J., Geng, L.: A unified framework for utility-based measures for mining itemsets. In: Proceedings of ACM SIGKDD 2nd Workshop on Utility-Based Data Mining, pp. 28–37, August 2006
Google Scholar
Ahmed, C.F., Tanbeer, S.K., Jeong, B.S., Lee, Y.K.: Efficient tree structures for high-utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)
Article Google Scholar
Fournier-Viger, P., Wu, C.-W., Tseng, Vincent S.: Novel concise representations of high utility itemsets using generator patterns. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS (LNAI), vol. 8933, pp. 30–43. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14717-8_3
Chapter Google Scholar
Lin, Y., Chen, H., Lin, G., Chen, J., Ma, Z., Li, J.: Synthesizing decision rules from multiple information sources: a neighborhood granulation viewpoint. Int. J. Mach. Learn. Cybern. 9, 1919–1928 (2018)
Article Google Scholar
Zhang, S., Wu, X., Zhang, C.: Multi-database mining. IEEE Comput. Intell. Bull. 2(1), 5–13 (2003)
Google Scholar
Xu, W., Yu, J.: A novel approach to information fusion in multi-source datasets: a granular computing viewpoint. Inf. Sci. 378, 410–423 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

St. Vincent Pallotti College of Engineering and Technology, Nagpur, India
Abhinav Muley & Manish Gudadhe

Authors

Abhinav Muley
View author publications
You can also search for this author in PubMed Google Scholar
Manish Gudadhe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abhinav Muley .

Editor information

Editors and Affiliations

Department of Computer Science, University of Calgary, Calgary, AB, Canada
Marina L. Gavrilova
Sardina Systems OÜ, Tallinn, Estonia
C.J. Kenneth Tan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Muley, A., Gudadhe, M. (2019). Clustering-Based Aggregation of High-Utility Patterns from Unknown Multi-database. In: Gavrilova, M., Tan, C. (eds) Transactions on Computational Science XXXIV. Lecture Notes in Computer Science(), vol 11820. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-59958-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-662-59958-7_2
Published: 29 August 2019
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-59957-0
Online ISBN: 978-3-662-59958-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics