Skip to main content

Clustering-Based Aggregation of High-Utility Patterns from Unknown Multi-database

  • Chapter
  • First Online:
Transactions on Computational Science XXXIV

Part of the book series: Lecture Notes in Computer Science ((TCOMPUTATSCIE,volume 11820))

  • 260 Accesses

Abstract

High-utility patterns generated from mining the unknown and different databases can be clustered to identify the most valid patterns. Sources include the internet, journals, and enterprise data. Here, a grid-based clustering method (CLIQUE) is used to aggregate patterns mined from multiple databases. The proposed model forms the clusters based on all the utilities of patterns to determine the interestingness and the correct interval of its utility measure. The set of all patterns is collected by first mining the databases individually, at the local level. The problem arises when the same pattern is identified by all of the databases but with different utility factors. In this case, it becomes difficult to decide whether the pattern should be considered as a valid or not, due to the presence of multiple utility values. Hence, an aggregation model is applied to test whether a pattern satisfies the utility threshold set by a domain expert. We found that the proposed aggregation model effectively clusters all of the interesting patterns by discarding those patterns that do not satisfy the threshold condition. The proposed model accurately optimizes the utility interval of the valid patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhang, S., Zhang, C., Wu, X.: Knowledge Discovery in Multiple Databases. Springer, London (2004). https://doi.org/10.1007/978-0-85729-388-6

    Book  MATH  Google Scholar 

  2. Lesser, V., Horling, B., Klassner, F., Raja, A., Wagner, T., Zhang, S.X.: BIG: an agent for resource-bounded information gathering and decision making. Artif. Intell. 118(1–2), 197–244 (2000)

    Article  MATH  Google Scholar 

  3. Zhong, N., Yao, Y.Y., Ohishima, M.: Peculiarity oriented multi database mining. IEEE Trans. Knowl. Data Eng. 15(4), 952–960 (2003)

    Article  Google Scholar 

  4. Wu, X., Zhang, S.: Synthesizing high-frequency rules from different data sources. IEEE Trans. Knowl. Data Eng. 15(2), 353–367 (2003)

    Article  Google Scholar 

  5. Zhang, S., Zaki, M.J.: Mining multiple data sources: local pattern analysis. Data Min. Knowl. Discov. 12(2–3), 121–125 (2006)

    Article  MathSciNet  Google Scholar 

  6. Adhikari, A., Ramachandra Rao, P., Pedrycz, W.: Developing Multi-Database Mining Applications. Springer, London (2010). https://doi.org/10.1007/978-1-84996-044-1

    Book  MATH  Google Scholar 

  7. Muley, A., Gudadhe, M.: Synthesizing high-utility patterns from different data sources. Data. 3(3), 32 (2018)

    Article  Google Scholar 

  8. Arabie, P., Hubert, L.J.: An overview of combinatorial data. In: Clustering and Classification, p. 5 (1996)

    Chapter  Google Scholar 

  9. Piatetsky-Shapiro, G., Fayyad, U.M., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, vol. 21. AAAI Press, Menlo Park (1996)

    Google Scholar 

  10. Michalski, R.S., Stepp, R.E.: Learning from observation: conceptual clustering. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning. SYMBOLIC, vol. 1, pp. 331–363. Springer, Berlin (1983). https://doi.org/10.1007/978-3-662-12405-5_11

    Chapter  Google Scholar 

  11. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)

    MATH  Google Scholar 

  12. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. ACM SIGMOD Rec. 27(2), 94–105 (1998)

    Article  Google Scholar 

  13. Fournier-Viger, P., et al.: The SPMF open-source data mining library version 2. In: Berendt, B., et al. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9853, pp. 36–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46131-1_8

    Chapter  Google Scholar 

  14. Zida, S., Fournier-Viger, P., Lin, J.C.W., Wu, C.W., Tseng, V.S.: EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl. Inf. Syst. 51(2), 595–625 (2017)

    Article  Google Scholar 

  15. Good, I.: Probability and the Weighting of Evidence. Charles Griffin, London (1950)

    MATH  Google Scholar 

  16. Chen, Y., An, A.: Approximate parallel high-utility itemset mining. Big Data Res. 6, 26–42 (2016)

    Article  Google Scholar 

  17. Xun, Y., Zhang, J., Qin, X.: FiDoop: parallel mining of frequent itemsets using mapreduce. IEEE Trans. Syst. Man Cybern.: Syst. 46(3), 313–325 (2016)

    Article  Google Scholar 

  18. Zhang, F., Liu, M., Gui, F., Shen, W., Shami, A., Ma, Y.: A distributed frequent itemset mining algorithm using spark for big data analytics. Clust. Comput. 18(4), 1493–1501 (2015)

    Article  Google Scholar 

  19. Marjani, M., et al.: Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5, 5247–5261 (2017)

    Article  Google Scholar 

  20. Wang, R., et al.: Review on mining data from multiple data sources. Pattern Recognit. Lett. 109, 120–128 (2018)

    Article  Google Scholar 

  21. Adhikari, A., Adhikari, J.: Mining patterns of select items in different data sources. Advances in Knowledge Discovery in Databases. ISRL, vol. 79, pp. 233–254. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13212-9_12

    Chapter  Google Scholar 

  22. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)

    Article  Google Scholar 

  23. Yao, H., Hamilton, H.J., Geng, L.: A unified framework for utility-based measures for mining itemsets. In: Proceedings of ACM SIGKDD 2nd Workshop on Utility-Based Data Mining, pp. 28–37, August 2006

    Google Scholar 

  24. Ahmed, C.F., Tanbeer, S.K., Jeong, B.S., Lee, Y.K.: Efficient tree structures for high-utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)

    Article  Google Scholar 

  25. Fournier-Viger, P., Wu, C.-W., Tseng, Vincent S.: Novel concise representations of high utility itemsets using generator patterns. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS (LNAI), vol. 8933, pp. 30–43. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14717-8_3

    Chapter  Google Scholar 

  26. Lin, Y., Chen, H., Lin, G., Chen, J., Ma, Z., Li, J.: Synthesizing decision rules from multiple information sources: a neighborhood granulation viewpoint. Int. J. Mach. Learn. Cybern. 9, 1919–1928 (2018)

    Article  Google Scholar 

  27. Zhang, S., Wu, X., Zhang, C.: Multi-database mining. IEEE Comput. Intell. Bull. 2(1), 5–13 (2003)

    Google Scholar 

  28. Xu, W., Yu, J.: A novel approach to information fusion in multi-source datasets: a granular computing viewpoint. Inf. Sci. 378, 410–423 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhinav Muley .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Muley, A., Gudadhe, M. (2019). Clustering-Based Aggregation of High-Utility Patterns from Unknown Multi-database. In: Gavrilova, M., Tan, C. (eds) Transactions on Computational Science XXXIV. Lecture Notes in Computer Science(), vol 11820. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-59958-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-59958-7_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-59957-0

  • Online ISBN: 978-3-662-59958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics