Abstract
In view of answering queries provided in multiple large databases, it might be required to mine relevant databases en block. In this chapter, we present an effective solution to clustering multiple large databases. Two measures of similarity between a pair of databases are presented and study their main properties. In the sequel, we design an algorithm for clustering multiple databases based on an introduced similarity measure. Also, we present a coding, referred to as IS coding , to represent itemsets space efficiently. The coding of this nature enables more frequent itemsets to participate in the determination of the similarity between two databases. Thus the invoked clustering process becomes more accurate. We also show that the IS coding attains maximum efficiency in most of the cases of the mining processes. The clustering algorithm becomes improved (in terms of its time complexity) when contrasted with the existing clustering algorithms. The efficiency of the clustering process has been improved using several strategies that is by reducing execution time of the clustering algorithm, using more suitable similarity measure, and storing frequent itemsets space efficiently.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adhikari A, Rao PR (2008) Efficient clustering of databases induced by local patterns. Decis Support Syst 44(4):925–943
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of ACM SIGMOD conference, pp 207–216
Ali K, Manganaris S, Srikant R (1997) Partial classification using association rules. In: Proceedings of the 3rd international conference on knowledge discovery and data mining, pp 115–118
Babcock B, Chaudhury S, Das G (2003) Dynamic sample selection for approximate query processing. In: Proceedings of ACM SIGMOD conference management of data, pp 539–550
Bandyopadhyay S, Giannella C, Maulik U, Kargupta H, Liu K, Datta S (2006) Clustering distributed data streams in peer-to-peer environments. Inf Sci 176(14):1952–1985
Barte RG (1976) The elements of real analysis, 2nd edn. Wiley, New York
FIMI (2004) http://fimi.cs.helsinki.fi/src/
Frequent Itemset Mining Dataset Repository (2004) http://fimi.cs.helsinki.fi/data
Huffman DA (1952) A method for the construction of minimum redundancy codes. Proc IRE 40(9):1098–1101
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
KDD CUP (2000) http://www.ecn.purdue.edu/KDDCUP
Lee C-H, Lin C-R, Chen M-S (2001) Sliding-window filtering: an efficient algorithm for incremental mining. In: Proceedings of the 10th international conference on information and knowledge management, pp 263–270
Li H, Hu X, Zhang Y (2009) An improved database classification algorithm for multi-database mining. In: Proceedings of the 3rd international workshop on frontiers in algorithmics, pp 346–357
Ling CX, Yang Q (2006) Discovering classification from data of multiple sources. Data Min Knowl Disc 12(2–3):181–201
Liu CL (1985) Elements of discrete mathematics, 2nd edn. McGraw-Hill, New York
Liu H, Lu H, Yao J (2001) Toward multi-database mining: identifying relevant databases. IEEE Trans Knowl Data Eng 13(4):541–553
Sayood K (2000) Introduction to data compression. Morgan Kaufmann, Los Altos
Su K, Huang H, Wu X, Zhang S (2006) A logical framework for identifying quality knowledge from different data sources. Decis Support Syst 42(3):1673–1683
Tan P-N, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of SIGKDD conference, pp 32–41
Wu X, Wu Y, Wang Y, Li Y (2005a) Privacy-aware market basket data set generation: a feasible approach for inverse frequent set mining. In: Proceedings of SIAM international conference on data mining, pp 103–114
Wu X, Zhang C, Zhang S (2005b) Database classification for multi-database mining. Inf Syst 30(1):71–88
Yang W, Huang S (2008) Data privacy protection in multi-party clustering. Data Knowl Eng 67(1):185–199
Yin X, Han J (2005) Efficient classification from multiple heterogeneous databases. In: Proceedings of 9th European conference on principles and practice of knowledge discovery in databases, pp 404–416
Yin X, Yang J, Yu PS, Han J (2006) Efficient classification across multiple database relations: a crossmine approach. IEEE Trans Knowl Data Eng 18(6):770–783
Zhang S (2002) Knowledge discovery in multi-databases by analyzing local instances, Ph.D. thesis, Deakin University
Zhang S, Wu X, Zhang C (2003) Multi-database mining. IEEE Comput Intell Bull 2(1):5–13
Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Disc 1(2):141–182
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Adhikari, A., Adhikari, J. (2015). Clustering Multiple Databases Induced by Local Patterns. In: Advances in Knowledge Discovery in Databases. Intelligent Systems Reference Library, vol 79. Springer, Cham. https://doi.org/10.1007/978-3-319-13212-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-13212-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13211-2
Online ISBN: 978-3-319-13212-9
eBook Packages: EngineeringEngineering (R0)