Clustering Multiple Databases Induced by Local Patterns

Adhikari, Animesh; Adhikari, Jhimli

doi:10.1007/978-3-319-13212-9_15

Animesh Adhikari⁵ &
Jhimli Adhikari⁶

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 79))

1251 Accesses
2 Citations

Abstract

In view of answering queries provided in multiple large databases, it might be required to mine relevant databases en block. In this chapter, we present an effective solution to clustering multiple large databases. Two measures of similarity between a pair of databases are presented and study their main properties. In the sequel, we design an algorithm for clustering multiple databases based on an introduced similarity measure. Also, we present a coding, referred to as IS coding , to represent itemsets space efficiently. The coding of this nature enables more frequent itemsets to participate in the determination of the similarity between two databases. Thus the invoked clustering process becomes more accurate. We also show that the IS coding attains maximum efficiency in most of the cases of the mining processes. The clustering algorithm becomes improved (in terms of its time complexity) when contrasted with the existing clustering algorithms. The efficiency of the clustering process has been improved using several strategies that is by reducing execution time of the clustering algorithm, using more suitable similarity measure, and storing frequent itemsets space efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adhikari A, Rao PR (2008) Efficient clustering of databases induced by local patterns. Decis Support Syst 44(4):925–943
Article Google Scholar
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of ACM SIGMOD conference, pp 207–216
Google Scholar
Ali K, Manganaris S, Srikant R (1997) Partial classification using association rules. In: Proceedings of the 3rd international conference on knowledge discovery and data mining, pp 115–118
Google Scholar
Babcock B, Chaudhury S, Das G (2003) Dynamic sample selection for approximate query processing. In: Proceedings of ACM SIGMOD conference management of data, pp 539–550
Google Scholar
Bandyopadhyay S, Giannella C, Maulik U, Kargupta H, Liu K, Datta S (2006) Clustering distributed data streams in peer-to-peer environments. Inf Sci 176(14):1952–1985
Article Google Scholar
Barte RG (1976) The elements of real analysis, 2nd edn. Wiley, New York
Google Scholar
FIMI (2004) http://fimi.cs.helsinki.fi/src/
Frequent Itemset Mining Dataset Repository (2004) http://fimi.cs.helsinki.fi/data
Huffman DA (1952) A method for the construction of minimum redundancy codes. Proc IRE 40(9):1098–1101
Article Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Article Google Scholar
KDD CUP (2000) http://www.ecn.purdue.edu/KDDCUP
Lee C-H, Lin C-R, Chen M-S (2001) Sliding-window filtering: an efficient algorithm for incremental mining. In: Proceedings of the 10th international conference on information and knowledge management, pp 263–270
Google Scholar
Li H, Hu X, Zhang Y (2009) An improved database classification algorithm for multi-database mining. In: Proceedings of the 3rd international workshop on frontiers in algorithmics, pp 346–357
Google Scholar
Ling CX, Yang Q (2006) Discovering classification from data of multiple sources. Data Min Knowl Disc 12(2–3):181–201
Article MathSciNet Google Scholar
Liu CL (1985) Elements of discrete mathematics, 2nd edn. McGraw-Hill, New York
MATH Google Scholar
Liu H, Lu H, Yao J (2001) Toward multi-database mining: identifying relevant databases. IEEE Trans Knowl Data Eng 13(4):541–553
Article Google Scholar
Sayood K (2000) Introduction to data compression. Morgan Kaufmann, Los Altos
Google Scholar
Su K, Huang H, Wu X, Zhang S (2006) A logical framework for identifying quality knowledge from different data sources. Decis Support Syst 42(3):1673–1683
Article Google Scholar
Tan P-N, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of SIGKDD conference, pp 32–41
Google Scholar
Wu X, Wu Y, Wang Y, Li Y (2005a) Privacy-aware market basket data set generation: a feasible approach for inverse frequent set mining. In: Proceedings of SIAM international conference on data mining, pp 103–114
Google Scholar
Wu X, Zhang C, Zhang S (2005b) Database classification for multi-database mining. Inf Syst 30(1):71–88
Article MATH Google Scholar
Yang W, Huang S (2008) Data privacy protection in multi-party clustering. Data Knowl Eng 67(1):185–199
Article Google Scholar
Yin X, Han J (2005) Efficient classification from multiple heterogeneous databases. In: Proceedings of 9th European conference on principles and practice of knowledge discovery in databases, pp 404–416
Google Scholar
Yin X, Yang J, Yu PS, Han J (2006) Efficient classification across multiple database relations: a crossmine approach. IEEE Trans Knowl Data Eng 18(6):770–783
Article Google Scholar
Zhang S (2002) Knowledge discovery in multi-databases by analyzing local instances, Ph.D. thesis, Deakin University
Google Scholar
Zhang S, Wu X, Zhang C (2003) Multi-database mining. IEEE Comput Intell Bull 2(1):5–13
Google Scholar
Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Disc 1(2):141–182
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Parvatibai Chowgule College, Margao, Goa, India
Animesh Adhikari
Department of Computer Science, Narayan Zantye College, Bicholim, Goa, India
Jhimli Adhikari

Authors

Animesh Adhikari
View author publications
You can also search for this author in PubMed Google Scholar
Jhimli Adhikari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Animesh Adhikari .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Adhikari, A., Adhikari, J. (2015). Clustering Multiple Databases Induced by Local Patterns. In: Advances in Knowledge Discovery in Databases. Intelligent Systems Reference Library, vol 79. Springer, Cham. https://doi.org/10.1007/978-3-319-13212-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-13212-9_15
Published: 28 December 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13211-2
Online ISBN: 978-3-319-13212-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics