Skip to main content

Clustering Multiple Databases Induced by Local Patterns

  • Chapter
  • First Online:
Advances in Knowledge Discovery in Databases

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 79))

Abstract

In view of answering queries provided in multiple large databases, it might be required to mine relevant databases en block. In this chapter, we present an effective solution to clustering multiple large databases. Two measures of similarity between a pair of databases are presented and study their main properties. In the sequel, we design an algorithm for clustering multiple databases based on an introduced similarity measure. Also, we present a coding, referred to as IS coding , to represent itemsets space efficiently. The coding of this nature enables more frequent itemsets to participate in the determination of the similarity between two databases. Thus the invoked clustering process becomes more accurate. We also show that the IS coding attains maximum efficiency in most of the cases of the mining processes. The clustering algorithm becomes improved (in terms of its time complexity) when contrasted with the existing clustering algorithms. The efficiency of the clustering process has been improved using several strategies that is by reducing execution time of the clustering algorithm, using more suitable similarity measure, and storing frequent itemsets space efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Adhikari A, Rao PR (2008) Efficient clustering of databases induced by local patterns. Decis Support Syst 44(4):925–943

    Article  Google Scholar 

  • Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of ACM SIGMOD conference, pp 207–216

    Google Scholar 

  • Ali K, Manganaris S, Srikant R (1997) Partial classification using association rules. In: Proceedings of the 3rd international conference on knowledge discovery and data mining, pp 115–118

    Google Scholar 

  • Babcock B, Chaudhury S, Das G (2003) Dynamic sample selection for approximate query processing. In: Proceedings of ACM SIGMOD conference management of data, pp 539–550

    Google Scholar 

  • Bandyopadhyay S, Giannella C, Maulik U, Kargupta H, Liu K, Datta S (2006) Clustering distributed data streams in peer-to-peer environments. Inf Sci 176(14):1952–1985

    Article  Google Scholar 

  • Barte RG (1976) The elements of real analysis, 2nd edn. Wiley, New York

    Google Scholar 

  • FIMI (2004) http://fimi.cs.helsinki.fi/src/

  • Frequent Itemset Mining Dataset Repository (2004) http://fimi.cs.helsinki.fi/data

  • Huffman DA (1952) A method for the construction of minimum redundancy codes. Proc IRE 40(9):1098–1101

    Article  Google Scholar 

  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  • KDD CUP (2000) http://www.ecn.purdue.edu/KDDCUP

  • Lee C-H, Lin C-R, Chen M-S (2001) Sliding-window filtering: an efficient algorithm for incremental mining. In: Proceedings of the 10th international conference on information and knowledge management, pp 263–270

    Google Scholar 

  • Li H, Hu X, Zhang Y (2009) An improved database classification algorithm for multi-database mining. In: Proceedings of the 3rd international workshop on frontiers in algorithmics, pp 346–357

    Google Scholar 

  • Ling CX, Yang Q (2006) Discovering classification from data of multiple sources. Data Min Knowl Disc 12(2–3):181–201

    Article  MathSciNet  Google Scholar 

  • Liu CL (1985) Elements of discrete mathematics, 2nd edn. McGraw-Hill, New York

    MATH  Google Scholar 

  • Liu H, Lu H, Yao J (2001) Toward multi-database mining: identifying relevant databases. IEEE Trans Knowl Data Eng 13(4):541–553

    Article  Google Scholar 

  • Sayood K (2000) Introduction to data compression. Morgan Kaufmann, Los Altos

    Google Scholar 

  • Su K, Huang H, Wu X, Zhang S (2006) A logical framework for identifying quality knowledge from different data sources. Decis Support Syst 42(3):1673–1683

    Article  Google Scholar 

  • Tan P-N, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of SIGKDD conference, pp 32–41

    Google Scholar 

  • Wu X, Wu Y, Wang Y, Li Y (2005a) Privacy-aware market basket data set generation: a feasible approach for inverse frequent set mining. In: Proceedings of SIAM international conference on data mining, pp 103–114

    Google Scholar 

  • Wu X, Zhang C, Zhang S (2005b) Database classification for multi-database mining. Inf Syst 30(1):71–88

    Article  MATH  Google Scholar 

  • Yang W, Huang S (2008) Data privacy protection in multi-party clustering. Data Knowl Eng 67(1):185–199

    Article  Google Scholar 

  • Yin X, Han J (2005) Efficient classification from multiple heterogeneous databases. In: Proceedings of 9th European conference on principles and practice of knowledge discovery in databases, pp 404–416

    Google Scholar 

  • Yin X, Yang J, Yu PS, Han J (2006) Efficient classification across multiple database relations: a crossmine approach. IEEE Trans Knowl Data Eng 18(6):770–783

    Article  Google Scholar 

  • Zhang S (2002) Knowledge discovery in multi-databases by analyzing local instances, Ph.D. thesis, Deakin University

    Google Scholar 

  • Zhang S, Wu X, Zhang C (2003) Multi-database mining. IEEE Comput Intell Bull 2(1):5–13

    Google Scholar 

  • Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Disc 1(2):141–182

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Animesh Adhikari .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Adhikari, A., Adhikari, J. (2015). Clustering Multiple Databases Induced by Local Patterns. In: Advances in Knowledge Discovery in Databases. Intelligent Systems Reference Library, vol 79. Springer, Cham. https://doi.org/10.1007/978-3-319-13212-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13212-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13211-2

  • Online ISBN: 978-3-319-13212-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics