Advertisement

Heuristic clustering of database objects according to multi-valued attributes

  • Jukka Teuhola
Object Oriented Databases II
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1308)

Abstract

This paper studies clustering of objects on the basis of set-valued attributes, so that objects in the same cluster share as many attribute values as possible. Our primary application is clustering of objects participating in a many-to-many relationship. Since the precise optimum is computationally too hard to find, a relatively simple and fast heuristic is developed. The sets of attribute values are represented by non-unique, fixed-size signatures, which constitute the basis for clustering. Objects to be clustered are stored on the leaf pages of a binary tree, where each internal node contains a pair of signatures directing the search for a suitable leaf. The core of the method is a page splitting algorithm, which tries to combine two endeavours: enhance clustering and keep the tree balanced. In a random case, there is little chance for beneficial clustering. However, the dependencies and correlations between real-life objects enable us to achieve notable increase of performance.

Keywords

Clustering Multi-valued attributes Many-to-many relationships Signature files Page splitting Object databases 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Banerjee, J., Kim, W., Kim, S-J., and Garza, J.F.: “Clustering a DAG for CAD Databases”, IEEE Trans. Softw. Eng. 14(11), 1988, 1684–1699.Google Scholar
  2. 2.
    Bentley, J.L.: “Multidimensional Search Trees Used for Associative Searching”, Comm. of the ACM 18(9), 1975, 509–517.Google Scholar
  3. 3.
    Chang, E.E., and Katz, R.H.: “Exploiting Inheritance and Structure Semantics for Effective Clustering and Buffering in an Object-Oriented DBMS”, Proc. ACM SIGMOD, 1989, 348–357.Google Scholar
  4. 4.
    Cheng, J.R., and Hurson, A.R.: “Effective Clustering of Complex Objects in Object-Oriented Databases”, Proc. ACM SIGMOD, 1991, 22–31.Google Scholar
  5. 5.
    Christodoulakis, S., and Faloutsos, C.: “Signature Files: An Access Method for Documents and Its Analytical Performance Evaluation”, ACM Trans. Office Inf. Syst. 2(4), 1984, 267–288.Google Scholar
  6. 6.
    Christodoulakis, S., et al.: “Multimedia Document Presentation, Information Extraction and Document Formation in MINOS: A Model and a System”, ACM Trans. Office Inf. Syst. 4(4), 1986, 345–383.Google Scholar
  7. 7.
    Deppisch, U.: “S-Tree: A Dynamic Balanced Signature Index for Office Retrieval”, Proc. of ACM Conf. on Res. and Dev. in Inf. Retrieval, 1986, 77–87.Google Scholar
  8. 8.
    Gusfield, D.: “Efficient Algorithms for Inferring Evolutionary Trees”, Networks 21(1), 1991, 19–28.Google Scholar
  9. 9.
    Jagadish, H.V.: “Linear Clustering of Objects with Multiple Attributes”, Proc. ACM SIGMOD, 1990, 332–342.Google Scholar
  10. 10.
    Kernighan, B.W., and Lin, S.: “An Efficient Heuristic Procedure for Partitioning Graphs”, Bell System Techn. J. 49(2), 1970, 291–307.Google Scholar
  11. 11.
    Lee, D.L., Kim., Y.M., and Patel, G.: “Efficient Signature File Methods for Text Retrieval”, IEEE Trans. Knowl. and Data Eng. 7(3), 1995, 423–435.Google Scholar
  12. 12.
    Nievergeld, J., Hinterberger, H., and Sevcik, K.C.: “The Grid File: An Adaptable, Symmetric Multikey File Structure”, ACM Trans. Database Syst. 9(1), 1984, 38–71.Google Scholar
  13. 13.
    Sacks-Davis, R., Kent, A., and Ramamohanarao, K.: “Multikey Access Methods Based on Superimposed Coding Techniques”, ACM Trans. Database Syst. 12(4), 1987, 655–696.Google Scholar
  14. 14.
    Schkolnick, M.: “A Clustering Algorithm for Hierarchical Structures”, ACM Trans. Database Syst. 2(1), 1977, 27–44.Google Scholar
  15. 15.
    Teuhola, J.: “Clustering of Shared Subobjects in Databases”, Proc. Int. Conf. on Inf. Syst. and Manag. of Data (CISMOD), New Delhi, 1993, 175–188.Google Scholar
  16. 16.
    Tsangaris, M.M., and Naughton, J.F.: “A Stochastic Approach for Clustering in Object Bases”, Proc. ACM SIGMOD, 1991, 12–21.Google Scholar
  17. 17.
    Tsangaris, M.M., and Naughton, J.F.: “On the Performance of Object Clustering Techniques”, Proc. ACM SIGMOD, 1992, 144–153.Google Scholar
  18. 18.
    Willard, D.E.: “Multidimensional Search Trees that Provide New Types of Memory Reductions”, Journal of ACM 34(4), 1987, 846–858.Google Scholar
  19. 19.
    Yu, C.T., Suen, C-M., Lam, K., and Siu, M.K.: “Adaptive Record Clustering”, ACM Trans. Database Syst. 10(2), 1985, 180–204.Google Scholar
  20. 20.
    Zezula, P., Rabitti, F., and Tiberio, P.: “Dynamic Partitioning of Signature Files”, ACM Trans. Inf. Syst. 9(4), 1991, 336–369.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Jukka Teuhola
    • 1
  1. 1.Turku Centre for Computer Science (TUCS)University of TurkuTurkuFinland

Personalised recommendations