Heuristic clustering of database objects according to multi-valued attributes

  • Jukka Teuhola
Object Oriented Databases II
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1308)


This paper studies clustering of objects on the basis of set-valued attributes, so that objects in the same cluster share as many attribute values as possible. Our primary application is clustering of objects participating in a many-to-many relationship. Since the precise optimum is computationally too hard to find, a relatively simple and fast heuristic is developed. The sets of attribute values are represented by non-unique, fixed-size signatures, which constitute the basis for clustering. Objects to be clustered are stored on the leaf pages of a binary tree, where each internal node contains a pair of signatures directing the search for a suitable leaf. The core of the method is a page splitting algorithm, which tries to combine two endeavours: enhance clustering and keep the tree balanced. In a random case, there is little chance for beneficial clustering. However, the dependencies and correlations between real-life objects enable us to achieve notable increase of performance.


Clustering Multi-valued attributes Many-to-many relationships Signature files Page splitting Object databases 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Banerjee, J., Kim, W., Kim, S-J., and Garza, J.F.: “Clustering a DAG for CAD Databases”, IEEE Trans. Softw. Eng. 14(11), 1988, 1684–1699.Google Scholar
  2. 2.
    Bentley, J.L.: “Multidimensional Search Trees Used for Associative Searching”, Comm. of the ACM 18(9), 1975, 509–517.Google Scholar
  3. 3.
    Chang, E.E., and Katz, R.H.: “Exploiting Inheritance and Structure Semantics for Effective Clustering and Buffering in an Object-Oriented DBMS”, Proc. ACM SIGMOD, 1989, 348–357.Google Scholar
  4. 4.
    Cheng, J.R., and Hurson, A.R.: “Effective Clustering of Complex Objects in Object-Oriented Databases”, Proc. ACM SIGMOD, 1991, 22–31.Google Scholar
  5. 5.
    Christodoulakis, S., and Faloutsos, C.: “Signature Files: An Access Method for Documents and Its Analytical Performance Evaluation”, ACM Trans. Office Inf. Syst. 2(4), 1984, 267–288.Google Scholar
  6. 6.
    Christodoulakis, S., et al.: “Multimedia Document Presentation, Information Extraction and Document Formation in MINOS: A Model and a System”, ACM Trans. Office Inf. Syst. 4(4), 1986, 345–383.Google Scholar
  7. 7.
    Deppisch, U.: “S-Tree: A Dynamic Balanced Signature Index for Office Retrieval”, Proc. of ACM Conf. on Res. and Dev. in Inf. Retrieval, 1986, 77–87.Google Scholar
  8. 8.
    Gusfield, D.: “Efficient Algorithms for Inferring Evolutionary Trees”, Networks 21(1), 1991, 19–28.Google Scholar
  9. 9.
    Jagadish, H.V.: “Linear Clustering of Objects with Multiple Attributes”, Proc. ACM SIGMOD, 1990, 332–342.Google Scholar
  10. 10.
    Kernighan, B.W., and Lin, S.: “An Efficient Heuristic Procedure for Partitioning Graphs”, Bell System Techn. J. 49(2), 1970, 291–307.Google Scholar
  11. 11.
    Lee, D.L., Kim., Y.M., and Patel, G.: “Efficient Signature File Methods for Text Retrieval”, IEEE Trans. Knowl. and Data Eng. 7(3), 1995, 423–435.Google Scholar
  12. 12.
    Nievergeld, J., Hinterberger, H., and Sevcik, K.C.: “The Grid File: An Adaptable, Symmetric Multikey File Structure”, ACM Trans. Database Syst. 9(1), 1984, 38–71.Google Scholar
  13. 13.
    Sacks-Davis, R., Kent, A., and Ramamohanarao, K.: “Multikey Access Methods Based on Superimposed Coding Techniques”, ACM Trans. Database Syst. 12(4), 1987, 655–696.Google Scholar
  14. 14.
    Schkolnick, M.: “A Clustering Algorithm for Hierarchical Structures”, ACM Trans. Database Syst. 2(1), 1977, 27–44.Google Scholar
  15. 15.
    Teuhola, J.: “Clustering of Shared Subobjects in Databases”, Proc. Int. Conf. on Inf. Syst. and Manag. of Data (CISMOD), New Delhi, 1993, 175–188.Google Scholar
  16. 16.
    Tsangaris, M.M., and Naughton, J.F.: “A Stochastic Approach for Clustering in Object Bases”, Proc. ACM SIGMOD, 1991, 12–21.Google Scholar
  17. 17.
    Tsangaris, M.M., and Naughton, J.F.: “On the Performance of Object Clustering Techniques”, Proc. ACM SIGMOD, 1992, 144–153.Google Scholar
  18. 18.
    Willard, D.E.: “Multidimensional Search Trees that Provide New Types of Memory Reductions”, Journal of ACM 34(4), 1987, 846–858.Google Scholar
  19. 19.
    Yu, C.T., Suen, C-M., Lam, K., and Siu, M.K.: “Adaptive Record Clustering”, ACM Trans. Database Syst. 10(2), 1985, 180–204.Google Scholar
  20. 20.
    Zezula, P., Rabitti, F., and Tiberio, P.: “Dynamic Partitioning of Signature Files”, ACM Trans. Inf. Syst. 9(4), 1991, 336–369.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Jukka Teuhola
    • 1
  1. 1.Turku Centre for Computer Science (TUCS)University of TurkuTurkuFinland

Personalised recommendations