Duplicate detection and deletion in the extended NF2 data model

  • K. Küspert
  • G. Saake
  • L. Wegner
Data Organizations For Extended DBMSs
Part of the Lecture Notes in Computer Science book series (LNCS, volume 367)

Abstract

A current research topic in the area of relational databases is the design of systems based on the Non First Normal Form (NF2) data model. One particular development, the so-called extended NF2 data model, even permits structured values like lists and tuples to be included as attributes in relations. It is thus well suited to represent complex objects for non-standard database applications. A DBMS which uses this model, called the Advanced Information Management Prototype, is currently being implemented at the IBM Heidelberg Scientific Center. In this paper we examine the problem of detecting and deleting duplicates within this data model. Several alternative approaches are evaluated and a new method, based on sorting complex objects, is proposed, which is both time- and space-efficient.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AB84.
    Abiteboul, S., Bidoit, N.: Non First Normal Form Relations: An Algebra Allowing Data Restructuring. Rapports de Recherche No. 347, Institut de Recherche en Informatique et en Automatique, Rocquencourt, France, Nov. 1984.Google Scholar
  2. AHU83.
    Aho, A.V., Hopcroft; J.E., Ullman, J.D.: Data Structures and Algorithms, Addison-Wesley, Reading, Mass., 1983.Google Scholar
  3. ALPS88.
    Andersen, F., Linnemann, V., Pistor, P., Südkamp, N.: Advanced Information Management Prototype: User Manual for the Online Interface of the Heidelberg Database Language (HDBL) Prototype Implementation, Release 2.0, Technical Note TN 86.01, IBM Heidelberg Scientific Center, Jan. 1988.Google Scholar
  4. BD83.
    Bitton, D., DeWitt, D.J.: Duplicate record elimination in large data files, ACM Trans. Database Syst., June 1983, pp. 255–265.Google Scholar
  5. BKM86.
    Bentley, J., Knuth, D.E., McIlroy, D.: Programming Pearls: A Literate Program, Comm. ACM, Vol. 29, No. 6, June 1986, pp. 471–483.Google Scholar
  6. Ch81.
    Chamberlin, D.D., et al.: Support of Repetitive Transactions and Ad Hoc Queries in System R., ACM Transactions on Database Systems, Vol. 6, No. 1, March 1981, pp. 70–94.Google Scholar
  7. Chang84.
    Chang, C.C.: A study of an ordered minimal perfect hashing scheme, Comm. ACM, Vol. 27, No. 4, April 1984, pp. 384–387Google Scholar
  8. Ci80.
    Cichelli, R.J.: Minimal Perfect Hash Functions Made Simple, Comm. ACM, Vol. 23, No. 1, Jan 1980, pp. 17–19.Google Scholar
  9. Co70.
    Codd, E.F.: A Relational Model of Data for Large Shared Data Banks, Comm. ACM, Vol. 13, No. 6, June 1970.Google Scholar
  10. Da81.
    Date, C.J.: An Introduction to Database Systems (3rd ed.), Addison-Wesley, Reading, Mass., 1981.Google Scholar
  11. Da86.
    Dadam, P., et al.: A DBMS Prototype to Support Extended NF2 Relations: An Integrated View on Flat Tables and Hierarchies, Proc. ACM SIGMOD Int. Conf. on Management of Data, Washington, D.C., May 1986, pp. 356–367.Google Scholar
  12. DGK.
    Dayal,U., Goodman,N., Katz,R.H.: An Extended Relational Algebra with Control Over Duplicate Elimination, Proc. ACM Symp. PoDS, Los Angeles, Cal., March 1982, pp. 117–123.Google Scholar
  13. FC84.
    Faloutsos, C., Christodoulakis, S.: Signature Files: An Access Method for Documents and its Analytical Performance Evaluation, ACM TOOIS, Vol. 2, No. 4, Oct. 1984, pp. 267–288.Google Scholar
  14. Fl64.
    Floyd, R.W.: Algorithm 245, Treesort 3, Comm. ACM, Vol. 7, No. 12, Dec. 1964, p. 701.Google Scholar
  15. Ja81.
    Jaeschke, G.: Reciprocal Hashing: A Method for Generating Minimal Perfect Hashing Functions, Comm. ACM, Vol. 24, No. 12, Dec. 1981, pp. 829–833.Google Scholar
  16. JS82.
    Jaeschke, G., Schek, H.-J.: Remarks on the Algebra of Non First Normal Form Relations, Proc. ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Los Angeles, Cal., March 1982, pp. 124–138.Google Scholar
  17. KF88.
    Khoshafian, S., Frank, D.: Implementation Techniques for Object Oriented Databases; in: Advances in Object-Oriented Database Systems, K.R. Dittrich (Ed.), Springer LNCS 334, Sept. 1988, pp. 60–79.Google Scholar
  18. Kn73.
    Knuth, D.E.: The Art of Computer Programming, Vol. 3: Sorting and Searching, Addison-Wesley, Reading, Mass., 1973.Google Scholar
  19. MS76.
    Munro, I., Spira, P.M.: Sorting and Searching in Multisets, SIAM J. Comput., Vol. 5, No. 1, March 1976, pp. 1–8.Google Scholar
  20. PA86.
    Pistor, P., Andersen, F.: Designing a Generalized NF2 Data Model with an SQL-Type Language Interface, Proc. 12th Int. Conf. on Very Large Data Bases, Kyoto, Japan, Aug. 1986, pp. 278–288Google Scholar
  21. Pi87.
    Pistor, P.: The Advanced Information Management Prototype: Architecture and Language Interface Overview, Proc. Troisièmes Journées Bases de Données Avancées, Port Camarque, France, May 1987 (invited paper).Google Scholar
  22. PT86.
    Pistor, P., Traunmüller, R.: A Database Language for Sets, Lists and Tables, Information Systems, Vol. 11, No. 4, 1986, pp. 323–336.Google Scholar
  23. Ro85.
    Roth, M.A.: SQL/NF: A Query Language for-NF Relational Databases, Technical Report TR-85-19, Univ. of Texas at Austin, Dept. of Computer Science, Sept. 1985.Google Scholar
  24. Se83.
    Sedgewick, R.: Algorithms, Addison-Wesley, Reading, Mass., 1983.Google Scholar
  25. SJ77.
    IBM Systems Journal, Special Issue on IMS, Vol. 16, No. 2, 1977.Google Scholar
  26. SLPW89.
    Saake, G., Linnemann, V., Pistor, P., Wegner, L.: Sorting, Grouping, and Duplicate Elimination in the Advanced Information Management System, IBM Heidelberg Scientific Center (in preparation).Google Scholar
  27. Sp77.
    Sprugnoli, R.: Perfect hashing functions: A single probe retrieving method for static sets, Comm, ACM, Vol. 20, No. 11, Nov. 1977, pp. 841–850.Google Scholar
  28. SS86.
    Schek, H.-J., Scholl, M.: The Relational Model with Relation-Valued Attributes, Information Systems, Vol. 11, No. 2, 1986, pp. 137–147.Google Scholar
  29. St76.
    Stonebraker, M., et al.: The Design and Implementation of Ingres, ACM Trans. on Database Systems, Vol. 1, No. 3, Sept. 1976, pp. 189–222.Google Scholar
  30. SW84.
    Six, H.W., Wegner, L.: Sorting a Random Access File in Situ, Computer Journal, Vol. 27, No. 3, pp. 270–275, 1984.Google Scholar
  31. TW88.
    Teuhola, J., Wegner, L.: The External Heapsort, IEEE Trans. Softw. Eng., 1988 (in print).Google Scholar
  32. TW89a.
    Teuhola, J., Wegner, L.: Linear Time, Minimal Space Duplicate Deletion, Math. Schriften Kassel, No. 2/89, January 1989.Google Scholar
  33. TW89b.
    Teuhola, J., Wegner, L.: A tale of sorts: duplicate deletion in Quicksort, Mergesort and Heapsort, 1989 (in prep.).Google Scholar
  34. We85.
    Wegner, L.: Quicksort for Equal Keys, IEEE Trans. on Computers, Vol. C-34, No. 4 (April 1985), pp. 362–367.Google Scholar
  35. We87.
    Wegner, L.: A Generalized, One-Way, Stackless Quicksort, BIT, Vol. 27, No. 1, pp. 44–48, 1987.Google Scholar

Copyright information

© Springer-Verlag 1989

Authors and Affiliations

  • K. Küspert
    • 1
  • G. Saake
    • 1
  • L. Wegner
    • 1
    • 2
  1. 1.IBM Heidelberg Scientific CenterHeidelbergWest Germany
  2. 2.FB MathematikGh Kassel — UniversitätKasselWest Germany

Personalised recommendations