An Approach to Classify Semi-Structured Objects

  • Elisa Bertino
  • Giovanna Guerrini
  • Isabella Merlo
  • Marco Mesiti
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1628)


Several advanced applications, such as those dealing with the Web, need to handle data whose structure is not known a-priori. Such requirement severely limits the applicability of traditional database techniques, that are based on the fact that the structure of data (e.g. the database schema) is known before data are entered into the database. Moreover, in traditional database systems, whenever a data item (e.g. a tuple, an object, and so on) is entered, the application specifies the collection (e.g. relation, class, and so on) the data item belongs to. Collections are the basis for handling queries and indexing and therefore a proper classification of data items in collections is crucial. In this paper, we address this issue in the context of an extended object-oriented data model. We propose an approach to classify objects, created without specifying the class they belong to, in the most appropriate class of the schema, that is, the class closest to the object state. In particular, we introduce the notion of weak membership of an object in a class, and define two measures, the conformity and the heterogeneity degrees, ex- ploited by our classification algorithm to identify the most appropriate class in which an object can be classified, among the ones of which it is a weak member.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    S. Abiteboul. Querying Semi-Structured Data. In F. Afrati and P. Kolaitis, editors, Database Theory-ICDT’97, pages 1–18, 1997.Google Scholar
  2. 2.
    S. Abiteboul, S. Cluet, and T. Milo. Correspondence and Traslation for Heterogeneous Data. In F. Afrati and P. Kolaitis, editors, Database Theory-ICDT’97, pages 351–363, 1997.Google Scholar
  3. 3.
    S. Abiteboul, R. Motwani, and S. Nestorov. Inferring Structure in Semistructured Data. In Proc. Workshop on Management of Semistructured Data, SIGMOD Record, 26(4):39–43, 1997.Google Scholar
  4. 4.
    S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. L. Wiener. The Lorel Query Language for Semistructured Data. Journal of Digital Libraries, 1(1):68–88, 1996.Google Scholar
  5. 5.
    S. Abiteboul and V. Vianu. Queries and Computation on the Web. In F. Afrati and P. Kolaitis, editors, Database Theory-ICDT’97, pages 262–275, 1997.Google Scholar
  6. 6.
    R. Agrawal, A. Borgida, and H. Jagadish. Effcient Management of Transitive Relationships in Large Data and Knowledge Bases. In J. Clifford, B. Lindsay, and D. Maier, editors, Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pages 253–262, 1989.Google Scholar
  7. 7.
    P. L. Bergstein and K. J. Lieberherr. Incremental Class Dictionary Learning and Optimization. In P. America, editor, Proc. Fifth European Conference on Object-Oriented Programming, number 512 in Lecture Notes in Computer Science, pages 377–396, 1991.CrossRefGoogle Scholar
  8. 8.
    E. Bertino, G. Guerrini, I. Merlo, and M. Mesiti. An Object-Oriented Data Model for Semi-Structured Data. Technical Report DISI-TR-99-06, University of Genova, Department of Computer Science (DISI), 1998.Google Scholar
  9. 9.
    R. Breitl, D. Maier, A. Otis, J. Penney, B. Schuchardt, J. Stein, E. H. Williams, and M. Williams. The GemStone Data Management System. In W. Kim and F. H. Lochovsky, editors, Object-Oriented Concepts, Databases, and Applications, pages 283–308. Addison-Wesley, 1989.Google Scholar
  10. 10.
    P. Buneman. Semistructured Data. In Proc. of 6th ACM SIGACT-SIGMOD-SIGART Symposium on PODS, pages 117–121, 1997. Tutorial.Google Scholar
  11. 11.
    P. Buneman, S. Davidson, M. Fernandez, and D. Suciu. Adding Structure to Unstructured Data. In F. Afrati and P. Kolaitis, editors, Database Theory-ICDT’97, pages 336–350, 1997.Google Scholar
  12. 12.
    P. Buneman, S. Davidson, D. Suciu, and G. Hillebrand. A Query Language and Optimization Techniques for Unstructured Data. In Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pages 505–516, 1996.Google Scholar
  13. 13.
    V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl. From Structured Documents to Novel Query Facilities. In Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pages 313–324, 1994.Google Scholar
  14. 14.
    S. Cluet. Modeling and Querying Semi-Structured Data. In M. T. Pazienza, editor, Information Extraction. LNAI 1299, pages 192–213, 1997.Google Scholar
  15. 15.
    O. Deux et al. The Story of o2. IEEE Transactions on Knowledge and Data Engineering, 2(1):91–108, 1990.CrossRefGoogle Scholar
  16. 16.
    A. Goldberg and D. Robson. Smalltalk-80: The Language and its Implementation. Addison-Wesley, 1983.Google Scholar
  17. 17.
    R. Goldman and J. Widom. Dataguides: Enabling Query Formulation and Optimization in Semistructured Databases. In Proc. Twentythird Int’l Conf. on Very Large Data Bases, pages 436–445, 1997.Google Scholar
  18. 18.
    G. Guerrini, E. Bertino, and R. Bal. A Formal De nition of the Chimera Object-Oriented Data Model. Journal of Intelligent Information Systems, 11(1):5–40, 1998.CrossRefGoogle Scholar
  19. 19.
    J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, and A. Crespo. Extracting Semistructured Information from the Web, 1997. Available via anonymous ftp at
  20. 20.
    M. Henzinger, T. Henzinger, and P. Kopke. Computing Simulation on Finite and Infinite Graphs. In Proc. of 20th Symposium on Foundations on Computer Science, pages 453–462, 1995.Google Scholar
  21. 21.
    S. Holzner. XML Complete. McGraw-Hill, 1998.Google Scholar
  22. 22.
    R. Milner. An Algebraic Definition of Simulation between Programs. In Proc. of the 2nd IJCAI, pages 481–489, London, UK, 1971.Google Scholar
  23. 23.
    S. Nestorov, S. Abiteboul, and R. Motwani. Extracting Schema from Semistructured Data. In L. M. Haas and A. Tiwary, editors, Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pages 295–306, 1998.Google Scholar
  24. 24.
    Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object Exchange Across Heterogeneous Information Sources. In Proc. of the 11th Int’l Conf. on Data Engineering, pages 251–260, 1995.Google Scholar
  25. 25.
    C. Peltason, A. Schmiedel, C. Kindermann, and J. Quantz. The BACK System Revisited. Technical Report KIT-Report 75, Technische Universitat Berlin, 1989.Google Scholar
  26. 26.
    F. Rabitti. The Multos Document Model, volume Human Factors in Information Technology of 6, chapter 3, pages 17–52. North-Holland, 1990.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Elisa Bertino
    • 1
  • Giovanna Guerrini
    • 2
  • Isabella Merlo
    • 2
  • Marco Mesiti
    • 3
  1. 1.Dipartimento di Scienze dell’InformazioneUniversità degli Studi di MilanoMilanoItaly
  2. 2.Dipartimento di Informatica e Scienze dell’InformazioneUniversità di GenovaGenovaItaly
  3. 3.Bell Communications ResearchNJUSA

Personalised recommendations