A Proposal for the Automatic Generation of Instances from Unstructured Text

  • Roxana Danger
  • I. Sanz
  • Rafael Berlanga-Llavori
  • José Ruiz-Shulcloper
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3287)


An ontology is a conceptual representation of a domain resulted from a consensus within a community. One of its main applications is the integration of heterogeneous information sources available in the Web, by means of the semantic annotation of web documents. This is the cornerstone of the emerging Semantic Web. However, nowadays most of the information in the Web consists of text documents with little or no structure at all, which makes impracticable their manual annotation. This paper addresses the problem of mapping text fragments into a given ontology in order to generate ontology instances that semantically describe this kind of resources. As a result, applying this mapping we can automatically populate a Semantic Web consisting of text documents that concern with a specific ontology. We have evaluated our approach over a real-application ontology and a text collection both in the Archeology domain. Results show the effectiveness of the method as well as its usefulness.


Information Extraction Automatic Generation Text Document Semantic Annotation Oriented Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001)Google Scholar
  2. 2.
    Gruber, T.R.: Towards Principles for the Design of Ontologies used for Knowledge Sharing. International Journal of Human-Computer Studies 43, 907–928 (1995)CrossRefGoogle Scholar
  3. 3.
    Forno, F., Farinetti, L., Mehan, S.: Can Data Mining Techniques Ease The Semantic Tagging Burden? In: SWDB 2003, pp. 277–292 (2003)Google Scholar
  4. 4.
    Doan, A., et al.: Learning to match ontologies on the Semantic Web. VLDB Journal 12(4), 303–319 (2003)CrossRefGoogle Scholar
  5. 5.
    Appelt, D.: Introduction to Information Extraction. AI Communications 12 (1999)Google Scholar
  6. 6.
    Maedche, A., Neumann, G., Staab, S.: Bootstrapping an Ontology based Information Extraction System. Studies in Fuzziness and Soft Computing. Springer, Heidelberg (2001)Google Scholar
  7. 7.
    Danger, R., Ruíz-Shulcloper, J., Berlanga, R.: Text Mining using the Hierarchical Structure of Documents. In: Conejo, R., Urretavizcaya, M., Pérez-de-la-Cruz, J.-L. (eds.) CAEPIA/TTIA 2003. LNCS (LNAI), vol. 3040, Springer, Heidelberg (2004) (in Press)CrossRefGoogle Scholar
  8. 8.
    Dirección General del Patrimonio Artístico,

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Roxana Danger
    • 1
  • I. Sanz
    • 2
  • Rafael Berlanga-Llavori
    • 2
  • José Ruiz-Shulcloper
    • 3
  1. 1.University of OrienteSantiago de CubaCuba
  2. 2.Universitat Jaume ICastellónSpain
  3. 3.Institute of Cybernetics, Mathematics and PhysicsLa HabanaCuba

Personalised recommendations