Skip to main content

A Web Information Extraction System to DB Prototyping

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2553))

Abstract

Database prototyping is a technique widely used both to validate user requirements and to verify certain application functionality. These tasks usually require the population of the underlying data structures with sampling data that, additionally, may need to stick to certain restrictions. Although some existing approaches have already automated this population task by means of random data generation, the lack of semantic meaning of the resulting structures may interfere both in the user validation and in the designer verification task.

In order to solve this problem and improve the intuitiveness of the resulting prototypes, this paper presents a population system that, departing from the information contained in a UML-compliant Domain Conceptual Model, applies Information Extraction techniques to compile meaningful information sets from texts available through Internet. The system is based on the semantic information extracted from the EWN lexical resource and includes, among other features, a named entity recognition system and an ontology that speed up the prototyping process and improve the quality of the sampling data.

This paper has been supported by the Spanish government, projects TIC2000-0664- C02-01/02 and TIC2001-3530-C02-01/02

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Bell. Code Generation from Object Models. Embedded Systems Programming, 3:1-9, 1998.

    Google Scholar 

  2. A. Cucchiarelli, D. Luzy, and P. Velardi. Automatic semantic tagging of unknown proper names. In ACL, editor, Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL’98), pages 286–292, Canada, 1998.

    Google Scholar 

  3. Morgan Kaufman, editor. Sixth Message Understanding Conference (MUC-6), Los Altos, Ca, November 1995.

    Google Scholar 

  4. G. R. Krupka. Description of the SRA system as used for MUC-6. In Kaufman [3], pages 71–86.

    Google Scholar 

  5. B. Magnini and G. Cavaglia. Integrating subject field codes into WordNet. In Proceedings of the LREC-2000, 2000.

    Google Scholar 

  6. B. Magnini and C. Strapparava. Experiments in Word Domain Disambiguation for Parallel Texts. In Proceedings of the ACL Workshop on Word Senses and Multilinguality, 2000.

    Google Scholar 

  7. D. McDonald. Internal and external evidence in the identification and semantic categorization of proper names, 1996.

    Google Scholar 

  8. A. Mikhocv, M. Moons, and C. Grover. Named Entity Recognition withou Gazetteers. In ACL, editor, Proceedings of the 11th European Chapter of the Association for Computational Linguistics (EACL), pages 1–8, Norway, 1999.

    Google Scholar 

  9. A. Montoyo, A. Suarez, and M. Palomar. Combining Supervised-Unsupervised Methods for Word Sense Disambiguation. In Alexander Gelbukh, editor, Proceedings of 3nd International conference on Intelligent Text Processing and Computational Linguistics (CICLing-2002), volume 2276 of Lecture Notes in Computer Science, pages 156–164, Mexico City, 2002. Springer-Verlag.

    Google Scholar 

  10. R. Morgan, R. Garigliano, P. Callaghan, S. Poria, M. Smith, A. Urbanowicz, R. Collingham, M. Costantino, and C. Cooper. Description of the LOLITA system as used for MUC-6. In Kaufman [3], pages 71–86.

    Google Scholar 

  11. R. Muñoz, A. Montoyo, F. Llopis, and A. Suárez. EsReconocimiento de entidades en el sistema EXIT. Procesamiento del Lenguaje Natural, 23:47–53, 1998.

    Google Scholar 

  12. B. Sundheim. Overview of results of the MUC-6. In Kaufman [3], pages 13–32.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moreda, P., Muñoz, R., Martńez-Barco, P., Cachero, C., Palomar, M. (2002). A Web Information Extraction System to DB Prototyping. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds) Natural Language Processing and Information Systems. NLDB 2002. Lecture Notes in Computer Science, vol 2553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36271-1_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-36271-1_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00307-6

  • Online ISBN: 978-3-540-36271-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics