A Web Information Extraction System to DB Prototyping

Moreda, P.; Muñoz, R.; Martńez-Barco, P.; Cachero, C.; Palomar, Manuel

doi:10.1007/3-540-36271-1_2

P. Moreda⁵,
R. Muñoz⁵,
P. Martńez-Barco⁵,
C. Cachero⁵ &
…
Manuel Palomar⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2553))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

445 Accesses
1 Citations

Abstract

Database prototyping is a technique widely used both to validate user requirements and to verify certain application functionality. These tasks usually require the population of the underlying data structures with sampling data that, additionally, may need to stick to certain restrictions. Although some existing approaches have already automated this population task by means of random data generation, the lack of semantic meaning of the resulting structures may interfere both in the user validation and in the designer verification task.

In order to solve this problem and improve the intuitiveness of the resulting prototypes, this paper presents a population system that, departing from the information contained in a UML-compliant Domain Conceptual Model, applies Information Extraction techniques to compile meaningful information sets from texts available through Internet. The system is based on the semantic information extracted from the EWN lexical resource and includes, among other features, a named entity recognition system and an ontology that speed up the prototyping process and improve the quality of the sampling data.

This paper has been supported by the Spanish government, projects TIC2000-0664- C02-01/02 and TIC2001-3530-C02-01/02

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Bell. Code Generation from Object Models. Embedded Systems Programming, 3:1-9, 1998.
Google Scholar
A. Cucchiarelli, D. Luzy, and P. Velardi. Automatic semantic tagging of unknown proper names. In ACL, editor, Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL’98), pages 286–292, Canada, 1998.
Google Scholar
Morgan Kaufman, editor. Sixth Message Understanding Conference (MUC-6), Los Altos, Ca, November 1995.
Google Scholar
G. R. Krupka. Description of the SRA system as used for MUC-6. In Kaufman [3], pages 71–86.
Google Scholar
B. Magnini and G. Cavaglia. Integrating subject field codes into WordNet. In Proceedings of the LREC-2000, 2000.
Google Scholar
B. Magnini and C. Strapparava. Experiments in Word Domain Disambiguation for Parallel Texts. In Proceedings of the ACL Workshop on Word Senses and Multilinguality, 2000.
Google Scholar
D. McDonald. Internal and external evidence in the identification and semantic categorization of proper names, 1996.
Google Scholar
A. Mikhocv, M. Moons, and C. Grover. Named Entity Recognition withou Gazetteers. In ACL, editor, Proceedings of the 11th European Chapter of the Association for Computational Linguistics (EACL), pages 1–8, Norway, 1999.
Google Scholar
A. Montoyo, A. Suarez, and M. Palomar. Combining Supervised-Unsupervised Methods for Word Sense Disambiguation. In Alexander Gelbukh, editor, Proceedings of 3nd International conference on Intelligent Text Processing and Computational Linguistics (CICLing-2002), volume 2276 of Lecture Notes in Computer Science, pages 156–164, Mexico City, 2002. Springer-Verlag.
Google Scholar
R. Morgan, R. Garigliano, P. Callaghan, S. Poria, M. Smith, A. Urbanowicz, R. Collingham, M. Costantino, and C. Cooper. Description of the LOLITA system as used for MUC-6. In Kaufman [3], pages 71–86.
Google Scholar
R. Muñoz, A. Montoyo, F. Llopis, and A. Suárez. EsReconocimiento de entidades en el sistema EXIT. Procesamiento del Lenguaje Natural, 23:47–53, 1998.
Google Scholar
B. Sundheim. Overview of results of the MUC-6. In Kaufman [3], pages 13–32.
Google Scholar

Download references

Author information

Authors and Affiliations

Grupo de investigación del Procesamiento del Lenguaje y Sistemas de Información Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain
P. Moreda, R. Muñoz, P. Martńez-Barco, C. Cachero & Manuel Palomar

Authors

P. Moreda
View author publications
You can also search for this author in PubMed Google Scholar
R. Muñoz
View author publications
You can also search for this author in PubMed Google Scholar
P. Martńez-Barco
View author publications
You can also search for this author in PubMed Google Scholar
C. Cachero
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Palomar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Systems Sciences, Royal Institute of Technology, Forum 100, 16440, Kista, Sweden
Birger Andersson , Maria Bergholtz & Paul Johannesson , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moreda, P., Muñoz, R., Martńez-Barco, P., Cachero, C., Palomar, M. (2002). A Web Information Extraction System to DB Prototyping. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds) Natural Language Processing and Information Systems. NLDB 2002. Lecture Notes in Computer Science, vol 2553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36271-1_2

Download citation

DOI: https://doi.org/10.1007/3-540-36271-1_2
Published: 28 February 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00307-6
Online ISBN: 978-3-540-36271-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics