Web-Based Sources for an Annotated Corpus Building and Composite Proper Name Identification

  • Sofía N. Galicia-Haro
  • Alexander Gelbukh
  • Igor A. Bolshakov
Conference paper

DOI: 10.1007/978-3-540-24681-7_14

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3034)
Cite this paper as:
Galicia-Haro S.N., Gelbukh A., Bolshakov I.A. (2004) Web-Based Sources for an Annotated Corpus Building and Composite Proper Name Identification. In: Favela J., Menasalvas E., Chávez E. (eds) Advances in Web Intelligence. AWIC 2004. Lecture Notes in Computer Science, vol 3034. Springer, Berlin, Heidelberg

Abstract

Nowadays, collections of texts with annotations on several levels are useful resources. Huge efforts are required to develop this resource for languages like Spanish. In this work, we present the initial step, lexical level annotation, for the compilation of an annotated Mexican corpus using Web-based sources. We also describe a method based on heterogeneous knowledge and simple Web-based sources for the proper name identification required in such annotation. We focused our work on composite entities (names with coordinated constituents, names with several prepositional phrases, and names of songs, books, movies, etc.). The preliminary obtained results are presented.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Sofía N. Galicia-Haro
    • 1
  • Alexander Gelbukh
    • 2
    • 3
  • Igor A. Bolshakov
    • 2
  1. 1.Faculty of SciencesUNAM Ciudad Universitaria Mexico CityMexico
  2. 2.Center for Computing ResearchNational Polytechnic InstituteMexico CityMexico
  3. 3.Department of Computer Science and EngineeringChung-Ang UniversitySeoulKorea

Personalised recommendations