Applying a Finite Automata Acquisition Algorithm to Named Entity Recognition

  • Muntsa Padró
  • Lluìs Padró
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4002)


In this work, Causal-State Splitting Reconstruction algorithm, originally conceived to model stationary processes by learning finite state automata from data sequences, is for the first time applied to NLP tasks, namely Named Entity Recognition. The obtained results are slightly below the best systems presented in CoNLL 2002 shared task, though given the simplicity of the used features, they are really promising.

Once the viability of using this algorithm for NLP tasks is stated, we plan to improve the results obtained at NER task, as well as to apply it to other NLP sequence recognition tasks such as PoS tagging, chunking, subcategorization patterns acquisition, etc.


Hide Markov Model Causal State Training Corpus Name Entity Recognition Entity Recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Segarra, E., Sanchis, E., García, F., Hurtado, L.F., Galiano, I.: Achieving full coverage of automatically learnt finite-state language models. In: Workshop on Finite-State Methods in Natural Language Processing. 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), Budapest, Hungary, pp. 135–142 (2003)Google Scholar
  2. 2.
    Pla, F.: Etiquetado Léxico y Análisis Sintáctico Superficial basado en Modelos Estadísticos. PhD thesis, Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València (2000)Google Scholar
  3. 3.
    Lang, K.J.: Random dfa’s can be approximately learned from sparse uniform examples. In: COLT 1992: Proceedings of the fifth annual workshop on Computational learning theory, pp. 45–52. ACM Press, New York (1992)CrossRefGoogle Scholar
  4. 4.
    Oncina, J.M.: Aprendizaje de lenguajes regulares y funciones subsecuenciales. PhD thesis, Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia (1991)Google Scholar
  5. 5.
    Rulot, H., Vidal, E.: Modelling (sub) string-length based constraints through a grammatical inference method. In: Pattern recognition theory and applications, pp. 451–459. Springer, Heidelberg (1987)CrossRefGoogle Scholar
  6. 6.
    Trakhtenbrot, B., Barzdin, Y.: Finite Automata: Behaviour and Synthesis. North Holland Publishing Company, Amsterdam (1973)MATHGoogle Scholar
  7. 7.
    Shalizi, C., Shalizi, K.: Blind construction of optimal nonlinear recursive predictors for discrete sequences. In: Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (2004)Google Scholar
  8. 8.
    Ramshaw, L., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Third ACL Workshop on Very Large Corpora. (1995)Google Scholar
  9. 9.
    Tjong Kim Sang, E.F., Veenstra, J.: Representing text chunks. In: Proceedings of EACL 1999, Bergen, Norway, pp. 173–179 (1999)Google Scholar
  10. 10.
    Tjong Kim Sang, E.F.: Introduction to the conll-2002 shared task: Language-independent named entity recognition. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 155–158 (2002)Google Scholar
  11. 11.
    Carreras, X., Chao, I., Padró, L., Padró, M.: Freeling: An open-source suite of language analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal (2004)Google Scholar
  12. 12.
    Carreras, X., Màrquez, L., Padró, L.: Named entity extraction using adaboost. In: Proceedings of CoNLL Shared Task, Taipei, Taiwan, pp. 167–170 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Muntsa Padró
    • 1
  • Lluìs Padró
    • 1
  1. 1.TALP Research CenterUniversitat Politècnica de CatalunyaSpain

Personalised recommendations