Machine Translation

, Volume 32, Issue 1–2, pp 31–43 | Cite as

Combining rule-based and statistical mechanisms for low-resource named entity recognition

  • Ryan GabbardEmail author
  • Jay DeYoung
  • Constantine Lignos
  • Marjorie Freedman
  • Ralph Weischedel


We describe a multifaceted approach to named entity recognition that can be deployed with minimal data resources and a handful of hours of non-expert annotation. We describe how this approach was applied in the 2016 LoReHLT evaluation and demonstrate that both statistical and rule-based approaches contribute to our performance. We also demonstrate across many languages the value of selecting the sentences to be annotated when training on small amounts of data.


Named entity recognition Low-resource NLP Annotation 



This material is based upon work supported by the the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-15-C-0113. The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. (Approved for Public Release by DARPA on Aug 29, 2017 (DISTAR Approval #28392) , Distribution Unlimited)


  1. Bonadiman D, Severyn A, Moschitti A (2015) Deep neural networks for named entity recognition in Italian. CLiC it 51–55Google Scholar
  2. Collins M, Singer Y (1999) Unsupervised models for named entity classification. In: Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora, pp 100–110Google Scholar
  3. Duchi JC, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159MathSciNetzbMATHGoogle Scholar
  4. Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, ICML ’01, pp 282–289Google Scholar
  5. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. CoRR abs/1603.01360,
  6. Li W, McCallum A (2003) Rapid development of hindi named entity recognition using conditional random fields and feature induction. In: ACM transactions on Asian language information processing, pp 290–294Google Scholar
  7. Linguistic Data Consortium (2016) LORELEI IL3 incident language pack for year 1 Eval. LDC2016E57Google Scholar
  8. Nadeau D, Turney PD, Matwin S (2006) Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. In: Proceedings of the 19th international conference on advances in artificial intelligence: Canadian Society for Computational Studies of Intelligence, Springer, Berlin, Heidelberg, AI’06, pp 266–277Google Scholar
  9. Ramshaw LA, Marcus MP (1999) Text chunking using transformation-based learning. In: Armstrong S, Church K, Isabelle P, Manzi S, Tzoukermann E, Yarowsky D (eds) Natural language processing using very large corpora. Springer, The Netherlands, Dordrecht, pp 157–176CrossRefGoogle Scholar
  10. Riaz K (2010) Rule-based named entity recognition in urdu. In: Proceedings of the 2010 named entities workshop, Association for computational linguistics, Stroudsburg, PA, NEWS ’10, pp 126–135Google Scholar
  11. Settles B (2010) Active learning literature survey. In: Computer sciences technical report, University of Wisconsin-MadisonGoogle Scholar
  12. Sun H, Grishman R, Wang Y (2016) Domain adaptation with active learning for named entity recognition. In: Sun X, Liu A, Chao HC, Bertino E (eds) Cloud computing and security: second international conference. Revised Selected Papers, Part II, Springer International Publishing, Cham, ICCCS 2016, Nanjing, China, 29–31 July 2016, pp 611–622Google Scholar
  13. Sundheim BM (1995) Overview of results of the MUC-6 evaluation. In: Proceedings of the 6th conference on message understanding, Association for Computational Linguistics, Stroudsburg, PA, MUC-6 ’95, pp 13–31Google Scholar
  14. Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003 - Vol4, Association for Computational Linguistics, Stroudsburg, PA, CoNLL ’03, pp 142–147Google Scholar
  15. Wick M (2016) Geonames ontology.
  16. Xu H, Marcus M, Ungar L, Yang C (2017) Unsupervised morphology learning with statistical paradigms, unpublished manuscriptGoogle Scholar
  17. Zhang B, Pan X, Wang T, Vaswani A, Ji H, Knight K, Marcu D (2016) Name tagging for low-resource incident languages based on expectation-driven learning. In: Proceedings of ACL 2016Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  1. 1.Raytheon BBN TechnologiesCambridgeUSA

Personalised recommendations