Unsupervised Named Entity Recognition and Disambiguation: An Application to Old French Journals

  • Yusra Mosallam
  • Alaa Abi-Haidar
  • Jean-Gabriel Ganascia
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8557)


In this paper we introduce our method of Unsupervised Named Entity Recognition and Disambiguation (UNERD) that we test on a recently digitized unlabeled corpus of French journals comprising 260 issues from the 19th century. Our study focuses on detecting person, location, and organization names in text. Our original method uses a French entity knowledge base along with a statistical contextual disambiguation approach. We show that our method outperforms supervised approaches when trained on small amounts of annotated data, since manual data annotation is very expensive and time consuming, especially in foreign languages and specific domains.


Noun Phrase Name Entity Recognition Word Sense Disambiguation Inverse Document Frequency Entity Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: COLING, vol. 96, pp. 466–471 (1996)Google Scholar
  2. 2.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)CrossRefGoogle Scholar
  3. 3.
    Fleischman, M., Hovy, E.: Fine grained classification of named entities. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)Google Scholar
  4. 4.
    Poibeau, T., Kosseim, L.: Proper name extraction from non-journalistic texts. Language and Computers 37(1), 144–157 (2001)Google Scholar
  5. 5.
    Cucerzan, S., Yarowsky, D.: Language independent named entity recognition combining morphological and contextual evidence. In: Proceedings of the 1999 Joint SIGDAT Conference on EMNLP and VLC, pp. 90–99 (1999)Google Scholar
  6. 6.
    Cucchiarelli, A., Velardi, P.: Unsupervised named entity recognition using syntactic and semantic contextual evidence. Computational Linguistics 27(1), 123–131 (2001)CrossRefGoogle Scholar
  7. 7.
    Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: 7th Conference on Natural Language Learning at HLT-NAACL, pp. 142–147. Association for Computational Linguistics (2003)Google Scholar
  8. 8.
    Bayerl, P.S., Paul, K.I.: Identifying sources of disagreement: Generalizability theory in manual annotation studies. Computational Linguistics 33(1), 3–8 (2007)CrossRefGoogle Scholar
  9. 9.
    Elsner, M., Charniak, E., Johnson, M.: Structured generative models for unsupervised named-entity clustering. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the N. American Chapter of the Association for Computational Linguistics, pp. 164–172. Association for Computational Linguistics (2009)Google Scholar
  10. 10.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)Google Scholar
  11. 11.
    Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, Taneva, G.: Robust disambiguation of named entities in text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 782–792. Association for Computational Linguistics (2011)Google Scholar
  12. 12.
    Hakimov, S., Oto, S.A., Dogdu, E.: Named entity recognition and disambiguation using linked data and graph-based centrality scoring. In: Proceedings of the 4th International Workshop on Semantic Web Information Management, p. 4. ACM (2012)Google Scholar
  13. 13.
    Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. In: Lamontagne, L., Marchand, M. (eds.) Canadian AI 2006. LNCS (LNAI), vol. 4013, pp. 266–277. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Cohen, W.W., Fan, W.: Learning page-independent heuristics for extracting data from web pages. In: AAAI Spring Symposium on Intelligent Agents in Cyberspace (1999)Google Scholar
  15. 15.
    Sagot, B., Stern, R., et al.: Aleda, a free large-scale entity database for French. In: Proceedings of LREC 2012 (2012)Google Scholar
  16. 16.
    Navigli, R.: Word sense disambiguation: A survey. ACM Comput. Surv. 41(2), 10:1–10:69 (2009)Google Scholar
  17. 17.
    Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 189–196. Association for Computational Linguistics (1995)Google Scholar
  18. 18.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press (2008)Google Scholar
  19. 19.
    Schmid, H.: Improvements in part-of-speech tagging with an application to German. In: Proceedings of the ACL SIGDAT-Workshop (1995)Google Scholar
  20. 20.
    Feldman, R., Sanger, J.: The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Yusra Mosallam
    • 1
  • Alaa Abi-Haidar
    • 2
  • Jean-Gabriel Ganascia
    • 2
  1. 1.DMKM Masters, UPMCParisFrance
  2. 2.ACASA, LIP6, UPMCParisFrance

Personalised recommendations