Advertisement

Domain-General Versus Domain-Specific Named Entity Recognition: A Case Study Using TEXT

  • Cheng Yang Lim
  • Ian K. T. TanEmail author
  • Bhawani Selvaretnam
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11909)

Abstract

Named entity recognition (NER) seeks to identify and classify named entities within bodies of text into language categories such as nouns, that are reflective of locations, organizations, and people. As it is language dependent, the approach taken for most NER systems are domain-general, meaning that they are designed based on a language and not on a specific targeted domain. With current usage of non-formal languages on social media, this instigates the need to compare the performance of domain-general and domain specific NERs. A domain specific NER (vehicle traffic domain), TEXT, is described and the performance of domain-general NER versus TEXT is compared. The results of the evaluation show that the performance of domain-specific NER significantly outperforms domain-general NER. The domain-general NER could only perform adequately for common scenarios.

Keywords

Domain-general Domain-specific Named Entity Recognition Traffic Information extraction 

References

  1. 1.
    Al-Rfou, R., Kulkarni, V., Perozzi, B., Skiena, S.: POLYGLOT-NER: massive multilingual named entity recognition. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 586–594. SIAM (2015)Google Scholar
  2. 2.
    Bird, S., Loper, E.: NLTK: the natural language toolkit. association for computational linguistics. In: Proceedings of the ACL Demonstration Session, pp. 214–217 (2004)Google Scholar
  3. 3.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. AcM (2008)Google Scholar
  4. 4.
    Chiticariu, L., Li, Y., Reiss, F.R.: Rule-based information extraction is dead! long live rule-based information extraction systems! In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 827–832 (2013)Google Scholar
  5. 5.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics (2005)Google Scholar
  6. 6.
    King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)Google Scholar
  7. 7.
    Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 177–180 (2007)Google Scholar
  8. 8.
    Lim, C.Y., Tan, I.K., Selvaretnam, B., Howg, E.K., Kar, L.H.: Text: Traffic entity extraction from Twitter. In: Proceedings of the 2019 5th International Conference on Computing and Data Engineering, pp. 53–59. ACM (2019)Google Scholar
  9. 9.
    Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The stanford coreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)Google Scholar
  10. 10.
    Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. Association for Computational Linguistics (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Faculty of Computing and InformaticsMultimedia UniversityCyberjayaMalaysia
  2. 2.School of IT, Monash University MalaysiaSubang JayaMalaysia
  3. 3.Valiantlytix Sdn BhdPetaling JayaMalaysia

Personalised recommendations