Advertisement

NER in Tweets Using Bagging and a Small Crowdsourced Dataset

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8686)

Abstract

Named entity recognition (NER) systems for Twitter are very sensitive to cross-sample variation, and the performance of off-the-shelf systems vary from reasonable (F 1: 60–70%) to completely useless (F 1: 40–50%) across available Twitter datasets. This paper introduces a semi-supervised wrapper method for robust learning of sequential problems with many negative examples, such as NER, and shows that using a simple conditional random fields (CRF) model and a small crowdsourced dataset [4], leads to good NER performance across datasets.

Keywords

Twitter semi-supervised learning bagging crowdsourcing named entity recognition unlabeled data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Collins, M.: Discriminative training methods for Hidden Markov Models. In: EMNLP (2002)Google Scholar
  3. 3.
    Eisenstein, J.: What to do about bad language on the internet. In: NAACL (2013)Google Scholar
  4. 4.
    Finin, T., Murnane, W., Karandikar, A., Keller, N., Martineau, J., Dredze, M.: Annotating named entities in Twitter data with crowdsourcing. In: NAACL Workshop on Creating Speech and Language Data with Amazons Mechanical Turk (2010)Google Scholar
  5. 5.
    Finkel, J., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: ACL (2005)Google Scholar
  6. 6.
    Hovy, D., Berg-Kirkpatrick, T., Vaswani, A., Hovy, E.: Learning whom to trust with MACE. In: NAACL (2013)Google Scholar
  7. 7.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML (2001)Google Scholar
  8. 8.
    Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: ACL (2011)Google Scholar
  9. 9.
    Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. In: NAACL (2013)Google Scholar
  10. 10.
    Piskorski, J., Ehrmann, M.: Named entity recognition in targeted Twitter streams in Polish. In: ACL Workshop on Balto-Slavic NLP (2013)Google Scholar
  11. 11.
    Poibeau, T., Kosseim, L.: Proper name extraction from non-journalistic texts. In: CLIN (2000)Google Scholar
  12. 12.
    Ritter, A., Clark, S., Etzioni, M., Etzioni, O.: Named entity recognition in tweets: an experimental study. In: EMNLP (2011)Google Scholar
  13. 13.
    Rodrigues, F., Pereira, F., Ribeiro, B.: Sequence labeling with multiple annotators. Machine Learning, 1–17 (2013)Google Scholar
  14. 14.
    Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: HTL-NAACL (2003)Google Scholar
  15. 15.
    Suzuki, J., Isozaki, H.: Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In: ACL, Columbus, Ohio, pp. 665–673 (2008)Google Scholar
  16. 16.
    Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: ACL (2010)Google Scholar
  17. 17.
    Wang, C.-K., Hsu, B.-J., Chang, M.-W., Kiciman, E.: Simple and knowledge-intensive generative model for named entity recognition. Technical report, Microsoft Research (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Center for Language TechnologyUniversity of CopenhagenDenmark

Personalised recommendations