Biographies or Blenders: Which Resource Is Best for Cross-Domain Sentiment Analysis?

  • Natalia Ponomareva
  • Mike Thelwall
Conference paper

DOI: 10.1007/978-3-642-28604-9_40

Part of the Lecture Notes in Computer Science book series (LNCS, volume 7181)
Cite this paper as:
Ponomareva N., Thelwall M. (2012) Biographies or Blenders: Which Resource Is Best for Cross-Domain Sentiment Analysis?. In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7181. Springer, Berlin, Heidelberg

Abstract

Domain adaptation is usually discussed from the point of view of new algorithms that minimise performance loss when applying a classifier trained on one domain to another. However, finding pertinent data similar to the test domain is equally important for achieving high accuracy in a cross-domain task. This study proposes an algorithm for automatic estimation of performance loss in the context of cross-domain sentiment classification. We present and validate several measures of domain similarity specially designed for the sentiment classification task. We also introduce a new characteristic, called domain complexity, as another independent factor influencing performance loss, and propose various functions for its approximation. Finally, a linear regression for modeling accuracy loss is built and tested in different evaluation settings. As a result, we are able to predict the accuracy loss with an average error of 1.5% and a maximum error of 3.4%.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Natalia Ponomareva
    • 1
  • Mike Thelwall
    • 1
  1. 1.Statistical Cybermetrics Research groupUniversity of WolverhamptonUK

Personalised recommendations