TISA: Topic Independence Scoring Algorithm

Martineau, Justin Christopher; Cheng, Doreen; Finin, Tim

doi:10.1007/978-3-642-39712-7_43

Justin Christopher Martineau²⁰,
Doreen Cheng²⁰ &
Tim Finin²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7988))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

4324 Accesses
3 Citations

Abstract

Textual analysis using machine learning is in high demand for a wide range of applications including recommender systems, business intelligence tools, and electronic personal assistants. Some of these applications need to operate over a wide and unpredictable array of topic areas, but current in-domain, domain adaptation, and multi-domain approaches cannot adequately support this need, due to their low accuracy on topic areas that they are not trained for, slow adaptation speed, or high implementation and maintenance costs.

To create a true domain-independent solution, we introduce the Topic Independence Scoring Algorithm (TISA) and demonstrate how to build a domain-independent bag-of-words model for sentiment analysis. This model is the best preforming sentiment model published on the popular 25 category Amazon product reviews dataset. The model is on average 89.6% accurate as measured on 20 held-out test topic areas. This compares very favorably with the 82.28% average accuracy of the 20 baseline in-domain models. Moreover, the TISA model is highly uniformly accurate, with a variance of 5 percentage points, which provides strong assurance that the model will be just as accurate on new topic areas. Consequently, TISAs models are truly domain independent. In other words, they require no changes or human intervention to accurately classify documents in never before seen topic areas.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), Valletta, Malta. European Language Resources Association (ELRA) (2010)
Google Scholar
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Annual Meeting of Association For Computational Linguistics, vol. 45, pp. 440–447 (2007)
Google Scholar
Blitzer, J., Kakade, S., Foster, D.P.: Domain adaptation with coupled subspaces. Journal of Machine Learning Research - Proceedings Track 15, 173–181 (2011)
Google Scholar
Chen, M., Weinberger, K.Q., Blitzer, J.: Co-training for domain adaptation. In: NIPS 2011, pp. 2456–2464 (2011)
Google Scholar
Fellbaum, C.: Wordnet. In: Theory and Applications of Ontology: Computer Applications, pp. 231–243 (2010)
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Martineau, J.: Identifying and Isolating Text Classification Signals from Domain and Genre Noise for Sentiment Analysis. PhD thesis, University of Maryland, Baltimore County, Computer Science and Electrical Engineering (December 2011)
Google Scholar
Martineau, J., Finin, T., Joshi, A., Patel, S.: Improving binary classification on text problems using differential word features. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 2019–2024. ACM (2009)
Google Scholar
Paltoglou, G., Thelwall, M.: A study of Information Retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1386–1395. Association for Computational Linguistics (2010)
Google Scholar
Pan, S., Ni, X., Sun, J., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th International Conference on World Wide Web, pp. 751–760. ACM (2010)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classi cation using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics, Morristown (2002)
Google Scholar
Strapparava, C., Valitutti, A.: Wordnet-affect: an affective extension of wordnet. In: Proceedings of LREC, vol. 4, pp. 1083–1086 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Samsung Information Systems North America, USA
Justin Christopher Martineau & Doreen Cheng
University of Maryland Baltimore County, USA
Tim Finin

Authors

Justin Christopher Martineau
View author publications
You can also search for this author in PubMed Google Scholar
Doreen Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Tim Finin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, IBaI, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martineau, J.C., Cheng, D., Finin, T. (2013). TISA: Topic Independence Scoring Algorithm. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2013. Lecture Notes in Computer Science(), vol 7988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39712-7_43

Download citation

DOI: https://doi.org/10.1007/978-3-642-39712-7_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39711-0
Online ISBN: 978-3-642-39712-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics