Explainable and Transferrable Text Categorization

Eljasik-Swoboda, Tobias; Engel, Felix; Hemmje, Matthias

doi:10.1007/978-3-030-54595-6_1

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1255))

Included in the following conference series:

International Conference on Data Management Technologies and Applications

327 Accesses

Abstract

Automated argument stance (pro/contra) detection is a challenging text categorization problem, especially if said arguments are to be detected for new topics. In previous research, we designed and evaluated an explainable machine learning based classifier. It was capable to achieve 96% F1 for argument stance recognition within the same topic and 60% F1 for previously unseen topics, which informed our hypothesis, that there are two sets of features in argument stance recognition: General features and topic specific features. An advantage of the described system is its quick transferability to new problems. Besides providing further details about the developed C3 TFIDF-SVM classifier, we investigate the classifiers effectiveness for different text categorization problems spanning two natural languages. Besides the quick transferability, the generation of human readable explanations about why specific results were achieved is a key feature of the described approach. We further investigate the generated explanation understandability and conduct a survey about how understandable the classifier’s explanations are.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

US National Library of Medicine National Institutes of Health pubmed.gov. https://www.ncbi.nlm.nih.gov/pubmed/. Accessed 17 Sep 2019
European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance); OJ L, 4 May, 2016, vol. 119, pp. 1–88 (2016)
Google Scholar
Clos, J., Wiratunga, N., Massie, S.: Towards explainable text classification by jointly learning lexicon and modifier terms. In: IJCAI-17 Workshop on Explainable AI (XAI) (2017)
Google Scholar
Lippi, M., Torroni, P.: Argument mining: a machine learning perspective. In: Black, E., Modgil, S., Oren, N. (eds.) TAFA 2015. LNCS (LNAI), vol. 9524, pp. 163–176. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-28460-6_10
Chapter Google Scholar
Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu, X., Cherry, C: Semeval-2016 task 6: detecting stance in tweets. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 31–41 (2016)
Google Scholar
Eljasik-Swoboda, T., Engel, F., Hemmje, M.: Using topic specific features for argument stance recognition. In: Proceedings of the 8th International Conference on Data Science, Technology and Applications (DATA 2019), pp. 13–22 (2019). ISBN:978-989-758-377-3
Google Scholar
Mohammad, S.M., Sobhani, P., Kiritchenko, S.: Stance and sentiment in tweets. ACM Trans. Internet Technol. Argument. Soc. Media 17, 1–23 (2016)
Google Scholar
Stab, C., Miller, T., Schiller, B., Rai, P., Gurevych, I.: Cross-topic argument mining from heterogeneous sources. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2018) (2018)
Google Scholar
Same Side Stance Classification. https://sameside.webis.de/. Accessed 24 Sep 2019
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)
Article Google Scholar
Bader, S., Hitzler, P.: Dimensions of neural-symbolic integration – a structured survey. arXiv preprint arXiv:cs/0511042 (2005)
Swoboda, T., Kaufmann, M., Hemmje, M.: Toward cloud-based classification and annotation support. In: Proceedings of the 6th International Conference on Cloud Computing and Services Science (CLOSER 2016), vol. 2, pp. 131–237 (2016)
Google Scholar
McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943)
Article MathSciNet Google Scholar
Helbig, H., Scherer, A.: Kurs 1830: Neuronale Netze. University of Hagen, Germany (2011)
Google Scholar
Arel, I., Rose, D.C., Karnowski, T.P.: Deep machine learning – a new frontier in artificial intelligence research. In: IEEE Computational Intelligence Magazine, USA, November issue, pp. 13–18 (2010)
Google Scholar
Vapnik, V.N., Chervonenkis, A.Y.: On a class of algorithms of learning pattern recognition. Framework of the Generalised Portrait Method, Oб oднoм клacce aлгopитмoв oбyчeния pacпoзнaвaнию oбpaзoв, Aвтoмaтикa и тeлeмexaникa (1964)
Google Scholar
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representation in vector space. In: Proceedings of Workshop at ICLR (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.: GloBe: global vectors for word representation. In: Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Google Scholar
Zanzotto, F.M., Korkontzelos, I., Fallucchi, F., Manandhar, S.: Estimating linear models for compositional distributed semantics. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1263–1271 (2010)
Google Scholar
Kusner, M.J., Sun, Y., Kolkin, N., Weinberger, K.Q.: From word embeddings to document distances. In: Proceedings of the 32nd International Conference on Machine Learning (2015)
Google Scholar
Dai, X., Bikdash, M., Meyer, M.: From social media to public health surveillance: word embedding based clustering method for twitter classification. In: Proceedings of SoutheastCon, pp. 1–7 (2017). https://doi.org/10.1109/secon.2017.7925400
Eljasik-Swoboda, T., Kaufmann, M., Hemmje, M.: No target function classifier – fast unsupervised text categorization using semantic spaces. In: Proceedings of the 7th International Conference on Data Science, Technology and Applications (DATA 2018), pp. 35–46 (2018)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805v2 (2019)
Wolff, E.: Microservices – Flexible Software Architecture. Pearson Education, USA (2017)
Google Scholar
Dropwizard: Production-ready, out of the box. https://dropwizard.io. Accessed 12 Sep 2019
Enterprise Container Platform | Docker. https://www.docker.com/. Accessed 30 Sep 2019
Peldszus, A.: An annotated corpus of argumentative microtexts. https://github.com/peldszus/arg-microtexts. Accessed 15 Mar 2019
Wiegand, M., Siegel, M., Ruppenhofer, J.: Overview of the GermEval 2018 shared task on the identification of offensive language. In: Proceedings of the GermEval, Vienna, Austria (2018)
Google Scholar
Coucke, A., et al.: Snipts voice platform, an embedded spoken language understanding system for private-by-design voice interfaces. arXiv:1805.10190 (2018)

Download references

Acknowledgements

This work has been funded by the Deutsche Forschungsgemeinschaft (DFG) within the project Empfehlungsrationalisierung, Grant Number 376059226, as part of the Priority Program “Robust Argumentation Machines (RATIO)” (SPP-1999).

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, University of Hagen, Hagen, Germany
Tobias Eljasik-Swoboda
FTK e.V. Forschungsinstitut für Telekommunikation und Kooperation, Dortmund, Germany
Felix Engel & Matthias Hemmje

Authors

Tobias Eljasik-Swoboda
View author publications
You can also search for this author in PubMed Google Scholar
Felix Engel
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Hemmje
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tobias Eljasik-Swoboda .

Editor information

Editors and Affiliations

MODESTE/ESEO, Angers, France
Slimane Hammoudi
Hochschule Niederrhein, University of Applied Sciences, Krefeld, Nordrhein-Westfalen, Germany
Christoph Quix
Centre for Informatics and Systems, University of Coimbra, Coimbra, Portugal
Jorge Bernardino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Eljasik-Swoboda, T., Engel, F., Hemmje, M. (2020). Explainable and Transferrable Text Categorization. In: Hammoudi, S., Quix, C., Bernardino, J. (eds) Data Management Technologies and Applications. DATA 2019. Communications in Computer and Information Science, vol 1255. Springer, Cham. https://doi.org/10.1007/978-3-030-54595-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-54595-6_1
Published: 30 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54594-9
Online ISBN: 978-3-030-54595-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics