Hierarchical Attention Network with XGBoost for Recognizing Insufficiently Supported Argument

  • Derwin Suhartono
  • Aryo Pradipta Gema
  • Suhendro Winton
  • Theodorus David
  • Mohamad Ivan Fanany
  • Aniati Murni Arymurthy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10607)


In this paper, we propose the empirical analysis of Hierarchical Attention Network (HAN) as a feature extractor that works conjointly with eXtreme Gradient Boosting (XGBoost) as the classifier to recognize insufficiently supported arguments using a publicly available dataset. Besides HAN + XGBoost, we performed experiments with several other deep learning models, such as Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), bidirectional LSTM, and bidirectional GRU. All results with the best hyper-parameters are presented. In this paper, we present the following three key findings: (1) Shallow models work significantly better than the deep models when using only a small dataset. (2) Attention mechanism can improve the deep model’s result. In average, it improves Area Under the Receiver Operating Characteristic Curve (ROC-AUC) score of Recurrent Neural Network (RNN) with a margin of 18.94%. The hierarchical attention network gave a higher ROC-AUC score by 2.25% in comparison to the non-hierarchical one. (3) The use of XGBoost as the replacement for the last fully connected layer improved the F1 macro score by 5.26%. Overall our best setting achieves 1.88% improvement compared to the state-of-the-art result.


Hierarchical Attention Network XGBoost Insufficiently supported argument Shallow learning Deep learning 



This research was fully funded by “Penelitian Disertasi Doktor” from Ministry of Research, Technology and Higher Education of Indonesia with contract number 039A/VR.RTT/VI/2017.


  1. 1.
    Aharoni, E., Polnarov, A., Lavee, T., Hershcovich, D., Levy, R., Rinott, R., Gutfreund, D., Slonim, N.: A benchmark dataset for automatic detection of claims and evidence in the context of controversial topics. In: Proceedings of the First Workshop on Argumentation Mining, pp. 64–68 (2014)Google Scholar
  2. 2.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
  3. 3.
    Bar-Haim, R., Bhattacharya, I., Dinuzzo, F., Saha, A., Slonim, N.: Stance classification of context-dependent claims (2016)Google Scholar
  4. 4.
    Bengio, Y., Goodfellow, I.J., Courville, A.: Deep learning. Nature 521, 436–444 (2015)CrossRefzbMATHGoogle Scholar
  5. 5.
    Bilu, Y., Hershcovich, D., Slonim, N.: Automatic claim negation: why, how and when. In: NAACL HLT 2015, p. 84 (2015)Google Scholar
  6. 6.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)CrossRefGoogle Scholar
  7. 7.
    Cabrio, E., Villata, S.: Combining textual entailment and argumentation theory for supporting online debates interactions. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 208–212. Association for Computational Linguistics (2012)Google Scholar
  8. 8.
    Caruana, R., Lawrence, S., Giles, L.: Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In: NIPS, pp. 402–408 (2000)Google Scholar
  9. 9.
    Chen, T., Guestrin, C.: Xgboost: reliable large-scale tree boosting system. In: Proceedings of the 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, pp. 13–17 (2016)Google Scholar
  10. 10.
    Chen, T., He, T.: Xgboost: extreme gradient boosting. R package version 0.4-2 (2015)Google Scholar
  11. 11.
    Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)
  12. 12.
    Chollet, F.K.: (2015).
  13. 13.
    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
  14. 14.
    Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM (2006)Google Scholar
  15. 15.
    Desilia, Y., Utami, V.T., Arta, C., Suhartono, D.: An attempt to combine features in classifying argument components in persuasive essays. In: 17th Workshop on Computational Models of Natural Argument (CMNA) (2017)Google Scholar
  16. 16.
    Do, C., Ng, A.Y.: Transfer learning for text classification. In: NIPS, pp. 299–306 (2005)Google Scholar
  17. 17.
    Dozat, T.: Incorporating nesterov momentum into adam (2016)Google Scholar
  18. 18.
    Eckle-Kohler, J., Kluge, R., Gurevych, I.: On the role of discourse markers for discriminating claims and premises in argumentative discourse. In: EMNLP, pp. 2236–2242 (2015)Google Scholar
  19. 19.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27(8), 861–874 (2006)CrossRefMathSciNetGoogle Scholar
  20. 20.
    Firat, O., Cho, K., Bengio, Y.: Multi-way, multilingual neural machine translation with a shared attention mechanism. arXiv preprint arXiv:1601.01073 (2016)
  21. 21.
    Gema, A.P., Winton, S., David, T., Suhartono, D., Shodiq, M., Gazali, W.: It takes two to tango: modification of siamese long short termmemory network with attention mechanism in recognizing argumentative relations in persuasive essay. In: 2nd International Conference on Computer Science and Computational Intelligence (2017)Google Scholar
  22. 22.
    Govier, T.: A Practical Study of Argument. Cengage Learning, Boston (2013)Google Scholar
  23. 23.
    Habernal, I., Gurevych, I.: Which argument is more convincing? Analyzing and predicting convincingness of web arguments using bidirectional LSTM. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) (2016)Google Scholar
  24. 24.
    Hinton, G.E., Salakhutdinov, R.R.: Replicated softmax: an undirected topic model. In: Advances in Neural Information Processing Systems, pp. 1607–1614 (2009)Google Scholar
  25. 25.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  26. 26.
    Johnson, R.H., Blair, J.A.: Logical Self-defense. Idea, New Delhi (2006)Google Scholar
  27. 27.
    Johnson, R., Zhang, T.: Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:1412.1058 (2014)
  28. 28.
    Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
  29. 29.
    Levy, R., Bilu, Y., Hershcovich, D., Aharoni, E., Slonim, N.: Context dependent claim detection (2014)Google Scholar
  30. 30.
    Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics (2011)Google Scholar
  31. 31.
    Moens, M.F., Boiy, E., Palau, R.M., Reed, C.: Automatic detection of arguments in legal texts. In: Proceedings of the 11th International Conference on Artificial Intelligence and Law, pp. 225–230. ACM (2007)Google Scholar
  32. 32.
    Palau, R.M., Moens, M.F.: Argumentation mining: the detection, classification and structuring of arguments in text. In: International Conference on Artificial Intelligence and Law (2009)Google Scholar
  33. 33.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)CrossRefGoogle Scholar
  34. 34.
    Parsons, S., Oren, N., Reed, C.: Computational Models of Argument: Proceedings of COMMA 2014, vol. 266. IOS Press, Amsterdam (2014)zbMATHGoogle Scholar
  35. 35.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)Google Scholar
  36. 36.
    Persing, I., Ng, V.: Modeling argument strength in student essays. In: ACL, vol. 1, pp. 543–552 (2015)Google Scholar
  37. 37.
    Rinott, R., Dankin, L., Perez, C.A., Khapra, M.M., Aharoni, E., Slonim, N.: Show me your evidence-an automatic method for context dependent evidence detection. In: EMNLP, pp. 440–450 (2015)Google Scholar
  38. 38.
    Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: Papers from the 1998 Workshop on Learning for Text Categorization, vol. 62, pp. 98–105 (1998)Google Scholar
  39. 39.
    Sandulescu, V., Chiru, M.: Predicting the future relevance of research institutions-the winning solution of the KDD cup 2016. arXiv preprint arXiv:1609.02728 (2016)
  40. 40.
    Sardianos, C., Katakis, I.M., Petasis, G., Karkaletsis, V.: Argument extraction from news. In: Proceedings of the 2nd Workshop on Argumentation Mining, pp. 56–66 (2015)Google Scholar
  41. 41.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)zbMATHMathSciNetGoogle Scholar
  42. 42.
    Stab, C., Gurevych, I.: Identifying argumentative discourse structures in persuasive essays. In: EMNLP, pp. 46–56 (2014)Google Scholar
  43. 43.
    Stab, C., Gurevych, I.: Recognizing insufficiently supported arguments in argumentative essays, pp. 980–990 (2017)Google Scholar
  44. 44.
    Suhartono, D., Iskandar, A.A., Fanany, M.I., Manurung, R.: Utilizing word vector representation for classifying argument components in persuasive essays (2016)Google Scholar
  45. 45.
    Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: EMNLP, pp. 1422–1432 (2015)Google Scholar
  46. 46.
    Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 90–94. Association for Computational Linguistics (2012)Google Scholar
  47. 47.
    Wei, Z., Liu, Y., Li, Y.: Is this post persuasive? Ranking argumentative comments in the online forum. In: The 54th Annual Meeting of the Association for Computational Linguistics, p. 195 (2016)Google Scholar
  48. 48.
    Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)Google Scholar
  49. 49.
    Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of NAACL-HLT, pp. 1480–1489 (2016)Google Scholar
  50. 50.
    Yao, Y., Rosasco, L., Caponnetto, A.: On early stopping in gradient descent learning. Constr. Approx. 26(2), 289–315 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  51. 51.
    Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Derwin Suhartono
    • 1
    • 2
  • Aryo Pradipta Gema
    • 1
  • Suhendro Winton
    • 1
  • Theodorus David
    • 1
  • Mohamad Ivan Fanany
    • 2
  • Aniati Murni Arymurthy
    • 2
  1. 1.Computer Science Department, School of Computer ScienceBina Nusantara UniversityJakartaIndonesia
  2. 2.Machine Learning and Computer Vision (MLCV) LaboratoryUniversitas IndonesiaDepokIndonesia

Personalised recommendations