Machine Learning

, Volume 107, Issue 5, pp 797–824 | Cite as

Learning with rationales for document classification

  • Manali Sharma
  • Mustafa Bilgic


We present a simple and yet effective approach for document classification to incorporate rationales elicited from annotators into the training of any off-the-shelf classifier. We empirically show on several document classification datasets that our classifier-agnostic approach, which makes no assumptions about the underlying classifier, can effectively incorporate rationales into the training of multinomial naïve Bayes, logistic regression, and support vector machines. In addition to being classifier-agnostic, we show that our method has comparable performance to previous classifier-specific approaches developed for incorporating rationales and feature annotations. Additionally, we propose and evaluate an active learning method tailored specifically for the learning with rationales framework.


Document classification Learning with rationales Active learning 



This material is based upon work supported by the National Science Foundation CAREER Award No. 1350337.


  1. Attenberg, J., Melville, P., & Provost, F. (2010). A unified approach to active dual supervision for labeling features and examples. In European conference on machine learning and knowledge discovery in databases, pp. 40–55.Google Scholar
  2. Chapelle, O., Schölkopf, B., & Zien, A. (Eds.). (2006). Semi-supervised learning. Cambridge, MA: MIT Press.Google Scholar
  3. Cleveland, W. S., & Devlin, S. J. (1988). Locally weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association, 83(403), 596–610.CrossRefzbMATHGoogle Scholar
  4. Das, S., Moore, T., Wong, W. K., Stumpf, S., Oberst, I., McIntosh, K., et al. (2013). End-user feature labeling: Supervised and semi-supervised approaches based on locally-weighted logistic regression. Artificial Intelligence, 204, 56–74.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Donahue, J., & Grauman, K. (2011). Annotator rationales for visual recognition. In 2011 IEEE international conference on computer vision (ICCV), pp. 1395–1402.Google Scholar
  6. Druck, G., Settles, B., & McCallum, A. (2009). Active learning by labeling features. In Proceedings of the 2009 conference on empirical methods in natural language processing: Volume 1-volume 1, pp. 81–90.Google Scholar
  7. Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8), 861–874.MathSciNetCrossRefGoogle Scholar
  8. Fung, G. M., Mangasarian, O. L., & Shavlik, J. W. (2002). Knowledge-based support vector machine classifiers. In Advances in neural information processing systems, pp. 521–528.Google Scholar
  9. Girosi, F., & Chan, N. T. (1995). Prior knowledge and the creation of virtual examples for rbf networks. In Neural networks for signal processing [1995] V. Proceedings of the 1995 IEEE workshop, pp. 201–210.Google Scholar
  10. Guyon, I. (2011). Results of active learning challenge.Google Scholar
  11. Lewis, D., & Catlett, J. (1994). Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the eleventh international conference on machine learning, pp. 148–156.Google Scholar
  12. Lewis, D. D., & Gale, W. A. (1994). A sequential algorithm for training text classifiers. In ACM SIGIR conference on research and development in information retrieval, pp. 3–12.Google Scholar
  13. Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1, pp. 142–150.Google Scholar
  14. Melville, P., Gryc, W., & Lawrence, R. D. (2009). Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp. 1275–1284.Google Scholar
  15. Melville, P., & Sindhwani, V. (2009). Active dual supervision: Reducing the cost of annotating examples and features. In Proceedings of the NAACL HLT 2009 workshop on active learning for natural language processing, pp. 49–57.Google Scholar
  16. Parikh, D., & Grauman, K. (2011). Relative attributes. In 2011 IEEE international conference on computer vision (ICCV). IEEE, pp. 503–510.Google Scholar
  17. Parkash, A., & Parikh, D. (2012). Attributes for classifier feedback. In Computer vision–ECCV 2012. Springer, pp. 354–368.Google Scholar
  18. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.MathSciNetzbMATHGoogle Scholar
  19. Raghavan, H., & Allan, J. (2007). An interactive algorithm for asking and incorporating feature feedback into support vector machines. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 79–86.Google Scholar
  20. Raghavan, H., Madani, O., & Jones, R. (2006). parkash:eccv2012. Journal of Machine Learning Research, 7, 1655–1686.Google Scholar
  21. Ramirez-Loaiza, M. E., Sharma, M., Kumar, G., & Bilgic, M. (2016). Active learning: An empirical study of common baselines. Data Mining and Knowledge Discovery, 1–27.
  22. Roy, N., & McCallum, A. (2001). Toward optimal active learning through sampling estimation of error reduction. In International conference on machine learning, pp. 441–448.Google Scholar
  23. Segal, R., Markowitz, T., & Arnold, W. (2006). Fast uncertainty sampling for labeling large e-mail corpora. In Conference on email and anti-spam.Google Scholar
  24. Settles, B. (2012). Active learning. Synthesis lectures on artificial intelligence and machine learning. San Rafael: Morgan & Claypool.zbMATHGoogle Scholar
  25. Seung, H. S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In ACM annual workshop on computational learning theory, pp. 287–294.Google Scholar
  26. Sharma, M., & Bilgic, M. (2013). Most-surely vs. least-surely uncertain. In IEEE 13th international conference on data mining, pp. 667–676.Google Scholar
  27. Sharma, M., Zhuang, D., & Bilgic, M. (2015). Active learning with rationales for text classification. In North American chapter of the association for computational linguistics human language technologies, pp. 441–451.Google Scholar
  28. Sindhwani, V., Melville, P., & Lawrence, R. D. (2009). Uncertainty sampling and transductive experimental design for active dual supervision. In Proceedings of the international conference on machine learning, pp. 953–960.Google Scholar
  29. Small, K., Wallace, B., Trikalinos, T., & Brodley, C. E. (2011). The constrained weight space svm: Learning with ranked features. In Proceedings of the 28th international conference on machine learning (ICML-11), pp. 865–872.Google Scholar
  30. Stumpf, S., Rajaram, V., Li, L., Burnett, M., Dietterich, T., Sullivan, E., et al. (2007). Toward harnessing user feedback for machine learning. In Proceedings of the 12th international conference on intelligent user interfaces, pp. 82–91.Google Scholar
  31. Stumpf, S., Rajaram, V., Li, L., Wong, W. K., Burnett, M., Dietterich, T., et al. (2009). Interacting meaningfully with machine learning systems: Three experiments. International Journal of Human-Computer Studies, 67(8), 639–662.CrossRefGoogle Scholar
  32. Tong, S., & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2, 45–66.zbMATHGoogle Scholar
  33. Towell, G. G., & Shavlik, J. W. (1994). Knowledge-based artificial neural networks. Artificial Intelligence, 70(1), 119–165.CrossRefzbMATHGoogle Scholar
  34. Towell, G. G., Shavlik, J. W., & Noordewier, M. (1990). Refinement of approximate domain theories by knowledge-based neural networks. In Proceedings of the eighth national conference on artificial intelligence, pp. 861–866.Google Scholar
  35. Zaidan, O., Eisner, J., & Piatko, C. D. (2007). Using “annotator rationales” to improve machine learning for text categorization. In HLT-NAACL, pp. 260–267.Google Scholar
  36. Zaidan, O. F., Eisner, J., & Piatko, C. (2008). Machine learning with annotator rationales to reduce annotation cost. In Proceedings of the NIPS* 2008 workshop on cost sensitive learning.Google Scholar
  37. Zhu, J., & Hovy, E. (2007). Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, pp. 783–790.Google Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Illinois Institute of TechnologyChicagoUSA

Personalised recommendations