Skip to main content

A Cost-Reducing Partial Labeling Estimator in Text Classification Problem

  • Conference paper
  • First Online:
Advances in Information and Communication (FICC 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1130))

Included in the following conference series:

  • 1359 Accesses

Abstract

The paper proposes a new approach to address the text classification problems when learning with partial labels is beneficial. Instead of offering each training sample a set of candidate labels, researchers assign negative-oriented labels to ambiguous training examples if they are unlikely falling into certain classes. Researchers construct two new maximum likelihood estimators with self-correction property, and prove that under some conditions, new estimators converge faster. Also the paper discusses the advantages of applying one of new estimators to a fully supervised learning problem. The proposed method has potential applicability in many areas, such as crowd-sourcing, natural language processing and medical image analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)

    Article  Google Scholar 

  2. Chen, J., Matzinger, H., Zhai, H., Zhou, M.: Centroid estimation based on symmetric KL divergence for multinomial text classification problem. arXiv preprint arXiv:1808.10261 (2018)

  3. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  4. Cour, T., Sapp, B., Taskar, B.: Learning from partial labels. J. Mach. Learn. Res. 12, 1501–1536 (2011)

    MathSciNet  MATH  Google Scholar 

  5. Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 148–155. ACM (1998)

    Google Scholar 

  6. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997)

    Article  Google Scholar 

  7. Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999)

    Google Scholar 

  8. Jin, R., Ghahramani, Z.: Learning with multiple labels. In: Advances in Neural Information Processing Systems, pp. 921–928 (2003)

    Google Scholar 

  9. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: European Conference on Machine Learning, pp. 137–142. Springer (1998)

    Google Scholar 

  10. Lang, K.: 20 newsgroups data set

    Google Scholar 

  11. Langley, P., Iba, W., Thompson, K., et al.: An analysis of Bayesian classifiers. In: AAAI, vol. 90, pp. 223–228 (1992)

    Google Scholar 

  12. Larkey, L.S.: Automatic essay grading using text categorization techniques. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 90–95. ACM (1998)

    Google Scholar 

  13. Lewis, D.D.: Reuters-21578

    Google Scholar 

  14. Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: IJCAI, vol. 3, pp. 587–592 (2003)

    Google Scholar 

  15. Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)

  16. McCallum, A.: Multi-label text classification with a mixture model trained by EM. In: AAAI Workshop on Text Learning, pp. 1–7 (1999)

    Google Scholar 

  17. Nguyen, N., Caruana, R.: Classification with partial labels. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–559. ACM (2008)

    Google Scholar 

  18. Belongie, S., Welinder, P., Branson, S., Perona, P.: The multidimensional wisdom of crowds. In: Advances in Neural Information Processing Systems, pp. 2424–2432 (2010)

    Google Scholar 

  19. Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 133–142 (2003)

    Google Scholar 

  20. Rish, I., et al.: An empirical study of the Naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol. 3, pp. 41–46. IBM, New York (2001)

    Google Scholar 

  21. Schneider, K.-M.: A new feature selection score for multinomial Naive Bayes text classification based on KL-divergence. In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, p. 24. Association for Computational Linguistics (2004)

    Google Scholar 

  22. Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622. ACM (2008)

    Google Scholar 

  23. Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)

    Google Scholar 

  24. Zhang, M.-L., Zhou, B.-B., Liu, X.-Y.: Partial label learning via feature-aware disambiguation. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1335–1344. ACM (2016)

    Google Scholar 

  25. Zhang, M.-L., Zhou, Z.-H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)

    Article  Google Scholar 

  26. Zhang, M.-L., Zhou, Z.-H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhibo Dai .

Editor information

Editors and Affiliations

Appendices

A Proof of Theorem 1

Proof

With assumption \(\sum _{j=1}^v x_j = m\), we can rewrite (3.10) as:

$$\hat{\theta }_{i_j} = \frac{\sum _{d\in C_i}x_j}{\sum _{d\in C_i}m}=\frac{\sum _{d\in C_i}x_j}{|C_i|m}.$$

Since \(d=(x_1,x_2,...,x_v)\) is multinomial distribution, with d in class \(C_i\), we have: \(E[x_j] = m\cdot \theta _{i_j}\), and \(E[x_j^2] = m\theta _{i_j}(1-\theta _{i_j}+m\theta _{i_j}).\)

  1. 1.
    $$\hat{\theta }_{i_j}=E[\frac{\sum _{d\in C_i}x_j}{|C_i|m}]=\frac{\sum _{d\in C_i}E[x_j]}{|C_i|m}=\frac{\sum _{d\in C_i}m\cdot \theta _{i_j}}{|C_i|m}=\theta _{i_j}.$$

    Thus \(\hat{\theta }_{i_j}\) is unbiased.

  2. 2.

    By (1), we have:

    $$E[|\hat{\theta }_{i_j}-\theta _{i_j}|^2]=E[\hat{\theta }_{i_j}^2]-2\theta _{i_j}E[\hat{\theta }_{i_j}]+\theta _{i_j}^2=E[\hat{\theta }_{i_j}^2]-\theta _{i_j}^2.$$

    Then

    $$\begin{aligned} \hat{\theta }_{i_j}^2=\frac{(\sum _{d\in C_i}x_j)^2}{|C_i|^2m^2}=\frac{\sum _{d\in C_i}x_j^2+\sum _{d_1,d_2\in C_i}2x_j^{d_1}x_j^{d_2}}{|C_i|^2m^2}, \end{aligned}$$
    (A.1)

    where \(d_i=(x_1^{d_i},x_2^{d_i},...,x_v^{d_i})\) for \(i=1,2\). Since:

    $$E[\sum _{d\in C_i}x_j^2]=\frac{|C_i|m\theta _{i_j}(1-\theta _{i_j}+m\theta _{i_j})}{|C_i|^2m^2}=\frac{\theta _{i_j}(1-\theta _{i_j}+m\theta _{i_j})}{|C_i|m},$$

    and

    $$E[\sum _{d_1,d_2\in C_i}2x_j^{d_1}x_j^{d_2}]=\frac{|C_i|(|C_i|-1)m^2\theta _{i_j}^2}{|C_i|^2m^2}=\frac{(|C_i|-1)\theta _{i_j}^2}{|C_i|}.$$

    Plugging them into (A.1) obtains:

    $$E[\hat{\theta }_{i_j}^2] = \frac{\theta _{i_j}(1-\theta _{i_j})}{|C_i|m} + \theta _{i_j}^2,$$

    thus: \(E[|\hat{\theta }_{i_j}-\theta _{i_j}|^2] = \frac{\theta _{i_j}(1-\theta _{i_j})}{|C_i|m}\).

   \(\square \)

B Figures

See Figs. 1, 2, 3 and 4.

Fig. 1.
figure 1

We take 10 largest groups in Reuter-21578 dataset (a) and 20 news group dataset (b), and take 20% of the data as training set, among which \(|S_1|=|S_2|\). The y-axis is the accuracy, and the x-axis is the class index.

Fig. 2.
figure 2

We take 10 largest groups in Reuter-21578 dataset (a), and 20 news group dataset (b), and take 90% of the data as \(S_2\) training set. The y-axis is the accuracy, and the x-axis is the class index.

Fig. 3.
figure 3

We take 10 largest groups in Reuter-21578 dataset (a), and 20 news group dataset (b), and take 10% of the data as \(S_1\) training set. The y-axis is the accuracy, and the x-axis is the class index.

Fig. 4.
figure 4

We take 10 largest groups in Reuter-21578 dataset (a), and 20 news group dataset (b), and take 10% of the data as \(S_1\) training set. We test the result on training set. The y-axis is the accuracy, and the x-axis is the class index.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, J. et al. (2020). A Cost-Reducing Partial Labeling Estimator in Text Classification Problem. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Advances in Information and Communication. FICC 2020. Advances in Intelligent Systems and Computing, vol 1130. Springer, Cham. https://doi.org/10.1007/978-3-030-39442-4_37

Download citation

Publish with us

Policies and ethics