Advertisement

An Active Learning Based on Uncertainty and Density Method for Positive and Unlabeled Data

  • Jun Luo
  • Wenan Zhou
  • Yu Du
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11334)

Abstract

Active learning can select most informative unlabeled samples to manually annotate to enlarge the training set. Many active learning methods have been proposed so far, most of them work for these data that have all classes of tagged data. A few methods work for positive and unlabeled data and the computational complexity of existing methods is particularly high and they can’t work well for big data. In this paper, we proposed an active learning approach that works well when only small number positive data are available in big data. We utilize data preprocessing to remove most of the outliers, so the density calculation is simplified relative to KNN algorithm, and our proposed sample selection strategy Min-Uncertainty Density (MDD) can help select more uncertain and higher density unlabeled samples with less computation. A combined semi-supervised learning active learning technique (MDD-SSAL) automatically annotating some confident unlabeled samples in the each iteration is proposed to reduce the number of manually annotated samples. Experimental results indicate that our proposed method is competitive with other similar methods.

Keywords

Active learning Positive and unlabeled data Semi-supervised learning Big data 

Notes

Acknowledgments

Supported by the National Science and Technology Major Project (2018ZX03001019-003), the National Natural Science Foundation of China (Grant No.61372088).

References

  1. 1.
    Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2(1), 999–1006 (2001)zbMATHGoogle Scholar
  2. 2.
    Wang, M., Hua, X.S.: Active learning in multimedia annotation and retrieval a survey. ACM Trans. Intell. Syst. Technol. 2(2), 1–21 (2011)CrossRefGoogle Scholar
  3. 3.
    Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught Learning (2007)Google Scholar
  4. 4.
    Xiaojin, Z.: Semi-supervised learning literature survey 37(1), 63–77 (2005)Google Scholar
  5. 5.
    Liu, B., Lee, W. S., Yu, P.S., Li, X.: Partially supervised classification of text documents. In: Nineteenth International Conference on Machine Learning, pp. 387–394. Morgan Kaufmann Publishers Inc. (2002)Google Scholar
  6. 6.
    Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: International Joint Conference on Artificial Intelligence, pp. 587–592. Morgan Kaufmann Publishers Inc. (2003)Google Scholar
  7. 7.
    Ren, Y.F., Ji, D.H., Zhang, H.B.: Positive unlabeled learning for deceptive reviews detection. In: EMNLP, pp. 488–498 (2014)Google Scholar
  8. 8.
    Plessis, M.C.D., Niu, G., Sugiyama, M.: Convex formulation for learning from positive and unlabeled data, pp. 1386–1394 (2015)Google Scholar
  9. 9.
    Zhang, J., Wang, Z., Yuan, J., Tan, Y.P.: Positive and unlabeled learning for anomaly detection with multi-features, pp. 854–862. ACM (2017)Google Scholar
  10. 10.
    Gu, Y., Jin, Z., Chiu, S.C.: Active learning combining uncertainty and diversity for multi-class image classification. IET Comput. Vis. 9(3), 400–407 (2015)CrossRefGoogle Scholar
  11. 11.
    He, G., Li, Y., Zhao, W.: An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification. Knowl.-Based Syst. 124, 8092 (2017)CrossRefGoogle Scholar
  12. 12.
    Li, Y., He, G., Xia, X., Li, Y.: A reverse nearest neighbor based active semi-supervised learning method for multivariate time series classification. In: Hartmann, S., Ma, H. (eds.) DEXA 2016. LNCS, vol. 9827, pp. 272–286. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-44403-1_17CrossRefGoogle Scholar
  13. 13.
    Zhu, J., Wang, H., Ma, M., Ma, M.: Active learning with sampling by uncertainty and density for data annotations. IEEE Trans. Audio Speech Lang. Process. 18(6), 1323–1331 (2010)CrossRefGoogle Scholar
  14. 14.
    Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. IEEE Trans. Pattern Anal. Mach. Intell. 36(10), 1936–1949 (2014)CrossRefGoogle Scholar
  15. 15.
    Guo, H., Wang, W.: An active learning-based SVM multi-class classification model. Pattern Recognit. 48(5), 1577–1597 (2015)CrossRefGoogle Scholar
  16. 16.
    Ghasemi, A., Rabiee, H.R., Fadaee, M., Manzuri, M.T., Rohban, M.H.: Active learning from positive and unlabeled data. In: IEEE, International Conference on Data Mining Workshops, pp. 244–250. IEEE (2012)Google Scholar
  17. 17.
    Seung, H.S., Opper, M., Sompolinsky.: Query by committee. In: Proceedings of the Fifth Workshop on Computational Learning Theory, vol. 284, pp. 287–294 (1992)Google Scholar
  18. 18.
    Hady, M.F.A., Schwenker, F.: Combining committee-based semi-supervised learning and active learning. J. Comput. Sci. Technol. 25(4), 681–698 (2010)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Abe, N., Mamitsuka, H.: Query learning strategies using boosting and bagging. In: Fifteenth International Conference on Machine Learning, pp. 1–9. DBLP (1998)Google Scholar
  20. 20.
    Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: Conference on Empirical Methods in Natural Language Processing, pp. 1070–1079. Association for Computational Linguistics (2008)Google Scholar
  21. 21.
    Dasgupta, S., Hsu, D.: Hierarchical sampling for active learning. In: Proceedings of Icml, pp. 208–215 (2015)Google Scholar
  22. 22.
    Wang, M., Min, F., Zhang, Z.H., Wu, Y.X.: Active learning through density clustering. Expert Syst. Appl. 85, 305–317 (2017)CrossRefGoogle Scholar
  23. 23.
    He, G., Duan, Y., Li, Y., Qian, T., He, J., Jia, X.: Active learning for multivariate time series classification with positive unlabeled data. In: IEEE International Conference on TOOLS with Artificial Intelligence, pp. 178–185. IEEE (2016)Google Scholar
  24. 24.
  25. 25.

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Computer ScienceBeijing University of Posts and TelecommunicationsBeijingPeople’s Republic of China

Personalised recommendations