A Study Based on Distributed Supervised Machine Learning System for Text Classification

  • Jingyi Xu
  • Duo Li
  • Shiwen Yu
  • Xue Bai
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 124)

Abstract

Complex data confronts the centralized supervised machine learning system (CSMLS) with some embarrassments in performance, self-adaptability and scalability for text classification. In this paper, aiming to resolve these embarrassments, a novel distributed supervised machine learning system (DSMLS) model is proposed. Based on data distribution consistency, classifier performance, and evidence belief we fuse predicted information from these diverse classification agents in DSMLS. It is experimentally shown that DSMLS provides better performance than CSMLS. Compared with CSMLS, maximally DSMLS reduces 21.5% in training time and improves 8.4% in F1.

Keywords

Classifier Performance Combination Rule Evidence Theory Basic Probability Assignment Machine Learning Research 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tozicka, J., et al.: A Framework for Agent-based Distributed Machine Learning and Data Mining. In: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, NY, pp. 1–8 (2007)Google Scholar
  2. 2.
    McDonald, R., et al.: Distributed Training Strategies for The Structured Perceptron, North American Chapter of the Association for Computational Linguistics (2010)Google Scholar
  3. 3.
    Mann, G., et al.: Efficient Large-scale Distributed Training of Conditional Maximum Entropy Models. In: Advances in Neural Information Processing Systems (2009)Google Scholar
  4. 4.
    Bo, Y., et al.: Combining Multiple Classifiers for Thematic Classification of Remotely Sensed Data. Journal of Remote Sensing 9(5), 557–562 (2005)Google Scholar
  5. 5.
    Chengzhi, Z.: Automatic Indexing Method Based on Ensemble Learning. Journal of the China Society for Scientific and Technical Information 29(1), 3–8 (2010)Google Scholar
  6. 6.
    Altincay, H., et al.: Speaker Identification by Combining Multiple Classifiers Using Dempster–Shafer Theory of Evidence. Speech Communication 41(4), 531–547 (2003)CrossRefGoogle Scholar
  7. 7.
    Yanhui, D., et al.: Semantic Annotation of Web Data based on Ensemble Learning and 2D Correlative-Chain Conditional Random Fields. Journal of Computers 33(2), 1–16 (2010)Google Scholar
  8. 8.
    Al-Ani, A., et al.: A New Technique for Combining Multiple Classifiers Using the Dempster-Shafer Theory of Evidence. Journal of Artificial Intelligence Research 17, 333–361 (2002)MATHMathSciNetGoogle Scholar
  9. 9.
    Reformat, M., Yager, R.: Building Ensemble Classifiers Using Belief Functions and OWA Operators. Soft Computing 12(6), 543–558 (2008)CrossRefMATHGoogle Scholar
  10. 10.
    Shunxian, Z., et al.: Text Information Extraction Based on Clustering Hidden Markov Model. Journal of System Simulation 19(21), 4926–4931 (2007)Google Scholar
  11. 11.
    Shafer, G.A.: Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)MATHGoogle Scholar
  12. 12.
    Zhaonian, Z.: The Research of Combination Algorithms of Weighted Evidence. Jilin University (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jingyi Xu
    • 1
  • Duo Li
    • 2
  • Shiwen Yu
    • 1
  • Xue Bai
    • 3
  1. 1.ICLPeking UniversityBeijingChina
  2. 2.DCLLPeking UniversityBeijingChina
  3. 3.Information Center of Health BureauBeijingChina

Personalised recommendations