The Application of Digital Archives Classification with Progressive M-SVM to Wisdom School Building

Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 107)


When using the SVM algorithm, the training set is so large that the traditional classification methods can’t satisfy the real-time requirements, how to design a more efficient SVM algorithm is one of the important study problems. We improve the method of the building about the digital archive’s corpus and also improve the course of Chinese participle and the multiprocessing of text feature selection with TF, IDF and Information Gain. The experiment shows that this improved method about M-SVM has obtained a better result.


Wisdom school Digital archive’s corpus Feature selection Progressive M-SVM 


  1. 1.
    Yuan P, Chen Y, Jin H (2008) MSVM-KNN: Combining SVM and K-NN for multi-class text classification [C]. In: Huang shan: IEEE international workshop on semantic computing and systems, pp 133–140Google Scholar
  2. 2.
    Appavu AB, Rajaram R (2007) Suspicious E-mail detection via decision tree: A data mining approach [J]. CIT. J Comput Inf Technol 15(2):161–166Google Scholar
  3. 3.
    Sun R-Z An improved KNN algorithm for text classification. Comput Knowl Technol. doi: CNKI:SUN:DNZS.0.2010-01-073Google Scholar
  4. 4.
    He B, Chuan KC, Han J (2004) Discovering complex matchings across web query interfaces: a correlation mining approach [C] KDD WashingtonGoogle Scholar
  5. 5.
    Beguel C (2005) Text mining and natural language processing technologies to support competitive intelligence efforts. Temis, LondonGoogle Scholar
  6. 6.
    Feng Y, Li H Zhong J, Ye C-X (2010) Text classification algorithm based on adaptive Chinese word segmentation and proximal SVM 01–064. doi: CNKI:SUN:JSJA.0Google Scholar
  7. 7.
    Jiang H, Chen L-Y (2010) A new feature selection method in SVM text categorization, Comput Technol Dev. 03-006. doi:CNKI:SUN:WJFZ.0Google Scholar
  8. 8.
    Li YJ et al (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64:381–404CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.Tourism College of ZhejiangHangzhouChina

Personalised recommendations