SVM based adaptive learning method for text classification from positive and unlabeled documents

Peng, Tao; Zuo, Wanli; He, Fengling

doi:10.1007/s10115-007-0107-1

SVM based adaptive learning method for text classification from positive and unlabeled documents

Regular Paper
Published: 06 September 2007

Volume 16, pages 281–301, (2008)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Tao Peng¹,
Wanli Zuo^1,2 &
Fengling He^1,2

550 Accesses
49 Citations
Explore all metrics

Abstract

Automatic text classification is one of the most important tools in Information Retrieval. This paper presents a novel text classifier using positive and unlabeled examples. The primary challenge of this problem as compared with the classical text classification problem is that no labeled negative documents are available in the training example set. Firstly, we identify many more reliable negative documents by an improved 1-DNF algorithm with a very low error rate. Secondly, we build a set of classifiers by iteratively applying the SVM algorithm on a training data set, which is augmented during iteration. Thirdly, different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Finally, we discuss an approach to evaluate the weighted vote of all classifiers generated in the iteration steps to construct the final classifier based on PSO (Particle Swarm Optimization), which can discover the best combination of the weights. In addition, we built a focused crawler based on link-contexts guided by different classifiers to evaluate our method. Several comprehensive experiments have been conducted using the Reuters data set and thousands of web pages. Experimental results show that our method increases the performance (F₁-measure) compared with PEBL, and a focused web crawler guided by our PSO-based classifier outperforms other several classifiers both in harvest rate and target recall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Pattern Based Two-Stage Text Classifier

A Novel Active Learning Method Using SVM for Text Classification

Article 25 July 2016

A new feature selection metric for text classification: eliminating the need for a separate pruning stage

Article 11 April 2021

References

Bing L, Wee SL, Philip S Yu, Xiaoli L (2002) Partially supervised classification of text documents. The nineteenth international conference on machine learning (ICML), Sydney, Australia, pp 384–397
Bing L, Yang D, Xiaoli L, Wee SL, Philip S Yu (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM 2003), Melbourne, Florida, USA, pp 179–188
Carlisle A, Dozier G (2001) An Off-The-Shelf PSO. In: Proceedings of the workshop on particle swarm optimization, Indianapolis, pp 1–6
Craven M, DiPasquo D, Freitag D, McCallum A, Mitchell T, Nigam K, Slattery S (1998) Learning to extract symbolic knowledge from the World Wide Web. In: Proceedings of the fifteenth national conference on artificial intellligence (AAAI-98), Madison, USA, pp 509–516
De Falco I, Della Cioppa A, Iazzetta A and Tarantino E (2005). An evolutionary approach for automatically extracting intelligible classification rules. Knowl Inf Syst 7(2): 179–201
Article Google Scholar
Denis F (1998) PAC learning from positive statistical queries. In: Proceedings of the 9th international conference on algorithmic learning theory. Lecture Notes in Artificial Intelligence. vol 1501, Springer, Heidelberg, pp 112–126
Denis F, Gilleron R, Tommasi M (2002) Text classification from positive and unlabeled examples. Conference on information processing and management of uncertainty in knowledge-based systems (IPMU), Annecy, France, pp 1927–1934
DeComite F, Denis F, Gilleron R (1999) Positive and unlabeled examples help learning. In: Proceedings of the 10th international conference on algorithmic learning theory, Tokyo, Japan, pp 219–230
Eberhart RC, Shi Y (2000) Comparing inertia weights and constriction factors in particle swarm optimization. In: Proceedings of the 2000 congress on evolutionary computation. Washington, DC, vol 1, pp 84–88
Hwanjo Y, Jiawei H, Kevin Chen-Chuan Chang (2002) PEBL: Positive example based learning for Web page classification using SVM. In: Proceedings 8th international conference on knowledge discovery and data mining (KDD’02), Edmonton, Canada, pp 239–248
Han ES, Karypis G, Kumar V (2001) Text categorization using weight adjusted k-nearest neighbor classification. In: Proceedings of the 5th Pacific-Asia conference on knowledge discovery and data mining, Hong Kong, pp 53–65
Kennedy J, Eberhart R (1995) Particle swarm optimization, IEEE International Conference on Neural Networks, Perth, Australia, vol 4, 1942–1948
Lin WY and Kuo IC (2004). A genetic selection algorithm for OLAP data cubes. Knowl Inf Syst 6(1): 83–102
Article Google Scholar
Larry MM and MalikYousef (2001). One-Class SVMs for document classification. J Mach Learn Res 2: 139–154
Google Scholar
Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: SIGIR ’94: Proceedings of the seventeenth annual international ACM SIGIR conference on research and development in information retrieval, Dublin, Ireland, pp 3–12
Lang K (1995) NewsWeeder: Learning to Filter Netnews. In: Machine Learning: Proceedings of the twelfth international conference (ICML ’95), San Francisco, CA, USA, pp 331–339
Levis D, Ringuette M (1994) A comparison of two learning algorithms for text categorization. In: Third annual symposium on document analysis and information retrieval, Las Vegas, US, pp 81–93
Letouzey F, Denis F, Gilleron R (2000) Learning from positive and unlabeled examples. In: Proceedings of the 11th international conference on algorithmic learning theory, Sydney, Australia, pp 71–85
Mukherjea S (2004). Discovering and analyzing World Wide Web collections. Knowl Inf Syst 6(2): 230–241
Google Scholar
Muslea I, Minton S, Knoblock CA (2002) Active + semi-supervised learning = robust multi-view learning. In: Proceedings of the nineteenth international conference on machine learning, Morgan Kaufmann Publishers Inc, San Francisco, USA, pp 435–442
Merwe VD, Engelbrecht AP (2003) Data clustering using particle swarm optimisation. In: Proceedings of IEEE congress on evolutionary computation (CEC 2003), Canbella, Australia, pp 215–220
Omran M, Salman A, Engelbrecht AP (2002) Image classification using particle swarm optimization. In: Proceedings of the 4th Asia-Pacific conference on simulated evolution and learning (SEAL 2002), Singapore, pp 370–374
Pazzani MJ, Muramatsu J, Billsus D (1996) Syskill & Webert: Identifying interesting Web sites. In Proceedings of the thirteenth national conference on artificial intelligence (AAAI-96), Portland, USA. AAAI Press/MIT Press, Cambridge, MA, pp 54–61
Srinivasan P, Menczer F and Pant G (2005). A general evaluation framework for topical crawlers. Inf Retr 8(3): 417–447
Article Google Scholar
Salton G and Buckley C (1988). Term weighting approaches in automatic text retrieval. Inf Process Manage 24(5): 513–523
Article Google Scholar
Thorsten Joachims (1998) Text Categorization with support vector machines: learning with many relevant features. In: Proceedings of ECML-98, 10th European conference on machine learning, Heidelberg, Germany, pp 137–142
Xiaoli L, Bing L (2003) Learning to classify text using positive and unlabeled data. In: Proceedings of eighteenth international joint conference on artificial intelligence (IJCAI-03), Acapulco, Mexico, pp 587–594
Yang Y, Pedersen JP (1997) Feature selection in statistical learning of text categorization. In: Proceedings of the fourteenth international conference on machine learning, Nashville, Tennessee, USA, pp 412–420
Zhihua Z, Shifu C and Zhaoqian C (2000). FANNC: A fast adaptive neural network classifier. Knowl Inf Syst 2(1): 115–129
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, No.2699 Qianjin Road, Changchun, Jilin, 130012, China
Tao Peng, Wanli Zuo & Fengling He
Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun, 130012, China
Wanli Zuo & Fengling He

Authors

Tao Peng
View author publications
You can also search for this author in PubMed Google Scholar
Wanli Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Fengling He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Peng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, T., Zuo, W. & He, F. SVM based adaptive learning method for text classification from positive and unlabeled documents. Knowl Inf Syst 16, 281–301 (2008). https://doi.org/10.1007/s10115-007-0107-1

Download citation

Received: 05 April 2006
Revised: 29 May 2007
Accepted: 05 August 2007
Published: 06 September 2007
Issue Date: September 2008
DOI: https://doi.org/10.1007/s10115-007-0107-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SVM based adaptive learning method for text classification from positive and unlabeled documents

Abstract

Access this article

Similar content being viewed by others

A Pattern Based Two-Stage Text Classifier

A Novel Active Learning Method Using SVM for Text Classification

A new feature selection metric for text classification: eliminating the need for a separate pruning stage

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SVM based adaptive learning method for text classification from positive and unlabeled documents

Abstract

Access this article

Similar content being viewed by others

A Pattern Based Two-Stage Text Classifier

A Novel Active Learning Method Using SVM for Text Classification

A new feature selection metric for text classification: eliminating the need for a separate pruning stage

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation