Automated Learning of RVM for Large Scale Text Sets: Divide to Conquer

  • Catarina Silva
  • Bernardete Ribeiro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4224)


Three methods are investigated and presented for automated learning of Relevance Vector Machines (RVM) in large scale text sets. RVM probabilistic Bayesian nature allows both predictive distributions on test instances and model-based selection yielding a parsimonious solution. However, scaling up the algorithm is not workable in most digital information processing applications. We look at the properties of the baseline RVM algorithm and propose new scaling approaches based on choosing appropriate working sets which retain the most informative data. Incremental, ensemble and boosting algorithms are deployed to improve classification performance by taking advantage of the large training set available. Results on Reuters-21578 are presented, showing performance gains and maintaining sparse solutions that can be deployed in distributed environments.


Support Vector Machine Sparse Solution Relevant Vector Machine Automate Learn Sparse Bayesian Learn 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34, 1–47 (2002)CrossRefMathSciNetGoogle Scholar
  2. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)MATHGoogle Scholar
  3. Tipping, M.: Sparse Bayesian Learning and the Relevance Vector Machine. Journal of Machine Learning Research I, 211–214 (2001)Google Scholar
  4. Kuncheva, L.: Combining Patt. Classifiers: Methods and Algorithms. Wiley, Chichester (2004)CrossRefGoogle Scholar
  5. Yang, Y., Zhang, J., Kisiel, B.: A Scalability Analysis of Classifiers in Text Categorization. In: SIGIR 2003, pp. 96–103. ACM Press, New York (2003)CrossRefGoogle Scholar
  6. Joachims, T.: Learning to Classify Text Using SVM. Kluwer, Dordrecht (2002)Google Scholar
  7. Sebastiani, F.: Classification of Text, Automatic. In: Brown, K. (ed.) The Encyclopedia of Language and Linguistics, 2nd edn., vol. 14, Elsevier, Amsterdam (2006)Google Scholar
  8. Eyheramendy, S., Genkin, A., Ju, W., Lewis, D., Madigan, D.: Sparse Bayesian Classifiers for Text Classification. Journal of Intelligence Community R&D (2003)Google Scholar
  9. Schapire, R., Singer, Y.: Boostexter: A Boosting-based System for Text Categorization. Machine Learning 39(2/3), 135–168 (2000)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Catarina Silva
    • 1
    • 2
  • Bernardete Ribeiro
    • 2
  1. 1.School of Technology and Management of the Polytechnic Institute of LeiriaMorro do Lena – Alto do Vieiro, PortugalLeiriaPortugal
  2. 2.Department of Informatics Engineering, Center for Informatics and Systems (CISUC)University of CoimbraCoimbraPortugal

Personalised recommendations