Advertisement

A Fast Data Preprocessing Procedure for Support Vector Regression

  • Zhifeng Hao
  • Wen Wen
  • Xiaowei Yang
  • Jie Lu
  • Guangquan Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4224)

Abstract

A fast data preprocessing procedure (FDPP) for support vector regression (SVR) is proposed in this paper. In the presented method, the dataset is firstly divided into several subsets and then K-means clustering is implemented in each subset. The clusters are classified by their group size. The centroids with small group size are eliminated and the rest centroids are used for SVR training. The relationships between the group sizes and the noisy clusters are discussed and simulations are also given. Results show that FDPP cleans most of the noises, preserves the useful statistical information and reduces the training samples. Most importantly, FDPP runs very fast and maintains the good regression performance of SVR.

Keywords

Support Vector Machine Support Vector Regression Small Group Size Support Vector Machine Regression Independent Noise 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vapnik, V.: The Nature of Statistical Learning Theory. John Wiley, New York (1995)MATHGoogle Scholar
  2. Wu, C.H.: Travel-Time Prediction with Support Vector Regression. IEEE Transactions on Intelligent Transportation Systems 5, 276–281 (2004)CrossRefGoogle Scholar
  3. Yang, H.Q., Chan, L.W., King, I.: Support Vector Machine Regression for Volatile Stock Market Prediction. In: Proceedings of the Third Intelligent Data Engineering and Automated Learning, pp. 391–396 (2002)Google Scholar
  4. Frie, T.T., Chistianini, V.N., Campbell, C.: The Kernel Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines. In: Proceedings of the 15th International Conference of Machine Learning. Morgan Kaufmann, San Fransisco (1998)Google Scholar
  5. Vapnik, V.: Estimation of Dependence Based on Empirical Data. Springer, New York (1982)Google Scholar
  6. Joachims, T.: Making large-scale support vector machine learning practical. In: Advances in Kernel Methods: Support Vector Learning, pp. 169–184. MIT Press, Cambridge (1998)Google Scholar
  7. Mangasarian, O.L., Musicant, D.R.: Successive Overrelaxation for Support Vector Machines. IEEE Transactions on Neural Networks 10, 1032–1037 (1999)CrossRefGoogle Scholar
  8. Yu, H.J., Yang, J., Han, J.W., Li, X.L.: Making SVMs Scalable to Large Data Sets using Hierarchical Cluster Indexing. Data Mining and Knowledge Discovery (2005) (Published online)Google Scholar
  9. Wang, W.J., Xu, Z.B.: A Heuristic Training for Support Vector Regression. Neurocomputing 61, 259–275 (2004)CrossRefGoogle Scholar
  10. Quan, Y., Yang, J., Yao, L.X., Ye, C.Z.: Successive Overrelaxation for Support Vector Regression. Journal of Software on 15, 200–206 (2004)MATHGoogle Scholar
  11. Platt, J.: Fast Training of Support Vector Machines Using Sequential Minimal Optimizationg. Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1998)Google Scholar
  12. Webb, A.R.: K-means clustering, Statistical Pattern Recognition, pp. 296–299. John Wiley & Sons, Inc., Chichester (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Zhifeng Hao
    • 1
  • Wen Wen
    • 2
  • Xiaowei Yang
    • 1
    • 3
  • Jie Lu
    • 3
  • Guangquan Zhang
    • 3
  1. 1.School of Mathematical ScienceSouth China University of TechnologyGuangzhouChina
  2. 2.College of Computer Science and EngineeringSouth China University of TechnologyGuangzhouChina
  3. 3.Faculty of Information Technology University of technology SydneyBroadwayAustralia

Personalised recommendations