Skip to main content
Log in

Data selection using support vector regression

  • Published:
Advances in Atmospheric Sciences Aims and scope Submit manuscript

Abstract

Geophysical data sets are growing at an ever-increasing rate, requiring computationally efficient data selection (thinning) methods to preserve essential information. Satellites, such as WindSat, provide large data sets for assessing the accuracy and computational efficiency of data selection techniques. A new data thinning technique, based on support vector regression (SVR), is developed and tested. To manage large on-line satellite data streams, observations from WindSat are formed into subsets by Voronoi tessellation and then each is thinned by SVR (TSVR). Three experiments are performed. The first confirms the viability of TSVR for a relatively small sample, comparing it to several commonly used data thinning methods (random selection, averaging and Barnes filtering), producing a 10% thinning rate (90% data reduction), low mean absolute errors (MAE) and large correlations with the original data. A second experiment, using a larger dataset, shows TSVR retrievals with MAE < 1 m s−1 and correlations ⩽ 0.98. TSVR was an order of magnitude faster than the commonly used thinning methods. A third experiment applies a two-stage pipeline to TSVR, to accommodate online data. The pipeline subsets reconstruct the wind field with the same accuracy as the second experiment, is an order of magnitude faster than the nonpipeline TSVR. Therefore, pipeline TSVR is two orders of magnitude faster than commonly used thinning methods that ingest the entire data set. This study demonstrates that TSVR pipeline thinning is an accurate and computationally efficient alternative to commonly used data selection techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bakır, G. H., L. Bottou, and J. Weston, 2004: Breaking SVM complexity with cross-training. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, MIT Press, 81–88.

    Google Scholar 

  • Barnes, S. L, 1964: A technique for maximizing details in numerical weather-map analysis. Journal of Applied Meteorology, 3, 396–409.

    Article  Google Scholar 

  • Bondarenko, V., T. Ochotta, and D. Saupe, 2007: The interaction between model resolution, observation resolution and observations density in data assimilation: A two-dimensional study. Preprints, 11th Symp. On Integrated Observing and Assimilation Systems for the Atmosphere, Oceans, and Land Surface, San Antonio, TX, Amer. Meteor. Soc., P5.19. [Available online at http://ams.confex.com/ams/pdfpapers/117655.pdf.]

    Google Scholar 

  • Bottou L., and Y. LeCun, 2004: On-line learning for very large datasets. Applied Stochastic Models in Business and Industry, 21, 137–151.

    Article  Google Scholar 

  • Bowyer, A., 1981: Computing Dirichlet tessellations. Comput. J., 24, 162–166.

    Article  Google Scholar 

  • Chang, P., P. Gaiser, K. St. Germain, and L. Li, 1997: Multi-Frequency Polarimetric Microwave Ocean Wind Direction Retrievals. Proceedings of the International Geoscience and Remote Sensing Symposium 1997, Singapore. [Available online at http://www.nrl.navy.mil/research/nrl-review/2004/featured-research/gaiser/#sthash.IskB3x9l.dpuf.]

    Google Scholar 

  • Du Q., V. Faber, and M. Gunzburger, 1999: Centroidal Voronoi tessellations: applications and algorithms. SIAM Review, 41, 637–676.

    Article  Google Scholar 

  • Gaiser, P. W., K. M. St. German, E. M. Twarog, G. A. Poe, W. Purdy, D. Richardson, W. Grossman, W. L. Jones, D. Spencer, G. Golba, J. Cleveland, L. Choy, R. M. Bevilacqua, and P. S. Chang, 2004: The WindSat space borne polarimetric microwave radiometer: Sensor description and early orbit performance. IEEE Trans. on Geosci. and Remote Sensing, 42, 2347–2361.

    Article  Google Scholar 

  • Gilbert, R. C., and T. B. Trafalis, 2009: Quadratic programming formulations for classification and regression. Optimization Methods and Software, 24, 175–185.

    Article  Google Scholar 

  • Helms, C. N., and R. E. Hart, 2013: A polygon-based line-integral method for calculating vorticity, divergence, and deformation from nonuniform observations. J. Appl. Meteor. Climatol., 52, 1511–1521.

    Article  Google Scholar 

  • Laskov, P., C. Gehl, S. Krüger, and K.-R. Müller, 2006: Incremental support vector learning: Analysis, implementation and applications. Journal of Machine Learning Research, 7, 1909–1936.

    Google Scholar 

  • Lazarus, S. M., M. E. Splitt, M. D. Lueken, R. Ramachandran, X. Li, S. Movva, S. J. Graves, and B. T. Zavodsky, 2010: Evaluation of data reduction algorithms for real-time analysis. Wea. Forecasting, 25, 511–525.

    Article  Google Scholar 

  • Lorenc, A. C., 1981: A three-dimensional multivariate statistical interpolation scheme. Mon. Wea. Rev., 109, 1177–1194.

    Article  Google Scholar 

  • Mansouri, H., R. C. Gilbert, T. B. Trafalis, L. M. Leslie, and M. B. Richman, 2007: Ocean surface wind vector forecasting using support vector regression. In C. H. Dagli, A. L. Buczak, D. L. Enke, M. J. Embrechts, and O. Ersoy, editors, Intelligent Engineering Systems Through Artificial Neural Networks, 17, 333–338.

    Article  Google Scholar 

  • MATLAB, 2012: MATLAB and Statistics Toolbox Release 2012b, The MathWorks, Inc., Natick, Massachusetts, United States. [Available online at http://nf.nci.org.au/facilities/software/Matlab/techdoc/ref/voronoi.html.]

    Google Scholar 

  • Musicant D. R., and O. L. Mangasarian, 2000: Large scale kernel regression via linear programming. Machine Learning, 46, 255–269.

    Google Scholar 

  • Ochotta, T., C. Gebhardt, D. Saupe, and W. Wergen, 2005: Adaptive thinning of atmospheric observations in data assimilation with vector quantization and filtering methods. Quart. J. Royal Meteorol. Soc., 131, 3427–3437.

    Article  Google Scholar 

  • Ochotta, T., C. Gebhardt, V. Bondarenko, D. Saupe, and W. Wergen, 2007: On thinning methods for data assimilation of satellite observations. Preprints, 23rd Int. Conf. on Interactive Information Processing Systems (IIPS), San Antonio, TX, Amer. Meteor. Soc., 2B.3. [Available online at http://ams.confex.com/ams/pdfpapers/118511.pdf.]

    Google Scholar 

  • Platt, J, 1999: Using sparseness and analytic QP to speed training of support vector machines. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11, MIT Press, 557–563.

    Google Scholar 

  • Purser, R. J., D. F. Parrish, and M. Masutani, 2000: Meteorological observational data compression: An alternative to conventional “super-obbing”. NCEP Office Note 430, 12pp. [Available online at http://www.emc.ncep.noaa.gov/mmb/papers/purser/on430.pdf.]

  • Quilfen, Y., C. Prigent, B. Chapron, A. A. Mouche, and N. Houti, 2007: The potential of QuikSCAT and WindSat observations for the estimation of sea surface wind vector under severe weather conditions, J. Geophys. Res. Oceans, 112, 49–66.

    Google Scholar 

  • Quinn, M. J. 2004: Parallel Programming in C with MPI and openMP. Dubuque, Iowa: McGraw-Hill Professional, 544pp.

    Google Scholar 

  • Ragothaman, A., S. C. Boddu, N. Kim, W. Feinstein, M. Brylinski, S. Jha, and J. Kim, 2014: Developing ethread pipeline using saga-pilot abstraction for large-scale structural bioinformatics. BioMed Research International, 2014. 1–12, doi: 10.1155/2014/348725.

    Google Scholar 

  • Santosa, B., M. B. Richman, and T.B. Trafalis, 2005: Variable selection and prediction of rainfall from WSR-88D radar using support vector regression. Proceedings of the 6th WSEAS Transactions on Systems, 4, 406–411.

    Google Scholar 

  • Schölkopf, B., and A. Smola, 2002: Learning with Kernels. MIT Press, 650pp.

    Google Scholar 

  • Smola, A. J., and B. Schölkopf, 1998: A Tutorial on Support Vector Regression Royal Holloway College, NeuroCOLT Technical Report (NC-TR-98-030), University of London, UK. [Available online at http://svms.org/tutorials/SmolaScholkopf1998.pdf.]

    Google Scholar 

  • Shawe-Taylor, J., and N. Cristianini, 2004: Kernel Methods for Pattern Analysis. Cambridge University Press, 478pp.

    Book  Google Scholar 

  • Son, H-J, T. B. Trafalis, and M. B. Richman, 2005: Determination of the optimal batch size in incremental approaches: An application to tornado detection, Proceedings of International Joint Conference on Neural Networks, IEEE, 2706–2710.

    Google Scholar 

  • Trafalis, T. B., B. Santosa, and M. B. Richman, 2005: Feature selection with linear programming support vector machines and applications to tornado prediction, WSEAS Transactions on Computers, 4, 865–873.

    Google Scholar 

  • Vapnik, V., 1982: Estimation of Dependences Based on Empirical Data. Springer, 505pp.

    Google Scholar 

  • Voronoi, G., 1908: Recherches sur les paralléloèdres Primitives. J. Reine Angew. Math. 134, 198–287 (in French).

    Google Scholar 

  • Wei, C.-C., and J. Roan, 2012: Retrievals for the rainfall rate over land using special sensor microwave imager data during tropical cyclones: Comparisons of scattering index, regression, and support vector regression. J. Hydrometeor, 13, 1567–1578.

    Article  Google Scholar 

  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed., Elsevier, 676 pp.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael B. Richman.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Richman, M.B., Leslie, L.M., Trafalis, T.B. et al. Data selection using support vector regression. Adv. Atmos. Sci. 32, 277–286 (2015). https://doi.org/10.1007/s00376-014-4072-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00376-014-4072-9

Key words

Navigation