Abstract
Geophysical data sets are growing at an ever-increasing rate, requiring computationally efficient data selection (thinning) methods to preserve essential information. Satellites, such as WindSat, provide large data sets for assessing the accuracy and computational efficiency of data selection techniques. A new data thinning technique, based on support vector regression (SVR), is developed and tested. To manage large on-line satellite data streams, observations from WindSat are formed into subsets by Voronoi tessellation and then each is thinned by SVR (TSVR). Three experiments are performed. The first confirms the viability of TSVR for a relatively small sample, comparing it to several commonly used data thinning methods (random selection, averaging and Barnes filtering), producing a 10% thinning rate (90% data reduction), low mean absolute errors (MAE) and large correlations with the original data. A second experiment, using a larger dataset, shows TSVR retrievals with MAE < 1 m s−1 and correlations ⩽ 0.98. TSVR was an order of magnitude faster than the commonly used thinning methods. A third experiment applies a two-stage pipeline to TSVR, to accommodate online data. The pipeline subsets reconstruct the wind field with the same accuracy as the second experiment, is an order of magnitude faster than the nonpipeline TSVR. Therefore, pipeline TSVR is two orders of magnitude faster than commonly used thinning methods that ingest the entire data set. This study demonstrates that TSVR pipeline thinning is an accurate and computationally efficient alternative to commonly used data selection techniques.
Similar content being viewed by others
References
Bakır, G. H., L. Bottou, and J. Weston, 2004: Breaking SVM complexity with cross-training. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, MIT Press, 81–88.
Barnes, S. L, 1964: A technique for maximizing details in numerical weather-map analysis. Journal of Applied Meteorology, 3, 396–409.
Bondarenko, V., T. Ochotta, and D. Saupe, 2007: The interaction between model resolution, observation resolution and observations density in data assimilation: A two-dimensional study. Preprints, 11th Symp. On Integrated Observing and Assimilation Systems for the Atmosphere, Oceans, and Land Surface, San Antonio, TX, Amer. Meteor. Soc., P5.19. [Available online at http://ams.confex.com/ams/pdfpapers/117655.pdf.]
Bottou L., and Y. LeCun, 2004: On-line learning for very large datasets. Applied Stochastic Models in Business and Industry, 21, 137–151.
Bowyer, A., 1981: Computing Dirichlet tessellations. Comput. J., 24, 162–166.
Chang, P., P. Gaiser, K. St. Germain, and L. Li, 1997: Multi-Frequency Polarimetric Microwave Ocean Wind Direction Retrievals. Proceedings of the International Geoscience and Remote Sensing Symposium 1997, Singapore. [Available online at http://www.nrl.navy.mil/research/nrl-review/2004/featured-research/gaiser/#sthash.IskB3x9l.dpuf.]
Du Q., V. Faber, and M. Gunzburger, 1999: Centroidal Voronoi tessellations: applications and algorithms. SIAM Review, 41, 637–676.
Gaiser, P. W., K. M. St. German, E. M. Twarog, G. A. Poe, W. Purdy, D. Richardson, W. Grossman, W. L. Jones, D. Spencer, G. Golba, J. Cleveland, L. Choy, R. M. Bevilacqua, and P. S. Chang, 2004: The WindSat space borne polarimetric microwave radiometer: Sensor description and early orbit performance. IEEE Trans. on Geosci. and Remote Sensing, 42, 2347–2361.
Gilbert, R. C., and T. B. Trafalis, 2009: Quadratic programming formulations for classification and regression. Optimization Methods and Software, 24, 175–185.
Helms, C. N., and R. E. Hart, 2013: A polygon-based line-integral method for calculating vorticity, divergence, and deformation from nonuniform observations. J. Appl. Meteor. Climatol., 52, 1511–1521.
Laskov, P., C. Gehl, S. Krüger, and K.-R. Müller, 2006: Incremental support vector learning: Analysis, implementation and applications. Journal of Machine Learning Research, 7, 1909–1936.
Lazarus, S. M., M. E. Splitt, M. D. Lueken, R. Ramachandran, X. Li, S. Movva, S. J. Graves, and B. T. Zavodsky, 2010: Evaluation of data reduction algorithms for real-time analysis. Wea. Forecasting, 25, 511–525.
Lorenc, A. C., 1981: A three-dimensional multivariate statistical interpolation scheme. Mon. Wea. Rev., 109, 1177–1194.
Mansouri, H., R. C. Gilbert, T. B. Trafalis, L. M. Leslie, and M. B. Richman, 2007: Ocean surface wind vector forecasting using support vector regression. In C. H. Dagli, A. L. Buczak, D. L. Enke, M. J. Embrechts, and O. Ersoy, editors, Intelligent Engineering Systems Through Artificial Neural Networks, 17, 333–338.
MATLAB, 2012: MATLAB and Statistics Toolbox Release 2012b, The MathWorks, Inc., Natick, Massachusetts, United States. [Available online at http://nf.nci.org.au/facilities/software/Matlab/techdoc/ref/voronoi.html.]
Musicant D. R., and O. L. Mangasarian, 2000: Large scale kernel regression via linear programming. Machine Learning, 46, 255–269.
Ochotta, T., C. Gebhardt, D. Saupe, and W. Wergen, 2005: Adaptive thinning of atmospheric observations in data assimilation with vector quantization and filtering methods. Quart. J. Royal Meteorol. Soc., 131, 3427–3437.
Ochotta, T., C. Gebhardt, V. Bondarenko, D. Saupe, and W. Wergen, 2007: On thinning methods for data assimilation of satellite observations. Preprints, 23rd Int. Conf. on Interactive Information Processing Systems (IIPS), San Antonio, TX, Amer. Meteor. Soc., 2B.3. [Available online at http://ams.confex.com/ams/pdfpapers/118511.pdf.]
Platt, J, 1999: Using sparseness and analytic QP to speed training of support vector machines. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11, MIT Press, 557–563.
Purser, R. J., D. F. Parrish, and M. Masutani, 2000: Meteorological observational data compression: An alternative to conventional “super-obbing”. NCEP Office Note 430, 12pp. [Available online at http://www.emc.ncep.noaa.gov/mmb/papers/purser/on430.pdf.]
Quilfen, Y., C. Prigent, B. Chapron, A. A. Mouche, and N. Houti, 2007: The potential of QuikSCAT and WindSat observations for the estimation of sea surface wind vector under severe weather conditions, J. Geophys. Res. Oceans, 112, 49–66.
Quinn, M. J. 2004: Parallel Programming in C with MPI and openMP. Dubuque, Iowa: McGraw-Hill Professional, 544pp.
Ragothaman, A., S. C. Boddu, N. Kim, W. Feinstein, M. Brylinski, S. Jha, and J. Kim, 2014: Developing ethread pipeline using saga-pilot abstraction for large-scale structural bioinformatics. BioMed Research International, 2014. 1–12, doi: 10.1155/2014/348725.
Santosa, B., M. B. Richman, and T.B. Trafalis, 2005: Variable selection and prediction of rainfall from WSR-88D radar using support vector regression. Proceedings of the 6th WSEAS Transactions on Systems, 4, 406–411.
Schölkopf, B., and A. Smola, 2002: Learning with Kernels. MIT Press, 650pp.
Smola, A. J., and B. Schölkopf, 1998: A Tutorial on Support Vector Regression Royal Holloway College, NeuroCOLT Technical Report (NC-TR-98-030), University of London, UK. [Available online at http://svms.org/tutorials/SmolaScholkopf1998.pdf.]
Shawe-Taylor, J., and N. Cristianini, 2004: Kernel Methods for Pattern Analysis. Cambridge University Press, 478pp.
Son, H-J, T. B. Trafalis, and M. B. Richman, 2005: Determination of the optimal batch size in incremental approaches: An application to tornado detection, Proceedings of International Joint Conference on Neural Networks, IEEE, 2706–2710.
Trafalis, T. B., B. Santosa, and M. B. Richman, 2005: Feature selection with linear programming support vector machines and applications to tornado prediction, WSEAS Transactions on Computers, 4, 865–873.
Vapnik, V., 1982: Estimation of Dependences Based on Empirical Data. Springer, 505pp.
Voronoi, G., 1908: Recherches sur les paralléloèdres Primitives. J. Reine Angew. Math. 134, 198–287 (in French).
Wei, C.-C., and J. Roan, 2012: Retrievals for the rainfall rate over land using special sensor microwave imager data during tropical cyclones: Comparisons of scattering index, regression, and support vector regression. J. Hydrometeor, 13, 1567–1578.
Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed., Elsevier, 676 pp.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Richman, M.B., Leslie, L.M., Trafalis, T.B. et al. Data selection using support vector regression. Adv. Atmos. Sci. 32, 277–286 (2015). https://doi.org/10.1007/s00376-014-4072-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00376-014-4072-9