Skip to main content

Exploring Synergetic Effects of Dimensionality Reduction and Resampling Tools on Hyperspectral Imagery Data Classification

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6871))

Abstract

The present paper addresses the problem of the classification of hyperspectral images with multiple imbalanced classes and very high dimensionality. Class imbalance is handled by resampling the data set, whereas PCA and a supervised filter are applied to reduce the number of spectral bands. This is a preliminary study that pursues to investigate the benefits of combining several techniques to tackle the imbalance and the high dimensionality problems, and also to evaluate the order of application that leads to the best classification performance. Experimental results demonstrate the significance of using together these two preprocessing tools to improve the performance of hyperspectral imagery classification. Although it seems that the most effective order corresponds to first a resampling strategy and then a feature (or extraction) selection algorithm, this is a question that still needs a much more thorough investigation in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blagus, R., Lusa, L.: Class prediction for high-dimensional class-imbalanced data. Bioinformatics 11(1), 523–540 (2010)

    Google Scholar 

  2. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth Inc., Monterey (1984)

    MATH  Google Scholar 

  3. Bruzzone, L., Serpico, S.B.: Classification of imbalanced remote-sensing data by neural networks. Pattern Recogn. Lett. 18(11-13), 1323–1328 (1997)

    Article  Google Scholar 

  4. Camps-Valls, G.: Machine learning in remote sensing data processing. In: Proc. IEEE Int’l. Workshop Machine Learning for Signal Processing, Grenoble, France, pp. 1–6 (2009)

    Google Scholar 

  5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  6. Chen, X., Fang, T., Huo, H., Li, D.: Semisupervised feature selection for unbalanced sample sets of VHR images. IEEE Geosci. Remote Sens. Lett. 7(4), 781–785 (2010)

    Article  Google Scholar 

  7. Ezawa, K.J., Singh, M., Norton, S.W.: Learning goal oriented bayesian networks for telecommunications risk management. In: Proc. 13th Int’. Conf. Machine Learning, pp. 139–147 (1996)

    Google Scholar 

  8. Fawcett, T., Provost, F.: Adaptive fraud detection. Data Min. Knowl. Disc. 1(3), 291–316 (1997)

    Article  Google Scholar 

  9. García, S., Herrera, F.: Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol. Comput. 17(3), 275–306 (2009)

    Article  Google Scholar 

  10. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newslett. 11, 10–18 (2009)

    Article  Google Scholar 

  11. Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. Ph.D. thesis, Dept. Computer Science, University of Waikato, Hamilton, New Zealand (1999)

    Google Scholar 

  12. Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: Proc. Int’l. Conf. Intelligent Computing, Hefei, China, pp. 878–887 (2005)

    Google Scholar 

  13. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  14. Hsu, P.H., Tseng, Y.H., Gong, P.: Dimension reduction of hyperspectral images for classification applications. Geogr. Inf. Sci. 8(1), 1–8 (2002)

    Google Scholar 

  15. Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    MATH  Google Scholar 

  16. Jiménez, L.O., Landgrebe, D.A.: Hyperspectral data analysis and supervised feature reduction via projection pursuit. IEEE Trans. Geosci. Remote Sens. 37(6), 2653–2667 (1999)

    Article  Google Scholar 

  17. Jolliffe, I.T.: Principal Component Analysis. Springer, New York (2002)

    MATH  Google Scholar 

  18. Kamal, A.H.M., Zhu, X., Narayanan, R.: Gene selection for microarray expression data with imbalanced sample distributions. In: Proc. Int’l. Joint Conf. Bioinformatics, Systems Biology and Intelligent Computing, Shanghai, China, pp. 3–9 (2009)

    Google Scholar 

  19. Kecman, V.: Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  20. Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2-3), 195–215 (1998)

    Article  Google Scholar 

  21. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proc. 14th Int’l. Conf. Machine Learning, Nashville, USA, pp. 179–186 (1997)

    Google Scholar 

  22. Landgrebe, D.A.: Signal Theory Methods in Multispectral Remote Sensing. Wiley, Hoboken (2003)

    Book  Google Scholar 

  23. Lin, L., Ravitz, G., Shyu, M.L., Chen, S.C.: Effective feature space reduction with imbalanced data for semantic concept detection. In: Proc. Int’l. Conf. Sensor Networks, Ubiquitous, and Trustworthy Computing, Taichung, Taiwan, pp. 262–269 (2008)

    Google Scholar 

  24. Liu, X.Y., Zhou, Z.H.: The influence of class imbalance on cost-sensitive learning: An empirical study. In: Proc. 6th Int’l. Conf. Data Mining, Hong Kong, pp. 970–974 (2006)

    Google Scholar 

  25. Maloof, M.A.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: Workshop Learning from Imbalanced Data Sets II, Whasington, DC (2003)

    Google Scholar 

  26. Martínez-Usó, A., Pla, F., Sotoca, J.M., García-Sevilla, P.: Clustering-based hyperspectral band selection using information measures. IEEE Trans. Geosci. Remote Sens. 45(12), 4158–4171 (2007)

    Article  Google Scholar 

  27. Melgani, F., Bruzzone, L.: Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 42(8), 1778–1790 (2004)

    Article  Google Scholar 

  28. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods, pp. 185–208. MIT Press, Cambridge (1999)

    Google Scholar 

  29. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)

    Google Scholar 

  30. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  31. Richards, J.A., Jia, X.: Using suitable neighbors to augment the training set in hyperspectral maximum likelihood classification. IEEE Geosci. Remote Sens. Lett. 5(4), 774–777 (2008)

    Article  Google Scholar 

  32. Trebar, M., Steele, N.: Application of distributed SVM architectures in classifying forest data cover types. Comput. Electron. Agr. 63(2), 119–130 (2008)

    Article  Google Scholar 

  33. Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A., Wald, R.: Feature selection with high-dimensional imbalanced data. In: IEEE Int’l. Conf. Data Mining Workshops, 2009, Miami, USA, pp. 507–514 (2009)

    Google Scholar 

  34. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Book  MATH  Google Scholar 

  35. Wasikowski, M., Chen, X.W.: Combating the small sample class imbalance problem using feature selection. IEEE Trans. Knowl. Data Eng. 22(10), 1388–1400 (2010)

    Article  Google Scholar 

  36. Waske, B., Benediktsson, J.A., Sveinsson, J.R.: Classifying remote sensing data with support vector machines and imbalanced training data. In: Proc. 8th Int’l. Workshop Multiple Classifier Systems, Reykjavik, Iceland, pp. 375–384 (2009)

    Google Scholar 

  37. Williams, D.P., Myers, V., Silvious, M.S.: Mine classification with imbalanced data. IEEE Geosci. Remote Sens. Lett. 6(3), 528–532 (2009)

    Article  Google Scholar 

  38. Zhang, J., Mani, I.: kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proc. Workshop Learning from Imbalanced Datasets, Washington DC (2003)

    Google Scholar 

  39. Zhou, Z.H., Liu, X.Y.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sánchez, J.S., García, V., Mollineda, R.A. (2011). Exploring Synergetic Effects of Dimensionality Reduction and Resampling Tools on Hyperspectral Imagery Data Classification. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23199-5_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23198-8

  • Online ISBN: 978-3-642-23199-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics