Solar Physics

, Volume 283, Issue 1, pp 113–141 | Cite as

On Dimensionality Reduction for Indexing and Retrieval of Large-Scale Solar Image Data

  • J. M. BandaEmail author
  • R. A. Angryk
  • P. C. H. Martens


This work investigates the applicability of several dimensionality reduction techniques for large-scale solar data analysis. Using a solar benchmark dataset that contains images of multiple types of phenomena, we investigate linear and nonlinear dimensionality reduction methods in order to reduce our storage and processing costs and maintain a good representation of our data in a new vector space. We present a comparative analysis of several dimensionality reduction methods and different numbers of target dimensions by utilizing different classifiers in order to determine the degree of data dimensionality reduction that can be achieved with these methods, and to discover the method that is the most effective for solar images. After determining the optimal number of dimensions, we then present preliminary results on indexing and retrieval of the dimensionally reduced data.


Dimensionality reduction Indexing Retrieval Content-based image retrieval (CBIR) 



This work was supported in part by two NASA Grant Awards: i) No. 08-SDOSC08-0008, funded from NNH08ZDA001N-SDOSC solicitation, and ii) No. NNX11AM13A, funded from NNH11ZHA003C solicitation. We would also like to thank our internal reviewers Michael Schuh, Richard McAllister, and Alexander Engell and the anonymous reviewers during the submission process.


  1. Agarwal, A., Triggs, B.: 2008, Hyperfeatures: Multilevel local coding for visual recognition. Int. J. Comput. Vis. 78, 15 – 17. CrossRefGoogle Scholar
  2. Antoniadis, A., Lambert-Lacroix, S., Leblanc, F.: 2003, Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 19, 563 – 570. CrossRefGoogle Scholar
  3. Banda, J.M., Angryk, R.: 2009, On the effectiveness of fuzzy clustering as a data discretization technique for large-scale classification of solar images. In: Feng, G.G. (ed.) Proc. IEEE International Conference on Fuzzy Systems, IEEE, New York, 2019 – 2024. Google Scholar
  4. Banda, J.M., Angryk, R.: 2010a, An experimental evaluation of popular image parameters for monochromatic solar image categorization. In: Guesgen, H., Murray, C. (eds.) The 23rd Florida Artificial Intelligence Research Society Conf., 380 – 385. Google Scholar
  5. Banda, J.M., Angryk, R.: 2010b, Usage of dissimilarity measures and multidimensional scaling for large scale solar data analysis. In: Srivastava, A., Chawla, N., Yu, P., Melby, P. (eds.) Proc. 2010 Conference on Intelligent Data Understanding, Mountain View, California, 189 – 203. Google Scholar
  6. Banda, J.M., Angryk, R., Martens, P.: 2011, On the surprisingly accurate transfer of image parameters between medical and solar images. In: Macq, B., Schelkens, P. (eds.) Proc. 18th IEEE Internat. Conf. Image Processing (ICIP-IEEE’11), IEEE, Brussels, 3730 – 3733. Google Scholar
  7. Barra, V., Delouille, V., Hochedez, J.F., Chainais, P.: 2005, Segmentation of EIT images using fuzzy clustering: a preliminary study. In: Danesy, D., Poedts, S., De Groof, A., Andries, J. (eds.) The Dynamic Sun: Challenges for Theory and Observations SP-600, ESA, Noordwijk, 71 – 80. Google Scholar
  8. Belkin, M., Niyogi, P.: 2003, Laplacian eigenmaps and spectral techniques for embedding and clustering. Neural Inf. Process. Syst. 14, 585 – 591. Google Scholar
  9. Bengio, Y., Paiement, J.-F., Vincent, P.: 2004, Out-of-sample extensions for LLE, Isomap, MDS, eigenmaps, and spectral clustering. Neural Inf. Process. Syst. 10, 682 – 688. Google Scholar
  10. Berchtold, S., Böhm, C., Kriegal, H.: 1998, The pyramid-technique: towards breaking the curse of dimensionality. In: Wang, J. (ed.) Proc. 1998 ACM SIGMOD Internat. Conf. on Management of Data, ACM, New York, 142 – 153. CrossRefGoogle Scholar
  11. Bernasconi, P.N., Rust, D.M., Hakim, D.: 2005, Advanced automated solar filament detection and characterization code: description, performance, and results. Solar Phys. 228, 97 – 117. doi: 10.1007/s11207-005-2766-y. ADSCrossRefGoogle Scholar
  12. Bingham, E., Mannila, H.: 2001, Random projection in dimensionality reduction: applications to image and text data. In: Zaki, M., Toivonen, H., Wang, J. (eds.) ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining, New York, 245 – 250. Google Scholar
  13. Borg, I., Groenen, P.: 2005, Modern Multidimensional Scaling: Theory and Applications, 2nd edn. Springer, New York, 145 – 150. zbMATHGoogle Scholar
  14. Chaudhuri, B.B., Nirupam, S.: 1995, Texture segmentation using fractal dimension. IEEE Trans. Pattern Anal. Mach. Intell. 17, 72 – 77. CrossRefGoogle Scholar
  15. Choi, H., Choi, S.: 2007, Robust kernel Isomap. Pattern Recognit. 40, 853 – 862 zbMATHCrossRefGoogle Scholar
  16. Cover, T., Hart, P.: 1967, Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21 – 27 zbMATHCrossRefGoogle Scholar
  17. Dasarathy, B.: 1990, Nearest Neighbor: Pattern Classification Techniques (Nn Norms : Nn Pattern Classification Techniques), IEEE Computer Society, Los Alamitos. Google Scholar
  18. Datta, R., Li, J., Wang, K.: 2005, Content-based image retrieval – approaches and trends of the new age. In: Zhang, H., Smith, J., Tian, Q. (eds.) Proc. of the 7th ACM SIGMM Internat. Workshop on Multimedia Information Retrieval, 253 – 262. Google Scholar
  19. De Moortel, I., McAteer, R.T.J.: 2004, Waves and wavelets: an automated detection technique for solar oscillations. Solar Phys. 223, 1 – 11. doi: 10.1007/s11207-004-0806-7. ADSCrossRefGoogle Scholar
  20. de Silva, V., Tenenbaum, J.B.: 2003, Global versus local methods in nonlinear dimensionality reduction. Neural Inf. Process. Syst. 15, 721 – 728. Google Scholar
  21. Deselaers, T., Keysers, D., Ney, H.: 2005, Fire flexible image retrieval engine: ImageCLEF 2004 evaluation. In: Peters, C., Clough, P., Gonzalo, J., Jones, F., Kluck, M., Magnini, B. (eds.) Lecture Notes in Computer Science 3491. Springer, Berlin, 688 – 698. Google Scholar
  22. Deselaers, T., Keysers, D., Ney, H.: 2008, Features for image retrieval: an experimental comparison. Inf. Retr. 11, 77 – 107. CrossRefGoogle Scholar
  23. Doulkeridis, C., Vlachou, A., Kotidis, Y., Vazirgiannis, M.: 2007, Peer-to-peer similarity search in metric spaces. In: Koch, C., Gehrke, J., Garofalakis, M., Srivastava, D., Aberer, K., Deshpande, A., Florescu, D., Chan, C., Ganti, V., Kanne, C.-C., Klas, W., Neuhold, E. (eds.) Very Large Data Bases Conference 33, 986 – 997. Google Scholar
  24. Eckart, C., Young, G.: 1936, The approximation of one matrix by another of lower rank. Psychometrika 1, 211 – 218. zbMATHCrossRefGoogle Scholar
  25. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: 2010, The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 303 – 338. CrossRefGoogle Scholar
  26. Farquhar, J., Hardoon, D., Meng, H., Shawe-Taylor, J.: 2005, Two view learning: SVM-2K, theory and practice. In: Saul, L., Weiss, Y., Bottou, L. (eds.) Neural Information Processing Systems 2, Vancouver, 355 – 362. Google Scholar
  27. Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: 2005, Learning object categories from Google’s image search. In: Sebe, N., Lew, M., Huang, T. (eds.) Proc. IEEE Internat. Conf. Computer Vision (ICCV), IEEE, New York, 1816–1823. Google Scholar
  28. Gonzalez, R.C., Woods, R.E.: 2006, Digital Image Processing, 3rd edn. Prentice-Hall, Upper Saddle River, 100 – 120. Google Scholar
  29. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: 2009, The WEKA data mining software: an update. In: Grossman, R., Zaiane, O., Aggarwal, C., Goethals, B. (eds.) SIGKDD Explorations 11, ACM, New York, 10 – 18. Google Scholar
  30. Hand, D., Yu, K.: 2001, Idiot’s Bayes – not so stupid after all? Int. Stat. Rev. 69, 385 – 398. zbMATHCrossRefGoogle Scholar
  31. Hand, D., Mannila, H., Smyth, P.: 2001, Principles of Data Mining (Adaptive Computation and Machine Learning), A Bradford Book, Cambridge. Google Scholar
  32. Harsanyi, J., Chang, C.: 1994, Hyper-spectral image classification and dimensionality reduction: an orthogonal subspace projection approach. IEEE Trans. Geosci. Remote Sens. 32, 779 – 785. ADSCrossRefGoogle Scholar
  33. He, X., Niyogi, P.: 2003, Locality preserving projections. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Neural Information Processing Systems 16, 153 – 160. Google Scholar
  34. Hersh, W., Müller, H., Kalpathy-Cramer, J.: 2009, The consolidated ImageCLEFMed medical image retrieval task test collection. J. Digit. Imaging 226, 648 – 655. CrossRefGoogle Scholar
  35. Jagadish, H.V., Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: 2005, iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30, 364 – 397. CrossRefGoogle Scholar
  36. Kotsiantis, S.: 2007, Supervised machine learning: a review of classification techniques. Informatica 31, 249 – 268. MathSciNetzbMATHGoogle Scholar
  37. Lamb, R.: 2008, An information retrieval system for images from the TRACE satellite. Master’s Thesis, Montana State University, Bozeman, MT, USA. Google Scholar
  38. Lamb, R., Angryk, R., Martens, P.: 2008, An Example-based Image Retrieval System for the TRACE Repository. In: Ejiri, M., Kasturi, R., Sanniti di Baja, G. (eds.) Proc. 19th Internat. Conf. Pattern Recognition, Tampa, FL, 1 – 4. Google Scholar
  39. Lawley, D.N., Maxwell, A.E.: 1971, Factor Analysis as a Statistical Method, 2nd edn., American Elsevier, New York, 210 – 212. zbMATHGoogle Scholar
  40. Lima, A., Zen, H., Nankaku, Y., Miyajima, C., Tokuda, K., Kitamura, T.: 2004, On the use of Kernel PCA for feature extraction in speech recognition. IEICE Trans. Inf. Syst. 87, 2802 – 2811. Google Scholar
  41. Lin, C., Yang, H.-J., Kuo, L.-H.: 2009, Behaviour analysis of Internet survey completion using decision trees. Online Inf. Rev. 33, 117 – 134. CrossRefGoogle Scholar
  42. Long, P., Servedio, R.: 2010, Random classification noise defeats all convex potential boosters. Mach. Learn. J. 78, 287 – 304. CrossRefGoogle Scholar
  43. Markl, V.: 1999, MISTRAL: Processing relational queries using a multidimensional access technique. Ph.D. Der Technischen Universität München, Munich, Germany. Google Scholar
  44. Manning, C., Raghavan, P., Schütze, H.: 2008, Introduction to Information Retrieval, Cambridge University Press, Cambridge. zbMATHCrossRefGoogle Scholar
  45. Maron, M.: 1961, Automatic indexing: an experimental inquiry. J. Assoc. Comp. Mach. 8, 404 – 417. zbMATHCrossRefGoogle Scholar
  46. Martens, P.C.H., Attrill, G.D.R., Davey, A.R., Engell, A., Farid, S., Grigis, P.C., Kasper, J., Korreck, K., Saar, S.H., Savcheva, A., Su, Y., Testa, P., Wills-Davey, M., Bernasconi, P.N., Raouafi, N.-E., Delouille, V.A., Hochedez, J.F., Cirtain, J.W., Deforest, C.E., Angryk, R.A., de Moortel, I., Wiegelmann, T., Georgoulis, M.K., McAteer, R.T.J., Timmons, R.P.: 2011, Computer vision for the solar dynamics observatory. Solar Phys. 275, 79 – 113. doi: 10.1007/s11207-010-9697-y. ADSCrossRefGoogle Scholar
  47. McAteer, R.T.J., Gallagher, P.T., Bloomfield, D.S., Williams, D.R., Mathioudakis, M., Keenan, F.P.: 2004, Ultraviolet oscillations in the chromosphere of the quiet sun. Astrophys. J. 602, 436 – 445. ADSCrossRefGoogle Scholar
  48. Minsky, M.: 1961, Steps toward Artificial Intelligence. Proc. Inst. Radio Eng. 49, 8 – 30. MathSciNetGoogle Scholar
  49. Moravec, P., Snasel, V.: 2006, Dimension reduction methods for image retrieval. In: Intelligent Systems Design and Applications, IEEE, New York, 1055 – 1060. Google Scholar
  50. Muller, H., Deselaers, T., Lehmann, T.M., Clough, P., Hersh, W.: 2007, Overview of the ImageCLEFmed 2007 medical retrieval and annotation tasks. In: Lecture Notes in Comput. Science 4730, 595 – 608. Google Scholar
  51. Muller, H., Deselaers, T., Deserno, T.M., Kalpathy-Kramer, J., Kim, E., Hersh, W.: 2008, Overview of the ImageCLEFmed 2007 medical retrieval and annotation tasks. In: CLEF2007, 472 – 491. Google Scholar
  52. Ng, A.Y., Jordan, M.I., Weiss, Y.: 2002, On spectral clustering: an analysis and an algorithm. Adv. Neural Inf. Process. Syst. 14, 849 – 856. Google Scholar
  53. Ooi, B.C., Tan, K.L., Yu, C., Bressan, S.: 2000, Indexing the edge: a simple and yet efficient approach to high-dimensional indexing. In: Proc. 19th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 166 – 174. Google Scholar
  54. Pearson, K.: 1901, On lines and planes of closest fit to systems of points in space. Phil. Mag. Lett. 6, 559 – 572. Google Scholar
  55. Pronobis, A., Caputo, B., Jensfelt, P., Christensen, H.I.: 2006, A discriminative approach to robust visual place recognition. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 3829 – 3836. Google Scholar
  56. Quinlan, J.: 1993, C4.5: Programs for Machine Learning, Morgan Kaufmann, San Francisco Google Scholar
  57. Quinlan, J.: 1996, Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4, 77 – 90. zbMATHGoogle Scholar
  58. Rajpoot, N.M., Arif, M., Bhalerao, A.H.: 2007, Unsupervised learning of shape manifolds. In: Rajpoot, N.M., Bhalerao, A.H. (eds.) British Machine Vision Conference, BMVA Press, Warwick, 1 – 10. Google Scholar
  59. Saul, L.K., Weinberger, K.Q., Sha, F., Hamm, J., Lee, D.D.: 2006, Spectral methods for dimensionality reduction. In: Chapelle, O., Schoelkopf, B., Zien, A. (eds.) Semi-supervised Learning, MIT Press, Cambridge, 293 – 308. Google Scholar
  60. Savcheva, A., Cirtain, J., Deluca, E., Lundquist, L., Golub, L., Weber, M., Shimojo, M., Shibasaki, K., Sakao, T., Narukage, N., Tsuneta, S., Kano, R.: 2007, A study of polar jet parameters based on Hinode XRT observations. Publ. Astron. Soc. Japan 59, S771 – S778. ADSGoogle Scholar
  61. Schuh, M.A., Banda, J.M., Bernasconi, P.N., Angryk, R.A., Martens, P.C.H.: 2012, A Comparative Evaluation of Automated Solar Filament Detection. Solar Phys. (submitted) Google Scholar
  62. Schölkopf, B., Smola, A., Muller, K.R.: 1997, Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) International Conference on Artificial Neural Networks, Springer Lecture Notes in Computer Science 1327, Springer, Berlin, 583 – 588. Google Scholar
  63. Schölkopf, B., Smola, A., Muller, K.R.: 1998, Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10, 1299 – 1319. CrossRefGoogle Scholar
  64. Schroeder, M.: 1991, Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise, W.H. Freeman, New York, 41 – 45. zbMATHGoogle Scholar
  65. Shen, H.T., Ooi, B.C., Zhou, X.: 2005, Towards effective indexing for very large video sequence database. In: ACM Special Interest Group on Management of Data, ACM, New York, 730 – 741. Google Scholar
  66. Štruc, V., Pavešić, N.: 2009, A comparison of feature normalization techniques for PCA-based palmprint recognition. In: Troch, I., Breitenecker, F. (eds.) Proc. Internat. Conf. MATHMOD, 2450 – 2453. Google Scholar
  67. Tamura, H., Mori, S., Yamawaki, T.: 1978, Texture features corresponding to visual perception. IEEE Trans. Syst. Man Cyber. 8, 460 – 473. CrossRefGoogle Scholar
  68. Tenenbaum, J.B., de Silva, V., Langford, J.C.: 2000, A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319 – 2323. ADSCrossRefGoogle Scholar
  69. Tan, P.-N., Steinbach, M., Kumar, V.: 2005, Introduction to Data Mining, Addison Wesley, San Francisco, 1 – 321. Google Scholar
  70. Tsang, I., Kwok, J., Cheung, P.M.: 2006, Core vector machines: Fast SVM training on very large data sets. J. Mach. Learn. Res. 6, 363 – 365. MathSciNetGoogle Scholar
  71. van der Maaten, L.J.P., Postma, E.O., van den Herik, H.J.: 2009, Dimensionality reduction: a comparative review. Tilburg University Technical Report, TiCC-TR 2009-005. Google Scholar
  72. Vapnik, V.: 1995, The Nature of Statistical Learning Theory, Springer, New York. zbMATHGoogle Scholar
  73. Vapnik, V., Kotz, S.: 2006, Estimation of Dependences Based on Empirical Data, Springer, New York. zbMATHGoogle Scholar
  74. Weinberger, K.Q., Sha, F., Zhuet, Q., Saul, L.K.: 2007, Graph Laplacian regularization for large-scale semi-definite programming. Neural Inf. Process. Syst. 19, 1487 – 1495. Google Scholar
  75. Welling, M., Rosen-Zvi, M., Hinton, G.: 2004, Exponential family harmoniums with an application to IR. Neural Inf. Process. Syst. 17, 1481 – 1488. Google Scholar
  76. Ye, J., Janardan, R., Li, Q.: 2004, GPCA: an efficient dimension reduction scheme for image compression and retrieval. In: Kim, W., Kohavi, R., Gehrke, J., DuMouchel, W. (eds.) ACM SIGKDD Internat. Conf. Knowledge Discovery and Data Mining, New York, 354 – 363. Google Scholar
  77. Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.V.: 2001, Indexing the distance: an efficient method to KNN processing. In: Apers, P., Atzeni, P., Ceri, S., Paraboschi, S., Ramamohanarao, K., Snodgrass, R. (eds.) Proc. 21st Internat. Conf. Very Large Data Bases, Morgan-Kaufmann, Roma, 421 – 430. Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  • J. M. Banda
    • 1
    Email author
  • R. A. Angryk
    • 1
  • P. C. H. Martens
    • 2
    • 3
  1. 1.Department of Computer ScienceMontana State UniversityBozemanUSA
  2. 2.Department of PhysicsMontana State UniversityBozemanUSA
  3. 3.Harvard-Smithsonian Center for AstrophysicsCambridgeUSA

Personalised recommendations