A Comparative Study of Clustering Methods for Active Region Detection in Solar EUV Images

Abstract

The increase in the amount of solar data provided by new satellites makes it necessary to develop methods to automate the detection of solar features. Here we present a method for automatically detecting active regions in solar extreme ultraviolet (EUV) images using a series of steps. Initially, the bright regions in the image are segmented using seeded region growing. In a second phase these bright regions are clustered into active regions. Partition-based clustering (both hard and fuzzy) and hierarchical clustering are compared in this work. The aim of the clustering phase is to associate a group to each segmented region in order to reduce the total number of active regions. This facilitates the documentation or subsequent monitoring of these regions. We use two indicators to validate the partitioning: i) the number of detected clusters approximates the number of active regions reported by the National Oceanic and Atmospheric Administration (NOAA) and ii) the area that defines each cluster overlaps with the area of an active region of NOAA. Experiments have been performed on over 6000 images from SOHO/EIT (195 Å). The best results were obtained using hierarchical clustering. The method detects a set of active regions in an image of the solar corona that successfully matches the number of NOAA regions. We will use these regions to perform real-time monitoring and flare detection.

This is a preview of subscription content, log in to check access.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Algorithm 1
Algorithm 2

References

  1. Aboudarham, J., Scholl, I., Fuller, N., Fouesneau, M., Galametz, M., Gonon, F., Maire, A., Leroy, Y.: 2008, Automatic detection and tracking of filaments for a solar feature database. Ann. Geophys. 26, 243 – 248.

    ADS  Article  Google Scholar 

  2. Adams, R., Bischof, L.: 1994, Seeded region growing. IEEE Trans. Pattern Anal. Mach. Intell. 16, 641 – 647.

    Article  Google Scholar 

  3. Alonso Moral, J.M.: 2007, Interpretable fuzzy systems modeling with cooperation between expert and induced knowledge. Ph.D. thesis, Universidad Politécnica de Madrid.

  4. Anderberg, M.: 1973, Cluster Analysis for Applications, Academic Press, New York, 395.

    Google Scholar 

  5. Aranda, M.C., Caballero, C.: 2010, Automatic detection of active region on EUV solar images using fuzzy clustering. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) Computational Intelligence for Knowledge-Based Systems Design, Lecture Notes in Computer Science 6178, Springer, Berlin, 69 – 78.

    Google Scholar 

  6. Arthur, D., Vassilvitskii, S.: 2007, k-means++: the advantages of careful seeding. In: Gabow, H. (ed.) Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, Philadelphia, 1027 – 1035.

    Google Scholar 

  7. Babuska, R., der Venn, P.J.V., Kaymak, U.: 2002, Improved variance estimation for Gustafson Kessel clustering. In: Proceedings of the 2002 IEEE International Conference on Fuzzy Systems, 1081 – 1085.

    Google Scholar 

  8. Barra, V., Delouille, V., Hochedez, J.-F.: 2008, Segmentation of extreme ultraviolet solar images via multichannel fuzzy clustering. Adv. Space Res. 42, 917 – 925.

    ADS  Article  Google Scholar 

  9. Barra, V., Delouille, V., Kretzschmar, M., Hochedez, J.F.: 2009, Fast and robust segmentation of solar EUV images: Algorithm and results for solar cycle 23. Astron. Astrophys. 505, 361 – 371.

    ADS  Article  Google Scholar 

  10. Benkhalil, A., Zharkova, V., Ipson, S., Zharkov, S.: 2006, Active region detection and verification with the solar feature catalogue. Solar Phys. 235, 87 – 106.

    ADS  Article  Google Scholar 

  11. Bezdek, J.C.: 1981, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 256.

    Google Scholar 

  12. Bezdek, J.C., Dunn, J.C.: 1975, Optimal fuzzy partitions: A heuristic for estimating the parameters in a mixture of normal distribution. IEEE Trans. Comput. 24, 835 – 838.

    MATH  Article  Google Scholar 

  13. Bezdek, J.C., Ehrlich, R., Full, W.: 1984, FCM: Fuzzy c-means algorithm. Comput. Geosci. 10, 191 – 203.

    ADS  Article  Google Scholar 

  14. Chou, C., Su, M., Lai, E.: 2004, A new cluster validity measure and its application to image compression. Pattern Anal. Appl. 7, 205 – 220.

    MathSciNet  Article  Google Scholar 

  15. Colak, T., Qahwaji, R.: 2008, Automated McIntosh-based classification of sunspot groups using MDI images. Solar Phys. 248, 277 – 296.

    ADS  Article  Google Scholar 

  16. Colak, T., Qahwaji, R.: 2009, Automated solar activity prediction: A hybrid computer platform using machine learning and solar imaging for automated prediction of solar flares. Space Weather 7, S06001.

    ADS  Article  Google Scholar 

  17. Davies, D.L., Bouldin, D.W.: 1979, A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224 – 227.

    Article  Google Scholar 

  18. Delaboudiniére, J.-P., Artzner, G.E., Brunaud, J., Gabriel, A.H., Hochedez, J.F., Millier, F., et al.: 1995, EIT: Extreme-ultraviolet Imaging Telescope for the SOHO mission. Solar Phys. 162, 291 – 312.

    ADS  Article  Google Scholar 

  19. Dunn, J.C.: 1974, Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95 – 104.

    MathSciNet  Article  Google Scholar 

  20. Fukuyama, Y., Sugeno, M.: 1989, A new method of choosing the number of clusters for the fuzzy c-means method. In: Proceedings of Fifth Fuzzy System Symposium, 247 – 250.

    Google Scholar 

  21. Fuller, N., Aboudarham, J., Bentley, R.: 2005, Filament recognition and image cleaning on Meudon Hα spectroheliograms. Solar Phys. 227, 61 – 73.

    ADS  Article  Google Scholar 

  22. Gath, I., Geva, A.B.: 1989, Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 11, 773 – 781.

    Article  Google Scholar 

  23. Ghaemi, R., Sulaiman, N., Ibrahim, H., Mustapha, N.: 2009, A survey: Clustering ensembles techniques. Proc. World Acad. Sci., Eng. Technol. 38, 644 – 653.

    Google Scholar 

  24. Gustafson, D.E., Kessel, W.C.: 1978, Fuzzy clustering with a fuzzy covariance matrix. In: IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes, 761 – 766.

    Google Scholar 

  25. Halkidi, M., Batistakis, Y., Varzigiannis, M.: 2002a, Cluster validity methods part I. ACM SIGMOD Rec. 31, 40 – 45.

    Article  Google Scholar 

  26. Halkidi, M., Batistakis, Y., Varzigiannis, M.: 2002b, Cluster validity methods part II. ACM SIGMOD Rec. 31, 19 – 27.

    Article  Google Scholar 

  27. Hartigan, J.: 1975, Clustering Algorithms, Wiley, New York, 351.

    Google Scholar 

  28. Higgins, P., Gallagher, P., McAteer, R., Bloomfield, D.: 2010, Solar magnetic feature detection and tracking for space weather monitoring. Adv. Space Res. 47, 2105 – 2117.

    ADS  Article  Google Scholar 

  29. Jain, A., Dubes, R.: 1988, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, 320.

    Google Scholar 

  30. Joshi, A., Srivastava, N., Mathew, S.: 2010, Automated detection of filaments and their disappearance using full-disc Hα images. Solar Phys. 262, 425 – 436.

    ADS  Article  Google Scholar 

  31. Kaufman, L., Rousseeuw, P.J.: 1987, Clustering by means of medois. Technical Report, Vrije Universiteit.

  32. Kaufman, L., Rousseeuw, P.J.: 1990, Finding Groups in Data: An Introduction to Cluster Analysis, Wiley-Interscience, New York, 342.

    Google Scholar 

  33. Krista, L., Gallagher, P.: 2009, Automated coronal hole detection using local intensity thresholding techniques. Solar Phys. 256, 87 – 100.

    ADS  Article  Google Scholar 

  34. Macqueen, J.B.: 1967, Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds.) Proc. Fifth Berkeley Symp. Mathematical Statistics and Probability 1, Univ. California Press, Berkeley, 281 – 297.

    Google Scholar 

  35. McAteer, R., Gallagher, P., Ireland, J., Young, C.: 2005, Automated boundary-extraction and region-growing techniques applied to solar magnetograms. Solar Phys. 228, 55 – 66.

    ADS  Article  Google Scholar 

  36. Nguyen, T., Willis, C., Paddon, D., Nguyen, H.: 2006, A hybrid system for learning sunspot recognition and classification. In: Proceedings of the 2006 International Conference on Hybrid Information Technology 2, Washington, 257 – 264.

    Google Scholar 

  37. Nieniewski, M.: 2004, Extraction of diffuse objects from images by means of watershed and region merging: example of solar images. IEEE Trans. Syst. Man Cybern., Part B, Cybern. 34, 796 – 801.

    Article  Google Scholar 

  38. Otsu, N.: 1979, A threshold selection method from grey level histograms. IEEE Trans. Syst. Man Cybern. 9, 62 – 66.

    Article  Google Scholar 

  39. Pesnel, W.D., Thompson, B.J., Chamberlin, P.C.: 2012, The solar dynamics observatory (SDO). Solar Phys. 275, 3 – 15.

    ADS  Article  Google Scholar 

  40. Qahwaji, R., Colak, T.: 2006, Automatic detection and verification of solar features. Int. J. Imaging Syst. Technol. 4, 199 – 210.

    Google Scholar 

  41. Ridpath, I.: 2012, A Dictionary of Astronomy, 2nd edn., Oxford Univ. Press, New York.

    Google Scholar 

  42. Robbrecht, E., Berghmans, D., van der Linden, R.: 2006, Objective CME detection over the solar cycle: A first attempt. Adv. Space Res. 38, 475 – 479.

    ADS  Article  Google Scholar 

  43. Scherrer, P.H., Bogart, R.S., Bush, R.I., Hoeksema, J.T., Kosovichev, A.G., Schou, J., et al.: 1995, The Solar Oscillation Investigation – Michelson-Doppler Imager. Solar Phys. 162, 129 – 188.

    ADS  Article  Google Scholar 

  44. Sharma, S.: 1996, Applied Multivariate Techniques, Wiley, New York, 225.

    Google Scholar 

  45. Steinhaus, H.: 1956, Sur la division des corp materiels en parties. Bull. Acad. Pol. Sci 1, 801 – 804.

    Google Scholar 

  46. Sych, R., Nakariakov, V., Karlicky, M., Afinogentov, S.: 2009, Relationship between wave processes in sunspots and quasi-periodic pulsations in active region flares. Astron. Astrophys. 505, 791 – 799.

    ADS  Article  Google Scholar 

  47. Tibshirani, R., Walter, G., Hastie, T.: 2001, Estimating the number of cluster in a dataset via the gap statistic. J. Roy. Stat. Soc. B 32, 411 – 423.

    Article  Google Scholar 

  48. Verbeeck, C., Higgins, T., Colak, T., Watson, T., Delouille, V., Mapaey, B., Qahwaji, R.: 2011, A multi-wavelength analysis of active regions and sunspots by comparison of automatic detection algorithms. Solar Phys. 283, 67–95.

    ADS  Article  Google Scholar 

  49. Veronig, A., Steinegger, M., Otruba, W., Hanslmeier, A., Messerotti, M., Temmer, M., Gonzi, S., Brunner, G.: 2000, Automatic image processing in the frame of a solar flare alerting system. Hvar Obs. Bull. 24, 195 – 205.

    ADS  Google Scholar 

  50. Watson, F., Fletcher, L., Dalla, S., Marshall, S.: 2009, Modelling the longitudinal asymmetry in sunspot emergence: The role of the Wilson depression. Solar Phys. 260, 5 – 19.

    ADS  Article  Google Scholar 

  51. Wu, K.-L., Yang, M.-S.: 2005, A cluster validity index for fuzzy clustering. Pattern Recognit. Lett. 26, 1275 – 1291.

    Article  Google Scholar 

  52. Xie, X.L., Beni, G.: 1991, A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13, 841 – 846.

    Article  Google Scholar 

  53. Xu, R., Wunsch, D.: 2008, Clustering, Wiley-IEEE Press, New York, 368.

    Google Scholar 

  54. Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: 2001, Validating clustering for gene expression data. Bioinformatics 17, 309 – 318.

    Article  Google Scholar 

  55. Young, C., Gallagher, P.: 2008, Multiscale edge detection in the corona. Solar Phys. 248, 457 – 469.

    ADS  Article  Google Scholar 

  56. Zharkov, S., Zharkova, V.: 2011, Statistical properties of Hα flares in relation to sunspots and active regions in the cycle 23. J. Atmos. Solar-Terr. Phys. 73, 264 – 270.

    ADS  Article  Google Scholar 

  57. Zharkov, S., Zharkova, V., Ipson, S., Benkhalil, A.: 2004, Automated recognition of sunspots on the SOHO/MDI white light solar images. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds.) Knowledge-Based Intelligent Information and Engineering Systems, Springer, Berlin, 446 – 452.

    Google Scholar 

Download references

Acknowledgements

This work was funded by the project TIC07-02861 of the Junta de Andalucía (Spain).

Author information

Affiliations

Authors

Corresponding author

Correspondence to C. Caballero.

Appendices

Appendix A: \(\mathrm{GSRG}^{2}_{\mathrm{h}}\): Seed Selection

In the \(\mathrm{GSRG}^{2}_{\mathrm{h}}\) method, seed selection is carried out using a histogram-based technique. In particular, it uses the histogram shape of the solar images after preprocessing. In the preprocessed solar images, most pixels have a value close to zero, so that pixels with the highest values of the histogram have few occurrences.

The threshold that determines the value of the seeds is calculated using the points of the histogram. The histogram of preprocessed solar images decreases and therefore the straight line that represents these points has a decreasing slope. If the threshold value is determined at the beginning of the fall of the histogram, then the level of intensity of the pixels that are selected will be too low. This means that irrelevant pixels (noise) are taken as relevant objects. On the other hand, if the threshold value is determined at the end of the fall of the histogram, then relevant objects are taken as noise. Therefore, the ideal threshold is close to the beginning of the fall, but not just at the beginning.

Let f i be the function that performs a least-square linear fit to the points of the histogram and let a i be the slope of the function f i . Furthermore, let F C be the family of functions f i whose slope satisfies the condition a i <σ c , where the threshold σ c is defined with a very small value. If any function f i satisfies the previous condition, then the threshold σ c is modified by an increasing factor (Δ c ).

The family of functions g i are also calculated, where the histogram points used to calculate the functions g i are the following histogram points that have been used to calculate the functions f i . The family of functions g i are used to determine the trend of the points after each of the pre-selected functions f i . Therefore, the threshold σ p is defined as a negative value and small enough to consider those functions with a small slope and discard those functions with a large fall, which are often the first.

Finally, let P be the family of functions f i of the set F C that also satisfy the condition c i <σ p where c i is defined as the slope of the function g i . If any function f i satisfies the previous conditions then the threshold σ p is modified using an increasing factor (Δ p ). The value of the threshold is determined by the average of the values x j and x k from the first point (x j ,x k ) of the set P. The first point of the set P is selected because this point has the smallest values of x j and x k .

Appendix B: Seeded Region Growing

In this appendix the pseudocode of the seeded region growing (SRG) algorithm is presented, which is described in detail in Adams and Bischof (1994).

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Caballero, C., Aranda, M.C. A Comparative Study of Clustering Methods for Active Region Detection in Solar EUV Images. Sol Phys 283, 691–717 (2013). https://doi.org/10.1007/s11207-013-0239-2

Download citation

Keywords

  • Active region
  • Clustering method
  • Segmentation