Skip to main content
Log in

Parametrization of Sunspot Groups Based on Machine-Learning Approach

  • Published:
Solar Physics Aims and scope Submit manuscript

Abstract

Sunspot groups observed in white light appear as complex structures. Analysis of these structures is usually based on simple morphological descriptors that only capture generic properties and miss information about fine details. We present a machine-learning approach to introduce a complete yet compact description of sunspot groups. The idea is to map sunspot-group images into an appropriate lower-dimensional (latent) space. We apply a combination of Variational Autoencoder and Principal Component Analysis to obtain a set of 285 latent descriptors. We demonstrate that the standard descriptors are embedded into the latent ones. Thus, latent features can be considered as an extended description of sunspot groups and, in our opinion, can expand the possibilities for research on sunspot groups. In particular, we demonstrate an application for the estimation of the sunspot-group complexity. The proposed parametrization model is generic and can be applied to investigation of other traces of solar activity observed in various spectral lines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14

Similar content being viewed by others

Data Availability

Key components of this work, which are the parametrization model and the dataset of sunspot groups and latent vectors, are available in the public GitHub repository github.com/observethesun/sunspot_groups and can be used to reproduce the results and for further research.

Notes

  1. In application to neural networks, one- or multidimensional arrays are called tensors.

  2. Strictly speaking, it is not necessary that the dimensionality of the output tensor is lower that the dimensionality of the input data.

  3. The model consists of 3 hidden layers with 128, 64 and 32 neurons with ELU activation function. The output layer has a single neuron with the linear activation. We use the MSE loss function for regression problems and binary crossentropy for the classification problem.

  4. solarcyclescience.com/activeregions.html

  5. sunspots.irsol.usi.ch/db/

References

  • Abd, M., Majed, S., Zharkova, V.: 2010 Automated Classification of Sunspot Groups with Support Vector Machines. ISBN 978-90-481-9150-5. DOI.

    Book  Google Scholar 

  • Baldi, P., Hornik, K.: 1989, Neural networks and principal component analysis: Learning from examples without local minima. Neural Netw. 2, 53. DOI.

    Article  Google Scholar 

  • Bao, X., Lucas, J., Sachdeva, S., Grosse, R.B.: 2020, Regularized linear autoencoders recover the principal components, eventually. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems 33, Curran Associates, Red Hook, 6971.

    Google Scholar 

  • Chen, Y., Manchester, W.B., Hero, A.O., Toth, G., DuFumier, B., Zhou, T., Wang, X., Zhu, H., Sun, Z., Gombosi, T.I.: 2019, Identifying solar flare precursors using time series of SDO/HMI images and SHARP parameters. Space Weather 17, 1404. DOI.

    Article  ADS  Google Scholar 

  • Colak, T., Qahwaji, R.: 2008, Automated McIntosh-based classification of sunspot groups using MDI images. Solar Phys. 248, 277. DOI. ADS.

    Article  ADS  Google Scholar 

  • Dosovitskiy, A., Brox, T.: 2016, Generating images with perceptual similarity metrics based on deep networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, Curran Associates, Red Hook, 658. ISBN 9781510838819.

    Google Scholar 

  • Hale, G.E., Ellerman, F., Nicholson, S.B., Joy, A.H.: 1919, The magnetic polarity of sun-spots. Astrophys. J. 49, 153. DOI. ADS.

    Article  ADS  Google Scholar 

  • Hoyt, D.V., Schatten, K.H.: 1998, Group sunspot numbers: A new solar activity reconstruction. Solar Phys. 179, 189. DOI. ADS.

    Article  ADS  Google Scholar 

  • Illarionov, E., Kosovichev, A., Tlatov, A.: 2020, Machine-learning approach to identification of coronal holes in solar disk images and synoptic maps. Astrophys. J. 903, 115. DOI. ADS.

    Article  ADS  Google Scholar 

  • Illarionov, E., Tlatov, A., Sokoloff, D.: 2015, The properties of the tilts of bipolar solar regions. Solar Phys. 290, 351. DOI. ADS.

    Article  ADS  Google Scholar 

  • Johnson, J., Alahi, A., Fei-Fei, L.: 2016, Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision.

    Google Scholar 

  • Johnstone, I.M., Paul, D.: 2018, PCA in high dimensions: An orientation. Proc. IEEE 106, 1277. DOI.

    Article  Google Scholar 

  • Kingma, D.P., Welling, M.: 2019, An introduction to variational autoencoders. Found. Trends Mach. Learn. 12, 307. DOI.

    Article  MATH  Google Scholar 

  • Makarenko, N., Malkova, D., Machin, M., Knyazeva, I., Makarenko, I.: 2014, Methods of computational topology for the analysis of dynamics of active regions of the Sun. J. Math. Sci. 203, 806. DOI.

    Article  MathSciNet  Google Scholar 

  • McIntosh, P.S.: 1990, The classification of sunspot groups. Solar Phys. 125, 251. DOI. ADS.

    Article  ADS  Google Scholar 

  • Moon, K.R., Li, J.J., Delouille, V., De Visscher, R., Watson, F., Hero, A.O.: 2016, Image patch analysis of sunspots and active regions. I. Intrinsic dimension and correlation analysis. J. Space Weather Space Clim. 6, A2. DOI. ADS.

    Article  Google Scholar 

  • Muñoz-Jaramillo, A., Senkpeil, R.R., Windmueller, J.C., Amouzou, E.C., Longcope, D.W., Tlatov, A.G., Nagovitsyn, Y.A., Pevtsov, A.A., Chapman, G.A., Cookson, A.M., Yeates, A.R., Watson, F.T., Balmaceda, L.A., DeLuca, E.E., Martens, P.C.H.: 2015, Small-scale and global dynamos and the area and flux distributions of active regions, sunspot groups, and sunspots: A multi-database study. Astrophys. J. 800, 48. DOI. ADS.

    Article  ADS  Google Scholar 

  • Murphy, K.P.: 2012, Machine Learning: A Probabilistic Perspective, MIT Press, Cambridge. ISBN 0262018020.

    MATH  Google Scholar 

  • Pearson, K.: 1901, LIII. On lines and planes of closest fit to systems of points in space. Phil. Mag. 2, 559. DOI.

    Article  MATH  Google Scholar 

  • Sadykov, V.M., Kitiashvili, I.N., Dalda, A.S., Oria, V., Kosovichev, A.G., Illarionov, E.: 2021, Compression of solar spectroscopic observations: A case study of Mg II k spectral line profiles observed by NASA’s IRIS satellite. In: 2021 International Conference on Content-Based Multimedia Indexing (CBMI), 1. DOI.

    Chapter  Google Scholar 

  • Schou, J., Scherrer, P.H., Bush, R.I., Wachter, R., Couvidat, S., Rabello-Soares, M.C., Bogart, R.S., Hoeksema, J.T., Liu, Y., Duvall, T.L., Akin, D.J., Allard, B.A., Miles, J.W., Rairden, R., Shine, R.A., Tarbell, T.D., Title, A.M., Wolfson, C.J., Elmore, D.F., Norton, A.A., Tomczyk, S.: 2012, Design and ground calibration of the helioseismic and magnetic imager (HMI) instrument on the solar dynamics observatory (SDO). Solar Phys. 275, 229. DOI. ADS.

    Article  ADS  Google Scholar 

  • Simonyan, K., Zisserman, A.: 2015, Very deep convolutional networks for large-scale image recognition.

  • Snell, J., Ridgeway, K., Liao, R., Roads, B.D., Mozer, M.C., Zemel, R.S.: 2017, Learning to generate images with perceptual similarity metrics. In: 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China. IEEE, 4277.

    Chapter  Google Scholar 

  • Stenning, D., Lee, T., van Dyk, D., Kashyap, V., Sandell, J., Young, C.: 2013, Morphological feature extraction for statistical learning with applications to solar image data. Stat. Anal. Data Min. 6, 329. DOI.

    Article  MathSciNet  MATH  Google Scholar 

  • Ternullo, M., Contarino, L., Romano, P., Zuccarello, F.: 2006, A statistical analysis of sunspot groups hosting M and X flares. Astron. Nachr. 327, 36. DOI. ADS.

    Article  ADS  Google Scholar 

Download references

Acknowledgments

The authors are grateful to the reviewers for valuable comments and suggestions. The research is carried out using the equipment of the shared research facilities of HPC computing resources at Lomonosov Moscow State University.

Funding

EI acknowledges the support of RSF grant 20-72-00106.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Egor Illarionov.

Ethics declarations

Disclosure of Potential Conflicts of Interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Complexity of Sunspot Groups

Here, we elaborate an application of the parametrization model to estimation of sunspot-group complexity. The main idea is that more complex structures should require more components of the latent vector \(Z\) for accurate reconstruction. We convert this idea into the following procedure. First, we consider the latent vector \(\mu \) in the output of the VAE encoder (recall, it has size 4096). Then, we measure a distance between this vector and its projection onto the first principal component (PC) and will refer to it as initial reconstruction error. Then, we measure the reconstruction error given the basis of the first two PCs, the first three PCs and so on. Clearly, by increasing the number of PCs, the reconstruction error will decrease.

In Figure 15 we show different decreasing patterns that arise from increasing the number of PCs from 1 to 285. Note that the colors in Figure 15 correspond to sunspot-group images shown in the first row of Figure 4. Intuitively, the complexity of sunspot-group structures increases in the first row of Figure 4. This impression is supported by Figure 15 where we observe that the first line drops more rapidly than the second one, the second line drops more rapidly than the third one and so on. Thus, we conclude that the decreasing pattern of the reconstruction error correlates with visual estimation of sunspot-group complexity.

Figure 15
figure 15

Reconstruction error normalized to the initial reconstruction error for various numbers of principal components and for sunspot-group images shown in the first row of Figure 4. Numbers in the color legend correspond to the position of the sunspot-group image in the first row of Figure 4.

In order to quantify the complexity we will find the number of PCs at which the reconstruction error is half of the initial reconstruction error. Figure 16 shows a distribution of the measured complexity over all sunspot groups visualized in the space of latent parameters \(Z_{1}\) and \(Z_{2}\). Comparing Figure 16 with Figure 9 we conclude that the proposed complexity measure corresponds to the expected properties and is low for single-spot groups and groups with small areas and increases for large multispot groups.

Figure 16
figure 16

Distribution of the complexity over all sunspot groups visualized in the space of latent parameters \(Z_{1}\) and \(Z_{2}\). Complexity is defined as the number of principal components at which the reconstruction error is half the initial reconstruction error.

Appendix B: Classification of Sunspot Groups

As suggested in the main text, the latent parameters can be useful for sunspot classification. Proper investigation of this idea requires a verified annotation of sunspot groups, e.g., following the Zurich or McIntosh classification systems. The dataset we use in this research does not contain such labels.

Although there are external datasets with sunspot-group classes (e.g., NOAA/USAFFootnote 4 or LocarnoFootnote 5 catalogs), we stress that separate research is required to establish a proper correspondence. There are at least several reasons why this process is not trivial. First, there is a certain time lag between observations in different catalogs. Taking into account the rapid evolution of sunspot groups at early stages, this time lag can cause systematic inconsistencies. Secondly, the difference in the resolution of telescopes (especially, satellite and ground-based) can strongly affect the estimation of the number of small spots and the identification of sunspot cores.

Nevertheless, to demonstrate the possibility of using latent parameters to classify sunspots, in this study we introduce a synthetic classification that mimics the McIntosh one. Specifically, we assign sunspot classes according to Table 1. The distribution of the classes in the space of latent parameters \(Z_{1}\) and \(Z_{2}\) is shown in Figure 17.

Figure 17
figure 17

Distribution of sunspot-group labels assigned according to Table 1 in the space of latent parameters \(Z_{1}\) and \(Z_{2}\).

Table 1 Algorithm for sunspot-group labeling. These labels are used as targets for the classification-model training.

Then, we train a simple fully connected neural-network model (similar to the one previously used to estimate sunspot-group properties) to predict classes based on latent vectors alone. We reserved 30% of samples for model validation and show the classification metrics in Table 2. We find that the accuracy varies substantially between classes and is 0.75 on average.

Table 2 Validation metrics. Overall accuracy is 0.75.

There are several effects that, in our opinion, limit the accuracy. First, there is a strong class imbalance in the dataset. Thus, we obtain very moderate scores for the rare classes. Secondly, the shallow neural network we used for the demonstration may be too simple to adequately decode latent vectors. We find that deeper models easily fail into strong overfitting. For the real application it looks reasonable to complement the latent vector with some simple sunspot-group properties so that the model can benefit from both simple and deep sunspot-group descriptors. Thirdly, the confusion matrix shown in Figure 18 reveals that the model often confuses close classes (e.g., D and E or E and F). Given that the difference between these classes is only in the elongation of the group, the classification model can easily be improved using explicit sunspot-group properties.

Figure 18
figure 18

Confusion matrix for the validation subset. True classes correspond to rows, predicted classes correspond to columns.

Finally, we would like to note that in practice sunspot-group classes reflect the evolutionary stage of the group rather than the instantaneous characteristics. This means that a correct classification model should also rely on the group’s prehistory. In our opinion, latent vectors can be a useful tool for studying the dynamics of sunspot groups, and we leave this study for future work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Illarionov, E., Tlatov, A. Parametrization of Sunspot Groups Based on Machine-Learning Approach. Sol Phys 297, 19 (2022). https://doi.org/10.1007/s11207-022-01955-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11207-022-01955-0

Keywords

Navigation