Parametrization of Sunspot Groups Based on Machine-Learning Approach

Illarionov, Egor; Tlatov, Andrey

doi:10.1007/s11207-022-01955-0

Parametrization of Sunspot Groups Based on Machine-Learning Approach

Published: 14 February 2022

Volume 297, article number 19, (2022)
Cite this article

Solar Physics Aims and scope Submit manuscript

329 Accesses
2 Citations
8 Altmetric
1 Mention
Explore all metrics

Abstract

Sunspot groups observed in white light appear as complex structures. Analysis of these structures is usually based on simple morphological descriptors that only capture generic properties and miss information about fine details. We present a machine-learning approach to introduce a complete yet compact description of sunspot groups. The idea is to map sunspot-group images into an appropriate lower-dimensional (latent) space. We apply a combination of Variational Autoencoder and Principal Component Analysis to obtain a set of 285 latent descriptors. We demonstrate that the standard descriptors are embedded into the latent ones. Thus, latent features can be considered as an extended description of sunspot groups and, in our opinion, can expand the possibilities for research on sunspot groups. In particular, we demonstrate an application for the estimation of the sunspot-group complexity. The proposed parametrization model is generic and can be applied to investigation of other traces of solar activity observed in various spectral lines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identification of large-scale cellular structures on the Sun based on the SDO and PSPT data

Article 30 January 2015

Sunspot Positions and Areas from Observations by Thomas Harriot

Article 10 March 2020

A Sunspot Catalog by Rafael Carrasco at the Madrid Astronomical Observatory for the Period 1931 – 1933

Article Open access 18 May 2022

Data Availability

Key components of this work, which are the parametrization model and the dataset of sunspot groups and latent vectors, are available in the public GitHub repository github.com/observethesun/sunspot_groups and can be used to reproduce the results and for further research.

Notes

In application to neural networks, one- or multidimensional arrays are called tensors.
Strictly speaking, it is not necessary that the dimensionality of the output tensor is lower that the dimensionality of the input data.
The model consists of 3 hidden layers with 128, 64 and 32 neurons with ELU activation function. The output layer has a single neuron with the linear activation. We use the MSE loss function for regression problems and binary crossentropy for the classification problem.
solarcyclescience.com/activeregions.html
sunspots.irsol.usi.ch/db/

References

Abd, M., Majed, S., Zharkova, V.: 2010 Automated Classification of Sunspot Groups with Support Vector Machines. ISBN 978-90-481-9150-5. DOI.
Book Google Scholar
Baldi, P., Hornik, K.: 1989, Neural networks and principal component analysis: Learning from examples without local minima. Neural Netw. 2, 53. DOI.
Article Google Scholar
Bao, X., Lucas, J., Sachdeva, S., Grosse, R.B.: 2020, Regularized linear autoencoders recover the principal components, eventually. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems 33, Curran Associates, Red Hook, 6971.
Google Scholar
Chen, Y., Manchester, W.B., Hero, A.O., Toth, G., DuFumier, B., Zhou, T., Wang, X., Zhu, H., Sun, Z., Gombosi, T.I.: 2019, Identifying solar flare precursors using time series of SDO/HMI images and SHARP parameters. Space Weather 17, 1404. DOI.
Article ADS Google Scholar
Colak, T., Qahwaji, R.: 2008, Automated McIntosh-based classification of sunspot groups using MDI images. Solar Phys. 248, 277. DOI. ADS.
Article ADS Google Scholar
Dosovitskiy, A., Brox, T.: 2016, Generating images with perceptual similarity metrics based on deep networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, Curran Associates, Red Hook, 658. ISBN 9781510838819.
Google Scholar
Hale, G.E., Ellerman, F., Nicholson, S.B., Joy, A.H.: 1919, The magnetic polarity of sun-spots. Astrophys. J. 49, 153. DOI. ADS.
Article ADS Google Scholar
Hoyt, D.V., Schatten, K.H.: 1998, Group sunspot numbers: A new solar activity reconstruction. Solar Phys. 179, 189. DOI. ADS.
Article ADS Google Scholar
Illarionov, E., Kosovichev, A., Tlatov, A.: 2020, Machine-learning approach to identification of coronal holes in solar disk images and synoptic maps. Astrophys. J. 903, 115. DOI. ADS.
Article ADS Google Scholar
Illarionov, E., Tlatov, A., Sokoloff, D.: 2015, The properties of the tilts of bipolar solar regions. Solar Phys. 290, 351. DOI. ADS.
Article ADS Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: 2016, Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision.
Google Scholar
Johnstone, I.M., Paul, D.: 2018, PCA in high dimensions: An orientation. Proc. IEEE 106, 1277. DOI.
Article Google Scholar
Kingma, D.P., Welling, M.: 2019, An introduction to variational autoencoders. Found. Trends Mach. Learn. 12, 307. DOI.
Article MATH Google Scholar
Makarenko, N., Malkova, D., Machin, M., Knyazeva, I., Makarenko, I.: 2014, Methods of computational topology for the analysis of dynamics of active regions of the Sun. J. Math. Sci. 203, 806. DOI.
Article MathSciNet Google Scholar
McIntosh, P.S.: 1990, The classification of sunspot groups. Solar Phys. 125, 251. DOI. ADS.
Article ADS Google Scholar
Moon, K.R., Li, J.J., Delouille, V., De Visscher, R., Watson, F., Hero, A.O.: 2016, Image patch analysis of sunspots and active regions. I. Intrinsic dimension and correlation analysis. J. Space Weather Space Clim. 6, A2. DOI. ADS.
Article Google Scholar
Muñoz-Jaramillo, A., Senkpeil, R.R., Windmueller, J.C., Amouzou, E.C., Longcope, D.W., Tlatov, A.G., Nagovitsyn, Y.A., Pevtsov, A.A., Chapman, G.A., Cookson, A.M., Yeates, A.R., Watson, F.T., Balmaceda, L.A., DeLuca, E.E., Martens, P.C.H.: 2015, Small-scale and global dynamos and the area and flux distributions of active regions, sunspot groups, and sunspots: A multi-database study. Astrophys. J. 800, 48. DOI. ADS.
Article ADS Google Scholar
Murphy, K.P.: 2012, Machine Learning: A Probabilistic Perspective, MIT Press, Cambridge. ISBN 0262018020.
MATH Google Scholar
Pearson, K.: 1901, LIII. On lines and planes of closest fit to systems of points in space. Phil. Mag. 2, 559. DOI.
Article MATH Google Scholar
Sadykov, V.M., Kitiashvili, I.N., Dalda, A.S., Oria, V., Kosovichev, A.G., Illarionov, E.: 2021, Compression of solar spectroscopic observations: A case study of Mg II k spectral line profiles observed by NASA’s IRIS satellite. In: 2021 International Conference on Content-Based Multimedia Indexing (CBMI), 1. DOI.
Chapter Google Scholar
Schou, J., Scherrer, P.H., Bush, R.I., Wachter, R., Couvidat, S., Rabello-Soares, M.C., Bogart, R.S., Hoeksema, J.T., Liu, Y., Duvall, T.L., Akin, D.J., Allard, B.A., Miles, J.W., Rairden, R., Shine, R.A., Tarbell, T.D., Title, A.M., Wolfson, C.J., Elmore, D.F., Norton, A.A., Tomczyk, S.: 2012, Design and ground calibration of the helioseismic and magnetic imager (HMI) instrument on the solar dynamics observatory (SDO). Solar Phys. 275, 229. DOI. ADS.
Article ADS Google Scholar
Simonyan, K., Zisserman, A.: 2015, Very deep convolutional networks for large-scale image recognition.
Snell, J., Ridgeway, K., Liao, R., Roads, B.D., Mozer, M.C., Zemel, R.S.: 2017, Learning to generate images with perceptual similarity metrics. In: 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China. IEEE, 4277.
Chapter Google Scholar
Stenning, D., Lee, T., van Dyk, D., Kashyap, V., Sandell, J., Young, C.: 2013, Morphological feature extraction for statistical learning with applications to solar image data. Stat. Anal. Data Min. 6, 329. DOI.
Article MathSciNet MATH Google Scholar
Ternullo, M., Contarino, L., Romano, P., Zuccarello, F.: 2006, A statistical analysis of sunspot groups hosting M and X flares. Astron. Nachr. 327, 36. DOI. ADS.
Article ADS Google Scholar

Download references

Acknowledgments

The authors are grateful to the reviewers for valuable comments and suggestions. The research is carried out using the equipment of the shared research facilities of HPC computing resources at Lomonosov Moscow State University.

Funding

EI acknowledges the support of RSF grant 20-72-00106.

Author information

Authors and Affiliations

Moscow State University, Moscow, Russia
Egor Illarionov
Moscow Center of Fundamental and Applied Mathematics, Moscow, Russia
Egor Illarionov
Kislovodsk Mountain Astronomical Station, Kislovodsk, Russia
Andrey Tlatov

Authors

Egor Illarionov
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Tlatov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Egor Illarionov.

Ethics declarations

Disclosure of Potential Conflicts of Interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Complexity of Sunspot Groups

Here, we elaborate an application of the parametrization model to estimation of sunspot-group complexity. The main idea is that more complex structures should require more components of the latent vector \(Z\) for accurate reconstruction. We convert this idea into the following procedure. First, we consider the latent vector \(\mu \) in the output of the VAE encoder (recall, it has size 4096). Then, we measure a distance between this vector and its projection onto the first principal component (PC) and will refer to it as initial reconstruction error. Then, we measure the reconstruction error given the basis of the first two PCs, the first three PCs and so on. Clearly, by increasing the number of PCs, the reconstruction error will decrease.

In Figure 15 we show different decreasing patterns that arise from increasing the number of PCs from 1 to 285. Note that the colors in Figure 15 correspond to sunspot-group images shown in the first row of Figure 4. Intuitively, the complexity of sunspot-group structures increases in the first row of Figure 4. This impression is supported by Figure 15 where we observe that the first line drops more rapidly than the second one, the second line drops more rapidly than the third one and so on. Thus, we conclude that the decreasing pattern of the reconstruction error correlates with visual estimation of sunspot-group complexity.

In order to quantify the complexity we will find the number of PCs at which the reconstruction error is half of the initial reconstruction error. Figure 16 shows a distribution of the measured complexity over all sunspot groups visualized in the space of latent parameters \(Z_{1}\) and \(Z_{2}\). Comparing Figure 16 with Figure 9 we conclude that the proposed complexity measure corresponds to the expected properties and is low for single-spot groups and groups with small areas and increases for large multispot groups.

Appendix B: Classification of Sunspot Groups

As suggested in the main text, the latent parameters can be useful for sunspot classification. Proper investigation of this idea requires a verified annotation of sunspot groups, e.g., following the Zurich or McIntosh classification systems. The dataset we use in this research does not contain such labels.

Although there are external datasets with sunspot-group classes (e.g., NOAA/USAF^{Footnote 4} or Locarno^{Footnote 5} catalogs), we stress that separate research is required to establish a proper correspondence. There are at least several reasons why this process is not trivial. First, there is a certain time lag between observations in different catalogs. Taking into account the rapid evolution of sunspot groups at early stages, this time lag can cause systematic inconsistencies. Secondly, the difference in the resolution of telescopes (especially, satellite and ground-based) can strongly affect the estimation of the number of small spots and the identification of sunspot cores.

Nevertheless, to demonstrate the possibility of using latent parameters to classify sunspots, in this study we introduce a synthetic classification that mimics the McIntosh one. Specifically, we assign sunspot classes according to Table 1. The distribution of the classes in the space of latent parameters \(Z_{1}\) and \(Z_{2}\) is shown in Figure 17.

Table 1 Algorithm for sunspot-group labeling. These labels are used as targets for the classification-model training.

Full size table

Then, we train a simple fully connected neural-network model (similar to the one previously used to estimate sunspot-group properties) to predict classes based on latent vectors alone. We reserved 30% of samples for model validation and show the classification metrics in Table 2. We find that the accuracy varies substantially between classes and is 0.75 on average.

Table 2 Validation metrics. Overall accuracy is 0.75.

Full size table

There are several effects that, in our opinion, limit the accuracy. First, there is a strong class imbalance in the dataset. Thus, we obtain very moderate scores for the rare classes. Secondly, the shallow neural network we used for the demonstration may be too simple to adequately decode latent vectors. We find that deeper models easily fail into strong overfitting. For the real application it looks reasonable to complement the latent vector with some simple sunspot-group properties so that the model can benefit from both simple and deep sunspot-group descriptors. Thirdly, the confusion matrix shown in Figure 18 reveals that the model often confuses close classes (e.g., D and E or E and F). Given that the difference between these classes is only in the elongation of the group, the classification model can easily be improved using explicit sunspot-group properties.

Finally, we would like to note that in practice sunspot-group classes reflect the evolutionary stage of the group rather than the instantaneous characteristics. This means that a correct classification model should also rely on the group’s prehistory. In our opinion, latent vectors can be a useful tool for studying the dynamics of sunspot groups, and we leave this study for future work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Illarionov, E., Tlatov, A. Parametrization of Sunspot Groups Based on Machine-Learning Approach. Sol Phys 297, 19 (2022). https://doi.org/10.1007/s11207-022-01955-0

Download citation

Received: 01 December 2021
Accepted: 24 January 2022
Published: 14 February 2022
DOI: https://doi.org/10.1007/s11207-022-01955-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parametrization of Sunspot Groups Based on Machine-Learning Approach

Abstract

Access this article

Similar content being viewed by others

Identification of large-scale cellular structures on the Sun based on the SDO and PSPT data

Sunspot Positions and Areas from Observations by Thomas Harriot

A Sunspot Catalog by Rafael Carrasco at the Madrid Astronomical Observatory for the Period 1931 – 1933

Data Availability

Notes

References

Acknowledgments

Funding