Abstract
Sunspot groups observed in white light appear as complex structures. Analysis of these structures is usually based on simple morphological descriptors that only capture generic properties and miss information about fine details. We present a machine-learning approach to introduce a complete yet compact description of sunspot groups. The idea is to map sunspot-group images into an appropriate lower-dimensional (latent) space. We apply a combination of Variational Autoencoder and Principal Component Analysis to obtain a set of 285 latent descriptors. We demonstrate that the standard descriptors are embedded into the latent ones. Thus, latent features can be considered as an extended description of sunspot groups and, in our opinion, can expand the possibilities for research on sunspot groups. In particular, we demonstrate an application for the estimation of the sunspot-group complexity. The proposed parametrization model is generic and can be applied to investigation of other traces of solar activity observed in various spectral lines.
Similar content being viewed by others
Data Availability
Key components of this work, which are the parametrization model and the dataset of sunspot groups and latent vectors, are available in the public GitHub repository github.com/observethesun/sunspot_groups and can be used to reproduce the results and for further research.
Notes
In application to neural networks, one- or multidimensional arrays are called tensors.
Strictly speaking, it is not necessary that the dimensionality of the output tensor is lower that the dimensionality of the input data.
The model consists of 3 hidden layers with 128, 64 and 32 neurons with ELU activation function. The output layer has a single neuron with the linear activation. We use the MSE loss function for regression problems and binary crossentropy for the classification problem.
References
Abd, M., Majed, S., Zharkova, V.: 2010 Automated Classification of Sunspot Groups with Support Vector Machines. ISBN 978-90-481-9150-5. DOI.
Baldi, P., Hornik, K.: 1989, Neural networks and principal component analysis: Learning from examples without local minima. Neural Netw. 2, 53. DOI.
Bao, X., Lucas, J., Sachdeva, S., Grosse, R.B.: 2020, Regularized linear autoencoders recover the principal components, eventually. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems 33, Curran Associates, Red Hook, 6971.
Chen, Y., Manchester, W.B., Hero, A.O., Toth, G., DuFumier, B., Zhou, T., Wang, X., Zhu, H., Sun, Z., Gombosi, T.I.: 2019, Identifying solar flare precursors using time series of SDO/HMI images and SHARP parameters. Space Weather 17, 1404. DOI.
Colak, T., Qahwaji, R.: 2008, Automated McIntosh-based classification of sunspot groups using MDI images. Solar Phys. 248, 277. DOI. ADS.
Dosovitskiy, A., Brox, T.: 2016, Generating images with perceptual similarity metrics based on deep networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, Curran Associates, Red Hook, 658. ISBN 9781510838819.
Hale, G.E., Ellerman, F., Nicholson, S.B., Joy, A.H.: 1919, The magnetic polarity of sun-spots. Astrophys. J. 49, 153. DOI. ADS.
Hoyt, D.V., Schatten, K.H.: 1998, Group sunspot numbers: A new solar activity reconstruction. Solar Phys. 179, 189. DOI. ADS.
Illarionov, E., Kosovichev, A., Tlatov, A.: 2020, Machine-learning approach to identification of coronal holes in solar disk images and synoptic maps. Astrophys. J. 903, 115. DOI. ADS.
Illarionov, E., Tlatov, A., Sokoloff, D.: 2015, The properties of the tilts of bipolar solar regions. Solar Phys. 290, 351. DOI. ADS.
Johnson, J., Alahi, A., Fei-Fei, L.: 2016, Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision.
Johnstone, I.M., Paul, D.: 2018, PCA in high dimensions: An orientation. Proc. IEEE 106, 1277. DOI.
Kingma, D.P., Welling, M.: 2019, An introduction to variational autoencoders. Found. Trends Mach. Learn. 12, 307. DOI.
Makarenko, N., Malkova, D., Machin, M., Knyazeva, I., Makarenko, I.: 2014, Methods of computational topology for the analysis of dynamics of active regions of the Sun. J. Math. Sci. 203, 806. DOI.
McIntosh, P.S.: 1990, The classification of sunspot groups. Solar Phys. 125, 251. DOI. ADS.
Moon, K.R., Li, J.J., Delouille, V., De Visscher, R., Watson, F., Hero, A.O.: 2016, Image patch analysis of sunspots and active regions. I. Intrinsic dimension and correlation analysis. J. Space Weather Space Clim. 6, A2. DOI. ADS.
Muñoz-Jaramillo, A., Senkpeil, R.R., Windmueller, J.C., Amouzou, E.C., Longcope, D.W., Tlatov, A.G., Nagovitsyn, Y.A., Pevtsov, A.A., Chapman, G.A., Cookson, A.M., Yeates, A.R., Watson, F.T., Balmaceda, L.A., DeLuca, E.E., Martens, P.C.H.: 2015, Small-scale and global dynamos and the area and flux distributions of active regions, sunspot groups, and sunspots: A multi-database study. Astrophys. J. 800, 48. DOI. ADS.
Murphy, K.P.: 2012, Machine Learning: A Probabilistic Perspective, MIT Press, Cambridge. ISBN 0262018020.
Pearson, K.: 1901, LIII. On lines and planes of closest fit to systems of points in space. Phil. Mag. 2, 559. DOI.
Sadykov, V.M., Kitiashvili, I.N., Dalda, A.S., Oria, V., Kosovichev, A.G., Illarionov, E.: 2021, Compression of solar spectroscopic observations: A case study of Mg II k spectral line profiles observed by NASA’s IRIS satellite. In: 2021 International Conference on Content-Based Multimedia Indexing (CBMI), 1. DOI.
Schou, J., Scherrer, P.H., Bush, R.I., Wachter, R., Couvidat, S., Rabello-Soares, M.C., Bogart, R.S., Hoeksema, J.T., Liu, Y., Duvall, T.L., Akin, D.J., Allard, B.A., Miles, J.W., Rairden, R., Shine, R.A., Tarbell, T.D., Title, A.M., Wolfson, C.J., Elmore, D.F., Norton, A.A., Tomczyk, S.: 2012, Design and ground calibration of the helioseismic and magnetic imager (HMI) instrument on the solar dynamics observatory (SDO). Solar Phys. 275, 229. DOI. ADS.
Simonyan, K., Zisserman, A.: 2015, Very deep convolutional networks for large-scale image recognition.
Snell, J., Ridgeway, K., Liao, R., Roads, B.D., Mozer, M.C., Zemel, R.S.: 2017, Learning to generate images with perceptual similarity metrics. In: 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China. IEEE, 4277.
Stenning, D., Lee, T., van Dyk, D., Kashyap, V., Sandell, J., Young, C.: 2013, Morphological feature extraction for statistical learning with applications to solar image data. Stat. Anal. Data Min. 6, 329. DOI.
Ternullo, M., Contarino, L., Romano, P., Zuccarello, F.: 2006, A statistical analysis of sunspot groups hosting M and X flares. Astron. Nachr. 327, 36. DOI. ADS.
Acknowledgments
The authors are grateful to the reviewers for valuable comments and suggestions. The research is carried out using the equipment of the shared research facilities of HPC computing resources at Lomonosov Moscow State University.
Funding
EI acknowledges the support of RSF grant 20-72-00106.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Disclosure of Potential Conflicts of Interest
The authors declare that they have no conflicts of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Complexity of Sunspot Groups
Here, we elaborate an application of the parametrization model to estimation of sunspot-group complexity. The main idea is that more complex structures should require more components of the latent vector \(Z\) for accurate reconstruction. We convert this idea into the following procedure. First, we consider the latent vector \(\mu \) in the output of the VAE encoder (recall, it has size 4096). Then, we measure a distance between this vector and its projection onto the first principal component (PC) and will refer to it as initial reconstruction error. Then, we measure the reconstruction error given the basis of the first two PCs, the first three PCs and so on. Clearly, by increasing the number of PCs, the reconstruction error will decrease.
In Figure 15 we show different decreasing patterns that arise from increasing the number of PCs from 1 to 285. Note that the colors in Figure 15 correspond to sunspot-group images shown in the first row of Figure 4. Intuitively, the complexity of sunspot-group structures increases in the first row of Figure 4. This impression is supported by Figure 15 where we observe that the first line drops more rapidly than the second one, the second line drops more rapidly than the third one and so on. Thus, we conclude that the decreasing pattern of the reconstruction error correlates with visual estimation of sunspot-group complexity.
In order to quantify the complexity we will find the number of PCs at which the reconstruction error is half of the initial reconstruction error. Figure 16 shows a distribution of the measured complexity over all sunspot groups visualized in the space of latent parameters \(Z_{1}\) and \(Z_{2}\). Comparing Figure 16 with Figure 9 we conclude that the proposed complexity measure corresponds to the expected properties and is low for single-spot groups and groups with small areas and increases for large multispot groups.
Appendix B: Classification of Sunspot Groups
As suggested in the main text, the latent parameters can be useful for sunspot classification. Proper investigation of this idea requires a verified annotation of sunspot groups, e.g., following the Zurich or McIntosh classification systems. The dataset we use in this research does not contain such labels.
Although there are external datasets with sunspot-group classes (e.g., NOAA/USAFFootnote 4 or LocarnoFootnote 5 catalogs), we stress that separate research is required to establish a proper correspondence. There are at least several reasons why this process is not trivial. First, there is a certain time lag between observations in different catalogs. Taking into account the rapid evolution of sunspot groups at early stages, this time lag can cause systematic inconsistencies. Secondly, the difference in the resolution of telescopes (especially, satellite and ground-based) can strongly affect the estimation of the number of small spots and the identification of sunspot cores.
Nevertheless, to demonstrate the possibility of using latent parameters to classify sunspots, in this study we introduce a synthetic classification that mimics the McIntosh one. Specifically, we assign sunspot classes according to Table 1. The distribution of the classes in the space of latent parameters \(Z_{1}\) and \(Z_{2}\) is shown in Figure 17.
Then, we train a simple fully connected neural-network model (similar to the one previously used to estimate sunspot-group properties) to predict classes based on latent vectors alone. We reserved 30% of samples for model validation and show the classification metrics in Table 2. We find that the accuracy varies substantially between classes and is 0.75 on average.
There are several effects that, in our opinion, limit the accuracy. First, there is a strong class imbalance in the dataset. Thus, we obtain very moderate scores for the rare classes. Secondly, the shallow neural network we used for the demonstration may be too simple to adequately decode latent vectors. We find that deeper models easily fail into strong overfitting. For the real application it looks reasonable to complement the latent vector with some simple sunspot-group properties so that the model can benefit from both simple and deep sunspot-group descriptors. Thirdly, the confusion matrix shown in Figure 18 reveals that the model often confuses close classes (e.g., D and E or E and F). Given that the difference between these classes is only in the elongation of the group, the classification model can easily be improved using explicit sunspot-group properties.
Finally, we would like to note that in practice sunspot-group classes reflect the evolutionary stage of the group rather than the instantaneous characteristics. This means that a correct classification model should also rely on the group’s prehistory. In our opinion, latent vectors can be a useful tool for studying the dynamics of sunspot groups, and we leave this study for future work.
Rights and permissions
About this article
Cite this article
Illarionov, E., Tlatov, A. Parametrization of Sunspot Groups Based on Machine-Learning Approach. Sol Phys 297, 19 (2022). https://doi.org/10.1007/s11207-022-01955-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11207-022-01955-0