A Survey of Computational Tools in Solar Physics

The SunPy Project developed a 13-question survey to understand the software and hardware usage of the solar-physics community. Of the solar-physics community, 364 members across 35 countries responded to our survey. We found that 99±0.5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$99\pm 0.5$\end{document}% of respondents use software in their research and 66% use the Python scientific-software stack. Students are twice as likely as faculty, staff scientists, and researchers to use Python rather than Interactive Data Language (IDL). In this respect, the astrophysics and solar-physics communities differ widely: 78% of solar-physics faculty, staff scientists, and researchers in our sample uses IDL, compared with 44% of astrophysics faculty and scientists sampled by Momcheva and Tollerud (2015). 63±4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$63\pm 4$\end{document}% of respondents have not taken any computer-science courses at an undergraduate or graduate level. We also found that most respondents use consumer hardware to run software for solar-physics research. Although 82% of respondents work with data from space-based or ground-based missions, some of which (e.g. the Solar Dynamics Observatory and Daniel K. Inouye Solar Telescope) produce terabytes of data a day, 14% use a regional or national cluster, 5% use a commercial cloud provider, and 29% use exclusively a laptop or desktop. Finally, we found that 73±4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$73\pm 4$\end{document}% of respondents cite scientific software in their research, although only 42±3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$42\pm 3$\end{document}% do so routinely.


Introduction
The SunPy Project (The SunPy Community et al., 2020) facilitates and promotes the use and development of community-led, free, and open source1 dataanalysis software for solar physics based on the scientific Python environment.To better understand the software and hardware preferences of the solar-physics community, the Project developed a 13-question survey (reproduced in Appendix A) and disseminated it internationally2 over a six-month period between 7 February 2019 and 28 July 2019.
Many of the survey questions were similar (and in some cases, identical) to those posed by Momcheva and Tollerud (2015) in an informal survey of 1142 members of the astrophysics community.The SunPy Project did this deliberately to compare software preferences between the solar and astrophysics communities.
This article presents the survey results, derived from analyzing 364 responses from community members across 35 countries.All of the survey responses, along with the code (Reback et al., 2020;Caswell et al., 2020;Waskom et al., 2020;van der Walt, Colbert, and Varoquaux, 2011;Bobra, Mumford, and Pereira, 2020) to analyze these data and produce the figures in this article, are publicly available at github.com/sunpy/survey.

Demographics
Since the SunPy Project relies largely on volunteer efforts, we chose to construct and disseminate this survey ourselves (instead of going through a formal channel such as the Statistical Research Center at the American Institute of Physics).As a result, we recognize that this survey may suffer from coverage error.
Our survey garnered 368 responses.Most of the survey respondents fit into one of four career stages: 56% (n=205) described themselves as a faculty member, staff scientist, or researcher, 15% (n=53) as a postdoc, 23% (n=84) as an undergraduate or graduate student, and 6% (n=22) as a software or instrument developer.This adds up to n=364.Four respondents did not fit into any career stage, and we dropped their responses from our analysis.
Community members across 35 countries3 responded to our survey.About three-quarters of the respondents came from the US, UK, Germany, India, and Japan.Together, these five countries include about 1150 solar physicists4 ; therefore, our survey sampled roughly a quarter of the solar-physics community.Our results are based on the assumption that our sample is representative of the solar-physics community overall.
We asked respondents to identify all of the areas of research relevant to their career.Most respondents identified multiple sub-disciplines of expertise.We found that 76% (n=275) work with space-based observational data, 46% (n=169) work with ground-based observational data, and 26% (n=93) work on building instruments.A vast majority of respondents, 82%, work with ground-based or space-based data.29% (n=105) identified theory as a relevant sub-discipline, and 47% (n=171) identified numerical simulations.
Most of the survey respondents (82%) chose to answer an optional question about whether they self-identified as an underrepresented minority; 16% of this subset (13% of the total sample) said yes. 79% of respondents chose to answer another optional question about whether they self-identified as a underrepresented gender identity; 11% of this subset (9% of the total sample) said yes.

Software Tools
In our survey of the solar-physics community, we found that 99±0.5% of respondents use software in their research 5 .In a survey of the astrophysics community, Momcheva and Tollerud (2015) found that 100% of respondents use software in their research.
We asked users to list all of the scientific software tools, including programming languages, software development tools, and data-analysis frameworks, that they utilized within the last year.We summarized their responses in Figure 1.We found that 66% of respondents use the Python scientific software stack and 73% use IDL 6 .Overall, respondents listed 42 different software tools and the average respondent used five tools in the past year.
We observe a stark contrast in usage between the two primary data-analysis languages in solar-physics research, Python and IDL, when viewed by respondent career stage.The earlier the career stage, the greater the percentage of Python users: 59% of faculty, staff scientists, and researchers, 75% of postdocs, and 79% of students use Python.The earlier the career stage, the fewer IDL users: 78% of faculty, staff scientists, and researchers, 75% of postdocs, and 60% of students use IDL.
Of course, these tools are not necessarily used in isolation -about half (45%) of respondents use both Python and IDL. Figure 2 shows that 28% of respondents use IDL exclusively (in other words, they use IDL and do not use Python), while 21% use Python exclusively.The ratio of exclusive IDL users to exclusive Python users is roughly 2:1 for faculty, staff, and research scientists and the opposite, 1:2, for students.
Figure 10 of Momcheva and Tollerud (2015) shows that Python is not only the most popular programming language within their sample of the astrophysics community, but it is also the most popular within every individual career category.Our survey results show that Python is the most popular programming language only among students; IDL and Python are at parity for postdocs, and IDL is more popular than Python for faculty, staff scientists, researchers, software developers, and instrument developers.In this respect, the astrophysics and solar-physics communities differ widely: 78% of solar-physics faculty, staff scientists, and researchers in our sample use IDL 7 , compared with 44% of astrophysics faculty and scientists sampled by Momcheva and Tollerud (2015). 6Where relevant, we supplied our counting error for non-demographic software and hardware related questions .For Question 6, we report √ 3/364, or 0.5%, as the percentage error in the number of no responses.Since this question required respondents to pick one response from a binary choice, we apply that same uncertainty to the yes responses.For Questions 7, 8, 10, and 11, which required respondents to pick only one response from a list of options, we quantified the percent error in each response simply by applying the squareroot rule for counting experiments (Taylor, 1997).For Questions 9 and 12, which allowed respondents to select as many options as they liked, we do not calculate a percent error.7 The use of IDL by the solar-physics community may be explained partly by how instrument teams provide their data.Many instrument teams provide data that have been calibrated to a low level, plus software that allows the data to be further calibrated for scientific use.The advantage of this model of scientific-data provision is that as knowledge of the instrument improves over time, the software can be updated to provide better high-level science-ready data products.A side-effect of this model of scientific-data provision is that scientific use of the data requires use of a particular package/language.Since many instrument teams chose to take advantage of the significant functionality provided by the SolarSoftWare (SSW: Freeland and Handy 1998) package, much of the software required to create higher-level data products is written in the primary language of SSW: IDL.Hence the model of scientific-data provision may explain why IDL is used by a significant proportion of respondents.The two groups of respondents share the same statistics, however, when it comes to writing software.In both the astrophysics and solar-physics communities, roughly a third of respondents write their own software most of the time (see Figure 3 of this article and Figure 3 of Momcheva and Tollerud, 2015).Furthermore, about 90% of respondents in both communities often or occasionally write their own software (see the same figures).

Education and Training
Although 99±0.5% of respondents use software in their research and 91±5% often or occasionally write their own software, 63±4% of respondents have not had any formal training (e.g.computer-science courses) at an undergraduate or graduate level.We found that people who write mostly their own software are no better trained than everyone else: 44±6% of people who write their own software reported "a lot (e.g. computer science courses)" of formal training, compared with 37±3% overall.We also found that students today are twice as likely to have a lot of formal training in programming compared with faculty, researchers, and staff scientists (see Figure 4).The amount of training does not vary with area of expertise; each sub-discipline shows roughly the same amount of formal training as the general population (37±3%).

Hardware Tools
We also found that most respondents utilize consumer hardware to run software for solar-physics research.Although 82% of respondents work with space-based or ground-based data, and some of these missions (e.g. the Solar Dynamics Observatory and Daniel K. Inouye Solar Telescope) produce terabytes of data per day, 14% use a regional or national cluster8 and 5% use a commercial cloud provider (see Figure 5).29% use exclusively a laptop or desktop.The community puts considerable effort into maintaining clusters and workstations, with 40% of respondents using a shared workstation, 51% using a local cluster, and 96% using a laptop or desktop.
These percentages vary significantly by sub-discipline.A larger percentage of respondents in the numerical simulations and theory sub-disciplines use local clusters (63% and 60%, respectively, compared with 51% overall) and regional or national clusters (26% and 26%, respectively, compared with 14% overall).

Citing Scientific Software
Figure 7 shows that 73±4% of respondents cite scientific software in their research, although only 42±3% do so routinely.Roughly a quarter (27±3%) never cite scientific software in their research.When asked why, about half (53±8%) responded that they do not know how to appropriately cite scientific software (see Figure 8); we note that only 4±1% of respondents do not think software belongs in citations.

Discussion
Scientific software is an indispensable component of the modern scientific research workflow (Rde et al., 2018).Virtually all of the solar-physics community uses software in their research.Based on this fact, we find three of the statistics presented in this article worrisome.First, similar to the astrophysics community9 , a significant fraction of the solar-physics community (63±4% of respondents) have not taken any computer-science courses at an undergraduate or graduate level.Second, most of the solar-physics community (82% of respondents) works with space-based or ground-based facilities, several of which produce terabyte-or petabyte-sized data sets, and nearly a third of the community (29% of respondents) uses exclusively a laptop or desktop to run software for solar-physics research.It is unclear whether the computing power offered by laptops and desktops limit the type of scientific endeavors in solar-physics.Finally, less than half of the community (42±3% of respondents) routinely cites scientific software in their research.The Ford Foundation's report, entitled Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure (Eghbal, 2016), also suggests expanding the pool of contributors so that more people, and more types of people, can build and sustain public software together."Increasing the diversity of the talent pool, which is still lacking in the solar-physics community, will help sustain a long-term future for open source software in solar-physics.
However, maximizing the scientific return of large data sets, such as those produced by the Solar Dynamics Observatory and the Daniel K. Inouye Solar Telescope, requires both skill in software development and computational resources.The United States National Academies of Sciences and Medicine (2020) report entitled Progress Toward Implementation of the 2013 Decadal Survey for Solar and Space Physics: A Midterm Assessment and the Kavli Foundation series of workshops called Petabytes To Science (Bauer et al., 2019) recommend adopting science platforms, which co-locate both data and computational resources required to analyze these data.In this paradigm, users run software in an external computing environment where the data lives, instead of moving the data to a desktop or laptop where the software lives.The astrophysics community already developed several science platforms, such as the ASTRO Data Lab (datalab.noao.edu),run by the NSFs National Optical-Infrared Astronomy Research Laboratory.We encourage the solar-physics community to fund the development of science platforms so that scientists are not restricted by the computational power of consumer hardware for analyses involving terabytes of data.
Finally, we recognize that software development, and hardware development, takes a vast amount of time.This time is rarely recognized by the academic community, which largely rewards publications.Therefore, we encourage the community to publish scientific software (by submitting articles that describe research software to refereed journals and archiving this software in publicly available digital repositories; see guides.github.com/activities/citable-code),cite scientific software (see Appendix B about how to cite scientific software), and count scientific software as a co-equal research artifact when considering career evaluation.This has two benefits: it gives academic credit and career recognition SOLA: main.tex; 1 April 2020; 0:57; p. 10 to those who write software and it makes it easier to reproduce studies in solarphysics.
Some of the earliest advocates for scientific reproducibility, Claerbout and Karrenbach (1992) and Buckheit and Donoho (1995), suggested that a journal article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship.The actual scholarship, they argue, is the code and development environment used to generate the results.Preserving these elements of scholarship require tools like version control, which create snapshots of software or data as they change over time.At the moment, less than half the community (44% of respondents) uses version control 10 .The United States National Academies of Sciences and Medicine (2019) report entitled Reproducibility and Replicability in Science recommends that researchers should convey clear, specific, and complete information about any computational methods and data products that support their published results in order to enable other researchers to repeat the analysis," including the data, study methods, and computational environment.
Scientists make a critical choice when selecting a computational environment, because the quality of our tools informs the quality of our research.A large fraction of the community uses the Python scientific-software stack (66% of respondents).This number will only grow over time, since Python is the most popular programming language among students in the solar-physics community (79% of students who took our survey use Python).
There are a number of reasons why the Python scientific-software stack is growing in prominence both in the solar-physics community and many other scientific disciplines 11 .Interoperability between many packages for numerical methods, plotting, astronomy, statistics, and computing (e.g.Virtanen et al., 2020;van der Walt, Colbert, and Varoquaux, 2011;McKinney, 2010;Hunter, 2007;The Astropy Collaboration et al., 2018;VanderPlas et al., 2012;Pedregosa et al., 2011;Rocklin, 2015) allows researchers to write code with relative speed and ease.The rise of more than fifty packages in heliophysics alone (see heliopython.org)enables interdisciplinary analysis across traditionally isolated fields.The open-development model 12 , adopted by most of the scientific Python ecosystem, improves the longevity of software since anyone can contribute to the codebase and no single institution or person controls the software. For

Figure 1 .
Figure 1.Summary of results for survey Question 9 "Which of the following [software tools] have you personally utilized in your work within the last year?"Results are grouped by self-identified career stage (Question 2).Respondents listed 42 different software tools; only tools used by 5% or more of respondents are shown.

Figure 2 .
Figure 2. Comparison of respondents that report using Python or IDL exclusively by reported career role.

Figure 3 .
Figure 3.Comparison of respondent's software development and use activities by reported career role, with uncertainty estimate.

Figure 4 .
Figure 4. Comparison of respondent's formal computer-science education activities (at both undergraduate and graduate level) by reported career role, with uncertainty estimate.

Figure 5 .
Figure 5. Responses to Question 12, related to computer resource and hardware usage, broken down by career role (Question 2).

Figure 6 .
Figure 6.Responses to Question 12, related to computer resource and hardware usage, broken down by solar-physics research area (Question 1).

Figure 7 .
Figure 7. Responses to Question 10, "Have you cited software papers in your published research?"

Figure 8 .
Figure 8. Responses to Question 11, "Why haven't you cited software in your research?",for those that responded "No" to Question 10.
these reasons, the United States National Academies of Sciences and Medicine (2018) report entitled Software Policy Options for NASA Earth and Space Sciences recommends that the NASA Science Mission Directorate should explicitly recognize the scientific value of open source software and incentivize its development and support, with the goal that open source science software becomes routine scientific practice.As the SunPy Advisory Board, we endorse this recommendation not only for the NASA Science Mission Directorate but for scientific funding agencies worldwide.3. What country is your institution in? (Respondents check appropriate country from a list of options.) 4. Do you self-identify as one or more underrepresented minorities in solar physics?This question is optional.Do you have any comments?(This is a free form response; comments are not required.Please feel free to give us feedback about topics like: version control, collaborative coding platforms such as Github, standard or best practices in coding, operating systems, text editors, or your personal experience with writing code and releasing software, or general thoughts about SunPy).
Yes, a lot (e.g.CS courses at an undergraduate or graduate level) Yes, a little (e.g.online classes, books, workshops) No 8. Which of the following statements is most applicable to you?I write mostly my own software.I mostly use software written by others.SOLA: main.tex; 1 April 2020; 0:57; p. 14