1 Introduction

The SunPy Project (The SunPy Community et al., 2020) facilitates and promotes the use and development of community-led, free, and open sourceFootnote 1 data-analysis software for solar physics based on the scientific Python environment. To better understand the software and hardware preferences of the solar-physics community, the Project developed a 13-question survey (reproduced in Appendix A) and disseminated it internationallyFootnote 2 over a six-month period between 7 February 2019 and 28 July 2019.

Many of the survey questions were similar (and in some cases, identical) to those posed by Momcheva and Tollerud (2015) in an informal survey of 1142 members of the astrophysics community. The SunPy Project did this deliberately to compare software preferences between the solar and astrophysics communities.

This article presents the survey results, derived from analyzing 364 responses from community members across 35 countries. All of the survey responses, along with the code (Reback et al., 2020; Caswell et al., 2020; Waskom et al., 2020; van der Walt, Colbert, and Varoquaux, 2011; Bobra, Mumford, and Pereira, 2020) to analyze these data and produce the figures in this article, are publicly available at github.com/sunpy/survey.

2 Demographics

Since the SunPy Project relies largely on volunteer efforts, we chose to construct and disseminate this survey ourselves (instead of going through a formal channel such as the Statistical Research Center at the American Institute of Physics). As a result, we recognize that this survey may suffer from coverage error.

Our survey garnered 368 responses. Most of the survey respondents fit into one of four career stages: 56% (\(n=205\)) described themselves as a faculty member, staff scientist, or researcher, 15% (\(n=53\)) as a postdoc, 23% (\(n=84\)) as an undergraduate or graduate student, and 6% (\(n=22\)) as a software or instrument developer. This adds up to \(n=364\). Four respondents did not fit into any career stage, and we dropped their responses from our analysis.

Community members across 35 countriesFootnote 3 responded to our survey. About three-quarters of the respondents came from the US, UK, Germany, India, and Japan. Together, these five countries include about 1150 solar physicists;Footnote 4 therefore, our survey sampled roughly a quarter of the solar-physics community. Our results are based on the assumption that our sample is representative of the solar-physics community overall.

We asked respondents to identify all of the areas of research relevant to their career. Most respondents identified multiple sub-disciplines of expertise. We found that 76% (\(n=275\)) work with space-based observational data, 46% (\(n=169\)) work with ground-based observational data, and 26% (\(n=93\)) work on building instruments. A vast majority of respondents, 82%, work with ground-based or space-based data. 29% (\(n=105\)) identified theory as a relevant sub-discipline, and 47% (\(n=171\)) identified numerical simulations.

Most of the survey respondents (82%) chose to answer an optional question about whether they self-identified as an underrepresented minority; 16% of this subset (13% of the total sample) said yes. 79% of respondents chose to answer another optional question about whether they self-identified as a underrepresented gender identity; 11% of this subset (9% of the total sample) said yes.

3 Software Tools

In our survey of the solar-physics community, we found that \(99\pm 0.5\)% of respondents use software in their research.Footnote 5 In a survey of the astrophysics community, Momcheva and Tollerud (2015) found that 100% of respondents use software in their research.

We asked users to list all of the scientific-software tools, including programming languages, software development tools, and data-analysis frameworks, that they utilized within the last year. We summarized their responses in Figure 1. We found that 66% of respondents use the Python scientific-software stack and 73% use IDL.Footnote 6 Overall, respondents listed 42 different software tools and the average respondent used five tools in the past year.

Figure 1
figure 1

Summary of results for survey Question 9 “Which of the following [software tools] have you personally utilized in your work within the last year?” Results are grouped by self-identified career stage (Question 2). Respondents listed 42 different software tools; only tools used by 5% or more of respondents are shown.

We observe a stark contrast in usage between the two primary data-analysis languages in solar-physics research, Python and IDL, when viewed by respondent career stage. The earlier the career stage, the greater the percentage of Python users: 59% of faculty, staff scientists, and researchers, 75% of postdocs, and 79% of students use Python. The earlier the career stage, the fewer IDL users: 78% of faculty, staff scientists, and researchers, 75% of postdocs, and 60% of students use IDL.

Of course, these tools are not necessarily used in isolation – about half (45%) of respondents use both Python and IDL. Figure 2 shows that 28% of respondents use IDL exclusively (in other words, they use IDL and do not use Python), while 21% use Python exclusively. The ratio of exclusive IDL users to exclusive Python users is roughly 2:1 for faculty, staff, and research scientists and the opposite, 1:2, for students.

Figure 2
figure 2

Comparison of respondents that report using Python or IDL exclusively by reported career role.

Figure 10 of Momcheva and Tollerud (2015) shows that Python is not only the most popular programming language within their sample of the astrophysics community, but it is also the most popular within every individual career category. Our survey results show that Python is the most popular programming language only among students; IDL and Python are at parity for postdocs, and IDL is more popular than Python for faculty, staff scientists, researchers, software developers, and instrument developers. In this respect, the astrophysics and solar-physics communities differ widely: 78% of solar-physics faculty, staff scientists, and researchers in our sample use IDL,Footnote 7 compared with 44% of astrophysics faculty and scientists sampled by Momcheva and Tollerud (2015).

The two groups of respondents share the same statistics, however, when it comes to writing software. In both the astrophysics and solar-physics communities, roughly a third of respondents write their own software most of the time (see Figure 3 of this article and Figure 3 of Momcheva and Tollerud, 2015). Furthermore, about 90% of respondents in both communities often or occasionally write their own software (see the same figures).

Figure 3
figure 3

Comparison of respondent’s software development and use activities by reported career role, with uncertainty estimate.

4 Education and Training

Although \(99\pm 0.5\)% of respondents use software in their research and \(91\pm 5\)% often or occasionally write their own software, \(63\pm 4\)% of respondents have not had any formal training (e.g. computer-science courses) at an undergraduate or graduate level. We found that people who write mostly their own software are no better trained than everyone else: \(44\pm 6\)% of people who write their own software reported “a lot (e.g. computer-science courses)” of formal training, compared with \(37\pm 3\)% overall. We also found that students today are twice as likely to have a lot of formal training in programming compared with faculty, researchers, and staff scientists (see Figure 4). The amount of training does not vary with area of expertise; each sub-discipline shows roughly the same amount of formal training as the general population (\(37\pm 3\)%).

Figure 4
figure 4

Comparison of respondent’s formal computer-science education activities (at both undergraduate and graduate level) by reported career role, with uncertainty estimate.

5 Hardware Tools

We also found that most respondents utilize consumer hardware to run software for solar-physics research. Although 82% of respondents work with space-based or ground-based data, and some of these missions (e.g. the Solar Dynamics Observatory and Daniel K. Inouye Solar Telescope) produce terabytes of data per day, 14% use a regional or national clusterFootnote 8 and 5% use a commercial cloud provider (see Figure 5). 29% use exclusively a laptop or desktop. The community puts considerable effort into maintaining clusters and workstations, with 40% of respondents using a shared workstation, 51% using a local cluster, and 96% using a laptop or desktop.

Figure 5
figure 5

Responses to Question 12, related to computer resource and hardware usage, broken down by career role (Question 2).

These percentages vary significantly by sub-discipline. A larger percentage of respondents in the numerical simulations and theory sub-disciplines use local clusters (63% and 60%, respectively, compared with 51% overall) and regional or national clusters (26% and 26%, respectively, compared with 14% overall) (Figure 6).

Figure 6
figure 6

Responses to Question 12, related to computer resource and hardware usage, broken down by solar-physics research area (Question 1).

6 Citing Scientific Software

Figure 7 shows that \(73\pm 4\)% of respondents cite scientific software in their research, although only \(42\pm 3\)% do so routinely. Roughly a quarter (\(27\pm 3\)%) never cite scientific software in their research. When asked why, about half (\(53\pm 8\)%) responded that they do not know how to appropriately cite scientific software (see Figure 8); we note that only \(4\pm 1\)% of respondents do not think software belongs in citations.

Figure 7
figure 7

Responses to Question 10, “Have you cited software papers in your published research?”

Figure 8
figure 8

Responses to Question 11, “Why haven’t you cited software in your research?”, for those that responded “No” to Question 10.

7 Discussion

Scientific software is an indispensable component of the modern scientific research workflow (Rüde et al., 2018). Virtually all of the solar-physics community uses software in their research. Based on this fact, we find three of the statistics presented in this article worrisome. First, similar to the astrophysics community,Footnote 9 a significant fraction of the solar-physics community (\(63\pm 4\)% of respondents) have not taken any computer-science courses at an undergraduate or graduate level. Second, most of the solar-physics community (82% of respondents) works with space-based or ground-based facilities, several of which produce terabyte- or petabyte-sized data sets, and nearly a third of the community (29% of respondents) uses exclusively a laptop or desktop to run software for solar-physics research. It is unclear whether the computing power offered by laptops and desktops limits the type of scientific endeavors in solar physics. Finally, less than half of the community (\(42\pm 3\)% of respondents) routinely cites scientific software in their research.

The United States National Academies of Sciences, Engineering, and Medicine (2018) report entitled Software Policy Options for NASA Earth and Space Sciences recognizes the lack of education in software development among scientists. The report recommends initiating and sponsoring “programs to educate and train researchers in open source best practices,” suggesting topics such as “export controls, licensing and intellectual property, workflows, and software development.” This includes sponsoring community members to attend conferences about open-source software development, such as Python in Astronomy (openastronomy.org/pyastro) or Scientific Computing with Python (conference.scipy.org), take online courses about software development, available on learning platforms such as Coursera (coursera.org) and edX (edx.org), join workshops like those led by The Carpentries (carpentries.org), and develop training programs, such as the Large Synoptic Survey Telescope’s Data Science Fellowship program (astrodatascience.org). Our findings in Section 4 show that the solar-physics community could benefit immensely from education and training in open-source software.

The Ford Foundation’s report, entitled Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure (Eghbal, 2016), also suggests “expanding the pool of contributors so that more people, and more types of people, can build and sustain public software together.” Increasing the diversity of the talent pool, which is still lacking in the solar-physics community, will help sustain a long-term future for open source software in solar physics.

However, maximizing the scientific return of large data sets, such as those produced by the Solar Dynamics Observatory and the Daniel K. Inouye Solar Telescope, requires both skill in software development and computational resources. The United States National Academies of Sciences, Engineering, and Medicine (2020) report entitled Progress Toward Implementation of the 2013 Decadal Survey for Solar and Space Physics: A Midterm Assessment and the Kavli Foundation series of workshops called Petabytes To Science (Bauer et al., 2019) recommend adopting science platforms that co-locate both data and computational resources required to analyze these data. In this paradigm, users run software in an external computing environment where the data lives, instead of moving the data to a desktop or laptop where the software lives. The astrophysics community already developed several science platforms, such as the ASTRO Data Lab (datalab.noao.edu), run by the NSF’s National Optical-Infrared Astronomy Research Laboratory. We encourage the solar-physics community to fund the development of science platforms so that scientists are not restricted by the computational power of consumer hardware for analyses involving terabytes of data.

Finally, we recognize that software development, and hardware development, takes a vast amount of time. This time is rarely recognized by the academic community, which largely rewards publications. Therefore, we encourage the community to publish scientific software (by submitting articles that describe research software to refereed journals and archiving this software in publicly available digital repositories; see guides.github.com/activities/citable-code), cite scientific software (see Appendix B about how to cite scientific software), and count scientific software as a co-equal research artifact when considering career evaluation. This has two benefits: it gives academic credit and career recognition to those who write software and it makes it easier to reproduce studies in solar physics.

Some of the earliest advocates for scientific reproducibility, Claerbout and Karrenbach (1992) and Buckheit and Donoho (1995), suggested that a journal article “about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship.” The actual scholarship, they argue, is the code and development environment used to generate the results. Preserving these elements of scholarship require tools like version control, which create snapshots of software or data as they change over time. At the moment, less than half the community (44% of respondents) uses version control.Footnote 10 The United States National Academies of Sciences, Engineering, and Medicine (2019) report entitled Reproducibility and Replicability in Science recommends that “researchers should convey clear, specific, and complete information about any computational methods and data products that support their published results in order to enable other researchers to repeat the analysis,” including the data, study methods, and computational environment.

Scientists make a critical choice when selecting a computational environment, because the quality of our tools informs the quality of our research. A large fraction of the community uses the Python scientific-software stack (66% of respondents). This number will only grow over time, since Python is the most popular programming language among students in the solar-physics community (79% of students who took our survey use Python).

There are a number of reasons why the Python scientific-software stack is growing in prominence both in the solar-physics community and many other scientific disciplines.Footnote 11 Interoperability between many packages for plotting, numerical methods, astronomy, statistics, and computing (e.g. Hunter, 2007; McKinney, 2010; Pedregosa et al., 2011; van der Walt, Colbert, and Varoquaux, 2011; VanderPlas et al., 2012; Rocklin, 2015; The Astropy Collaboration et al., 2018; Virtanen et al., 2020) allows researchers to write code with relative speed and ease. The rise of more than 50 packages in heliophysics alone (see heliopython.org) enables interdisciplinary analysis across traditionally isolated fields. The open-development model,Footnote 12 adopted by most of the scientific Python ecosystem, improves the longevity of software since anyone can contribute to the codebase and no single institution or person controls the software.

For these reasons, the United States National Academies of Sciences, Engineering, and Medicine (2018) report entitled Software Policy Options for NASA Earth and Space Sciences recommends that the “NASA Science Mission Directorate should explicitly recognize the scientific value of open source software and incentivize its development and support, with the goal that open source science software becomes routine scientific practice.” As the SunPy Advisory Board, we endorse this recommendation not only for the NASA Science Mission Directorate but for scientific funding agencies worldwide.