Introduction

Particle physics in the coming decades will continue to explore the fundamental workings of the universe. This requires upgrading existing major facilities such as the Large Hadron Collider (LHC) to the High Luminosity LHC [1] and building new facilities like the Long-Baseline Neutrino Facility (LBNF) [2] and Deep Underground Neutrino Experiment (DUNE) [3], among many others. To realize the full physics potential of this work, an equivalent investment must be made into the software required to collect, process, and analyze the deluge of the data recorded. Recent efforts such as the HSF [4] and IRIS-HEP [5] are facilitating cooperation and common efforts in HEP software and computing worldwide to develop state-of-the-art software cyberinfrastructure required to meet the challenges of the upcoming HEP experiments’ data-intensive scientific research. The rapid evolution of computing technology with a concomitant increase in the complexity of software algorithms for analysis requires developers to acquire a broad portfolio of programming skills to enable future discoveries.

It is critical that all stakeholders across HEP make a major effort to provide a strong foundation for new researchers entering the field. The researchers must be brought up to date with new software technologies, concurrent programming, and artificial intelligence, and must maintain, improve, and sustain the existing HEP software. However, young researchers graduating from universities worldwide currently do not receive adequate preparation in modern computing practices to respond to the growing needs related to the above experimental challenges. A community white paper [6] outlined the initiatives to address training needs and issues that need to be taken into account for these to be successful. In the last two years, the HSF Training working group, together with IRIS-HEP and FIRST-HEP [7] and partnering with The Carpentries [8], has begun development of a software training program. The efforts of this group have been focused on two specific goals: (1) developing material for an introductory HEP software curriculum, and (2) teaching this curriculum to HEP scientists. Thus far, over 1000 people in HEP and related computing areas have been trained. This paper describes the activities, the curriculum, and future directions of HEP software training.

Organization

The HSF Training working group, which is led by three co-conveners, engages with different experimental collaborations and initiatives such as IRIS-HEP, FIRST-HEP, and The Carpentries. The training group has weekly public meetings [9] to plan and assess progress. This is where ideas and proposals are discussed and events are planned. These meetings are held remotely using Zoom and live notes are maintained for anyone unable to join. Training events are announced via several email lists, with registration and timetables organized using Indico [10].

The style and pedagogy of the training is heavily inspired by The Carpentries. The training is student-centric, suitable for self-study, and experiment agnostic, with reusable study material that is open source and open access, and hosted in the HSF’s Training repositories on GitHub [11]. We encourage participants to provide feedback and suggestions for improvement by opening issues in these repositories or to directly help with the development by opening pull requests. In most cases the training material is in the form of a website that is built from files written in the easy-to-learn Markdown language. The website is automatically built using the static site generator Jekyll [12] via GitHub Pages [13] and an adapted and extended template from The Carpentries [14, 15]. Thus, the entry barrier to contribute to the material is fairly low, as only basic knowledge of git is required (and in most cases, all necessary steps can be performed via the GitHub web interface). All lessons are listed in the HSF Training Center [16], which provides an overview of the available training modules and serves as an entry point for anyone wishing to learn.

Based on our experiences, we have also formalized the procedure used to organize a training event and have compiled our knowledge in a compact guide [17]. As organization is all about dividing work, we distinguish between three relevant roles at our events:

  • Instructors are subject-matter experts who develop training material and then teach it, either in person, in recorded live sessions, or by recording videos before the event. Instructors are the primary academic drivers of the program at large and provide guidance to mentors and students alike. They gain experience in curriculum design with a focus on optimizing pedagogy for all learning styles.

  • Mentors work closely with participants, for example, by conducting small group mentoring sessions with ideally only five students per mentor. They optimize the learning environment for individual participants and help them persevere. They are critical to the success of any event and through participation as a mentor not only serve the community, but develop pedagogical communication skills that are transferable to other aspects of their research/teaching portfolio.

  • Facilitators take care of organizational aspects. They are responsible for putting together all of the pieces of the puzzle to successfully execute the full event while serving as the primary point of reference for participants to communicate. They take on a dynamic responsibility beyond the “core content” of the training event itself, and they also learn the essential “soft skills” necessary to be a leader in the academic community and beyond.

All three groups are collectively referred to as educators. While some of our members are allowed to use a fraction of their regular working hours for our teaching activities, most of the work is done on a voluntary basis. As creating training material and teaching requires a lot of commitment and time, it is therefore of great importance to acknowledge the efforts of everyone involved. Currently this is mostly achieved by listing helping community members on the pages of the relevant training and on a central community page [18].

Finally, Blueprint workshops [19] and hackathons [20] are organized to brainstorm new training events, develop content, and discuss improvements. The travel cost for educators and video captioning of training material have been supported by IRIS-HEP and FIRST-HEP.

Curriculum

An initial survey of the software and training needs of the HEP community was conducted in February of 2019 [21]. This was followed by the development of “prototype” course modules and pilot training events from which feedback from participants was solicited.

Based on the surveys and the experiences gathered at the events, the course structure was extended into a full curriculum consisting of a variety of training modules. Each training module is independent from the others (except for some clearly marked requirements), so that students can prioritize certain skills before others. This is especially important in academia because students are often expected to work directly towards scientific results with minimal time given for acquiring software knowledge or best practices.

The most basic skill set (Unix shell, Python, and git) is covered by modules directly developed by the Software Carpentry [22]. A large module that covers the basics of modern C++ is currently in development and other modules focusing on development in C++ such as CMake have already been taught with great success.

This is complemented by a series of broader software engineering topics, such as continuous integration and deployment using both GitHub Actions and GitLab CI as examples. These modules are also particularly relevant for analysis preservation, for which modules covering domain-specific software such as REANA [23] are in development.

A lesson on machine learning and a lesson specifically targeting machine learning with GPUs started a data analysis techniques curriculum section. Similarly, important are HEP-specific tools, especially the ROOT data analysis framework [24] and packages such as uproot [25] that enhance its interoperability with non-HEP-specific packages.

Finally, development is ongoing of modules that cover advanced topics that are important for students striving to become core developers, such as code documentation, performance optimization and parallel programming.

The module list [16] and the material evolves continuously depending on input from participants and person-power available; as it is open source, any interested stakeholder can contribute.

Training Events

During the initial period of training, 150 people received “introductory” software skills training at Fermilab (FNAL), Argonne National Lab (ANL), Lawrence Berkeley Lab (LBNL), and CERN [26,27,28,29]. National labs are the hub of the HEP community and provide an environment where it is easier to reach a diverse population of participants with good infrastructure for in-person training. At the CoDaS-HEP school [30], over 50 people participated in the advanced “computing bootcamp” software training. These training events were in-person.

However, the COVID-19 pandemic necessitated a rapid adjustment to virtual platforms, which evolved throughout the course of 2020 as we gained experience. The events that we had to pivot to use a virtual environment include training on continuous integration and deployment [31, 32], Docker [33], machine learning on GPUs [34] and C++ [35] (organized together with SIDIS [36]).

To date, nearly 100 educators have taught over 1000 participants in about a dozen training events. Valuable lessons have been learned regarding in-person and virtual training. There is very clear and detailed guidance for anyone willing to host, request or organize a training while staying aligned with the approach, philosophy, and code of conduct of the HSF Training group so as to make the tools and techniques that are developed persistent, reusable, and broadly accessible [17].

While in-person events offer more opportunities for active and efficient engagement of participants and community building, they are generally more exclusive: participants need sufficient funding and extra preparation time to arrange travel to the venue. Hosts have to book specially arranged/equipped rooms with multiple projectors and screens to simultaneously show teaching materials and slides. The space constraints typically limit the number of participants to a few dozen and a long lead time is required for the logistics. Our in-person events have been managed by about five educators, which is necessary for the “hands-on” aspect to be successful. These educators also need to make a large time commitment; they cannot just present their material and leave. Virtual events have a broader reach of participant attendance that is much higher compared to in-person events and enable a considerably more equitable service to the community. Since the teaching materials are fully preserved via lesson creation and YouTube videos beforehand, an inability to attend during the scheduled time does not considerably degrade learning. Finally, these video materials are captioned to be inclusive of those with hearing impairments. Captioning videos for a week-long event (\(\sim\)$50/day) is considerably more economical than the cost of a hired sign language interpreter (\(\sim\)$1000/day).

The disadvantage of virtual events, however, is that it is difficult for educators and participants to interact closely—you just cannot recreate the in-person environment on Zoom. Educators and participants have to plan and act upon their spread across time zones in the best possible way. It is also challenging to keep everyone engaged and on the same page due to the pervasive culture of “multi-tasking” within HEP. Due to this issue, although initial registrations for these events are very high, the actual attendance is typically only 50% of those who have registered. The online experience is more prone to be distracted by other professional duties. However, it should be noted that this does not mean that there is a lesser degree of learning occurring at the training event. Tools such as Mattermost, discord, and Slack have been effectively deployed for asynchronous communication, both during and after the event.

In general, devoting full time to training is always challenging. Though there is widespread desire to engage in training, there is an institutional culture that prioritizes immediate research activity over dedicated professional development, even though the latter will lead to higher productivity in the long term.

Feedback

Feedback is required for us to evaluate if we are effectively facilitating learning and to ensure the success of future training. Every training has a pre- and post-survey to collect feedback from the participants. This includes a set of baseline questions pertaining to demographics and questions to assess the quality and method of training. These questions can be adapted to the nature and topic of each training event. In addition, we organize a “post-mortem” session among the educators to internally discuss the successes and failures. This typically occurs after completion of the results of the post- (and pre-) workshop surveys, which guide the discussion. Finally, a short presentation about the training experience is presented at the HSF Training weekly meeting and/or at the HSF all-working-groups planning meeting.

Figure 1 shows feedback on a training event involving containerization with Docker [37]: clearly the training made a difference. However, we are aware that this type of “learning evaluation” does not fully encompass the impact of our training. It only probes the perceived and self-reported learning of a skill. Instead, what is needed is a survey that is conducted sufficiently later to understand how well the learned skill is being applied in the context of research.

Fig. 1
figure 1

The self-reported pre- and post-training level of knowledge on the topic of Docker (a software container technology)

Community

The solutions to future computing challenges require a large workforce trained in a wide range of software skills. To train this workforce, we rely on an active community whose members are enthusiastic and motivated to teach. Our members include people with various roles and backgrounds in HEP, such as experimental physicists from different collaborations, as well as software engineers from different institutes. As we scale our training activities, we also have members from nuclear physics and computer science as well. Members of The Carpentries teach part of our very basic curriculum by an agreement via membership subscription through IRIS-HEP. The overall diversity of the background of the instructors and mentors adds great value to the training. Each educator brings their own flavor of experience from a different computing environment with a common goal of creating, teaching, and sustaining a common set of software skills.

As the success of our mission depends crucially on the motivation and participation of the community, we cultivate a strong sense of community ownership and pay special attention to acknowledge contributions of all kinds. We also encourage the participants in our training events to remain active or become more active, share feedback, and in particular, to sign up to be a mentor in one of the next iterations of the same training module. If former participants do not yet feel confident about their mentoring skills, we offer to match them with a more senior mentor. In the same way, we encourage mentors to become instructors or facilitators and to become more and more active in our organization. By actively engaging participants and educators throughout the training community, we can sustain and nurture a culture of intentional learning and grow our community in an organic fashion [38].

Educators not only provide an invaluable service to the HEP community, but they also get the opportunity to develop and sharpen their pedagogical skills and enhance their professional portfolio. About two-thirds of the HEP workforce eventually work outside of HEP, such as in the software industry and in data science. The training makes a meaningful difference in the preparation for such careers in terms of software knowledge and experience, and enhances the employability for both the educators and participants. The skills taught and learned, like Python, machine learning, and data analysis, align with the needs of the software industry and strengthen the job profile of a physicist to work in industry. At the same time, recognizing the importance of software skills within HEP may hopefully help to provide more incentives and clearer career paths in academia to those who want to pursue their career within HEP, or in other scientific research fields. In particular, strengthening the research software engineer career path [39, 40] could significantly help retain the expertise within the HEP community.

Sustainability

Sustainable software [41] is essential for HEP. A sustainable training program [42] is key to pursuing this goal. While continuing the existing work, it will be essential to spread the training events and training expertise geographically to keep the costs low and move to an online training model to reduce financial burdens that accompany in-person training. In parallel, it is important that as the curriculum grows, it begins to include material specifically aimed at making software sustainable.

Training should be structured so that a minimal set of people are needed for maintenance and costs per event are minimized. Growing the community is an important aspect of sustaining the workforce. Providing recognition and possible financial incentives can keep the community vibrant and motivated. The community should recognize and appreciate the broader value of our software training, which prepares a workforce to solve computing challenges that are essential to advance our field and society at large.

To lead software training across HEP and related communities over the long run, we need a core team whose main focus is to support the overall mission of HEP software training. To scale up training efforts, we need to build mentorship and leadership at the local and regional level supported by the core team. Specifically, while we have started the following set of activities, we need to scale up by:

  • Engaging more HEP labs, institutes, and universities in this endeavor.

  • Promoting equity, diversity, inclusion, and accessibility in participation across HEP communities and being mindful of under-resourced institutions in different geographical regions.

  • Establishing a mechanism to get feedback from our communities and improve the training.

  • Ensuring that our core team and volunteers are afforded opportunities to grow professionally and have career paths.

  • Exploring ways to manage a financial support model to share costs in the long term.

Broader Impacts

HSF-led training is multilayered, with a basic HEP software curriculum progressing to HEP-specific physics tools. Integrated with this is a growing outreach program that is essential to building an influx of software workforce and training young minds, catching them early in their educational development. For example, several outreach events are organized on introducing Python programming to K-12 teachers [43] under IRIS-HEP and FIRST-HEP. The teachers can turn this into a classroom experience for their students where physics, astronomy, and math courses can have problem solving components that integrate programming with Python. In outreach events, the teachers analyze and interpret physics data with Python using Google Colab [44], which allows them to work directly in the web browser without requiring any additional setup. Workshops teaching the basics of machine learning to school teachers are also organized [45]. We plan to scale this experience by partnering with other stakeholders in HEP outreach, for example, Quarknet [46], which already has a well developed network of teachers and schools taking part in HEP outreach programs.

Summary

HSF and IRIS-HEP are creating software training and ensuring sustainability of software in HEP for years to come. The training material is open source and open access, shared publicly via GitHub. This allows anyone to join the discussion and make contributions by proposing changes, thereby continuously improving the available material. This process is guided by continual feedback solicited from the participants of the training events. Finally, we have established a growing community of educators to broadly promote a culture within HEP that goes beyond valuing software skills, but also values the teaching of those skills to others. In doing so, we aim to foster a more active, inclusive, and diverse scientific community. By leading software training across HEP and related communities, we will be able to meet the challenges in the field and beyond.