Outline
We gave talks to two groups which included a general introduction to HPC, details of the actual process of using an HPC system and examples of practical cases where it can be used. One session (Group A) was held in September of 2019 and a hands-on training was offered for participants following the talk, which was held over the course of 2 weeks following a week after the talk, focusing heavily on the use of a domain-specific tool. The other session (Group B) was held in June of 2020, which was held online and the organisation of a hands-on training was not possible due to the COVID-19 pandemic.
Three different questionnaires were taken to be used for the analysis of subjects’ demography, reactions and expectations regarding supercomputing and the specific learning activities. The questionnaires were provided to participants via Google Forms. The first one, \(Q_1\), gathered demographics and established a baseline of attitude and previous knowledge and was filled out before the introductory talk. The second questionnaire, \(Q_2\), was filled right after the introductory talk, while \(Q_3\), was filled after the hands-on course by those participating in it. Questionnaires were anonymous, but participants were asked to generate a unique, 5 character long ID for us to be able to connect individuals across questionnaires. To generate the characters of the ID, the participants received the following instructions: “(1) The first letter of your birth month written in English. (2) The last digit of the day of your birth. (3) The first letter of your mother’s maiden last name. (4) The second letter of your mother’s first name. (5) The last letter of your father’s first name.” Although this ID is not guaranteed to be unique in a mathematical sense, but is highly unlikely to generate duplicates within small groups of people and participants can regenerate it anytime without the need to remember them, thus is an effective way to generate anonymous ID-s. For further details, we provide the entire dataset, along with the original questions received by participants in the Supplementary Materials.
Demography
Group A subjects were recruited from the Department of Ethology, Eötvös Loránd University, Hungary and closely associated research groups which operate in the same location. Participation in the introductory talk was semi-compulsory it was held during the usual weekly meeting of the Department of Ethology, which is not explicitly compulsory, but people are expected to attend regularly. The hands-on course was entirely voluntary and was scheduled according to the availability of volunteers over a 2-week period. Both were advertised beforehand through the Department’s e-mailing list.
In total 29 people attended the introductory talk, but only 25 filled out both \(Q_1\) and \(Q_2\), so only these 25 were kept for later analysis (16 women, 9 men; 14 Ph.D. students, 6 postdocs and 5 senior researchers; age: 35.28 ± SD 8.49).
In total 5 people attended the hands-on course, but one of the participants did not attend the introductory talk, and was thus excluded (2 women, 2 men; 2 Ph.D. students, 2 postdocs; age: 31.00 ± SD 3.92).
Group B subjects were recruited from the University of León, Spain mainly from the faculties of Veterinary Sciences, Biological and Environmental Sciences, and Economics. Participants were reached through snowball emails and participation was voluntary and was completely online. As mentioned before, due to COVID-19, it was not possible to hold a hands-on training.
In total 26 people attended the introductory talk, but only 19 filled out both \(Q_1\) and \(Q_2\), so only these 19 were kept for later analysis (10 women, 9 men; 1 student, 7 PhD students, 1 postdoc and 10 senior researchers; age: 39.10 ± SD 11.45).
Background of participants
Participants’ background was assessed in \(Q_1\) with self-evaluation questions and a question “Describe what supercomputing (high-performance computing) is.” which was later scored by 5 experts (researchers at the University of León, Spain, with considerable experience with HPC) on a 1 to 4 scale, to quantify how accurate the response was (the median score of the 6 experts were used in further analysis). There is reasonable correlation (Spearman’s \(r = 0.61, p < 0.001\)) between this score and participants’ confidence level in using supercomputing. See Fig. 1 for questions and the scoring. The questions could be answered on an ordinal scale from 1 (lowest skill or confidence) to 4 (highest skill or confidence), thus we only report the median values.
Participants reported good (median of 3) computer skills, but little knowledge in programming (median of 3) and no knowledge in supercomputing (median of 1).
Overall, participants’ level of expertise is very low with regards to supercomputing, and based on their self-assessments, their programming skills are also low, but their ability to handle computers is adequate on average, thus they fit the scope of the current study.
Training evaluation: Kirkpatrick’s four levels of evaluation
Evaluation is the key to know the impact of learning programs, courses, trainings, etc. Workplace learning opportunities can have a dramatic impact on business performance by changes in specific knowledge, skills, attitudes or behaviours of employees. In higher education, this can be described as academics’ growing efficiency in teaching and research performance. These learning opportunities have the potential to be really transformative for the effectiveness, especially when they are based on evidence and data-driven decisions. Training evaluation is the process of information and data collecting systematically which is planned along with the training plan, based on the planning objectives and goals the organisation wanted to obtain [13].
Kirkpatrick’s model [13, 14] is one of the most well-known learning evaluation model which is implemented well in practice. Its four levels are:
-
1.
reaction
-
2.
learning
-
3.
behaviour
-
4.
results
Reaction level is focused on the participants’ thoughts, feelings, and satisfaction about the training. It describes whether the information, the process of knowledge sharing was effective and appreciated. It helps to improve the training, to identify topics and areas that are missing from the training, and its perceived value and transferability to the workplace. It is captured by surveys following the training.
Learning level is focused on what the participants have learned, i.e. the resulting increase in knowledge or capability. It is captured by assessments or tests before and after the training to describe a difference.
Behaviour level is focused on how the participants change their behaviour based on the training received, so the main focus is training effectiveness rather than training evaluation. Evaluation of implementation and application is vital and challenging at the same time. It is captured by surveying learners after the training, when they have returned to their work. We have to empathise that behaviour can only change if the conditions are favourable, supporting is reachable from the organisation, and encouraged by leaders.
Results level describes the final results of training, and is focused on the outcomes that the organisation has determined to be good for the business (teaching and/or research), and good for the participants. It can only really be measured by looking at business data (or other measures of relevant output, e.g. number and quality of research papers) relating to the training.
In Table 1, we show the above levels correspond to the setting and aims of our learning activities.
Table 1 Kirkpatrick’s four levels of evaluation model applied to study goals [13] HPC environment
An HPC environment has three basic components: (1) an HPC facility, (2) a resource manager to manage the accesses to the HPC facility, and (3) one or more parallel frameworks to work with.
For this work, researchers were introduced to a specific HPC environment described below. During the introductory talk this facility was used as an example, and during the hands-on course, participants were granted access to this facility.
Caléndula is the cluster of Supercomputación Castilla y León (SCAyLE). SCAyLE has several calculation clusters with different computer technology architectures. Participants accessed a cluster dedicated to teaching [1].
Calendula uses SLURM for resource management. It is a free and open-source job scheduler for Linux and Unix-like kernels [25]. It is used by many of the world’s supercomputers and computer clusters. Slurm provides three basic services. First, it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. It also provides a framework for starting, executing, and monitoring work (typically a parallel job such as MPI) on a set of allocated nodes. Finally it arbitrates contention for resources by managing a queue of pending jobs. Slurm uses a best fit algorithm in order to optimise locality of task assignments on parallel computers [18].
Domain-specific tool for hands-on training
The hands-on training with only for Group A due to COVID-19, thus the domain specific tool was aimed at ethologists.
Ethology studies animal behaviour by observing the animal behaviour in various contexts and coding the observed behaviour according to the relevant study questions. In the beginning, this was done in situ during observation, but in modern times the typical routine is to make video recordings of the behaviour which is analysed later to get quantitative results (examples with various taxa are, e.g. experiments with dogs [3], capuchin monkeys [4], cleaner fish [6], and zebra finches [15]). The possibility of recordings opened the possibility for obtaining a wealth of data, but due to lack of tools, analysis is mostly done with human effort.
In order to ease the burden on human analysts, the application LabDogTracker has been developed to track the movement of dogs and humans within the lab of the Department of Ethology, as these are the two most common subjects at the department. Until this study, none of the staff have actually seen or used it before (except for the developer, who is also a co-author of this paper). The application relies on using a pre-trained neural network to find the location of dogs on the images of five cameras mounted on the ceiling of the lab, with multiple cameras’ field of view covering any given area in the lab. The coordinates measured on the images are then mapped to the physical space of the lab and later merged into paths. The paths are exported into text files, which can be later used to answer simple ethological questions. For example, many experiments require an answer to question such as: how much time did the dog spend around their owner, or how much time the dog spent in a specific place, or how fast the dog moved around.
The computationally most expensive part of the application is the video analysis. Using a PC with an average GPU analysing a 6 min long experiment requires time on the order of a full day. For a typical study at the Department of Ethology there will be around 30–50 measurements, which can easily be longer than 6 min, thus analysing an entire study on one’s own PC could take several months. As such running the LabDogTracker in an HPC environment would be highly useful for the researchers.
Participants’ reaction to the learning activities
Some questions of \(Q_2\) and \(Q_3\) were aimed at evaluating the learning activities themselves, in order to control for the quality of the talk and hands-on training in our results. The evaluation was based on level 1 of Kirkpatrick’s model, the reaction. See Figs. 2 and 3 for the list of questions and results for both learning activities. The questions could be answered on an ordinal scale from 1 (strongly disagree) to 4 (strongly agree), thus we only report the median values.
For both groups, the majority of the medians of the answers were rated 4 and the lowest median was 3. On any of the questions regarding reactions, only 1 person replied with a score of one once. We used Mann–Whitney-U tests to check for differences between the two groups. The only difference we found was that for the question “The presenter was responsive to the participants” Group B, where the talk was delivered online reported on average 0.42 lower score (\({p} < 0.001\)), which is most likely due to the online nature of the talk.
Overall, the participants’ were happy with the learning activities.