1 Introduction

Name that Neutrino [1] is a citizen science project that seeks input from the public to aid in classification of neutrino events for the IceCube Neutrino Observatory. Citizen science provides a powerful tool for advancing both science and public engagement. Motivated volunteers learn more about cutting-edge research and then perform analyses based on visual scans of data. The ultimate goal is to design, develop, and implement an online experience that allows novices to contribute to ongoing research that may lead to new insights. This paper gives a summary of IceCube’s promising first attempt to engage people outside the collaboration to analyze data using the citizen science approach.

Name that Neutrino is hosted on Zooniverse [2], the largest web-based research platform of its kind with 2.7 million volunteers world-wide. Zooniverse has established the power of the citizen science approach. For example, Galaxy Zoo, a Zooniverse project that works on galaxy shape classification, has had more than 10,000 volunteers, resulting in over 60 publications [3]. Their work, much like IceCube’s discussed here, is related to pattern recognition and capitalizes on the keen ability of humans to see things that currently remain difficult to identify with computers, even with advances in machine-learning algorithms. However, Name that Neutrino is one of the few projects on Zooniverse to include videos rather than static images.

Identifying a research question and developing and implementing the tools needed for Name that Neutrino has been a long process. The first attempt at an IceCube citizen science project began in June 2016 as part of a six-week program for high school students [4]. This group produced much of the background material to introduce the IceCube project to the citizen users but the effort was limited to displaying data in static images at that time. As described in more detail in the next section, the IceCube Neutrino Observatory is a cubic kilometer array of 5160 light sensors (Digital Optical Modules or DOMs) embedded in the South Pole ice. From the amount and time sequence of recorded light, the energy and direction of the incident particle can be reconstructed. A video of the time sequence of the data for each event is much more informative, especially for novices.

The ability to include videos was eventually implemented by Zooniverse, and we decided to use this opportunity to compare people classifying IceCube events to current state-of-the-art machine-learning algorithms. A formal Zooniverse launch was completed in 2023 which included a rigorous approval process in order to be featured to the Zooniverse community. Videos for Name that Neutrino were produced from Monte Carlo simulations of trigger-level IceCube data which included significant noise and many ambiguous events. Trigger-level refers to the collected data before any quality cuts or noise reduction techniques have been applied. The primary motivation for using trigger-level data was to inspect and compare the performance of both citizen users and a deep neural network (DNN) machine-learning algorithm at the most challenging level.

Fig. 1
figure 1

The IceCube neutrino observatory

2 IceCube neutrino observatory

The IceCube Neutrino Observatory [5], located at the geographic South Pole, has a multifaceted and growing science scope. Research topics include astrophysics, particle and fundamental physics, glaciology, and more, with exciting results identifying the first high-energy neutrino sources [6,7,8], leading limits on certain classes of dark matter candidates [9] and measurements of neutrino oscillations at energies beyond those reachable in dedicated reactor and accelerator experiments [10]. IceCube consists of a hexagonal array of DOMs that instrument a cubic kilometer of ice at depths between 1.5 and 2.5 kms below the surface as shown in Fig. 1.

When a neutrino interacts with the rock below or the ice near or within the IceCube array, the resulting high-energy secondary particles emit Cherenkov light, some of which is detected by the DOMs. The light pattern (or topology) will depend on the neutrino characteristics (flavor, direction, and energy) and the type of interaction [11]. In the work presented here, IceCube cannot distinguish between neutrinos and anti-neutrinos. Since there are three neutrino flavors, and two channels of neutrino interaction relevant to this work, there are in principle six different options. In practice most of the recorded events fall into two broad categories depending on the neutrino flavor and how it interacted—either a charged or neutral current interaction. The two main types of topologies seen in IceCube are tracks and cascades. Tracks are produced by muons originating from muon neutrino charged current interactions or cosmic-ray induced air showers. Muons with sufficient energy can propagate through large distances in the ice resulting in a linear light pattern. Cascades are produced by particle showers induced by all neutral current neutrino interactions, as well as charged current neutrino interactions from electron- and tau neutrinos. These particle showers evolve over distances of approximately 10 ms in the ice, resulting in a roughly spherical, outward-going light pattern.

Unfortunately, at least for those only interested in neutrinos, there is an overwhelming background—a steady rain of muons produced by cosmic-ray interactions above the detector in the Earth’s atmosphere from the southern hemisphere. Cosmic-ray interactions in the atmosphere also produce neutrinos, which are identified at the rate of about one per million background events. Roughly one in a few hundred of the neutrinos which interact in the detector is from an astrophysical source rather than a cosmic-ray interaction in the Earth’s atmosphere. IceCube records about 3000 events per second, almost entirely background events from cosmic-ray induced muon tracks [12].

Fig. 2
figure 2

Examples of the five signal topologies used in this work. The color indicates the arrival time at individual DOMs with red happening first, then yellow, green, and blue last. The size of each bubble is related to the light detected by the DOM

IceCube data can be separated into the two broad categories, tracks or cascades. The track events can be further subdivided into groups as shown in Fig. 2. Each colored “bubble” in these event displays represents a DOM that detected light. The size of each bubble is related to the total Cherenkov light detected. The colors indicate the relative time the light was recorded, with red earlier and blue later. Through-going tracks start and end outside of the detector and therefore the bubbles transverse the volume. Starting tracks begin inside the detector and move outward, identified by redder bubbles on the interior and bluer bubbles toward the edge. Starting tracks and cascades are some of the most interesting topologies for IceCube since they are only produced by neutrino signal events. However, classification is further complicated since starting tracks are usually accompanied by a cascade from the hadronic part of the charged current interaction and starting tracks turn into cascades as the track length approaches zero. Stopping tracks start outside of the detector and stop inside, with redder bubbles near the edge and bluer bubbles on the interior. Skimming tracks are events where nearly all of the energy loss is outside of the detector volume; the detected light will be near the outer volume of the detector, making it difficult to reconstruct the energy and direction of the incident particle.

IceCube has developed a DNN to classify events that returns a probability of an event being one of the five event topologies: a cascade, or a through-going, starting, stopping, or skimming track. In general, DNNs are a class of machine-learning algorithms that attempt to mimic the brain by learning from known examples. The specific architecture of the DNN used in this work is described in [13], and it was inspired by Google’s InceptionResnet architecture [14]. Each event is interpreted as a 4D image with three spatial detector dimensions and one feature dimension that contains information about the time and charge recorded at each DOM.

Training samples for DNNs usually have a majority of examples that are clear enough for the algorithm to learn. For this particular DNN, the training used Monte Carlo events with cuts applied to the trigger-level data that suppressed noise, background, and ambiguous events. The Monte Carlo truth value is used to provide the DNN with the correct characterization for the training samples; however, the Monte Carlo truth value for trigger-level events in IceCube is complex. Trigger-level events may contain two or more separate signals, and therefore the Monte Carlo truth is not single valued. Furthermore, the aforementioned event topologies have been found by IceCube to be useful classes in later stages of our data processing pipeline. They are based on knowledge about the physical processes, but they also have some ambiguity (e.g., how close an event has to be to the detector boundary to count as skimming). On trigger level, these categories may not be optimal, and it was thus a goal of this project to compare the intuition of citizen users against the DNN.

3 Name that Neutrino

Name that Neutrino is available in English, Spanish, and German to anyone in the world with internet access. After selecting “classify,” first time visitors are prompted with a brief tutorial on how to perform the task, and a field guide with frequently asked questions. Once the tutorial is completed, users have access to a random event from the sample of 4273 videos. The “classify” section shown to users can be seen in Fig. 3. They are able to replay the 7-second video as many times as desired and adjust the playback speed. Then, the users must choose from one of the topologies described previously and shown in Fig. 2.

The 4273 simulated trigger-level events chosen for Name that Neutrino were randomly selected to produce a uniform distribution in the log of the energy—the number of DOMs detecting light scales with energy. With trigger-level events, it is expected to have events that are difficult to classify, especially events with energies close to the detector threshold. An artificial enhancement of electron neutrino events was also implemented to ensure there were a variety of topologies. The enhancement was needed due to the lower trigger rate of electron neutrinos compared to much longer-lived muons. Cascades produced by electron neutrino interactions cannot travel far and therefore the electron neutrino must interact inside of the detector. Muon neutrinos, on the other hand, can interact far from the detector producing long-lived muons that make it inside the detector.

Videos (10 frames per second) were produced with the IceCube event display software Steamshovel [15] showing 3D visualizations of each event with the detector rotating by 5.3 degrees per second. The choices for the videos were chosen to provide a variety of viewpoints with adequate video quality after compression to fit within the Zooniverse file limit of 1 MB per event file. The videos were uploaded as a subject set to work with a uniquely designed Zooniverse classification workflow [16].

The Zooniverse approval process requires reviews and beta tests; Name that Neutrino completed one internal Zooniverse review and two beta tests that provided feedback from citizen users who suggested improvements and assessed the feasibility of the project. Zooniverse approved and officially launched Name that Neutrino in March 2023; after 3 months each of the videos were classified 15 times (by 15 different users) resulting in 64,095 classifications. The video repetition is standard practice for Zooniverse projects and is called the retirement limit. When an individual user is classifying events for Name that Neutrino, they will be shown a randomly chosen video that has not yet met the retirement limit and was not previously seen by that user. In June 2023, the retirement limit was increased to 20 and all of the videos were then classified 5 more times. By September 2023, the new retirement limit of 20 was reached for all videos, resulting in 85,460 classifications. As of December 2023, there are over 128,000 classifications and over 1800 registered volunteers for Name that Neutrino who continue to work toward the higher retirement limits.

Fig. 3
figure 3

The “classify” section of the Name that Neutrino Zooniverse project. Users are prompted to complete a brief tutorial on their first visit and presented with a field guide with FAQs

4 Results

After the classification of all 4273 events, a maximum score is calculated for each event. The maximum score for either the Zooniverse citizen users or for the DNN is the probability value of the most-likely category. For the users, the maximum score is the vote fraction for the category that received the largest number of votes. For example, if for a specific video cascade receives 12 votes, then the vote fraction would be 0.6 (12 divided by 20 total votes) and cascade would be the maximum category. This is a proxy for confidence as it shows the level of agreement between users. A maximum score of 0.2 is the minimum possible value and indicates no preference for any classification. A maximum score of 1 means that every vote was for the same category. For the DNN, the maximum score is defined to be the highest probability among the five categories provided by the DNN.

The distribution of maximum scores is shown in Fig. 4 for a retirement limit of 20 classifications per event for users (left), and for the DNN (right). For users, the distribution is broadly peaked around 0.53 with the majority of events achieving a maximum score less than 0.5. The low user scores signify disagreements between users and are likely related to the difficulty of classifying trigger-level events. There are no events with a maximum score below 0.25, indicating that there is always at least a weak preference for a classification. The distribution of DNN maximum scores peaks sharply at 1, indicating a high confidence in the classification. It is important to point out that high confidence does not necessarily correlate with accuracy. For the work shown here, the DNN was applied to data with more information than the original training set, and could preferentially select the same incorrect classification for events outside its original scope.

Fig. 4
figure 4

The distribution of maximum scores for the citizen science users (left) and the DNN machine-learning algorithm (right)

The confusion matrix shown in Fig. 5 compares the maximum categories chosen by the DNN to the maximum categories chosen by the citizen science users. To better compare the broad, low-confidence user distribution with the sharply peaked, high-confidence DNN results, we applied a cut to remove user data with a maximum score of below 0.55. The value of 0.55 was chosen because this corresponds to agreement on more than half of the votes (11 votes for the same category out of 20 total votes). The confusion matrix counts the number of events in each category chosen by users compared to and normalized to the DNN maximum category. The columns display events in the associated DNN maximum categories and the rows include events in the corresponding user maximum categories. The diagonal values represent agreement between the DNN and the users, with the largest agreement occurring with cascades at 92.1%. The disagreement between the users and the DNN is represented by off-diagonal values. For example, in the top right corner of Fig. 5, there were 25 events (30.1%) where the users chose skimming out of the 83 total stopping tracks as classified by the DNN. Similarly, users chose another 35 of those 83 events (42.2%) to be through-going tracks instead of stopping tracks. Some of the confusion shown in Fig. 5 comes from differences in the training methods for the users and the DNN. The DNN was trained on around 13 million events with concrete definitions for each classification, while the users were given a qualitative explanation of each classification and one example image per classification. Finally, though this confusion matrix is normalized to the DNN maximum category, this does not imply that is the correct characterization of the event. The matrix simply shows levels of agreement between users and the DNN results.

Fig. 5
figure 5

A confusion matrix showing the agreement along the diagonal between the user and DNN maximum categories, normalized to the maximum DNN category. Only user events with a maximum score of 0.55 and above are included

Since the events are trigger level, “expert-by-eye” values were produced by members of the Name that Neutrino team as a substitute for the Monte Carlo truth. Specifically, we explored the off-diagonal starting, stopping, and through-going track events where users and the DNN disagreed on the classification. The comparison of the user and DNN classifications with the “expert-by-eye” classifications are shown in the left and right sides of Fig. 6, respectively. Again, diagonal values represent agreement and off-diagonal values represent disagreement, but now in comparison to the “expert-by-eye.” Note the columns do not sum to 100% since it is still possible for selected events to be classified as cascades and skimming tracks. Future work will explore these scenarios.

Though the number of events has been significantly reduced, there are some large, noteworthy differences between the performance of the users and the DNN. For through-going tracks, users agreed with the expert category 78.5% of the time compared to 19.0% for the DNN. This could be an indication that the users are better at identifying through-going tracks or that users were more inclined to pick through-going track because they took that option much more often than other categories. Alternatively, the DNN agreed with the expert category more often for the starting and stopping tracks than the users. The users may have had difficulty with this if they had trouble identifying the exact location for the edges of the detector. More work is needed to understand the results shown in Fig. 6.

Fig. 6
figure 6

Confusion matrices showing both user (left) and DNN (right) results normalized to the “expert-by-eye” value for through-going tracks, starting tracks, and stopping tracks. Only the off diagonal values from Fig. 5 were used in order to explore disagreeing scenarios. Note that it is possible for these events to be classified as cascades or skimming tracks

5 Conclusion and future work

After seven months of collecting data, the Name that Neutrino project has provided insights into data classification for both citizen users and an IceCube specific DNN machine-learning algorithm. There was more agreement between the users and the DNN for events classified as cascades and less consistency for events classified as starting, stopping, or through-going tracks. The comparison of user and “expert-by-eye” classifications indicates that additional training is needed for the users. More work should be done in order to fully understand the DNN results compared to the “expert-by-eye” classifications. This initial study has demonstrated the feasibility of using the citizen science approach to classify IceCube data, but that more work is needed to establish the validity of the user results.

Future improvements to Name that Neutrino should include increasing the rotation and sharpening the edges of the detector to help improve the identification of starting and stopping tracks by the users. Since the task of classifying events at the trigger-level was rather challenging, future iterations of Name that Neutrino could include cleaner events where it would be possible to compare the Monte Carlo truth values to the user results. Or alternatively, more work could be done to optimize the Monte Carlo truth values at trigger-level in order to compare to the results presented here. Future citizen science projects could involve the use of real data rather than simulation data to separate coincident events, identify new classes, or search for possible biases in DNN performance that come from the differences between simulation and data.

Name that Neutrino demonstrated that citizen science is a powerful tool for public engagement for IceCube. It was not clear that there would be interest in looking at the abstract data, especially compared to more readily identifiable astronomical optical telescope images. Engaging with over 1800 members of the general public with the IceCube project through 128,000 classifications and over 600 discussion board posts certainly counts as a successful start.