1 Introduction

The Martian surface is carved by dendritic valley networks indicative of sustained widespread flowing liquid water in the planet’s past (e.g. Carr 1996, 2012). Most are thought to have formed during the Noachian period ~ 3.7 billion years ago, suggesting planetary surface conditions differed considerably from the present (e.g. Carr 1995; Hynek et al. 2010). The well-preserved morphology, planform and orientation of these valleys make them ideal relics of the climate and topography at their time of formation. Furthermore, their distribution provides an insight into the ancient global climate (Wordsworth et al. 2015). Previous global maps of Martian valley networks have allowed for a great deal of information to be derived from these valleys (Hynek et al. 2010). Hynek et al. (2010) identified valleys using thermal emission imaging system (THEMIS) daytime and night-time infra-red (IR) images, 100 m per pixel. However, the recent availability of higher spatial resolution images (HRSC—high resolution stereo camera—image; 15–25 m per pixel) at near global coverage (96%) has allowed many more valleys to be identified (Bahia et al. 2018). Bahia et al. (2018), identified ~ 45% more valley length than the previous valley map between 90° S–90° N and 20° W–20° E using these images, indicating a similar increase may be seen in other areas of the Martian surface.

The identification of valley networks using high resolution images is very time-consuming process, with it taking Bahia et al. (2018) 2 years to complete their latitudinal strip. Global coverage would take much longer. Similar issues arise with the identification of craters. Their great abundance on planetary surfaces such as the Moon and Mars require a significant amount of time to be identified. Citizen science projects aim to solve this issue.

Citizen science projects, such as the now archived Moon Zoo (Joy et al. 2011; Bugiolacchi et al. 2016), utilise volunteers from the public to aid in scientific endeavours. Specifically, the Moon Zoo citizen science project utilised internet crowd-sourcing. The users were asked to examine high-resolution images from NASA’s Lunar Reconnaissance Orbiter Camera (LROC), and identify impact crater whilst measuring their sizes and morphological ‘features of interest’ (Bugiolacchi et al. 2016). This project was available as a website, making it easily accessible for the public all over the world; theoretically, this meant many images could be repeatedly analysed by a number of users over a short duration of time. If a similar website was designed for the analysis of valley networks on Mars, the thousands of HRSC images could be analysed in several weeks, rather than years—allowing for scientific advancement at a much quicker rate. For this to be valid, the public must be able to identify valleys reliably. Although, craters are arguably simpler features to identify than valleys, errors within their identification still arise (Tar et al. 2017).

The Moon Zoo crater data (Bugiolacchi et al. 2016) contains many false positives—indistinct features (e.g. shadows, hills etc.) that have been annotated mistakenly. Such false positives may lead to over estimates of valid crater counts. False negatives, valid craters that are not identified, may also occur. Based on the assumption that numerous participants are less likely to erroneously annotate the same features (Robbins et al. 2014), applying a limit on the number of participants who identify the same crater is an approach to limiting false positives. This approach, however, risks the removal of valid craters which are not repeatedly annotated. To avoid this risk, a pattern recognition system could be trained to aid in determination post-annotation. At present, such pattern recognition systems are not effective for the identification of Martian valley networks, with those previously applied identifying far fewer than those visibly identified in images (Luo and Stepinski 2009; Hynek et al. 2010). Similar false positives and negatives may occur with the identification of Martian valley networks.

This study examines the level of expertise required to effectively identify Martian valley networks; in doing so, this study will help determine whether a citizen science project, such as that conducted by Moon Zoo, in the identification of Martian valley networks is worth pursuing. Citizen science is often considered from the perspective of benefiting scientists; however, the most salient aspect, often overlooked, is: how does citizen science impact the citizens? This project, therefore, involves elements of pedagogy: deciding on the most effective methods of both teaching and sparking scientific interest in members of the public.

2 Methodology and Study Area

In this study we are not only interested in how effectively one can identify Martian valley networks in HRSC images, but also how consistent the measurable parameters are of these valleys. Valley lengths (e.g. Caldarelli et al. 2004; Som et al. 2009; Caprarelli and Wang 2012; Bahia et al. 2019) and the inferred palaeoslope direction based on valley orientation (Bahia et al. 2018) are of importance for hydrological and geomorphological analysis of Martian valley networks. To test how effectively one identifies Martian valley networks, a test area was chosen within a 150 km × 175 km area east of Vogel Crater (Fig. 1). This area was chosen as there are valleys of varying sizes (Hynek et al. 2010; Bahia et al. 2018), tributaries and orientations present. Tributaries are particularly significant for this study as they are utilised to infer valley orientation, and hence, palaeoslope direction.

Fig. 1
figure 1

Valley mapping test area, east of Vogel Crater (36.1° S, 10.2° W). HRSC images: h8599_0000, h8592_0000 and h2643_0009 (west to east)

2.1 Participant Group’s Level of Expertise

To assess the effects of expertise on the consistency and efficacy of valley network identification, we used participants of varying levels of involvement in scientific research. These are categories by three levels of expertise: low, medium and high. The low group consists of 22 physics a-level students (high school students aged 16–18, with little to no background in geology or geography) and 2 high school teachers who volunteered to take part in the research. These participants are representative of the public with an interest in scientific research, i.e. those who are likely to take part in citizen science projects, but are unlikely to have any prior experience in fluvial geomorphology or valley identification. The college students were invited to an open day at the University of Manchester, in which they were given a general overview talk that lasted an hour on the topic of Martian Science (lead by Bahia). Bahia is a research associate at the University of Manchester with a PhD in Planetary Science, which focused on utilising Martian valley networks to make inferences about the ancient Martian climate and topography. This talk was followed by the practical session in which the mapping study took place. The Medium Group is made up of 45 Undergraduate Geologists with a moderate level of experience in fluvial geomorphology, but little/no prior involvement in orbital imagery identification of valley networks. The High Group contains 1 Martian fluvial geomorphologist (Bahia)—who has over 4 years of experience in fluvial geomorphology and valley identification.

2.2 Valley Mapping and Length, Palaeoslope and Drainage Density Calculations

The groups identified the valleys using ArcMap 10.2.1. Both the Low and Medium groups had no experience using this software. They were, therefore, given a 40-min lesson on how to use the software to perform the task—analogous to a tutorial one would expect to receive prior to participation in a citizen science project. The participants were given similar determining characteristics to those of previous Martian valley mapping studies (Carr 1995; Scott et al. 1995; Hynek et al. 2010), i.e. valleys are “sublinear, erosional channels that form branching networks, slightly increasing in size downstream and dividing into smaller branches upslope” (e.g. Fig. 2). Based on this definition, participants were asked to map as many valleys as they believed were present within the imagery, manually tracing them as vector-based lines from upslope to downslope (inferred from valley orientation) using the polyline function within ArcMap.

Fig. 2
figure 2

Nirgal Vallis (27.4° S, 44.4° W), a prime example of a Martian valley network displaying a sublinear profile, with branching tributaries joining a larger main channel (HRSC image h6442_0000)

For the mapped polylines, valley length and palaeoslope direction were calculated. The lengths were determined using a combination of ArcMap tools. Polylines were created along the valleys using the Editor Tool, the lengths of which were calculated using the Calculate Geometry function. Participants were asked to trace polylines in the downslope direction, i.e. trace from upslope to downslope. To determine whether this was consistent between repeats the palaeoslope direction of these valleys were found using the ArcMap Spatial Analyst—Linear Direction Mean function. This converts the polylines into compass directions representative of the direction in which they were traced (e.g. Fig. 3).

Fig. 3
figure 3

Example of paleoslope direction (vector arrows overlain on networks) derived from valleys (blue lines) produced using the ArcMap spatial analyst—linear direction mean function

2.2.1 Comparative Data Sets: Hynek et al. (2010)

To make a citizen science project worth pursuing it must improve on the data-sets currently available. We, therefore, compare the results of the previous global map of Mars using images of ~ 100 m per pixel in spatial resolution (Hynek et al. 2010), with the average values of valley length of each expertise group within the study area. The same technique was used to acquire valley length for the Hynek et al. (2010) dataset as in Sect. 2.2.

2.3 Repeatability and Reproducibility Test

To assess the identification abilities and consistency of candidates within the different groups we conducted a repeatability and reproducibility test. To avoid being influenced by each other’s results, the participants were asked to not confer or compare.

2.3.1 Repeatability Test

The repeatability test involves an individual identifying all the valley networks within the area of interest that they believe are present. They then return and repeat this activity after a period in which they will have forgotten where they previously identified the valleys. By comparing the individual’s initial and repeat results we determined the consistency of individuals within each group. Due to logistical reasons, the period between repeats varied for each group. For the Low Group the repeat was immediate. However, the second iteration was performed on a mirrored image, effectively making it appear like a different study area. For the Medium Group there was a gap of 10 days between repeats and for the High Group a gap of 6 months. This large gap for the High Group was chosen due to their memory retention time of the study area.

For each individual’s initial and repeated data sets, we compared the mapped polylines and matched those that were representative of the same “valley”. The lengths of these matched polylines and the orientation in which they were traced were then compared.

2.3.2 Reproducibility Test

The reproducibility test involved comparing the results of individuals within the respective expertise groups. This was only performed for the Low and Medium groups, as there was only one candidate in the High Group. As each group performed the mapping task twice, this resulted in two reproducibility tests for each group. We calculated the number of polylines repeatedly identified by individuals within these groups, and in doing so, determined the consistency in valley mapping for certain levels of expertise. Polylines were classed as repeatedly identified as long as 2 or more individuals traced the same polyline. False positives, i.e. polylines that have been traced but do not correctly identify a valley, were identified after the mapping task had been completed. This was done by producing cross-sections of each reproduced polyline to determine whether a trough was present, indicative of a valley. This will determine whether a repeated tracing of a valley validates its identification. Additionally, we compare the cumulative reproduced valley data sets for each group with that of the High Group to determine potential false negatives, i.e. valleys that are present but have not been identified by the Low or Medium group.

2.4 Pedagogical Analysis and Effective Scientific Outreach

To assess how to most effectively deliver scientific information and engage the public in research, we asked participants in the Low Group to fill out an evaluation form. This form asked for information about what they found most engaging about the day, which teaching methods they found effective for learning and which aspect(s) furthered their interest in involvement in scientific research. Additionally, we asked their teachers: (1) what aspect(s) of the day had the most impact on the pupils who took part? (2) Did you see any change in the students’ views about science in general? (3) From the students who decided to apply to the University of Manchester, what did they say made them want to apply? (4) From a pedagogy standpoint, what do you think worked and did not work about the day?

3 Results

In this section, sketched features will be referred to as ‘polylines’, as opposed to ‘valleys’, as sketched features may not necessarily represent valid valleys.

3.1 Repeatability Test Results

The repeatability test results are summarised in Fig. 4.

Fig. 4
figure 4

Summary of repeatability test results for all participant groups: a is a graph of the average number of polylines mapped by each group for each of their repeats; b shows the average total length of the polylines traced in each iteration; c shows the average proportion of polylines (by number) repeated traced by each group; d shows the average difference in the total length of polylines mapped between repeats for the participants; e is a graph of the average difference in vector directions of repeatedly traced polylines; f shows the average proportion of repeatedly sketched polylines that were traced in the opposite direction (e.g. left to right as opposed to right to left) for each group

Figure 4a shows that on average the Low and Medium group consistently traced the same number of polylines between repeats (~ 30), with comparative total valley lengths (Fig. 4b). However, the consistency between repetitions is greater in the Medium group, with individuals on average repeatedly tracing a greater proportion of polylines than the Low Group (Fig. 4c, d). Furthermore, the consistency in the orientation within which repeated polylines were traced was greater for the Medium Group than the Low Group (Fig. 4e, f).

The High Group mapped a consistent number of polylines between repeats (Fig. 4a), tracing a greater number and length of polylines than the Low and Medium Group (Fig. 4b). Furthermore, all repeatedly mapped polylines were consistently sketched in the same direction (Fig. 4f), with little variation in the vector direction (Fig. 4e). Although only 60% of the polylines were matched between repetitions (Fig. 4c), the difference in total length between repetitions was only 26.4 km (Fig. 4d); this was due to the polylines that were not repeated having small lengths.

3.2 Reproducibility Test Results

The reproducibility test results are summarised in Fig. 5.

Fig. 5
figure 5

Summary of reproducibility test results for low and medium participant groups, and the results for this study area for the high group and Hynek et al. (2010)

Figure 5 shows that the participants of the Medium Group reproduced a greater number of polylines on their first attempt than the Low Group’s first and second attempts. Additionally, the Medium Group’s first attempt resulted in the largest total valley length of reproduced polylines. This greatly contrasted with their second attempt which resulted in a similar number of reproduced polylines as the Low Group’s first attempt, but a much lower total length.

Although the Medium Group’s first attempt reproduced the greatest number of polylines and the greatest total length, the validity of these polylines was low; with 80 of the 105 polylines being false positives, resulting in only 35% of polylines being valid and a total length of 708.7 km. This was lower than the Low Group’s first attempt which resulted in a greater number of valid polylines, with 65 of the 96 traced polylines being false positives; this resulted in 44% of polylines being valid and total length of 767.6 km. Additionally, the number of validated valleys that were missed (i.e. false negatives), determined from comparisons with the averaged results from the tracings of the High Groups two iterations, was consistent for both the Low and Medium Group.

Figure 6 shows the locations of the reproduced polylines for the different groups. For comparison the High Group’s repeatability test data is also shown (Fig. 6e).

Fig. 6
figure 6

ad Are the results of the reproducibility test: a shows the results of the low Group’s first reproducibility test; b the low Group’s second reproducibility test results; c the medium Group’s first reproducibility test results; d the Medium Group’s second reproducibility test results. e Shows the high Group’s repeatability results. HRSC images (west to east): h8599_0000, h8592_0000 and h2643_0009

3.2.1 Comparative Data Set Results

Hynek et al. (2010) identified a total of 20 valleys within the study area, with a cumulative length of 546.4 km. Figure 7 shows that these valleys are generally long, with the smallest identified having a length of ~ 3.5 km.

Fig. 7
figure 7

Valleys mapped within the study area by Hynek et al. (2010). HRSC images (west to east): h8599_0000, h8592_0000 and h2643_0009

The cumulative length of valley networks identified by Hynek et al. (2010) within the study area is lower than that of the validated results from the reproducibility test for both the Low and Medium group. Additionally, it is less than half of that identified by the High group. Although the majority of valley identified in this study that are not identified by Hynek et al. (2010) are small (< 3.5 km), a major valley (~ 100 km in length) has also been missed (as highlighted in Fig. 7).

3.2.2 False Positives: Features Misidentified as Valleys

The vast proportion of reproduced polylines for both the Low and Medium group were false positives. False positives were generally present due to the misidentification of several surface features (Fig. 8): cliff/hill ranges (Fig. 9a)—some of which cast shadows; scree slopes (Fig. 9b); slumped crater ridges (Fig. 9c); and valley ridges (Fig. 9d). A few other features were misidentified, such as areas with material albedo differences and splitting slope streaks (< 1%)—but these were rare.

Fig. 8
figure 8

Graph showing the proportion of total false positives by surface feature for the Low and Medium group

Fig. 9
figure 9

Images of features misidentified as valleys. a Shows how hills/cliffs were traced. b Shows that the areas of albedo differences associated with scree-slopes were often traced. c The slumped areas of crater rims were often misidentified as valley networks. d Both the centre of the valley and the ridge of the valley were often traced. HRSC images: h8599_0000, h8592_0000 and h2643_0009

There are minor differences in the causes of generating false positives for the two groups. Notably the Low group less frequently misidentified cliff/hill ranges as valleys; whereas, the Medium group less frequently identify slumped crater rims as valleys. Both groups equally misidentified valley ridges as separate valleys to the main trough. Combining the results of the Low and Medium Groups reproducibility tests, we find the majority of false positives are as a result of tracing cliff/hill ranges (34%). Slumped crater ridges and valley ridges are responsible for 27% and 21% of false positives respectively. Scree slopes are the cause of 10% of false positives.

3.3 Results of Evaluation Forms

A reoccurring message from the evaluation forms and from the teacher’s response to the question, “What aspect(s) of the day had the most impact on the pupils who took part?” was that the students were engaged by involvement in “real science”. This is to say, they were not merely repeating an exercise that had no merit other than to learn from the task, they were engaging in applicable scientific research with the opportunity to work with material not often shared with the general public. Additionally, those who were most inspired by the day related this to actual involvement in scientific research, as it indicated they too could pursue a career in science.

From a pedagogy standpoint, providing introductory information though the general overview talk about Martian Science was an important factor in scaffolding students’ knowledge from a basic level. To further enhance effective outreach teaching, as suggested by the teacher and students, formative assessments such as knowledge checks and worksheets could be embedded.

4 Discussion

4.1 Effects of Expertise on Validity of Valley Identification: Is a Citizen Science Project in Valley Network Identification Worth Pursuing?

The results of the repeatability test show that the participants of the Low and Medium Groups generally traced a similar number of polylines in both takes, however, the placement of these polylines was inconsistent. The consistent number of polylines traced between repeats may be a result of the participants remembering the approximate number of polylines they had traced in their first attempt and resultantly mapping that same number in their second attempt. However, the large variation in the length of the polylines mapped between repeats for the Low and Medium Groups, and the lack of consistency in the positioning of the polylines between repeats, indicates that memory of the study area was not an influencing factor. Furthermore, for the Low and Medium Groups the discrepancy in valley orientation between iterations questions its validity; specifically, when considering the application of such a data set to geomorphological studies, such as that conducted by Bahia et al. (2018). The repeatability results, therefore, show that the Low Group are the least consistent between repeats, and the Medium Group are slightly more consistent. The greater consistency in the tracing of valley orientations for the Medium Group is likely due to their greater understanding of valley network geometry compared to the Low Group. However, both groups’ results are much less consistent than that of the High Group, indicating experience in the discipline is required to be highly consistent.

The results of the reproducibility test were comparable for both the Low and Medium Group. For both groups, a large proportion of the reproduced polylines were false positives (56% for the Low Group and 65% for the Medium Group) and many of the valleys identified by the High Group were not identified (false negatives). False positives were generally the result of the misidentification of cliffs/hills, scree slopes, slumped crater rims and valley ridges. If we were to pursue a citizen science project involving the tracing of valley networks in high resolution images, informing the participants to be wary of tracing such features may improve their ability to avoid false positives.

Although a vast proportion of the reproduced polylines traced by the Low and Medium group are false positives, with these removed both groups mapped a greater number of valleys, and a greater valley length, than the previous valley map (Hynek et al. 2010). This indicates that it effectively improves on the previous data set, even for the Low Group who represent the public. This improvement is likely a result of the higher resolution imagery and not the mapping technique, as emphasised by the fact that the High Group used the same mapping technique as previous mapping studies (e.g. Hynek et al. 2010) but identified approximately double the amount of valley length for this area. There is a large valley that has been missed by the Hynek et al. (2010) study, which cannot be explained by a lack of image resolution. The reasoning for this missed valley is unclear, but is perhaps a result of poor lighting geometry in the images used by Hynek et al. (2010) in this area causing the larger valley to be obscured.

Considering the improvements in image resolution from THEMIS daytime and night-time IR images (100 m per pixel) to HRSC (15–25 m per pixel) makes a considerable difference in the observed valley networks, increasing the resolution further may cause an even greater number to be observed. Specifically, using Context Camera images (CTX—5 m per pixel), which are now available at a global scale, to conduct valley mapping studies would likely result in a greater number of valleys being identified. Using images of a higher resolution than CTX (e.g. High Resolution Imaging Science Experiment Images—0.5 m per pixel) would likely not result in a greater number of valleys being observed. This is due to the fact that any valley too small to be observed in CTX images has likely been obscured by dust coverage, or has been eroded away by aeolian weathering or micrometeorite bombardment.

The improvement on the previous valley data set observed in this study may validate the pursuit of a citizen science project; however, in a practical sense the data would require analysis by an expert to remove false positives, making it additionally time consuming and potentially negating its purpose. It is worth noting that the removal of false positives is a quicker process than the manual task of mapping valley networks.

A citizen science project of this kind may, therefore, reduce the amount of time needed by scientists to trace valley networks in high resolution satellite images. Although it may improve on previous valley data sets (Hynek et al. 2010), the data set produced by these tracings will not effectively identify all valley networks present within high-resolution imagery; this would require the input of an expert in the field.

4.2 How to Effectively Engage Participants in Scientific Research

This study found that full engagement occurred when participants were aware of the scientific context in which they were partaking: the inclusion within a ‘real’ research project authenticated their work. In the case of this study, the exercise taught them about Martian valley systems and how to use GIS software; this ‘real life’ scientific task, alongside the knowledge that they were participating in scientific research, aided and abetted engagement.

This involvement did not only inspire participants through the delivery of scientific information, but removed the barrier between civilians and scientists—demonstrating the accessibility of a career in scientific research.

Additionally, from a pedagogical perspective, this study highlighted the importance of providing base knowledge for participants at outreach events as a contextual awareness of the task promotes engagement.

5 Conclusion

In this study, we conducted a repeatability and reproducibility test with three groups of participants of differing levels of expertise in valley mapping (low, medium and high). The participants were asked to map out all of the valleys they believed were present within a selected study area, and to trace them in direction of downslope. The results of this study’s repeatability test showed that members of the public (the low group) were not as effective at consistently tracing valley networks as individuals who have some experience in the area of valley networks (the medium group) or those who are experts in the field (the high group). However, the results of the reproducibility test showed that members of the public were just as effective at reproducing valid tracings of valley networks as the medium group. These validated tracings improve upon the number and total length of valleys mapped by previous studies (Hynek et al. 2010). Although this requires the input of an expert to remove false positives, this may indicate that a citizen science project involving the tracing of valley networks in high resolution images may be worth pursuing. However, although these valid tracings improve on the previous mapping study (Hynek et al. 2010) they do not effectively identify all valleys. To effectively identify the maximum amount of valleys an expert is required. From a pedagogical standpoint, the most effective method of engaging the public with scientific outreach was direct involvement in scientific research.