We conducted qualitative studies to learn more about user behaviour dealing with VGI point data. First, we performed a preliminary study to identify relevant tasks typically performed on point data and to test the suitability of the chosen point data sets. In the main part of the study, we conducted a new type of think-aloud interviews and examined user behaviour while solving different synoptic interpretation tasks.
Task Definition and VGI Datasets
In our studies, we used the task typology defined by Andrienko and Andrienko (2006). The authors divide tasks into elementary ones, which refer to individual elements of the data, and synoptic tasks, which take a subset of the whole dataset into account. Tasks in both categories can be distinguished into direct and inverse lookups, direct and inverse comparisons, and relation-seeking.
To examine the different types of tasks, we created various maps using VGI point data sets from different sources:
Italian restaurants in the inner city of Hamburg, Germany, retrieved from Open Street Map via the Overpass API.
Pictures tagged either with “Cristo Redentor” or “sugarloaf” and provided with a photo location in Rio de Janeiro, Brazil, retrieved via the Flickr API.
Pictures provided with a photo location in the Lüneburger Heide, Germany, retrieved via the Flickr API
Bird sightings in the Nature Park Lüneburger Heide, Germany, retrieved from (1) the eBird Basic Dataset and from (2) the iNaturalist database.
Antelope sightings at Kruger National Park, South Africa, retrieved from the iNaturalist database.
For all datasets retrieved from the Flickr API, we created smaller subsets for examining the influence of the point cardinality on user behaviour. To preserve the initial clusters, we reduced the number of points incrementally and analysed after each step if all clusters are still identified by the HDBScan algorithm.
Goals of the preliminary study were to identify and select relevant interpretation tasks for our datasets, to get a first impression of the user behaviour and confidence while solving these tasks, and to evaluate the suitability of the datasets for the purpose of overall study objectives. Beyond that, we wanted to find manually defined patterns in our datasets, which we used during the encoding process of the following interviews.
The preliminary study was conducted as a postal questionnaire with 25 participants, all of them professionally working with maps. For the first part of the study (Part A), a printed map of the Italian restaurants in the inner city of Hamburg, Germany, was provided. The dataset was also used for the second part (Part B) but is now extended with randomly generated information about the price level of each restaurant, which was symbolized with different colours. The map for the third part (Part C) showed the distribution of sightings of four different antelope species in the Kruger National Park, indicated with different colours. Based on the aforementioned task typology, we generated three elementary and nine synoptic tasks, and increased task difficulty (elementary vs. synoptic tasks) and complexity (mono- vs. multi-categorical data) from part to part (see Table 1). After the last question of each part, the participants had to rate their certainty while answering the questions about the map.
Elementary tasks of lookups and comparisons were very well understood and solved by the participants. In synoptic pattern identification tasks, participants were able to characterize patterns with their own words, often using additional information presented on the map. Furthermore, they argued with local pre-knowledge such as street names or landmarks and marked reasonable clusters on the map. As shown in Fig. 1, most of these clusters were also identified by the HDBScan Clustering algorithm. Synoptic comparison tasks had the best success rate within the whole questionnaire, with correct answers ranging between 88 and 95%. The answers for relation-seeking tasks showed greater variability, with the participant’s solving confidence being lower than before. Because most of the participants answered all questions, we argue that the answer variety was caused by different solution strategies.
To limit the scope of our study, we decided to focus solely on synoptic tasks in the following main study. We wanted to further investigate the user behavior—and, therefore, identify potentially different solution strategies—while having tasks for each of the synoptic subtypes (pattern identification, pattern comparison, relation-seeking).
The main study focused on the overall objective—the analysis of user behaviour while solving synoptic interpretation tasks. We also wanted to find out whether the cardinality of the point data set, as well as the graphical map complexity of the background map, have an influence on the task-solving strategy of the participants. Figure 2 gives an overview of the design of the main study.
Originally, we planned an eye-tracking study, but due to the ongoing COVID-19 pandemic, the study design changed to an online study. First, we sent a questionnaire to the participants and asked for their gender, age, education, their experiences and skills with maps and their geographic knowledge (see left column in Fig. 2). Furthermore, we invited the participants to perform a Navon test to define their cognitive style via the open-source software library PsyToolkit.Footnote 1 After a short introduction, they had to answer 50 questions within a short time using a provided web interface and fill in their results in a table provided at the end of our document.
The second part of the study was held via a Zoom meeting and can be described as a Think Aloud interview, a qualitative research method where the user is requested to vocalize her or his thoughts while answering a question or solving a task (Eccles and Arsal 2017). After a short introduction to the method, participants had to share their screen during the interview. We recorded their screens and voices, with only the maps and their mouse symbol visible to us (see the centre of Fig. 2). After the test, we anonymized all related data files, so backtracking from the interview to the participant was not possible anymore.
We chose a between-subject design for our study, where each participant had to solve five tasks, and examined the user behaviour through a set of ordered task-solving actions. Every task was introduced with an explanation about the map, the underlying data and the task, and participants could decide when to start. While solving the task, each participant described her or his behaviour and answered questions from the interviewer. With that, the method is a mixture of concurrent and retrospective Think-Aloud as described by Häder (2006). Although we guided the interviews by predefined questions, we waited with our first question until the participants—in their opinion—seemed to finish the respective task (concurrent Think Aloud). The interview guidelines included a list of questions regarding the different actions a participant made to solve the task, their order, the decision-making process, and if special characteristics of the map had influenced their strategies. With the guidelines, we made the interviews comparable and ensured that each question was answered by the participants, either directly while thinking aloud, or on request afterwards (retrospective Think Aloud).
Twenty-one people (7 female and 14 male) participated in the main study voluntary, with none of them already participating in the preliminary study. All of the participants were either students or postgraduates from the HafenCity University Hamburg, Germany. Regarding geographical knowledge, two participants rate themselves as experts, 15 participants as advanced, and four as a layman. All participants rate their experience in using maps as either average or expert. The questionnaire further revealed that two of the participants use satellite data every day, eleven often and eight rarely.
Nineteen of the twenty-one participants provided their results of the Navon test. The Navon test is a method to define the cognitive style of a person, which can be holistic or analystic. Holistics tend to focus on the large-scale patterns (“Forests before trees”; Navon 1977), while analytics tend to examine individual parts and their connections. Because the perception of global features is a crucial part of solving synoptic tasks, our hypothesis was that user behaviours significantly differ between analytic and holistic cognitive types. Using the results provided by PsyToolkit, we followed the same practice as Opach et al. (2019). In the end, we identified seven participants each with an analytic or holistic cognitive style and five participants which were neutral following this definition.
We used five different maps on four locations for our study (see Table 2), and randomly assigned the map variations to the participants. Point colour was used to differentiate between data sources (task 1), tags (task 2), antelope species (tasks 3 and 4) and the price levels of restaurants (task 5). We use the same scale within variants of the same location, but different scales for each location to ensure that the resulting user behaviour identified in this study is not bound to a certain scale. All but the last map was provided without any information or labels hinting at the location of the map. On request, nobody identified the locations correctly, so we can state that all locations were unknown to the participants. We varied the cardinality of the point data sets and the background map and calculated the respective map load with the graphic map load measuring tool (GMLMT). We use the map load measure of GMLMT as the indicator for graphical complexity in our study, which uses edge detection to measure the graphic map load as the amount of visible structures in a map (Barvir and Vit 2021). As shown in Table 2, we decided to use background maps with rather low map loads, as we want to make sure that the confounding impact of the most complex background map is still small enough to allow a straightforward interpretation of the respective point data sets.
As a result of the preliminary study, we decided to focus on synoptic tasks, with increasing complexity (task numbers correspond to the map identifiers used in Table 2):
Pattern identification: Describe the distribution of points representing bird sightings and photos taken at the Nature Reserve.
Direct comparison: Compare the distribution of points representing photos with tags of two landmarks.
Inverse comparison task: Describe the relative positions of sightings of two given antelope species.
Relation-seeking between different attributes within the same areas: Identify other antelope species which have similar patterns of sightings compared to the two species of task 3.
Relation-seeking between the same attribute within different areas: Identify districts which have a similar price level to that of a given district.
First, we cut the interview recordings into smaller clips of single tasks and sort out all other parts (e.g. introduction to the method, discussions between tasks). Then we encoded each clip twice (see right column of Fig. 2): For the content analysis, we wrote the participants’ reasoning down while solving the task and answering the predefined interview questions. For the visual analysis, we implemented the encoding system described in Knura and Schiewe (2021). It allowed us to encode the content and the focus location on the map for every second of the clip, using the statements of the participants and their mouse cursor as indicators. We defined seven content categories: (1) task description, (2) interviewer’s question, (3) participant’s question, (4) non-task-related discussion, (5) legend, (6) background map and (7) data. Data-related content referred either to the whole dataset or to one of the attributes which we encoded as subclasses for category (7), e.g. we had a subclass for each of the antelope species in task 3. For encoding the location, we used the manually marked clusters from the results of the preliminary study (see Fig. 1) for tasks which use the same data (tasks 3 and 4). Otherwise, we define clusters either by applying the HDBScan algorithm (tasks 1 and 2) or based on the dataset itself (city districts for task 5).
From the results of the encoding process, we created visualizations using techniques from eye-tracking analysis (Andrienko et al. 2012), e.g. flow maps, attention maps or map displays of trajectories. We are aware that we cannot reveal the actual eye movements consisting of fixations and saccades with our technique, but we can reconstruct the task-solving approach of each participant. To distinguish it from actual eye-tracking visualizations, we will use the terms focus map and focus trajectory when referring to the results of our encoding process. For further analysis and clustering different strategies between the participants, we used the MultiMatch method proposed by Jarodzka et al. (2010), a vector-based approach originally developed to compute scan path similarity for eye-tracking data.
In general, we did not use viewing time as a dependent variable in our study, as this factor heavily depends on the users’ capabilities to express their thoughts comprehensively, which is of course an essential part of think-aloud interviews.