1 Introduction

Both the demand for and the availability of multi-temporal geodata are increasing: On the one hand, there is a growing demand due to current and time-critical applications—such as epidemiology, climate, meteorology, mobility or energy supply. On the other hand, the provision of large multi-temporal data sets is generally no longer a problem due to the technical developments of satellite-, drone- or terrestrial-borne sensors and advanced distribution via the web. In this context, cartographic visualization (and associated generalization steps for reducing complexity) plays an important role in providing effective and efficient insights into such data sets for experts or laypersons.

If such multi-temporal data sets consist of cardinally scaled, relative and area-related data, series of choropleth maps are a very frequently used form of communication, especially in the media (Mooney and Juhász 2020), either as dynamic cartographic animation or as static map series (in particular, small multiples).

In addition to the inherent disadvantages that static choropleth maps already bring with them (such as the ‘area size bias’; Schiewe 2023a), there are further problems with multi-temporal representations. These problems can essentially be traced back to the limited capacities of the human memory levels (sensory, working and long-term memories), which only allow incomplete capture and further processing of the extensive information, presented in a short period of time (Harrower 2007). Conversely, the number of visual stimuli is normally too large for working memory capacity, which can be caused by a large number of enumeration units, class values (or related colours), epochs or change events. These effects are emphasized by simultaneous and opposing changes in different parts of the map.

To enhance visual contrasts and improve readability, data are often classified for the display in choropleth maps. This article will address two problems that arise specifically with multi-temporal data classification and make usable change detection and analysis more difficult:

  • Missing task orientation: The existing methods of data classification work purely data based, and there is no explicit consideration of different, more specifically described types of change (or questions or tasks). On the contrary, it is obvious that questions such as ‘show significant value changes’, ‘show trend changes’ or ‘show maximum positive changes’ require different pre-processing and sometimes also different visual representations.

  • Missing change preservation: The preservation of value changes ∆x between two epochs after data classification is the essential prerequisite for them to be visible in classified maps (then as class changes ∆c). However, when determining class boundaries, the typical methods of data classification (such as equidistant, quantiles or natural breaks) do not take into account the preservation of ‘important’ or statistically significant changes, nor the desire to categorize very small changes into different classes. This results in various cases of correct and incorrect reproduction of changes (Fig. 1).

Fig. 1
figure 1

Cases of desired and undesired class differences ∆c (and respective grey coding) between two epochs—in this example, an ‘important’ class change is defined for value differences of ∆x ≥ 3

The goals of this article arise from these problems, which relate to the multi-temporal representation of cardinal, area-related data:

  • To integrate different change tasks (expressed through related change metrics) into the data classification method and to visually and numerically demonstrate the differences of the respective outcomes.

  • To integrate suitable change preservation metrics into the classification process and to visually and numerically demonstrate the progress against a conventional method (such as equidistant).

The remainder of this paper is structured as follows: Sect. 2 describes selected work on the fundamental problems of multi-temporal choropleth maps and on task-oriented data classification. Section 3 covers the methodology—this includes the description of selected change tasks and change metrics as well as a measure that defines the preservation of changes after data classification. This measure is then also used in a customized data classification method—the application of this method to real data sets is demonstrated in Sect. 4 and discussed in Sect. 5. Section 6 summarizes the results and provides an outlook for future work.

2 Previous Work

With regard to solving general perceptual problems with multi-temporal map displays, two general approaches can be identified in previous work. These are mostly related to cartographic animations, but can also be partially transferred to map series. The first approach deals with different graphical or interactive design options. These include, in particular, the integration of interactive elements (e.g. Harrower and Fabrikant 2008), placement of legends (e.g. Kraak et al. 1997), use of speech or sound (e.g. Muehlenhaus 2013) or the interpolation (‘tweening’) between map frames (e.g. Fish et al. 2011).

The second general approach is related to data or map generalization methods, both for the spatial and temporal dimensions (Panopoulos et al. 2003). They follow the basic recommendation that the multi-temporal information to be displayed should be reduced to the necessary minimum. Temporal generalizations include the selection of certain epochs, aggregation (e.g. from daily values to monthly averages) or smoothing (e.g. by averaging an epoch and a certain number of previous epochs (e.g. Traun et al. 2021).

If one focuses on data classification as a specific form of necessary generalization, there is extensive treatment in the literature—relevant overviews are provided by Cromley and Cromley (1996) or Coulsen (1987). However, data-driven methods for static representations are usually discussed here—the multi-temporal and possibly dynamic representation considered in this paper, on the contrary, receives little attention.

An exception is the contribution by Monmonier (1994), who (unsuccessfully) tried to avoid small or unwanted class changes. Harrower (2003) advocated aggregation into two or three classes, which of course represents a high loss of information. Also, Becontyé et al. (2022) emphasized the loss of information for the case of aggregation in the course of pre-processing for choropleth maps.

Brewer and Pickle (2002) found in their studies that when existing classification methods are taken into account, the quantile method produces the best results for map comparisons. However, it must be mentioned that classifications were only made into three classes and that a posteriori processing was necessary due to numerous false positives. In addition, it was empirically confirmed that identical classifications and legends for all epochs significantly increased the accuracy of map comparison (by 28%).

There are also only a few publications regarding the consideration of specific change tasks (as well as associated change metrics) based on multi-temporal (choropleth) maps. For example, Brewer and Pickle (2002) did not consider different change tasks. The quantile method, which has been found to be optimal (see above), is certainly well suited to represent only very few, large changes—but a classification based on significant change values has not been considered.

Schiewe (2018) and Schiewe (2022) used the triad model of Peuquet (1994) and the task typology of Andrienko and Andrienko (2006) to categorize cases of change and possible forms of visualization. Halpern et al. (2021) pointed out (using the example of communication COVID-19 data) that data classifications should in principle be individually adapted for each application.

There are definitely empirical findings for the choice of a classification method for mono-temporal representations, which particularly take the given data distributions into account (e.g. highly skewed distributions should usually not be treated by an equidistant classification, as this leads to many weakly and a few strongly populated classes). However, transferring such findings to multi-temporal data sets does not make sense, as this creates a ‘new’ large data set of all epochs, for which the data distribution no longer has any connection to the individual epochs.

In summary, it can be stated that the influence of undesirable loss and preservation of changes in the context of data classification has not yet been adequately addressed in the literature and with that no implementations in standard software can be observed.

3 Methods

3.1 Change Tsks and Metrics

The focus in this article is on area-related, cardinal data, which limits the scope of possible tasks or questions regarding changes in the data sets. According to the triad model by Peuquet (1994), a related, possible general change task description consists of the input parameters space (‘where’) and time (‘when’) and the change information as output (’what’).

A rough categorization of the spatial component leads to local tasks (i.e. related to a single enumeration unit) and regional tasks (i.e. related to more than one, up to all enumeration units). The temporal component defines the number of epochs, the overall duration and the temporal resolution of the given data. Depending on specific applications, the change information as such (based on area-related, cardinal data) can be expressed as value differences, value quotients (or percentages), trends or derived metrics.

Value differences or quotients are typically of bi-temporal nature with a certain time lag. One obtains a series of differences or quotients for multi-temporal data sets (with three or more epochs).

Trends might be expressed qualitatively (e.g. increase, decrease and no change) or with quantitative metrics (such as numerical trend functions).

Based on these basic metrics, a couple of derived metrics might be necessary for specific applications, in particular:

  • Extreme values (maximum, minimum)—e.g. to search for outliers, etc.;

  • Other statistical values (mean, median, standard deviation, etc.)—e.g. to get a summary or overview of the time series, etc.;

  • Magnitude of absolute (positive, negative) change values only—e.g. to filter out locations that show increased information only, etc.;

  • Deviations from given (legal, etc.) thresholds—e.g. to identify critical locations in time that exceed certain thresholds, etc.

  • Deviation of values from trend functions—e.g. to identify strong deviations that take also the general development of values of time into account.

In principle, all combinations of the aforementioned spatial, temporal and change information components are possible to describe concrete change tasks. As the central idea of this paper is to integrate different change tasks (expressed through related change metrics) into the data classification method, it can be concluded that in principle all numerical values mentioned before can be used as input. The example (Sect. 4.2) will tackle a small subset.

3.2 Change Preservation Metrics

Based on the core problem that ‘important’ changes (which of course require a prior definition of ‘important’) can be lost through data classification, the following approach is proposed to improve the preservation of changes (see also Schiewe 2023b):

  • Defining which changes should be represented by class differences or number of class changes,

  • Carrying out data classification and

  • Comparing the actual class changes achieved with the required changes using a change preservation metric.

The first and third steps are described in more detail below. The intermediate data classification can be carried out using any method—the new data classification will be discussed in Sect. 3.3 in more detail.

The definition of the required class differences ∆crequired can be done in different ways (Schiewe 2023b). In general, an upper threshold value ∆xupper is set, which should lead to a class difference of one (in the case of a binary representation) or to a maximum class difference of k-1 (with k: number of classes) (Fig. 2). In the second case, a proportional assignment between value and class difference can be made below this threshold:

Fig. 2
figure 2

Transformation of value differences ∆x to class differences ∆c (a: binary assignment; b: proportional assignment; each based on the upper threshold value ∆xupper)

$${\Delta c}_{{\text{required}}}=\left\{\begin{array}{c}{\text{INT}} \left(\frac{k-1}{{\Delta x}_{{\text{UPPER}}}}\Delta x\right) {\text{for }} \Delta x< {\Delta x}_{{\text{UPPER}}} \\ k-1 {\text{ else}}\end{array}\right\}$$

The assessment of how well these required class differences were reached after using a data classification procedure makes a change preservation metric necessary. This metric can also be used for a new classification method that explicitly takes the preservation of changes into account (Sect. 3.4). The POCC (preservation of class changes) metric is used below, which looks at the difference between the requested (∆crequired) and the actually received class difference (∆cachieved) as follows:

$${\text{POCC}}=1-\frac{\sum_{i}{w}_{i}\bullet |{\Delta c}_{{\text{required}},i}-{\Delta c}_{{\text{achieved}},i}|}{\sum_{i}{w}_{i}\bullet {\Delta c}_{{\text{required}},i}}$$

The measure uses normalization so that the range of POCC is between 0 and 1. The larger the POCC, the better the preservation. The index i runs over all enumeration units with a specified time difference (e.g. in direct succession of epochs). The weights w can be used to emphasize the preservation of certain class differences (e.g. very high value differences; see also Sect. 4.3).

3.3 POCC Data Classification

The change preservation metric (POCC) described above can also be used to control a data classification process, which aims for an improvement of the preservation of desired class differences ∆crequired.

It is assumed that a consistent classification across all time epochs will be used—whereas separate classifications for each individual time point will hinder the visual change detection (Brewer and Pickle 2002)—as Fig. 5 also shows.

The class boundaries are determined using a sweep line method (Fig. 3). The attribute values x are shown on the right axis. All value differences (alternatively, all value quotients) between all epochs are plotted below. The sweep line is pushed from left to right over these intervals. The option currently pursued here is that the stop points are defined by all start values of the intervals. Each intersection or point of contact between the sweep line and the interval line represents a possible class boundary. For each sweep line, the number of intersection points is counted. Then, all combinations of sweep lines for a given number of classes are taken into account (leading to a brute force approach).

Fig. 3
figure 3

Sweep line procedure: Horizontal black lines below number line show placement and width of all value intervals ∆x; dashed vertical lines are sweep lines with stop points at start of each interval

Based on this, also different required class breaks ∆crequired depending on the interval width number can be considered, which allows the computation of the POCC metric (as introduced in Sect. 3.3). Figure 4 shows an example for POCC calculation for one possible class break setting. The maximum POCC value of all combinations for the given number of classes determines the best solution.

Fig. 4
figure 4

Example for POCC calculation: For demonstration purposes, the given values ∆crequired are simply proportional to the interval widths; ∆cachieved is derived from number of intersections between class breaks and interval

4 Experiments

4.1 Data

This study uses a prominent data set from the Robert Koch Institute (Berlin, Germany)—the description of daily COVID-19 incidences for the 16 federal states of Germany between September 11, 2021, and April 17, 2023. To reduce the data set, temporal aggregation was carried out by averaging to monthly values. Two subsets were generated from the total number of epochs—one with four epochs (months 5–8; Fig. 5) and one with six epochs (months 10–15).

Fig. 5
figure 5

Four-epoch data set (months 5–8) of absolute COVID-19 incidences in German States also demonstrating the almost impossible detection of changes when using an individual classification (here: equidistant) and colouring for each single epoch (source: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/Inzidenz-Tabellen.html)

4.2 Experimental Design

The following experiments consider the following independent variables:

  • Number of epochs: As described above, two factors (four and six epochs) will be treated.

  • Number of classes: Two factors with typical numbers (four and five classes) will be used.

  • Change tasks and related measures: The following three factors will be introduced:

    1. o

      Show absolute change of COVID-19 cases: Absolute differences between two consecutive epochs;

    2. o

      Show relative change of COVID-19 cases: Absolute quotients between two consecutive epochs;

    3. o

      Show deviations from temporal trend to consider general, temporal behaviour of COVID-19 cases of entire area of interest: Absolute deviations from trend (the trend is defined by the average of the current and the three previous epochs)

In any case, statistical significance will be defined by those numbers that are above the 80% percentile (arbitrary setting) of the processed values (absolute difference, etc.) for the entire data set.

  • Data classification method: Two factors are considered—POCC method (Sect. 3.4) and equidistant method (probably being the most frequently used standard method).

All in all, this leads to a factorial design of 2 × 2 × 3 × 2 = 24 scenarios.

The dependent variables are the numerical class breaks (which can be used for the follow-up visualization of multi-temporal choropleth maps) and the change-preserving measure POCC (Sect. 3.3).

4.3 Results

The parameters of the above factorial design and the associated results (i.e. class breaks and POCC values) are summarized in Table 1.

Table 1 Results of POCC vs. equidistant data classification for different change tasks

When calculating the POCC metric, the weights w were set proportional to the desired class jumps. This means that small differences for which no class difference is desired are given a weight of zero, so that they play no role in the evaluation of preservation.

Figure 6 shows selected results of this study—for the case of four epochs and five classes. The incorrect assignments are also marked here—on the one hand, the false positives (i.e. those class differences greater than or equal to 1, which, however, should not show a class change due to the small change in values or quotients), and on the other hand, the false negatives (i.e. class differences of 0, which, however, should show a class change larger than 0 according to ∆crequired).

Fig. 6
figure 6

Choropleth maps for data set with four epochs and five classes: Top: equidistant grouping (here, definition of false positives and negatives is done according to thresholds for absolute differences); Rows below: Maps derived from POCC classifications for different tasks; Last row: deviation from trend uses different colour scheme to express the fact that not the original value space (but differences from trend) is used

The occurrence of false positives in particular is very hindering for change detection—these cases are listed in Table 2. This shows that the equidistant method has a slightly (but not significantly) lower number of false positives.

Table 2 Relative number of false positives (four epochs and five classes) in relationship to all change cases with ∆crequired = 0 (no change required)

One approach to reduce false positives is to introduce non-zero weights (w) for the class of changes that should not result in a class change. Table 3 shows an exemplary comparison: The introduction of weights shows slight improvements; however, these are not yet statistically proven and require further investigation.

Table 3 Different weighting (w) of no change-classes (four epochs and five classes)

5 Discussion

5.1 Consideration of Change Tasks

The first goal of this article was to present the added value that arises from considering explicit change tasks and associated change metrics in the data classification process for multi-temporal data sets.

A simple visual analysis of the choropleth maps (Fig. 6) and the examination of the class boundaries (Table 1) show that different classifications are created depending on the change task selected. In contrast, conventional methods such as equidistant methods always produce exactly one solution regardless of a change task (Fig. 6, top row), since the division is purely mathematically defined along the number line.

Of course, the different definitions of change metrics associated to the change are responsible for these effects. This becomes clear when comparing the change tasks in Fig. 6: For example, between the first and second epochs for the two southern federal states (Baden-Württemberg, Bavaria), one observes increases for the ‘absolute value difference’ task and decreases for the ‘deviations from trend’ task. In the first case, the pure changes in values are considered; in the second case, the focus is on ‘outliers’ that cannot be traced back to the general trend.

A numerical analysis confirms this visual impression: If one looks at the pairwise correlations between the required class differences for change tasks (Table 4), there is a strong positive linear, but also inconsistent correlation (r = 0.84) between absolute differences and quotients. The deviations of these two methods from the ‘deviation from trend’ task, on the contrary, show weak correlations (r = 0.34 and r = 0.38, respectively), which reflects the clear diversity of these change tasks.

Table 4 Correlation coefficients between required class differences for different change tasks (case: four epochs, five classes; i.e. 3 × 16 class differences per change task)

5.2 Consideration of Change Metrics

The second goal of the article dealt with the better preservation of desired changes by using the POCC data classification. The POCC metric demonstrates the superiority of the method over equidistant classification in all variants of the experiment (Table 1). This is also logical, since the POCC measure controlled the classification process.

It is noticeable that the POCC metrics for a given task (e.g. ‘absolute difference’) are very similar regardless of the number of epochs and classes, while there are significant differences between tasks.

If the POCC classification is compared with the equidistant classification regarding their false positives and false negatives (Fig. 6, Table 2), comparable values occur. Beyond this ‘binary consideration’ (class change correct or incorrect), the consistently higher POCC measure for the POCC classifications shows a better representation of the magnitude of the class changes (regarding different values for ∆crequired).

Undesirable false positives occur in almost all variants, so that small value differences lead to undesirable class changes and thus make it more difficult to recognize the important ‘real’ changes. This effect was not absorbed by the previous choice of parameters—the weighting w only considered the preservation of larger differences, but not the preservation of identical classes for small value differences. Other weights have already been experimented with in Table 3—this approach needs to be expanded upon.

6 Summary and Future Work

6.1 Summary

The first goal of this article was to show the added value of explicitly considering different change tasks in the data classification methods. This could be demonstrated both visually (simplified by the differences in the map results) and numerically (through the pairwise correlation coefficients) using four selected tasks.

The second aim dealt with the added value of a data classification that was directly controlled by a measure to preserve class differences. In a comparison with an equidistant classification, there were no significant differences in a purely ‘binary’ view (class change correct or incorrect). However, the significantly better POCC measures, which have explicitly introduced through weights proportional to the desired class difference (and not just a grouping of change vs. no-change), show the superiority of the POCC over the equidistant grouping.

All in all, this work has shown a promising approach for task-orientated data classifications and thus reducing the effect of information loss through classification, which is relatively little discussed in the literature. The information to be preserved (here: statistically significant changes) must be specified in advance—this step can be performed either manually or by a priori statistical analyses. This approach can also be applied to the preservation of other information (e.g. spatial patterns; Schiewe 2018) and offers added value—especially compared to the common application of standard procedures such as equidistant, quantiles or natural breaks in GIS software packages, which lead to unpredictable information preservation.

6.2 Future Work

The experiments carried out in this article were usually based on a fixed parameter setting. This also reveals the potential for experimental extensions and algorithmic improvements, e.g.:

  • A key point is the explicit consideration (or weighting) of unwanted class changes. A first approach (see Table 3) already suggests a reduction of false positives through such weighting, but still needs more testing.

  • The current model for calculating the desired class changes ∆crequired used an upper threshold ∆xupper. In this article, a fixed rule was used to determine it (80% percentile), which should be varied by other values.

  • In addition, in this article, a maximum class difference of k – 1 was used for changes above the threshold value ∆xupper, followed by a proportional distribution of the smaller value differences below this value. Tests that use different settings are still missing.

  • A general problem arises from the high computing time for the POCC data classification that results from the brute force approach (Sect. 3.4)—leading to calculation times of 5 min and more for the most complex cases in this study (six epochs and five classes) using a standard laptop. Heuristic procedures must be designed and implemented here to enable better usability.

Very importantly, empirical studies must verify or falsify the hypothesis that—depending on change tasks and parameter settings—a correspondence actually exists between the POCC measures and the level of human change perception. Certainly, here a distinction must be made between cartographic animations and map series.

This paper has put the focus on the classical setup of a series of epochs. Obviously, the disadvantage of this approach is the fact that changes are not displayed in an explicit manner so that users need to invest additional mental resources for detecting changes. Difference maps are a typical solution to this problem; however, in these, the original absolute values are removed, which might be an important loss of information for some applications. However, it can also be envisaged that the classification of difference maps can be improved on the fundament of the above-presented method.