1 Introduction

Traditional cartography is a technique for creating geographical maps that allow humans to intuitively understand the spatial or geological characteristics of countries, oceans, or celestial objects. However, in these past few years, some non-geo-spatial cartography was proposed. The most successful one is antigenic cartography, proposed in Ref. [1, 2]. This new cartography aims to map the antigenic properties of influenza viruses, in which viruses with similar antigenic reactions are mapped closely, and viruses that generate significantly different responses are placed apart on the map. This map will help to design a vaccination strategy quickly and efficiently. This antigenic cartography proved instrumental during the COVID-19 pandemic [3,4,5], when it mapped a variant of the coronavirus, Omicron, way apart from other viruses. This fact prompted the development of new vaccines specialized for this variant. The antigenic cartography is based on the traditional multidimensional scaling (MDS) [6] that requires a matrix of vectorial differences between all the instances. This study proposes a new and context-flexible cartography without requiring the distance matrix, thus allowing more efficient cartography.

Similar to the antigenic cartography that builds a map by a dimensionality reduction algorithm, the proposed study also depends on a dimensionality reduction technique. This study proposes a new algorithm for cartography using a novel Siamese topological neural network (STN) that is significantly different from the traditional MDS. The STN utilizes a hierarchical neural network containing a low-dimensional topological layer [7] as its base network that maps high-dimensional input into a two-dimensional topological map that can be visualized. Most of the dimensional-reduction methods for visualization, for example, t-SNE [8], isomap [9], Umap [10] reduce the dimensionality while preserving a criterion of similarity between the high-dimensional data in their low dimensional representations. Unlike these unsupervised dimensionality reduction algorithms that do not take the context of the data, for example, their labels, into account, STN forms a low-dimensional embedding that reflects the context-similarity and thus expands the visualization flexibility of high-dimensional data. However, STN is not strictly a supervised algorithm, as it does not require the availability of data labels or the difference matrix for all the data points.

In the past, there were many supervised dimensionality reduction methods [11, 12] that also took data labels into account, linear discriminant analysis (LDA) [13, 14] being the most traditional example. While the idea for supervised dimensionality reduction methods has been known for a very long time, it is still actively studied, resulting in many exciting algorithms [15,16,17,18,19]. These methods formed low-dimensional representations in the context of the data labels and hence required the availability of the labels. However, for many real-world problems, data labels are only sometimes available and can be expensive. The proposed study requires pairwise similarity measures but not data labels. Here, the STN is trained to learn the similarity measure while executing dimensionality reduction at the same time.

The proposed cartography generates maps of high-dimensional data using the data’s topological characteristics as well as the contexts assigned to them. Self-organizing maps (SOM) [20, 21] introduced by Kohonen visualized the topological structure of the data. Visually induced SOM (ViSOM) [22] further modified SOM so that the maps visualize not only the topological structure of the data but also the distances between their low dimensional representations. The relation between ViSOM and MDS was further elaborated in [23]. The proposed STN shares a similarity with ViSOM in that it also aims at preserving the distance structure of the data. However, it significantly differs in the context of the preserved distance, as in STN, the distance is associated with an arbitrarily assigned context.

The proposed STN also shares a similarity with the Elastic SOM [24] in that both attempt to induce an arbitrarily given metric of distance into the generated maps. However, STN significantly differs in the training process and the ability to project out-of-sample data not used during the learning process onto the map.

The proposed study shares some resemblances with the past research of dimensionality reduction through kernel-based similarity learning for high-dimensional data [25], in that the networks execute similarity learning that can be related to some given contexts, for example, rank. However, this method needs to depend on a separate dimensionality reduction method for visualization, while in the proposed STN, the dimensionality reduction is embedded.

The novelty of the STN is two-fold: (1) the proposed method allows a topological representation of high-dimensional data to be learned from a set of paired data with arbitrarily designed similarity measures, resulting in a context-oriented representation that depends not only on the feature-similarity of the inputs but also the similarity-context assigned to them. (2) the proposed methods seamlessly integrate similarity learning with dimensionality reduction. This is a strength of the proposed model because many similarity learning methods require separate visualization algorithms, for example, t-SNE [8] being one of the most popular methods for visualizing their representations. In this study, the dimensionality reduction mechanism that allows visualization is an inherent part of the network. Further, as the topological representations here are context-oriented, in that different similarity criteria result in other topological structures, a flexible visual analysis is made possible. The novelties are instrumental in building a new flexible cartography for visual data analysis.

The rest of the paper is structured as follows. Section 2 explains the structure of the STN, its learning dynamics, and the framework of the proposed STN as similarity learning. The subsequent section describes the experiments, while the conclusions will be elaborated in the final section.

2 Siamese topological networks

The proposed Siamese Topological Networks (STN) structure is outlined in Fig. 1. The base network for the STN is a hierarchical topological network, proposed in [7, 26, 27]. The structure of the base network is similar to the standard Multilayered Perceptron (MLP), except that the hidden layer is a two-dimensional topological map as in many cases of Self-Organizing Maps (SOM) [20, 21]. The low dimensionality allows the network to form topological representations of a given data set that humans can visualize. To some extent, visualizing the internal representations enables humans to understand how the network converts its inputs into output intuitively. In the past, it was utilized for visualizing high-dimensional data [16]. It can also be trained as an autoencoder, a classifier, or a mix of both [26], and hence can visualize high-dimensional data constrained by various contexts.

This study pairs two topological networks to form the STN for similarity learning. The objective is to train the network with high-dimensional data in which a similarity criterion between pairwise data instances is given. The similarity measure may be defined qualitatively, quantitatively, or subjectively and does not need to meet a mathematically defined distance. The low-dimensional topological map can be utilized to visualize the data structure in the context of the defined similarity. Here, different similarity measures for the same data will generate another map, allowing context-flexible cartography.

Fig. 1
figure 1

Siamese topological networks

2.1 Learning process

The learning process of STN is explained as follows.

For each of the pair of high-dimensional input to the network, \(\textbf{X}^{(k)} (k \in \{1,2\})\), the best matching unit, \(win^{(k)}\) is selected as in Eq. (1).

$$\begin{aligned} win^{(k)}(t) = argmin_{j} \Vert \textbf{W}_j(t) - \textbf{X}^{(k)}(t) \Vert \end{aligned}$$
(1)

The activation of the j-th hidden neuron, \(h_j^{(k)}(t)\), incurred by one of the input, \(\textbf{X}^{(k)}\), of the pair at epoch t is calculated as in Eq. (2). Here, \(\textbf{h}^{(k)}\) denotes the activation vector incurred by input \(\textbf{X}^{(k)}\).

$$\begin{aligned} h_j^{(k)}(t)=\, & {} \sigma (win^{(k)}(t),j) e^{-\Vert \textbf{X}^{(k)}(t)-\textbf{W}_j(t) \Vert ^2 } \nonumber \\ \textbf{h}^{(k)}(t)=\, & {} (h_1^{(k)}(t), h_2^{(k)}(t), \cdot \cdot \cdot h_{N_{hid}}^{(k)}(t) )^{\textrm{T}} \nonumber \\{} & {} k \in \{1,2\} \end{aligned}$$
(2)

Here, \(N_{hid}\) is the number of the neurons in the topological layer, while the neighborhood function \(\sigma (win^{(k)},j,t)\) is defined as follows.

$$\begin{aligned} \sigma (win^{(k)},j,t)=\, & {} exp(-\frac{dist(win^{(k)}(t),j)}{S(t)}) \nonumber \\ S(t)=\, & {} \sigma _{\infty } + \frac{1}{2} (\sigma _0 - \sigma _{\infty }) (1+cos \frac{\pi t}{t_{\infty }}) \end{aligned}$$
(3)

In Eq. (3), \(\sigma _0\) and \(\sigma _{\infty }\) are the initial and terminal annealing constants, and \(t_\infty\) is the termination epoch. At the same time, \(dist(win^{(k)},j)\) is the distance from the BMU to the j-th neuron on the topological layer.

The output of the ReLU layer, \(\textbf{O}^{(k)} = (O^{(k)}_1, O^{(k)}_2 \cdot \cdot \cdot , O^{(k)}_{N_{out}})\), where \(N_{out}\) denotes the number of the neurons in ReLU layer, is as follows.

$$\begin{aligned} O^{(k)}_m=\, & {} ReLU(\textbf{V}_m \cdot \textbf{h}^{(k)}) \nonumber \\ ReLU(x)=\, & {} {\left\{ \begin{array}{ll} x &{} x \ge 0 \\ 0 &{} otherwise \end{array}\right. } \end{aligned}$$
(4)

In Eq. (4), \(\textbf{V}_m\) is the weight vector connecting the hidden layer with the m-th output neuron.

The ground truth of the similarity, between inputs \(\textbf{X}^{(1)}\) and \(\textbf{X}^{(1)}\), is defined as follows.

$$\begin{aligned} T(\textbf{X}^{(1)},\textbf{X}^{(2)})= {\left\{ \begin{array}{ll} 1 &{} {\textbf{X}}^{(1)}, {\textbf{X}}^{(2)}: similar \\ 0 &{} (otherwise) \end{array}\right. } \end{aligned}$$
(5)

Here, it should be noted that the similarity itself can be arbitrarily defined. For example, it can be defined that two inputs having the same labels are similar or two inputs with close ranks are similar. The definition can also be subjective, for example, human impressions or preferences. The flexibility in assigning the definition of similarities is the strong point of the proposed STN in visualizing high-dimensional data in different contexts.

The cosine similarity between the activation vectors incurred by the input pair is defined in Eq. (6).

$$\begin{aligned} S(\textbf{X}^{(1)},\textbf{X}^{(2)}) = \frac{\textbf{O}^{(1)} \cdot \textbf{0}^{(2)}}{ \Vert \textbf{O}^{(1)} \Vert \Vert \textbf{O}^{(2)} \Vert } \end{aligned}$$
(6)

Because \(\forall j, k \in \{1,2\}, 0<h_j^{(k)} \le 1\)

in Eq. (6), \(0 < S(\textbf{X}^{(1)},\textbf{X}^{(2)}) \le 1\)

The loss function is cross entropy defined as follows.

$$\begin{aligned} {\mathcal {L}}(\textbf{X}^{(1)},\textbf{X}^{(2)}) = - (T log S + (1-T) log (1-S)) \end{aligned}$$
(7)

The gradients of the loss function are calculated as follows.

$$\begin{aligned} \frac{\partial {\mathcal {L}}}{ \partial \textbf{V}_m}= & {} - \frac{(T-S)}{S (1-S)} \frac{\partial S}{\partial \textbf{V}_m} \nonumber \\= & {} - \frac{(T-S)}{(1-S)} \sum _{k=1}^{2} sgn_m^{(k)} \delta ^{(k)}_m \textbf{h}^{(k)} \nonumber \\{} & {} m \in \{1,2, \cdot \cdot \cdot , N_{hid} \} \end{aligned}$$
(8)

In Eq. (8),

$$\begin{aligned}{} & {} \delta ^{(k)}_m=\frac{O^{({\bar{k}})}_m}{\mathbf{O^{(1)}} \cdot \textbf{O}^{(2)}}-\frac{O^{(k)}_m}{\Vert \textbf{O}^{(k)\Vert ^2}}, \\{} & {} \quad {\bar{k}} = {\left\{ \begin{array}{ll} 2 &{} (k=1) \\ 1 &{} (k=2), \end{array}\right. } \end{aligned}$$

and,

$$\begin{aligned} sgn^{(k)}_m = {\left\{ \begin{array}{ll} 1 &{} (h^{(k)}_m > 0) \\ 0 &{} (otherwise), \end{array}\right. } \end{aligned}$$

Hence, the weight vector from the topological layer to the ReLU layer is modified as follows.

$$\begin{aligned} \textbf{V}_m(t+1) = \textbf{V}_m(t) + \eta \frac{(T-S)}{(1-S)} \sum _{k=1}^{2} sgn_m^{(k)} \delta ^{(k)}_m \textbf{h}^{(k)} \end{aligned}$$
(9)

While the modification rule is executed as in standard Backpropagation, the interpretation of the modification rule in Eq. (9) is interesting. When teacher signal \(T=1\), indicating that the two inputs are similar, the direction of the modification only depends on \(\delta ^{(k)}_m\) that measures the difference between the influence of the m-th element from the \({\bar{k}}\)-th input to the cosine similarity and that of the k-th input to the norm of \(\textbf{O}^{(k)}\). Here, for \(\delta ^{(k)}_m > 0\) the weight vector is corrected toward \(\textbf{h}^{(k)}\) that in turn will bring \(\textbf{0}^{(k)}\) closer to \(\textbf{0}^{({\bar{k}})}\) and thus increasing their cosine similarity. For \(\delta ^{(k)}_m < 0\), the weight is modified in the opposite direction of \(\textbf{h}^{(k)}\), and hence will decrease \(O^{(k)}_m\) and hence will bring \(\textbf{0}^{(k)}\) closer to \(\textbf{0}^{({\bar{k}})}\) as well. Using the same rationale, it is obvious that when the teacher signal \(T=0\), the weight is modified so that the cosine similarity between \(\textbf{0}^{(k)}\) and \(\textbf{0}^{({\bar{k}})}\) decreases.

The gradient of the loss function regarding the reference to the reference vector associated with the m-th neuron in the topological layer is calculated as follows.

$$\begin{aligned} \frac{\partial {\mathcal {L}}}{ \partial \textbf{W}_n}= & {} - \frac{(T-S)}{S (1-S)} \frac{\partial S}{\partial \textbf{W}_n} \nonumber \\= & {} -2\frac{(T-S)}{(1-S)}\sum _{k=1}^{2} \epsilon ^{(k)}_n h^{(k)}_m (\textbf{X}^{(k)}-\textbf{W}_n ) \nonumber \\{} & {} \end{aligned}$$
(10)

In Eq. (10), the term \(\epsilon ^{(k)}_n\) is the weighted correction signal from the ReLU layer that is calculated as follows.

$$\begin{aligned} \epsilon ^{(k)}_n = \sum _{m=1}^{N_{out}} sgn_{m}^{(k)} v_{mn} \delta ^{(k)}_m (k \in \{1,2\}) \end{aligned}$$
(11)

In Eq. (11), \(v_{mn}\) is the weight from the m-th neuron in the topological layer to the n-th neuron in the ReLU layer, while \(N_{out}\) denotes the number of neurons in the ReLU layer.

Hence, the reference vector modification is as follows.

$$\begin{aligned} \textbf{W}_n(t+1) = \textbf{W}_n(t) + \eta \frac{(T-S)}{(1-S)}\sum _{k=1}^{2} \epsilon ^{(k)}_n h^{(k)}_m (\textbf{X}^{(k)}-\textbf{W}_n ) \end{aligned}$$
(12)

Equation 12 significantly differs from that of conventional SOM in that in SOM, the reference vectors are always modified toward the input vector, while in STN, the direction of modification is influenced by the sign of \((T-S)\), and the sign of \(\epsilon ^{(k)}_m\). The former indicates that during the learning process, the formation of the topological map is influenced by the input’s topological structure and context, in that the same set of inputs produces different topological maps if different similarity rules are assigned to them. The latter is the weighted sum of \(\delta ^{(k)}_m\) that can be considered as a result of votes between all the output neurons to decide the direction of the reference vector modification that is influenced by the similarity-contexts of the inputs. Here, changing the data’s context will change the appearance of the map. This means that new cartography can be built for flexibly visualizing data from various perspectives.

3 Experiment

3.1 Preliminary experiments

For demonstrating the cartographic properties of the STN, it is first tested against some simple labeled data from the UCI machine learning repository [28] for creating maps in the context of their labels’ differences. Although the data are labeled, the labels were not explicitly utilized during the learning process. Here, when an arbitrarily chosen pair shares the same label, they are defined as similar, and they are defined to be dissimilar otherwise. The resulting topological map is then visually compared with that of standard SOM.

In the following experiments, the ground truth of similarity in Eq. (5) is defined to be similar when two inputs have an identical label, while they are dissimilar when they have different labels.

Fig. 2
figure 2

Maps of Iris Data

The first example is the well-known Iris problem, a four-dimensional, three-classed data set. Here, during the learning process, the class labels of the data are not explicitly utilized but used to define the similarity between the two samples. In Fig. 2a, the topological structure of the data is visualized using the standard SOM. Here, for visualization clarity, the samples are colored according to their class labels, although the class labels did not have any role in the self-organizing process. From this figure, a well-known profile of this problem is that one of the classes (illustrated by s) is linearly separable from the other two classes (illustrated by and that are not linearly separable, can be observed.

Figure 2b shows the topological map for these data. It can be observed from this figure that STN generates sparser representations of the high dimensional input, in which the three classes are equally separable except for some anomalies. The better separability is due to the pairwise-similarity information unavailable to SOM. It is also interesting to notice that although the class labels are not explicitly provided, STN can form distinctive clusters according to the data labels. The map also successfully locates anomalies in these data.

Fig. 3
figure 3

Maps of breast cancer data

The second example is the Breast Cancer problem, a 9-dimensional 3-classes problem. The SOM representation in Fig. 3a indicates that while some overlapping samples exist, most instances can be easily classified. As shown in Fig. 3b, the STN generates a sparser map and nicely visualizes the potentially hard-to-classify instances due to their similarity with other samples belonging to contrasting classes.

The third problem is the Wine problem, a 13-dimensional 3-classes problem. Figure 4a shows the SOM representations. It can be observed that s form two distinct clusters that are also well represented by the STN as in Fig. 4b.

The three examples illustrated reflect the given context of similarity, which in these three examples are their class similarities.

Fig. 4
figure 4

Maps of wine data

For quantitatively analyzing the resulting maps, the Neighborhood Consistency Index, \(C_{idx}\) is defined as follows.

$$\begin{aligned} C_{idx} = \frac{1}{N} \sum _i \Phi (BMU(i), label(i)) \end{aligned}$$
(13)

In Eq. (13), BMU(i) and label(i) denote the BMU and the label for input i, while \(\Phi (x,y)\) denotes the ratio of inputs having label y associated with the nearest BMU to BMU x, and N denotes the number of inputs. Here, it should be noted that a BMU may be associated with several inputs, each with different labels. For example, if the nearest BMU to the BMU of the input i, where \(label(i)=\alpha\), is associated with five inputs, and four have labels \(\alpha\), then \(\Phi (BMU(i),label(i))=0.8\).

Hence, \(0 \le C_{idx} \le 1\) measures the neighborhood consistency of the resulting maps. In the experiments above, STN was trained so that the topological characteristics of the inputs in the context of their labels should be preserved on the map. Here, \(C_{idx} =0\) indicates that the topological similarity in the context of the input labels is not well presented on the map, while \(C_{idx} =1\) indicates the opposite.

Executing 10-cross validation tests, the average \(C_{idx}\)s for Iris Data, Breast Cancer Data, and Wine Data are 0.935, 0.949, and 0.982, respectively. The resulting consistency indexes indicate the successful learning for the proposed STN.

For these three preliminary experiments, the number of neurons in the ReLU layer was set to 80, the learning rate, \(\eta\) in Eqs. (9 and 12) was set to 0.001, while the map size was set to \(\lceil \sqrt{\frac{N}{2}} \times \sqrt{\frac{N}{2}} \rceil\) where N is the data size.

3.2 Maps of demographic data

In the previous preliminary experiments, STN is trained with labeled data and hence whose similarities can be naturally defined from their original labels. Here, the STN is trained using demographic data with no natural labels, but their similarities can be arbitrarily designated. For example, countries are often ranked based on various criteria: their economic power, the educational level of their populations, their richness in natural resources, and so on. However, these kinds of ranks usually only take the criteria used to define the ranks while ignoring other profiles of the countries. Hence, for a clearer understanding of the similarity structure, the rank and the features that characterize the countries should be considered. For example, it is intuitive to think that countries with similar profiles and ranks should be visualized close to each other. In contrast, it is interesting to observe, for example, how countries that share similar profiles but have wide rank gaps or countries that have dissimilar profiles but have similar ranks are aligned on the map. This kind of context-oriented visualization is essential in understanding deeper inherent structures of their relative similarities.

For the subsequent experiments, 32 Asia and Pacific countries were chosen. Each country has 11 economic and demographic profiles: 1. Life expectancy at birth, 2. Fertility rate (general population), 3. Under-5 mortality rate, 4. Immunization (measles) rate, 5. \(CO_2\) emissions per capita, 6. Start-up procedure to register a business, 7. Foreign direct investment, 8. Age dependency ratio, 9. Access to electricity, 10. Access to clean fuels and technology for cooking, 11. GDP per capita. These data are obtained from World Bank’s DataBank [29] and are standardized. While a country can be characterized using many more profiles, the chosen variables are sufficient to show at least the economic prowess of those countries that may correlate with many factors, such as their education level, population welfare, military strength, etc.

Fig. 5
figure 5

Country Map: SOM

Figure 5 is the SOM’s representation of the 32 countries, in which the positions on the map reflect the profiles’ similarities. It can be observed that economic powers in this region, such as Japan, NZ, Singapore, Australia, and Rep. of Korea, are aligned closely with each other. In contrast, PNG, Pakistan, and Afghanistan are aligned far from those countries. Fig. 6 is the t-SNE representation of the same data that were utilized for generating Fig. 5. It should be noted that these two maps are generated solely based on the high-dimensional profiles of the 32 countries without assigning any contexts to them.

Fig. 6
figure 6

Country Map: t-SNE

These data are then ranked according to a criterion not included in their original profiles. The objective is to observe how STN generates different topological maps for rank criteria and generalizes them to out-of-sample data.

The first rank criterion is the Health Expenditure per capita, in which the top 6 countries are Australia, Japan, NZ, Singapore, Rep. Korea, and Maldives. In comparison, the bottom six countries are Bangladesh, Pakistan, Nepal, PNG, Lao, and Afghanistan.

In addition to SOM and t-SNE that execute unsupervised self-organization, Sammon Maps [6, 30], in which the pairwise distance between two countries, i and j defined in Eq. (14)  was generated in Fig. 7. For visualization clarity, six top countries are illustrated in s, six bottom countries in s, while other countries in s. It can be observed that the Sammon Map nicely assigned ranks wise-similar countries close to each other while distancing them from dissimilar countries.

$$dist({\mathbf{X}}^{{(i)}} ,{\mathbf{X}}^{{(j)}} ) = \frac{\alpha }{{11}}\left\| {\left. {{\mathbf{X}}^{{(i)}} - {\mathbf{X}}^{{(j)}} } \right\|} \right. + (1 - \alpha )|r(i) - r(j)|$$
(14)

In Eq. (14), the distance between two countries is defined as the weighted difference of their features and ranks. Here, \(\textbf{X}^{(i)}\) and r(i) are the profile vector and the standardized rank for country i, while \(0 \le \alpha \le 1\) is the weighting coefficient that is set to 0.5.

Fig. 7
figure 7

Health Expenditure-ranked Sammon Map

Fig. 8
figure 8

Health Expenditure-ranked Map

The map produced by STN against the same data set is shown in Fig. 8. For this experiment, in training the STN, two countries whose rank-different is less than three are designated as similar and defined as dissimilar otherwise. Hence, the ground truth similarity in Eq. (5) is as follows.

$$\begin{aligned} T(\textbf{X}^{(1)},\textbf{X}^{(2)})= {\left\{ \begin{array}{ll} 1 &{} abs(rank({\textbf{X}}^{(1)})-rank({\textbf{X}}^{(2)})) \le 3 \\ 0 &{} (otherwise) \end{array}\right. } \end{aligned}$$
(15)

In Eq. (15), abs is for the absolute value, while \(rank(\textbf{X}^{(i)})\) is the rank of input \(\textbf{X}^{(i)}\).

In this figure, it is visually apparent that the STN generated a more clustered topological map. For example, the top four countries are concentrated in one cluster, while the number 5 and six countries (Korea and Maldives) form a different cluster due to their profile differences from the top four countries. It is also interesting to observe that the bottom countries are more spread out than the leading countries, indicating their profiles are more diverse. Furthermore, while the Sammon Map cannot map out-of-sample data into the generated map, it can be quickly done in STN by calculating the BMU for an out-of-sample country without retraining the STN.

Fig. 9
figure 9

Health Expenditure-ranked out of samples

For testing the out-of-sample data projection of STN, some European and Central Asia countries are projected as shown in Fig. 9. From this figure, it can be learned that if Austria were in Asia, it can be predicted that Austria would have a similar rank to Japan in health expenditure due to its profile similarity. Similarly, Luxembourg with Korea, Azerbaijan with Malaysia, while no European and Central Asia countries share similarities with Afghanistan and Nepal. Here, for visualization clarity, only a few in-sample Asian countries are plotted. The in-sample countries are plotted in red to differentiate from the out-of-sample countries.

Fig. 10
figure 10

Adolescence Fertility-ranked Sammon Map

In the next experiment, the Asian countries in the previous experiment are ranked using a different criterion: Adolescence Fertility (births per 1000 women ages 15–19). The top six countries are Bangladesh, Afghanistan, Solomon Island, Nepal, Lao, and the Philippines, while the bottom six are Korea, Singapore, Japan, China, Maldives, and Brunei.

The Sammon Map is given in Fig. 10, while the rank-oriented topological map generated by the STN is shown in Fig. 11. This figure shows Solomon Island, Nepal, and Bangladesh form a cluster including Cambodia (ranked 9) among the top six countries due to their profile similarities. At the same time, no country shares a significant similarity with Afghanistan. Singapore, Korea, Japan, and China form a cluster for the bottom six countries, while Brunei is solitary, and Maldives is close to Malaysia (ranked 25).

Fig. 11
figure 11

Adolescence Fertility-ranked Map

The out-of-sample projection is shown in Fig. 12. Here, it can be predicted that Austria will rank similarly to Japan, Luxembourg with Australia, Greece with Malaysia, Sweden and Finland with NZ, and Kyrgyzstan with Tonga. Within the context of this ranking, no countries share similarities with Nepal.

Fig. 12
figure 12

Adolescence Fertility-ranked Out-sample

For all the experiments on demographic data, the number of the neurons in the ReLU layer was set to 80, the learning rate \(\eta\) was set to 0.003, and the map size was set to \(15 \times 15\) to accommodate a larger map for the visualization clarity.

4 Conclusions

This study proposes a novel cartography utilizing Siamese Neural Networks. This new cartography allows us to generate maps of high-dimensional data from the perspective of their context. As the context can be given arbitrarily, the proposed new cartography is useful for visual analysis of high-dimensional data, for example discovering their structure within their context and intuitively locating anomalies.

In this paper, the mathematical characteristics of the proposed STN are elaborated, and hence the formation of the map can be clearly understood. In the experiment, the properties of the new cartography are demonstrated using real-world demographic data.

The primary strength of the new cartography is its ability to generate maps based on different criteria. This property is essential for visually analyzing data from different perspectives. The other strength is its ability to add out-of-sample data into an already-built map, which is challenging to execute in most traditional cartography.

The immediate future work for this study is to utilize this new cartography for antigenic analysis and mapping drug responses for designing new drugs or personalized medicines. Developing an intuitive interface that allows the applications for this new cartography in various fields is also of interest.