Context-flexible cartography with Siamese topological neural networks

Hartono, Pitoyo

doi:10.1007/s44163-023-00098-w

Context-flexible cartography with Siamese topological neural networks

Research
Open access
Published: 09 January 2024

Volume 4, article number 3, (2024)
Cite this article

Download PDF

You have full access to this open access article

Discover Artificial Intelligence Aims and scope Submit manuscript

Context-flexible cartography with Siamese topological neural networks

Download PDF

Pitoyo Hartono¹

602 Accesses
Explore all metrics

Abstract

Cartography is a technique for creating maps, which are graphical representations of spatial information. Traditional cartography involves the creation of geographical data, such as locations of countries, geographical features of mountains, rivers, and oceans, and celestial objects. However, cartography has recently been utilized to display various data, such as antigenic signatures, graphically. Hence, it is natural to consider a new cartography that can flexibly deal with various data types. This study proposes a model of Siamese topological neural networks consisting of a pair of hierarchical neural networks, each with a low-dimensional internal layer for creating context-flexible maps. The proposed Siamese topological neural network transfers high-dimensional data with various contexts into their low-dimensional spatial representations on a map that humans can use to gain insights from the data. Here, it is enough to define a metric of difference between an arbitrary pair of data instances for training the proposed neural network. As the metric can be arbitrarily defined, the proposed neural network realizes context-flexible cartography useful for visual data analysis. This paper applies the proposed network for visualizing various demographic data.

Clustering Contextual Neural Gas: A New Approach for Spatial Planning and Analysis Tasks

Weighted merge context for clustering and quantizing spatial data with self-organizing neural networks

Article 02 November 2015

Artificial Intelligence Applied to the Geography: A Connectionist Approach

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Traditional cartography is a technique for creating geographical maps that allow humans to intuitively understand the spatial or geological characteristics of countries, oceans, or celestial objects. However, in these past few years, some non-geo-spatial cartography was proposed. The most successful one is antigenic cartography, proposed in Ref. [1, 2]. This new cartography aims to map the antigenic properties of influenza viruses, in which viruses with similar antigenic reactions are mapped closely, and viruses that generate significantly different responses are placed apart on the map. This map will help to design a vaccination strategy quickly and efficiently. This antigenic cartography proved instrumental during the COVID-19 pandemic [3,4,5], when it mapped a variant of the coronavirus, Omicron, way apart from other viruses. This fact prompted the development of new vaccines specialized for this variant. The antigenic cartography is based on the traditional multidimensional scaling (MDS) [6] that requires a matrix of vectorial differences between all the instances. This study proposes a new and context-flexible cartography without requiring the distance matrix, thus allowing more efficient cartography.

Similar to the antigenic cartography that builds a map by a dimensionality reduction algorithm, the proposed study also depends on a dimensionality reduction technique. This study proposes a new algorithm for cartography using a novel Siamese topological neural network (STN) that is significantly different from the traditional MDS. The STN utilizes a hierarchical neural network containing a low-dimensional topological layer [7] as its base network that maps high-dimensional input into a two-dimensional topological map that can be visualized. Most of the dimensional-reduction methods for visualization, for example, t-SNE [8], isomap [9], Umap [10] reduce the dimensionality while preserving a criterion of similarity between the high-dimensional data in their low dimensional representations. Unlike these unsupervised dimensionality reduction algorithms that do not take the context of the data, for example, their labels, into account, STN forms a low-dimensional embedding that reflects the context-similarity and thus expands the visualization flexibility of high-dimensional data. However, STN is not strictly a supervised algorithm, as it does not require the availability of data labels or the difference matrix for all the data points.

In the past, there were many supervised dimensionality reduction methods [11, 12] that also took data labels into account, linear discriminant analysis (LDA) [13, 14] being the most traditional example. While the idea for supervised dimensionality reduction methods has been known for a very long time, it is still actively studied, resulting in many exciting algorithms [15,16,17,18,19]. These methods formed low-dimensional representations in the context of the data labels and hence required the availability of the labels. However, for many real-world problems, data labels are only sometimes available and can be expensive. The proposed study requires pairwise similarity measures but not data labels. Here, the STN is trained to learn the similarity measure while executing dimensionality reduction at the same time.

The proposed cartography generates maps of high-dimensional data using the data’s topological characteristics as well as the contexts assigned to them. Self-organizing maps (SOM) [20, 21] introduced by Kohonen visualized the topological structure of the data. Visually induced SOM (ViSOM) [22] further modified SOM so that the maps visualize not only the topological structure of the data but also the distances between their low dimensional representations. The relation between ViSOM and MDS was further elaborated in [23]. The proposed STN shares a similarity with ViSOM in that it also aims at preserving the distance structure of the data. However, it significantly differs in the context of the preserved distance, as in STN, the distance is associated with an arbitrarily assigned context.

The proposed STN also shares a similarity with the Elastic SOM [24] in that both attempt to induce an arbitrarily given metric of distance into the generated maps. However, STN significantly differs in the training process and the ability to project out-of-sample data not used during the learning process onto the map.

The proposed study shares some resemblances with the past research of dimensionality reduction through kernel-based similarity learning for high-dimensional data [25], in that the networks execute similarity learning that can be related to some given contexts, for example, rank. However, this method needs to depend on a separate dimensionality reduction method for visualization, while in the proposed STN, the dimensionality reduction is embedded.

The novelty of the STN is two-fold: (1) the proposed method allows a topological representation of high-dimensional data to be learned from a set of paired data with arbitrarily designed similarity measures, resulting in a context-oriented representation that depends not only on the feature-similarity of the inputs but also the similarity-context assigned to them. (2) the proposed methods seamlessly integrate similarity learning with dimensionality reduction. This is a strength of the proposed model because many similarity learning methods require separate visualization algorithms, for example, t-SNE [8] being one of the most popular methods for visualizing their representations. In this study, the dimensionality reduction mechanism that allows visualization is an inherent part of the network. Further, as the topological representations here are context-oriented, in that different similarity criteria result in other topological structures, a flexible visual analysis is made possible. The novelties are instrumental in building a new flexible cartography for visual data analysis.

The rest of the paper is structured as follows. Section 2 explains the structure of the STN, its learning dynamics, and the framework of the proposed STN as similarity learning. The subsequent section describes the experiments, while the conclusions will be elaborated in the final section.

2 Siamese topological networks

The proposed Siamese Topological Networks (STN) structure is outlined in Fig. 1. The base network for the STN is a hierarchical topological network, proposed in [7, 26, 27]. The structure of the base network is similar to the standard Multilayered Perceptron (MLP), except that the hidden layer is a two-dimensional topological map as in many cases of Self-Organizing Maps (SOM) [20, 21]. The low dimensionality allows the network to form topological representations of a given data set that humans can visualize. To some extent, visualizing the internal representations enables humans to understand how the network converts its inputs into output intuitively. In the past, it was utilized for visualizing high-dimensional data [16]. It can also be trained as an autoencoder, a classifier, or a mix of both [26], and hence can visualize high-dimensional data constrained by various contexts.

This study pairs two topological networks to form the STN for similarity learning. The objective is to train the network with high-dimensional data in which a similarity criterion between pairwise data instances is given. The similarity measure may be defined qualitatively, quantitatively, or subjectively and does not need to meet a mathematically defined distance. The low-dimensional topological map can be utilized to visualize the data structure in the context of the defined similarity. Here, different similarity measures for the same data will generate another map, allowing context-flexible cartography.

2.1 Learning process

The learning process of STN is explained as follows.

For each of the pair of high-dimensional input to the network, $\textbf{X}^{(k)} (k \in \{1,2\})$, the best matching unit, $win^{(k)}$ is selected as in Eq. (1).

$$\begin{aligned} win^{(k)}(t) = argmin_{j} \Vert \textbf{W}_j(t) - \textbf{X}^{(k)}(t) \Vert \end{aligned}$$

(1)

The activation of the j-th hidden neuron, $h_j^{(k)}(t)$, incurred by one of the input, $\textbf{X}^{(k)}$, of the pair at epoch t is calculated as in Eq. (2). Here, $\textbf{h}^{(k)}$ denotes the activation vector incurred by input $\textbf{X}^{(k)}$.

$$\begin{aligned} h_j^{(k)}(t)=\, & {} \sigma (win^{(k)}(t),j) e^{-\Vert \textbf{X}^{(k)}(t)-\textbf{W}_j(t) \Vert ^2 } \nonumber \\ \textbf{h}^{(k)}(t)=\, & {} (h_1^{(k)}(t), h_2^{(k)}(t), \cdot \cdot \cdot h_{N_{hid}}^{(k)}(t) )^{\textrm{T}} \nonumber \\{} & {} k \in \{1,2\} \end{aligned}$$

(2)

Here, $N_{hid}$ is the number of the neurons in the topological layer, while the neighborhood function $\sigma (win^{(k)},j,t)$ is defined as follows.

$$\begin{aligned} \sigma (win^{(k)},j,t)=\, & {} exp(-\frac{dist(win^{(k)}(t),j)}{S(t)}) \nonumber \\ S(t)=\, & {} \sigma _{\infty } + \frac{1}{2} (\sigma _0 - \sigma _{\infty }) (1+cos \frac{\pi t}{t_{\infty }}) \end{aligned}$$

(3)

In Eq. (3), $\sigma _0$ and $\sigma _{\infty }$ are the initial and terminal annealing constants, and $t_\infty$ is the termination epoch. At the same time, $dist(win^{(k)},j)$ is the distance from the BMU to the j-th neuron on the topological layer.

The output of the ReLU layer, $\textbf{O}^{(k)} = (O^{(k)}_1, O^{(k)}_2 \cdot \cdot \cdot , O^{(k)}_{N_{out}})$, where $N_{out}$ denotes the number of the neurons in ReLU layer, is as follows.

$$\begin{aligned} O^{(k)}_m=\, & {} ReLU(\textbf{V}_m \cdot \textbf{h}^{(k)}) \nonumber \\ ReLU(x)=\, & {} {\left\{ \begin{array}{ll} x &{} x \ge 0 \\ 0 &{} otherwise \end{array}\right. } \end{aligned}$$

(4)

In Eq. (4), $\textbf{V}_m$ is the weight vector connecting the hidden layer with the m-th output neuron.

The ground truth of the similarity, between inputs $\textbf{X}^{(1)}$ and $\textbf{X}^{(1)}$, is defined as follows.

$$\begin{aligned} T(\textbf{X}^{(1)},\textbf{X}^{(2)})= {\left\{ \begin{array}{ll} 1 &{} {\textbf{X}}^{(1)}, {\textbf{X}}^{(2)}: similar \\ 0 &{} (otherwise) \end{array}\right. } \end{aligned}$$

(5)

Here, it should be noted that the similarity itself can be arbitrarily defined. For example, it can be defined that two inputs having the same labels are similar or two inputs with close ranks are similar. The definition can also be subjective, for example, human impressions or preferences. The flexibility in assigning the definition of similarities is the strong point of the proposed STN in visualizing high-dimensional data in different contexts.

The cosine similarity between the activation vectors incurred by the input pair is defined in Eq. (6).

$$\begin{aligned} S(\textbf{X}^{(1)},\textbf{X}^{(2)}) = \frac{\textbf{O}^{(1)} \cdot \textbf{0}^{(2)}}{ \Vert \textbf{O}^{(1)} \Vert \Vert \textbf{O}^{(2)} \Vert } \end{aligned}$$

(6)

Because $\forall j, k \in \{1,2\}, 0<h_j^{(k)} \le 1$

in Eq. (6), $0 < S(\textbf{X}^{(1)},\textbf{X}^{(2)}) \le 1$

The loss function is cross entropy defined as follows.

$$\begin{aligned} {\mathcal {L}}(\textbf{X}^{(1)},\textbf{X}^{(2)}) = - (T log S + (1-T) log (1-S)) \end{aligned}$$

(7)

The gradients of the loss function are calculated as follows.

$$\begin{aligned} \frac{\partial {\mathcal {L}}}{ \partial \textbf{V}_m}= & {} - \frac{(T-S)}{S (1-S)} \frac{\partial S}{\partial \textbf{V}_m} \nonumber \\= & {} - \frac{(T-S)}{(1-S)} \sum _{k=1}^{2} sgn_m^{(k)} \delta ^{(k)}_m \textbf{h}^{(k)} \nonumber \\{} & {} m \in \{1,2, \cdot \cdot \cdot , N_{hid} \} \end{aligned}$$

(8)

In Eq. (8),

$$\begin{aligned}{} & {} \delta ^{(k)}_m=\frac{O^{({\bar{k}})}_m}{\mathbf{O^{(1)}} \cdot \textbf{O}^{(2)}}-\frac{O^{(k)}_m}{\Vert \textbf{O}^{(k)\Vert ^2}}, \\{} & {} \quad {\bar{k}} = {\left\{ \begin{array}{ll} 2 &{} (k=1) \\ 1 &{} (k=2), \end{array}\right. } \end{aligned}$$

and,

$$\begin{aligned} sgn^{(k)}_m = {\left\{ \begin{array}{ll} 1 &{} (h^{(k)}_m > 0) \\ 0 &{} (otherwise), \end{array}\right. } \end{aligned}$$

Hence, the weight vector from the topological layer to the ReLU layer is modified as follows.

$$\begin{aligned} \textbf{V}_m(t+1) = \textbf{V}_m(t) + \eta \frac{(T-S)}{(1-S)} \sum _{k=1}^{2} sgn_m^{(k)} \delta ^{(k)}_m \textbf{h}^{(k)} \end{aligned}$$

(9)

While the modification rule is executed as in standard Backpropagation, the interpretation of the modification rule in Eq. (9) is interesting. When teacher signal $T=1$, indicating that the two inputs are similar, the direction of the modification only depends on $\delta ^{(k)}_m$ that measures the difference between the influence of the m-th element from the ${\bar{k}}$-th input to the cosine similarity and that of the k-th input to the norm of $\textbf{O}^{(k)}$. Here, for $\delta ^{(k)}_m > 0$ the weight vector is corrected toward $\textbf{h}^{(k)}$ that in turn will bring $\textbf{0}^{(k)}$ closer to $\textbf{0}^{({\bar{k}})}$ and thus increasing their cosine similarity. For $\delta ^{(k)}_m < 0$, the weight is modified in the opposite direction of $\textbf{h}^{(k)}$, and hence will decrease $O^{(k)}_m$ and hence will bring $\textbf{0}^{(k)}$ closer to $\textbf{0}^{({\bar{k}})}$ as well. Using the same rationale, it is obvious that when the teacher signal $T=0$, the weight is modified so that the cosine similarity between $\textbf{0}^{(k)}$ and $\textbf{0}^{({\bar{k}})}$ decreases.

The gradient of the loss function regarding the reference to the reference vector associated with the m-th neuron in the topological layer is calculated as follows.

$$\begin{aligned} \frac{\partial {\mathcal {L}}}{ \partial \textbf{W}_n}= & {} - \frac{(T-S)}{S (1-S)} \frac{\partial S}{\partial \textbf{W}_n} \nonumber \\= & {} -2\frac{(T-S)}{(1-S)}\sum _{k=1}^{2} \epsilon ^{(k)}_n h^{(k)}_m (\textbf{X}^{(k)}-\textbf{W}_n ) \nonumber \\{} & {} \end{aligned}$$

(10)

In Eq. (10), the term $\epsilon ^{(k)}_n$ is the weighted correction signal from the ReLU layer that is calculated as follows.

$$\begin{aligned} \epsilon ^{(k)}_n = \sum _{m=1}^{N_{out}} sgn_{m}^{(k)} v_{mn} \delta ^{(k)}_m (k \in \{1,2\}) \end{aligned}$$

(11)

In Eq. (11), $v_{mn}$ is the weight from the m-th neuron in the topological layer to the n-th neuron in the ReLU layer, while $N_{out}$ denotes the number of neurons in the ReLU layer.

Hence, the reference vector modification is as follows.

$$\begin{aligned} \textbf{W}_n(t+1) = \textbf{W}_n(t) + \eta \frac{(T-S)}{(1-S)}\sum _{k=1}^{2} \epsilon ^{(k)}_n h^{(k)}_m (\textbf{X}^{(k)}-\textbf{W}_n ) \end{aligned}$$

(12)

Equation 12 significantly differs from that of conventional SOM in that in SOM, the reference vectors are always modified toward the input vector, while in STN, the direction of modification is influenced by the sign of $(T-S)$, and the sign of $\epsilon ^{(k)}_m$. The former indicates that during the learning process, the formation of the topological map is influenced by the input’s topological structure and context, in that the same set of inputs produces different topological maps if different similarity rules are assigned to them. The latter is the weighted sum of $\delta ^{(k)}_m$ that can be considered as a result of votes between all the output neurons to decide the direction of the reference vector modification that is influenced by the similarity-contexts of the inputs. Here, changing the data’s context will change the appearance of the map. This means that new cartography can be built for flexibly visualizing data from various perspectives.

3 Experiment

3.1 Preliminary experiments

For demonstrating the cartographic properties of the STN, it is first tested against some simple labeled data from the UCI machine learning repository [28] for creating maps in the context of their labels’ differences. Although the data are labeled, the labels were not explicitly utilized during the learning process. Here, when an arbitrarily chosen pair shares the same label, they are defined as similar, and they are defined to be dissimilar otherwise. The resulting topological map is then visually compared with that of standard SOM.

In the following experiments, the ground truth of similarity in Eq. (5) is defined to be similar when two inputs have an identical label, while they are dissimilar when they have different labels.

The first example is the well-known Iris problem, a four-dimensional, three-classed data set. Here, during the learning process, the class labels of the data are not explicitly utilized but used to define the similarity between the two samples. In Fig. 2a, the topological structure of the data is visualized using the standard SOM. Here, for visualization clarity, the samples are colored according to their class labels, although the class labels did not have any role in the self-organizing process. From this figure, a well-known profile of this problem is that one of the classes (illustrated by _s) is linearly separable from the other two classes (illustrated by and that are not linearly separable, can be observed.

Figure 2b shows the topological map for these data. It can be observed from this figure that STN generates sparser representations of the high dimensional input, in which the three classes are equally separable except for some anomalies. The better separability is due to the pairwise-similarity information unavailable to SOM. It is also interesting to notice that although the class labels are not explicitly provided, STN can form distinctive clusters according to the data labels. The map also successfully locates anomalies in these data.

The second example is the Breast Cancer problem, a 9-dimensional 3-classes problem. The SOM representation in Fig. 3a indicates that while some overlapping samples exist, most instances can be easily classified. As shown in Fig. 3b, the STN generates a sparser map and nicely visualizes the potentially hard-to-classify instances due to their similarity with other samples belonging to contrasting classes.

The third problem is the Wine problem, a 13-dimensional 3-classes problem. Figure 4a shows the SOM representations. It can be observed that _s form two distinct clusters that are also well represented by the STN as in Fig. 4b.

The three examples illustrated reflect the given context of similarity, which in these three examples are their class similarities.

For quantitatively analyzing the resulting maps, the Neighborhood Consistency Index, $C_{idx}$ is defined as follows.

$$\begin{aligned} C_{idx} = \frac{1}{N} \sum _i \Phi (BMU(i), label(i)) \end{aligned}$$

(13)

In Eq. (13), BMU(i) and label(i) denote the BMU and the label for input i, while $\Phi (x,y)$ denotes the ratio of inputs having label y associated with the nearest BMU to BMU x, and N denotes the number of inputs. Here, it should be noted that a BMU may be associated with several inputs, each with different labels. For example, if the nearest BMU to the BMU of the input i, where $label(i)=\alpha$, is associated with five inputs, and four have labels $\alpha$, then $\Phi (BMU(i),label(i))=0.8$.

Hence, $0 \le C_{idx} \le 1$ measures the neighborhood consistency of the resulting maps. In the experiments above, STN was trained so that the topological characteristics of the inputs in the context of their labels should be preserved on the map. Here, $C_{idx} =0$ indicates that the topological similarity in the context of the input labels is not well presented on the map, while $C_{idx} =1$ indicates the opposite.

Executing 10-cross validation tests, the average $C_{idx}$s for Iris Data, Breast Cancer Data, and Wine Data are 0.935, 0.949, and 0.982, respectively. The resulting consistency indexes indicate the successful learning for the proposed STN.

For these three preliminary experiments, the number of neurons in the ReLU layer was set to 80, the learning rate, $\eta$ in Eqs. (9 and 12) was set to 0.001, while the map size was set to $\lceil \sqrt{\frac{N}{2}} \times \sqrt{\frac{N}{2}} \rceil$ where N is the data size.

3.2 Maps of demographic data

In the previous preliminary experiments, STN is trained with labeled data and hence whose similarities can be naturally defined from their original labels. Here, the STN is trained using demographic data with no natural labels, but their similarities can be arbitrarily designated. For example, countries are often ranked based on various criteria: their economic power, the educational level of their populations, their richness in natural resources, and so on. However, these kinds of ranks usually only take the criteria used to define the ranks while ignoring other profiles of the countries. Hence, for a clearer understanding of the similarity structure, the rank and the features that characterize the countries should be considered. For example, it is intuitive to think that countries with similar profiles and ranks should be visualized close to each other. In contrast, it is interesting to observe, for example, how countries that share similar profiles but have wide rank gaps or countries that have dissimilar profiles but have similar ranks are aligned on the map. This kind of context-oriented visualization is essential in understanding deeper inherent structures of their relative similarities.

For the subsequent experiments, 32 Asia and Pacific countries were chosen. Each country has 11 economic and demographic profiles: 1. Life expectancy at birth, 2. Fertility rate (general population), 3. Under-5 mortality rate, 4. Immunization (measles) rate, 5. $CO_2$ emissions per capita, 6. Start-up procedure to register a business, 7. Foreign direct investment, 8. Age dependency ratio, 9. Access to electricity, 10. Access to clean fuels and technology for cooking, 11. GDP per capita. These data are obtained from World Bank’s DataBank [29] and are standardized. While a country can be characterized using many more profiles, the chosen variables are sufficient to show at least the economic prowess of those countries that may correlate with many factors, such as their education level, population welfare, military strength, etc.

Figure 5 is the SOM’s representation of the 32 countries, in which the positions on the map reflect the profiles’ similarities. It can be observed that economic powers in this region, such as Japan, NZ, Singapore, Australia, and Rep. of Korea, are aligned closely with each other. In contrast, PNG, Pakistan, and Afghanistan are aligned far from those countries. Fig. 6 is the t-SNE representation of the same data that were utilized for generating Fig. 5. It should be noted that these two maps are generated solely based on the high-dimensional profiles of the 32 countries without assigning any contexts to them.

These data are then ranked according to a criterion not included in their original profiles. The objective is to observe how STN generates different topological maps for rank criteria and generalizes them to out-of-sample data.

The first rank criterion is the Health Expenditure per capita, in which the top 6 countries are Australia, Japan, NZ, Singapore, Rep. Korea, and Maldives. In comparison, the bottom six countries are Bangladesh, Pakistan, Nepal, PNG, Lao, and Afghanistan.

In addition to SOM and t-SNE that execute unsupervised self-organization, Sammon Maps [6, 30], in which the pairwise distance between two countries, i and j defined in Eq. (14) was generated in Fig. 7. For visualization clarity, six top countries are illustrated in _s, six bottom countries in _s, while other countries in _s. It can be observed that the Sammon Map nicely assigned ranks wise-similar countries close to each other while distancing them from dissimilar countries.

$$dist({\mathbf{X}}^{{(i)}} ,{\mathbf{X}}^{{(j)}} ) = \frac{\alpha }{{11}}\left\| {\left. {{\mathbf{X}}^{{(i)}} - {\mathbf{X}}^{{(j)}} } \right\|} \right. + (1 - \alpha )|r(i) - r(j)|$$

(14)

In Eq. (14), the distance between two countries is defined as the weighted difference of their features and ranks. Here, $\textbf{X}^{(i)}$ and r(i) are the profile vector and the standardized rank for country i, while $0 \le \alpha \le 1$ is the weighting coefficient that is set to 0.5.

The map produced by STN against the same data set is shown in Fig. 8. For this experiment, in training the STN, two countries whose rank-different is less than three are designated as similar and defined as dissimilar otherwise. Hence, the ground truth similarity in Eq. (5) is as follows.

$$\begin{aligned} T(\textbf{X}^{(1)},\textbf{X}^{(2)})= {\left\{ \begin{array}{ll} 1 &{} abs(rank({\textbf{X}}^{(1)})-rank({\textbf{X}}^{(2)})) \le 3 \\ 0 &{} (otherwise) \end{array}\right. } \end{aligned}$$

(15)

In Eq. (15), abs is for the absolute value, while $rank(\textbf{X}^{(i)})$ is the rank of input $\textbf{X}^{(i)}$.

In this figure, it is visually apparent that the STN generated a more clustered topological map. For example, the top four countries are concentrated in one cluster, while the number 5 and six countries (Korea and Maldives) form a different cluster due to their profile differences from the top four countries. It is also interesting to observe that the bottom countries are more spread out than the leading countries, indicating their profiles are more diverse. Furthermore, while the Sammon Map cannot map out-of-sample data into the generated map, it can be quickly done in STN by calculating the BMU for an out-of-sample country without retraining the STN.

For testing the out-of-sample data projection of STN, some European and Central Asia countries are projected as shown in Fig. 9. From this figure, it can be learned that if Austria were in Asia, it can be predicted that Austria would have a similar rank to Japan in health expenditure due to its profile similarity. Similarly, Luxembourg with Korea, Azerbaijan with Malaysia, while no European and Central Asia countries share similarities with Afghanistan and Nepal. Here, for visualization clarity, only a few in-sample Asian countries are plotted. The in-sample countries are plotted in red to differentiate from the out-of-sample countries.

In the next experiment, the Asian countries in the previous experiment are ranked using a different criterion: Adolescence Fertility (births per 1000 women ages 15–19). The top six countries are Bangladesh, Afghanistan, Solomon Island, Nepal, Lao, and the Philippines, while the bottom six are Korea, Singapore, Japan, China, Maldives, and Brunei.

The Sammon Map is given in Fig. 10, while the rank-oriented topological map generated by the STN is shown in Fig. 11. This figure shows Solomon Island, Nepal, and Bangladesh form a cluster including Cambodia (ranked 9) among the top six countries due to their profile similarities. At the same time, no country shares a significant similarity with Afghanistan. Singapore, Korea, Japan, and China form a cluster for the bottom six countries, while Brunei is solitary, and Maldives is close to Malaysia (ranked 25).

The out-of-sample projection is shown in Fig. 12. Here, it can be predicted that Austria will rank similarly to Japan, Luxembourg with Australia, Greece with Malaysia, Sweden and Finland with NZ, and Kyrgyzstan with Tonga. Within the context of this ranking, no countries share similarities with Nepal.

For all the experiments on demographic data, the number of the neurons in the ReLU layer was set to 80, the learning rate $\eta$ was set to 0.003, and the map size was set to $15 \times 15$ to accommodate a larger map for the visualization clarity.

4 Conclusions

This study proposes a novel cartography utilizing Siamese Neural Networks. This new cartography allows us to generate maps of high-dimensional data from the perspective of their context. As the context can be given arbitrarily, the proposed new cartography is useful for visual analysis of high-dimensional data, for example discovering their structure within their context and intuitively locating anomalies.

In this paper, the mathematical characteristics of the proposed STN are elaborated, and hence the formation of the map can be clearly understood. In the experiment, the properties of the new cartography are demonstrated using real-world demographic data.

The primary strength of the new cartography is its ability to generate maps based on different criteria. This property is essential for visually analyzing data from different perspectives. The other strength is its ability to add out-of-sample data into an already-built map, which is challenging to execute in most traditional cartography.

The immediate future work for this study is to utilize this new cartography for antigenic analysis and mapping drug responses for designing new drugs or personalized medicines. Developing an intuitive interface that allows the applications for this new cartography in various fields is also of interest.

Data availability

Data sets that were compiled and generated during the current study are available from the corresponding author upon reasonable request.

References

Smith DJ, Lapedes AS, de Jong JC, et al. Mapping the antigenic and genetic evolution of influenza virus. Science. 2004;305(5682):371–6. https://doi.org/10.1126/science.1097211.
Article Google Scholar
Lapedes A, Farber R. The geometry of shape space: application to influenza. J Theor Biol. 2001;212(1):57–69. https://doi.org/10.1006/jtbi.2001.2347.
Article Google Scholar
Mykytyn AZ, Rissmann M, Kok A, et al. Antigenic cartography of SARS-CoV-2 reveals that omicron BA.1 and BA.2 are antigenically distinct. Sci Immunol. 2022;7(75):eabq4450. https://doi.org/10.1126/sciimmunol.abq4450.
Article Google Scholar
van der Straten K, Guerra D, van Gils MJ, et al. Antigenic cartography using sera from sequence-confirmed SARS-CoV-2 variants of concern infections reveals antigenic divergence of omicron. Immunity. 2022;55(9):1725-1731.e4. https://doi.org/10.1016/j.immuni.2022.07.018.
Article Google Scholar
Wang W, Lusvarghi S, Subramanian R, et al. Antigenic cartography of well-characterized human sera shows SARS-CoV-2 neutralization differences based on infection and vaccination history. Cell Host Microbe. 2022;30(12):1745-1758.e7. https://doi.org/10.1016/j.chom.2022.10.012.
Article Google Scholar
Sammon J. A nonlinear mapping for data structure analysis. IEEE Trans Comput. 1969;C–18(5):401–9. https://doi.org/10.1109/T-C.1969.222678.
Article Google Scholar
Hartono P, Hollensen P, Trappenberg T. Learning-regulated context relevant topographical map. IEEE Trans Neural Netw Learn Syst. 2015;26(10):2323–35. https://doi.org/10.1109/TNNLS.2014.2379275.
Article MathSciNet Google Scholar
van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(86):2579–605.
Google Scholar
Zhang Y, Zhang Z, Qin J, et al. Semi-supervised local multi-manifold isomap by linear embedding for feature extraction. Pattern Recogn. 2018;76:662–78. https://doi.org/10.1016/j.patcog.2017.09.043.
Article Google Scholar
McInnes L, Healy J, Melville J. Umap: uniform manifold approximation and projection for dimension reduction. 2018. arxiv:1802.03426.
Ghosh T, Kirby M. Supervised dimensionality reduction and visualization using centroid-encoder. J Mach Learn Res. 2022;23(20):1–34.
MathSciNet Google Scholar
Vogelstein JT, Bridgeford EW, Tang M, et al. Supervised dimensionality reduction for big data. Nat Commun. 2021. https://doi.org/10.1038/s41467-021-23102-2.
Article Google Scholar
Fisher RA. The use of multiple measurements in taxonomic problems. Ann Eugen. 1936;7(2):179–88. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x.
Article Google Scholar
Tharwat A, Gaber T, Ibrahim A, et al. Linear discriminant analysis: a detailed tutorial. AI Commun. 2017;30(2):169–90. https://doi.org/10.3233/AIC-170729.
Article MathSciNet Google Scholar
Goldberger J, Hinton GE, Roweis S, et al. Neighbourhood components analysis. In: Saul L, Weiss Y, Bottou L, editors., et al., Adv Neural Inform Process Syst, vol. 17. MIT Press; 2004.
Google Scholar
Hartono P. Classification and dimensional reduction using restricted radial basis function networks. Neural Comput Appl. 2018;30:905–15. https://doi.org/10.1007/s00521-016-2726-5.
Article Google Scholar
Mika S, Ratsch G, Weston J, et al. Fisher discriminant analysis with kernels. In: Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468), 1999. pp. 41–8. https://doi.org/10.1109/NNSP.1999.788121.
Hastie T, Tibshirani R. Discriminant analysis by gaussian mixtures. J R Stat Soc: Ser B. 1996;58(1):155–76. https://doi.org/10.1111/j.2517-6161.1996.tb02073.x.
Article MathSciNet Google Scholar
Raducanu B, Dornaika F. A supervised non-linear dimensionality reduction approach for manifold learning. Pattern Recogn. 2012;45(6):2432–44. https://doi.org/10.1016/j.patcog.2011.12.006.
Article Google Scholar
Kohonen T. Self-organized formation of topologically correct feature maps. Biol Cybern. 1982;43:59–69. https://doi.org/10.1007/BF00337288.
Article MathSciNet Google Scholar
Kohonen T. Essential of self-organizing map. Neural Netw. 2013;37:52–65. https://doi.org/10.1016/j.neunet.2012.09.018.
Article Google Scholar
Yin H. ViSOM—a novel method for multivariate data projection and structure visualization. IEEE Trans Neural Netw. 2002;13(1):237–43. https://doi.org/10.1109/72.977314.
Article Google Scholar
Yin H. On multidimensional scaling and the embedding of self-organising maps. Neural Netw. 2008;21(2):160–9. https://doi.org/10.1016/j.neunet.2007.12.027.
Article MathSciNet Google Scholar
Hartono P, Take Y. Pairwise elastic self-organizing maps. In: 2017 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM), 2017. pp 1–7, https://doi.org/10.1109/WSOM.2017.8020006.
Wang B, Zhu J, Pierson E, et al. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Methods. 2017. https://doi.org/10.1038/nmeth.4207.
Article Google Scholar
Hartono P. Mixing autoencoder with classifier: conceptual data visualization. IEEE Access. 2020;8:105301–10. https://doi.org/10.1109/ACCESS.2020.2999155.
Article Google Scholar
Hartono P. Topological neural networks: theory and applications. In: 2023 World Symposium on Digital Intelligence for Systems and Machines (DISA), 2023. pp. 84–9, https://doi.org/10.1109/DISA59116.2023.10308945.
UCI Machine learning repository. 2022. https://archive.ics.uci.edu/ml/index.php. Accessed Nov 17 2020.
DataBank. World bank, world development indicators. 2016. https://databank.worldbank.org/indicator/NY.GDP.PCAP.CD/1ff4a498/Popular-Indicators. Accessed 17 Feb 2022.
Ghojogh B, Ghodsi A, Karray F, et al Multidimensional scaling, sammon mapping, and isomap: tutorial and survey. 2020. arxiv:2009.08136.

Download references

Acknowledgements

This work is supported by a Rohm joint research program.

Author information

Authors and Affiliations

School of Engineering, Chukyo University, 10-2 Yagotohonmachi, Showa-ku, Nagoya, Aichi, 466-8666, Japan
Pitoyo Hartono

Authors

Pitoyo Hartono
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

PH conducted the entire research process and also wrote all parts of the paper.

Corresponding author

Correspondence to Pitoyo Hartono.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hartono, P. Context-flexible cartography with Siamese topological neural networks. Discov Artif Intell 4, 3 (2024). https://doi.org/10.1007/s44163-023-00098-w

Download citation

Received: 26 October 2023
Accepted: 28 December 2023
Published: 09 January 2024
DOI: https://doi.org/10.1007/s44163-023-00098-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Context-flexible cartography with Siamese topological neural networks

Abstract

Similar content being viewed by others

Clustering Contextual Neural Gas: A New Approach for Spatial Planning and Analysis Tasks

Weighted merge context for clustering and quantizing spatial data with self-organizing neural networks

Artificial Intelligence Applied to the Geography: A Connectionist Approach

1 Introduction