Advertisement

Data-Dimension Reductions: A Comparison

Chapter
Part of the Computational Risk Management book series (Comp. Risk Mgmt)

Abstract

Data and dimension reduction techniques, and particularly their combination for Data-Dimension Reductions (DDR), have in many fields and tasks held promise for representing data in an easily understandable format. However, comparing methods and finding the most suitable one is a challenging task. In the previous chapter, we discussed the aim of dimension reduction in terms of three tasks. This chapter compares DDR combinations to financial performance analysis. To this end, after a general review of the literature on comparisons of data and dimension reduction methods, we discuss the aims and needs of DDR combinations in general and for the task at hand in particular.

Keywords

Dimension Reduction Neighborhood Relation Dimension Reduction Method Financial Entity Structure Preservation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Data and dimension reduction techniques, and particularly their combination for DDR, have in many fields and tasks held promise for representing data in an easily understandable format. However, comparing methods and finding the most suitable one is a challenging task. Above, we discussed the aim of dimension reduction in terms of three tasks. For the third task of visualization, the most popular method has been the SOM, which is oftentimes asserted as an artifact of its simplicity and intuitive formulation [e.g., Lee and Verleysen (2007), Trosset (2008)]. Yet, being well-known or simple, while being an asset, is not a proper validation of relative goodness. The focus of this chapter is to challenge the superiority of the SOM by comparing it to alternative methods.

To capture the most suitable methods for visual financial performance analysis according to the needs for the task, this chapter assesses the suitability of three classical, or so-called first-generation, dimension reduction methods: metric MDS (Torgerson 1952), Sammon’s mapping (Sammon 1969) and the SOM (Kohonen 1982). Rather than being the most recent methods, the rationale for comparing these is to capture the suitability of well-known dimension reduction methods with inherently different aims: global and local distance preservation and topology preservation, respectively. For DDR, and due to access to overabundant amounts of data, we look into test serial and parallel combinations of the projections with three data reduction or compression methods: VQ (Linde et al. 1980), k-means clustering (MacQueen 1967), Ward’s (1963) hierarchical clustering. While conceptually being similar, the functioning of the SOM differs from the other DDR combinations as the two tasks of data and dimension reduction are treated as concurrent subtasks. In serial combinations, the dimension reduction is always subordinate to the data reduction, whereas parallel combinations deal separately with the initial dataset.

This chapter compares DDR combinations to financial performance analysis as follows. After a general review of the literature on comparisons of data and dimension reduction methods, we discuss the aims and needs of DDR combinations in general and for the task at hand in particular. That is, building low-dimensional mappings from high-volume and high-dimensional data that function as displays for additional information, be it individual data (e.g., time series of entities) or general structural properties of data (e.g., qualities, distance structures and densities). The relative goodness of methods for financial performance analysis will then be discussed from a qualitative perspective. Further, experiments on a dataset of annual financial ratios for European banks is used to illustrate the general applicability of the DDR combinations for the task. After illustrating some approaches to link information to the visualization displays, results of these comparisons are then projected to the second generation of dimension reduction methods for a final discussion on the superiority of methods for overall visual financial performance analysis, including tasks for macroprudential oversight, as well as the general applicability of this comparison. These discussions also include an information visualization perspective to dimension reductions.

5.1 The Optimal Method: A Literature Review

When reviewing the literature on method comparisons, we first focus on dimension reduction methods and then on data reduction methods. The focus is on neutral evaluations of methods rather than evaluations in papers presenting novel methods. While papers presenting new methods generally include an evaluation and conclude at least partial superiority of it, such as some of those found in Sect.   4.3, they may be biased to a lesser or greater extent towards data and evaluation measures suitable for that particular approach.

5.1.1 A Comparison of Dimension Reductions

The large number of methods has obviously also stimulated a large number of performance comparisons between them. The comparisons mainly vary in terms of used data and evaluation measures, whereas there may still be some variation in the precise utilization of methods. For instance, Flexer (1997, 2001) used Pearson correlation, Duch and Naud (1996) hypercubes in 3–5 dimensions and Bezdek and Pal (1995) the metric topology preserving index to show that MDS outperforms the SOM. Trosset (2008) argues that a serial combination of clustering and MDS is superior to the SOM. Venna and Kaski (2001) and Nikkilä et al. (2002) show superiority of the SOM and GTM in terms of trustworthiness of neighborhood relationships, while later Himberg (2004) and Venna and Kaski (2007) show superiority of CCA in terms of the same measure. Not surprisingly, de Vel et al. (1996) show, using Procruses analysis and Spearman rank correlation coefficients on various datasets, that the superiority of a method depends on the used evaluation measures and data. Hence, despite many attempts, inconsistent comparisons do not indicate the superiority of one method.

Lately, Lee and Verleysen (2009) proposed a unified measure based upon a co-ranking matrix for evaluating dimension reductions, an adequate ground for generic evaluations. Lueks et al. (2011) further developed the measure by letting the user specify the properties that are more important to be preserved. While being useful aids in comparing methods, they neither show nor propose existence of one superior method for every type of data and preferences of similarity preservation.

5.1.2 A Comparison of Data Reductions

When reviewing the literature on methods for data reduction, one can easily observe that neither is there a unanimity on the best available method. Herein, the focus is on comparisons between the SOM and stand-alone data reduction methods. Bação et al. (2005) show that the SOM outperforms k-means clustering with 3 evaluation measures and 4 datasets. Flexer (1997, 2001) show that k-means clustering outperforms the SOM using a Rand index and 36 datasets. Waller et al. (1998) show on 2,580 datasets that the SOM performs equally well as k-means clustering and better than other methods. Balakrishnan et al. (1994) show that k-means outperforms the SOM on 108 datasets, but do not decrease the SOM neighborhood to zero at the end of learning [as, e.g., Kohonen (2001) proposes]. Vesanto and Alhoniemi (2000) showed on 3 datasets that two-level clustering of the SOM is equally accurate as agglomerative and partitive methods, while being computationally cheaper and having merits in visualizing relations in data. Ultsch and Vetter (1994) compare the SOM with hierarchical and k-means clustering and conclude that the SOM not only provides an equally accurate result, but also an easily interpretable output. Despite no unanimity on superiority, the literature still indicates that the SOM, and its adaptations, are equally considerable alternatives for data reduction as other methods, such as centroid-based and hierarchical clustering.

5.1.3 Why is the Literature so Divided?

While the quality of data reductions can be quantified by common evaluation measures like quantization error, assessing the superiority of one dimension reduction method over others with a quantitative measure is more difficult. And there is still no unanimity on the superiority of one data reduction method over others. What varies in the above discussed studies is mainly the underlying data, which indicates that methods show different performance on different types of data. One reason might be that clusters in the SOM topology learn from and are provided guidance by neighboring data as well, which aids the analysis of noisy data, whereas accuracy suffers on well-behaving toy data. This is supported by the findings of de Bodt et al. (1999) and Bação et al. (2005), where they propose that the SOM better spans the search space as neighborhood relations force units to follow each other. This is, however, only speculative reasoning about the above lack of unanimity.

Since the mid-20th century, the overload of available data has stimulated a soar in the development of dimension reduction methods with inherent differences (as reviewed in Chap.   4). However, most differences in the quality of dimension reductions, as all structural information can impossibly be preserved in a lower dimension, derive from variations in preserved similarity relations, such as pairwise distances or topological relationships. The performance, and choice of model specification, of one method can generally be motivated by its own quantitative quality measure. However, the relative goodness of different methods depend strongly on the correspondence between the particular quality measure and the objective function.

Despite the fact that the large number of dimension reduction methods has stimulated quality comparisons along different measures, inconsistency of the comparisons has lead to no unanimity on the superiority of one method [see, e.g., Flexer (1997, 2001) and Venna and Kaski (2001)]. This also indicates that the goodness of methods depends to a large extent on the correspondence between the measure and the objective function, and confirms that the quality measure is a user-specified parameter depending on the task at hand. While recent advances in unified measures for evaluating dimension reductions have included a parameter for the user to specify properties that are more important to be preserved (Lee and Verleysen 2009; Lueks et al. 2011), quantitative measures still have difficulties in including qualitative differences in properties of methods, such as differences in flexibility for difficult data and the shape of the low-dimensional output. This motivates assessing the suitability of data and dimension reduction methods for a specific task from a qualitative perspective.

5.2 DDR Combinations for the Task at Hand

This section discusses specific aims, needs and restrictions of DDR combinations for visual financial performance analysis. Based upon this discussion, we look into dimensions of DDR combinations relevant for measuring the suitability of methods for the task herein.

5.2.1 Aims and Needs for the Task

So, what is the so-called task at hand? The aim of models for visual financial performance analysis, including tasks for macroprudential oversight, is to represent high-volume and high-dimensional data of financial entities on low-dimensional displays. The data for such a task are derived from a data cube, as the one represented in Fig.  3.1 (see Sect.   3.3). Data and dimension reductions hold promise for the task, but the form of the models still set some specific needs and restrictions. While recent advances in information technology have enabled access to databases with nearly endless amounts of macroeconomic and financial information (e.g., Bankscope, Bloomberg, Standard & Poor’s and Capital IQ), as well as provision and integration of multiple sources (e.g., Haver Analytics), data are oftentimes problematic in being incomplete and non-normal (e.g., Deakin 1976). For instance, in the case of representing a financial entity with its balance-sheet information, it is more common than not that some items of the balance sheet are missing. Due to changes in reporting rules and financial innovation, data might be missing or start in the latter part of a time series. An example of skewed distributions is the commonly appearing power-law distribution and Benford’s law, as well as the particularly fat tails of market-based data. While there exist a multitude of preprocessing methods for transforming, normalizing and trimming data, the tails of financial ratio distributions are oftentimes of high interest. This derives two necessities: the computational cost of the method needs to be considerably low and scalable and the method needs to be flexible for problematic data.

The main aim of the low-dimensional mappings is to use them as displays for additional information, in particular for: (i) individual data, (ii) structural properties of data, and (iii) qualities of the models. This is due to three respective reasons:
  1. (i)

    the two-dimensional plane should function as a basis or display for visual performance comparisons of financial entities (i.e., observation-level data) and their time series;

     
  2. (ii)

    for the human visual system to recognize patterns in data, we need to provide guidance for interpreting general data structures, and oftentimes also possess this types of linkable information; and

     
  3. (iii)

    qualities of a dimension reduction may vary across mappings and locations in mappings as all information cannot be correctly preserved in a lower dimension.

     
The main aim of these mappings is hence not to be an ending point, but rather to function as a basis for a wide range of additional visualizations.

5.2.2 Aims and Needs of DDR Combinations

When evaluating or comparing performance of data and dimension reduction methods, particularly DDR combinations, quantitative measures have difficulties in accounting for qualitative differences in properties of methods. Hence, as the performed comparison is qualitative, the needs for visual financial performance analysis are suppressed into four qualitative criteria for evaluating DDR combinations: form of structure preservation, computational cost, flexibility for problematic data and shape of the output. Next, we discuss these criteria in more detail.

Form of Structure Preservation

As all relations in a high dimensional space can obviously not be preserved in a lower dimension, there are differences in what locations are stressed when preserving the structure. Given these differences, the main characteristics of structure preservation should obviously match important desires of the particular task at hand. The key question is thus: Which relations are of central importance for visual financial performance analysis? With a main focus on visualizing individual financial entities on a low-dimensional display, correctly locating neighboring data becomes essential. This leads to trustworthiness of neighborhood relationships being more important than precision on the exact distance to those far away. Noise and erroneous data as well as comparability issues related to reporting differences, for instance, also motivate attempting this type of a local order-preserving mapping rather than focusing on global detail.

Computational Cost

We oftentimes have access to vast amounts of macro-financial data in today’s databases, including high-dimensional data for a large number of entities with a high frequency over long periods (i.e., a large data cube along all three dimensions), not the least if the used data are based upon market sources. This obviously sets some restrictions on computational cost and scalability of methods. While computation time is not entirely a qualitative property, it has still not been incorporated in quantified evaluation measures. As also noted by van der Maaten and Hinton (2008), the practical applicability of a dimension reduction method relies upon its computational complexity, as application becomes infeasible if the computational resources needed are too large. In addition to the properties of data, computational cost of a method is set by the dimensionality of the output, the definition of a neighborhood in the case of neighborhood preservation and for iterative techniques the number of iterations, not to mention the form of input data (e.g., pairwise distance matrices or high-dimensional data points). It is also worth to consider that computational expense is not only a one-off cost when creating a dimension reduction, but also when updating it. Combinations with data reduction methods may also affect the computational cost of a dimension reduction. Still, it is important to acknowledge that a cut-off between computationally costly and non-costly methods is difficult. Yet, the differences between methods oftentimes tend to be significant.

Flexibility for Problematic Data

Methods differ in flexibility for non-normal and incomplete data, something more common than not in real-world macro-financial settings. Hence, desired properties of dimension reduction methods are flexibility for incomplete and non-normal data. While the former can be defined in terms of treatment of missing values, the latter depends largely on the task at hand. Most often data are preprocessed for ideal results, including treatment of skewed distributions. Yet, preprocessing seldom does, and is most often not desired to, compress the data into uniform density. Oftentimes, the most extreme values of data are among the most interesting states of financial performance. Hence, one type of tolerance towards outliers can be derived from the output of methods. A method is judged to be tolerant towards outliers and skewed distributions if problematic data do not significantly impair the intelligibility of an output or display (e.g., stretch towards outliers).

Shape of the Output

One of the main aims is to use a dimension reduction as a display to which additional information is linked. In particular, the low-dimensional mappings are used as displays for individual data, structural properties and qualities. This turns the focus to the shape of the outputs of dimension reduction mappings. They can take a wide range of forms. The interrelated properties of the shape can be considered to be the following: continuous versus discrete mappings, optional versus mandatory data reductions and predefined versus data-driven grid shapes. While a mandatory data reduction is generally not desirable, it is not considered a significant disadvantage. Rather the opposite, due to the large amounts of available data. This leads also to restricting mappings to discrete rather than continuous, whereas continuous mappings would obviously be desirable from the perspective of detail and accuracy. The largest difference for interpretation, especially in terms of linking visualizations, is between predefined and data-driven grid shapes. While methods with data-driven grid shapes may better adapt to data, the methods with predefined regular shapes are superior in functioning as a regularly formed display for additional information. This is a key property as the mappings are starting points rather than ending points of the analysis, where additional information may be individual data, structural properties of data and qualities of the models.

5.3 A Qualitative Comparison

This section presents a qualitative discussion of DDR combinations for visual performance analysis and relates it to the four identified criteria: form of structure preservation, computational cost, flexibility for problematic data and shape of the output. Below, we discuss MDS, Sammon’s mapping and the SOM from the viewpoint of the task at hand and the four criteria.

Form of Structure Preservation

The main difference between DDR combinations is how the dimension reduction methods differ in the properties of data they attempt to preserve. For the task of visual financial performance analysis, the focus is on one question: Which methods better assure trustworthy neighbors? MDS-based methods with objective functions attempting distance preservation, while potentially being better at approximating distance structures, may end up with skewed errors across the projection. To this end, Venna and Kaski (2001) and Nikkilä et al. (2002) have shown that the SOM, which stresses neighborhood relations, better assures trustworthy neighbors. That is, data found close-by each other on a SOM display are more likely to be similar in terms of the original data space as well. The conceptual difference in structure preservation between distance- and topology-preserving methods is illustratively described by Kaski (1997) with an experiment on a curved two-dimensional surface in a three-dimensional space: the former methods may follow the surface in data with two dimensions, whereas the latter require three dimensions to describe the structure.

Computational Cost

Expensive computations is obviously an issue when dealing with large-volume financial data. Generally, computing pairwise distances between data is costly with an order of magnitude of \(N^{2}\). The topology preservation of the SOM relates instead to the grid size \(M\) with an order of magnitude of \(M^{2}\) (Kaski 1997). This implies that the complexity of the methods are similar if the grid size \(M\) equals the number of data \(N\), but more importantly that the SOM allows for adjusting \(M\) for cheaper complexity. Further, parallel DDR combinations suffer from an additional computational cost as the clustering is performed on the initial dataset rather than on a reduced number of units. The computational cost of MDS-based methods motivates serial DDR combinations. Another issue related to computational cost is the lack of an explicit mapping function for the MDS-based methods. Hence, when including new samples, the projection needs to be recomputed. While new samples can be visualized via projection to their best-matching data, each update requires recomputing the projection.1 In contrast, the SOM can cheaply be updated with individual data using the sequential algorithm (i.e., an online version of the batch SOM).

Flexibility for Problematic Data

The methods significantly differ in flexibility for problematic data. Methods dealing with distance preservation have obvious difficulties with incomplete data. However, the SOM, and its self-organization, can be seen as tolerant to missing values by only considering the available ones in matching (Samad and Harp 1992). In practice, the SOM has been shown to be robust when up to approximately 1/3 of the variables in a row (i.e., data vector \(x_{j}\)) are missing (Kaski and Kohonen 1996; Kohonen 2001; Denny Squire 2005; Sarlin 2012b). Indeed, the SOM has even been shown to be effective for imputing missing values (e.g., Cottrell and Letrémy 2005). Tolerance towards outliers is measured in terms of representation of skewed distributions. An MDS-based mapping becomes difficult to interpret if it is stretched towards directions of outliers and extreme tails. While the processing of the SOM does not per se treat outliers, its regularly shaped grid of units facilitates visualizing data with non-uniform density functions. This provides a hint of the final criterion.

Shape of the Output

A key to using a dimension reduction as a display, and linking information to it, is the shape of its output. Whereas the SOM has a discrete mapping, mandatory data reduction and predefined grid shape, MDS-based methods are its contrasts by having continuous mappings, optional data reduction and data-driven lattice (if combined with data reduction). The predefined SOM grid, while also having drawbacks for representing structural properties of data, facilitates the interpretation of linked information. Today, it is standard that the SOM comes with a wide set of linked extensions for visual analytics, such as the so-called feature planes, U-matrix and frequency plots (Vesanto 1999). Even though visual aids for showing distance structure and density compensate for constraints set by the grid shape, there is a large group of other aids that enhance the representation of available information in data. The visual aids, while not always being even applicable, have generally not been explored in the context of MDS-based projections. Feature planes (see Sect. 5.4), for instance, are difficult to visualize due to the lack of a reduced number of units. Even DDR combinations with serial VQ, i.e., processing similar to that of the SOM, would still lack the concept of neighborhood relations of a regularly shaped grid.

5.4 Illustrative Experiments

The qualitative discussion of properties of DDR combinations for financial performance analysis still lacks illustrations of the above discussed properties of methods. This section presents experiments with these methods. Dimension reduction is performed with the SOM, metric MDS and Sammon’s mapping and data reduction with Ward’s hierarchical clustering, k-means clustering and VQ. We explore various combinations for DDR with the aim of achieving easily interpretable models for visual financial performance analysis. The methods are chosen and combined as to their suitability for data reduction of dimension reductions, and vice versa.

Data

The dataset used in these examples consists of annual financial ratios for banks from the EU, including all provided financial ratios in the Bankscope database from Bureau van Dijk. Initially, the dataset consisted of 38 annual financial ratios for 1,236 banks spanning from 1992:12–2008:12. A large concern in the dataset is the share of missing values, due to which 24 ratios were chosen by dropping those with more than 25 % missing data. Observations with missing values for more than 1/3 of the ratios were removed. Finally, we are left with a resulting 9,655 rows of data, and a total of 855 banks. Yet, the dataset still includes missing values. Although the SOM is tolerant to missing data, we need to impute them in this work as distance-preserving methods require complete data. For simplicity, the SOM is used for imputing missing values. A SOM allows mapping incomplete data to their best-matching units (BMUs) by only considering the available variables. Hence, complete data were used for training a SOM, incomplete data were mapped to their BMUs and the missing values were imputed from their BMUs. Moreover, although outliers are not a problem  per se, they may still affect the interpretability of the models, in particular MDS-based models. Not to lose significant amounts of data, modified boxplots are used for trimming with replacement. The modified boxplot is preferred over Winsorizing, for instance, as it accounts for variable-specific distributions, resulting in replacement of a total of 7.39  % of the data, distributed as needed per variable and tail. In the following experiments, we use the entire dataset, in particular when creating displays with data and dimension reduction methods. Further, a sample of trajectories are used to illustrate the visualization of individual data on the created displays. The trajectories consist of all input variables spanning from 2002 to 2008 for Deutsche Bank, ABN Amro and Société Général.

Parallel DDR

Figure 5.1 shows parallel DDR combinations on the entire dataset. Sammon’s mapping is combined with k-means clustering, and MDS and the SOM are combined with Ward’s clustering.2 Ward’s clustering of the SOM is, however, performed on its units rather than on the dataset and restricted to agglomerate only adjacent clusters in the SOM topology. This option is not, however, considered for MDS-based projections as there is no natural definition of adjacency. On top of all three mappings, we can observe a superimposed cluster color coding and performance comparison of trajectories from 2002–2008 for three large European banks. Cluster memberships are visualized through a qualitative color scheme from ColorBrewer (Harrower and Brewer 2003), where groups are differentiated in hue contrast with nearly constant saturation and lightness. The projections of MDS and Sammon’s mapping on this large dataset are very similar, whereas k-means clustering has less overlapping cluster memberships in the mapping than Ward’s clustering. The trajectories as well as the underlying variables confirm that, while the orientations of the two MDS-based projections are somewhat different from those of the SOM model, their structure is still inherently similar. Yet, the computational cost differs significantly. While it takes on an ordinary personal computer only a few seconds to train SOM-based models on these data, the MDS-based projections require several hours on a dedicated server.
Fig. 5.1

Parallel DDR combinations. Notes The figures show parallel DDR combinations on the entire financial datasit; Sammon’s mapping is combined with \(k\)-means clustering, and MDS and the SOM are combined with Ward’s clustering. Color codes on each mapping correspond to clusters and the superimposed trajectories to a performance comparison of three large European banks from 2002–2008

Serial DDR

For cheaper complexity, we further explore possibilities of MDS by testing serial combinations. Figure 5.2 shows a Sammon’s mapping of the k-means cluster centroids as well of the second-level centroids of the SOM, where size represents the number of data in each cluster. This type of usage of MDS-based methods was already proposed by Sammon (1969) due to their high computational cost, and later applied by Flexer (2001), for instance. It is, indeed, a cheap way to illustrate relations between the cluster centroids, but lacks detail for structural as well as individual analysis.
Fig. 5.2

Serial DDR combinations. Notes The figure shows serial DDR combinations on the entire financial dataset; Sammon’s mapping is combined with \(k\)-means clustering, and the SOM with second-level Ward’s clustering. Color codes on each mapping correspond to clusters. Not to clutter the display, trajectories are not displayed in this figure

Serial and Parallel DDR

Costly, yet detailed, MDS-based projections in Fig. 5.1 and cheap, yet crude, projections in Fig. 5.2 motivate finding a compromise solution. For reducing computational expense, it is still necessary to rely on a serial DDR combination. For more detail, however, the initial dataset is reduced to a smaller but representative dataset. This type of data compression can, for instance, be achieved with standard VQ that approximates probability density functions of data. The compressed reference vectors can then be used as an input for a parallel DDR. Conceptually, while still lacking the interaction between the tasks as well as the regular grid shape, we come close to what is achieved using a SOM in Fig. 5.1 by relying on both serial and parallel DDR combinations. The left plot in Fig. 5.3 shows a VQ of the initial dataset and then a subsequent Sammon’s mapping and k-means clustering on the VQ reference vectors. The right plot in Fig. 5.3 shows a corresponding Sammon’s mapping of SOM units with a superimposed cluster color coding. However, the figure illustrates two issues: the ordered SOM units have less overlap of cluster memberships and the importance of naturally defined topological relations. The former issue is partly a result of interaction between the tasks of data and dimension reduction and partly of the inclusion of neighborhood relations when agglomerating clusters. The latter issue of a regularly shaped grid is particularly useful when attempting to visualize as much of the available information as possible through linked visualizations.
Fig. 5.3

Serial and parallel DDR combinations. Notes The figures show serial and parallel combinations on the entire financial dataset; Sammon’s mapping is combined with VQ and \(k\)-means clustering, and the SOM with Ward’s clustering and Sammon’s mapping. Color codes on each mapping correspond to clusters and the net-like representation illustrates neighborhood relations

5.5 The SOM and Its Visualization Aids

This section first briefly reviews visualization aids for the SOM and then illustrates the use of the regularly shaped SOM grid, and its visualization aids. Figure 5.1 showed the two-dimensional SOM grid, and trajectories for three large European banks from 2002–08, but a central question remains: How should we interpret the map? The possibility of linking additional information to the SOM grid has stimulated the development of a wide scope of visualization aids [see Vesanto (1999) for an early overview]. These can be classified into three groups:
  1. (i)

    those compensating structural properties inherent in data that the regular grid shape eliminates;

     
  2. (ii)

    those extending the visualization of properties inherent in data but not normally accessible in dimension reductions; and

     
  3. (iii)

    those linking the SOM grid with other methods or data to further enhance the understanding of the task.

     
The first group includes means to represent the distance structure and density on a SOM, something missing due to the VQ and grid shape. Densities on the SOM are generally assessed with frequency plots and the Pareto density estimation matrix (P-matrix) (Ultsch 2003a). Examples of aids for assessing distance structures are Sammon’s mapping, the Unified distance matrix (U-matrix) (Ultsch and Siemon 1990) and cluster connections (Merkl and Rauber 1997). Moreover, some methods attempt to account for both structures and densities, such as the U*-matrix (Ultsch 2003b), the sky metaphor visualization (Latif and Mayer 2007), the neighborhood graph (Pölzlbauer et al. 2005), smoothed data histograms (Pampalk et al. 2002), and cluster coloring (Kaski et al. 2001; Sarlin and Rönnqvist 2013).

The second group consists of visualizations that enhance the representation of the high-dimensional information. Feature planes are a standard method for visualizing the spread of values of individual dimensions on the SOM, but they have been further enhanced in several aspects. For instance, Vesanto and Ahola (1999) use a SOM for reorganizing the feature planes according to correlations and Neumayer et al. (2007) introduced the metro map discretization to summarize all feature planes onto one plane. Kaski et al. (2001) have developed a visualization of the contribution of each variable to distances between units, that is, the cluster structure. Another extension, while partly also belonging to the other groups, is visualization of vector fields (Pölzlbauer et al. 2006) for assessing contributions to the cluster structure and for finding correlations and dependencies in the underlying data.

The third group uses other methods or data for further enhancing the understanding of the task. One common way to represent cluster structures in a SOM is applying a second-level clustering on the units, and visualizing it through color coding (Vesanto and Alhoniemi 2000). The reference vectors have been used as an input for other predictive methods, such as a neural network in Serrano-Cinca (1996), whereafter the prediction may be visualized on the SOM grid.
Fig. 5.4

An exemplification of information linked to a SOM. a SOM and Ward’s clustering, b U-matrix, c Frequency plot, d Quantization error, e Capital ratios, f Loan ratios, g Profitability ratios. Notes The figures link additional information to the regularly shaped SOM grid. Charts (ac) illustrate structural properties of the model: (a) shows cluster memberships of the second-level clustering, (b) shows average distances between units, or the so-called U-matrix, and (c) shows the frequency distribution on the SOM grid. Charts (d) shows qualities of the model, whereas charts (eg) show the spread of three subdimensions of financial performances on the SOM grid: capital, loan and profitability ratios

Next, we look at some examples of how visualizations from the above three groups can be linked to the SOM. The previously presented SOM in Fig. 5.1 already showed a financial performance comparison over time of three large European banks using labels and trajectories. Figure 5.4 uses the regular shape of the SOM grid as a basis for seven different representations of additional information. Whereas cluster memberships are visualized through a qualitative color scheme, the rest of the visualizations are shown through variation in luminance (light to dark to represent low to high values) in a blue hue. It is worth noting that a complicating factor in using luminance is that perceived lightness is dependent on context (Purves et al. 2004), namely the lightness of surrounding colors. For this reason, color scales ought to be presented with a consistent reference color to be comparable in lightness. The units of the SOM are in this book represented with circles rather than hexagons to leave space for reference coloring.

First, Fig. 5.4a, b, c illustrate structural properties of the model: (a) shows crisp cluster memberships of the second-level clustering, (b) shows distance structures using a U-matrix visualization, and (c) shows the frequency distribution on the SOM grid. While Fig. 5.4a, b show similar characteristics of cluster structures, Fig. 5.4c shows no specific patterns in density, except for borders being comparatively less dense. Second, Fig. 5.4d shows qualities of the model, where larger quantization errors cluster around the lower right corner. Third, Fig. 5.4e, f, g enable assessing correlations and distributions by showing the spread of three financial performance measures on the SOM grid: capital, loan and profitability ratios. Here, one can observe that, generally, the right part represents well-performing and the left part poor banks, which gives a direct interpretation to the trajectories in Fig. 5.1.

So, how does the SOM relate to information visualization? Following the discussion about data graphics in Sect.   4.1, the SOM can be related to Bertin’s (1983) framework. The plane, and its two dimensions \((x,y)\), are described as the richest variables, which can be perceived at all levels of organization. On the SOM, they represent discrete neighborhood relations. This corresponds also to the key aim of the SOM, that is, to preserve neighborhood relations, whereas global distance structures are of secondary importance. The retinal variables, and their three types of implantation (point, line and area), are thus positioned on the grid. The six retinal variables may be used to represent properties of the SOM grid, particularly properties of the units. To refresh memory, they are as follows (where the parenthesis refers to Bertin’s levels of organization): size (ordered, selective and quantitative), value (ordered and selective), texture (ordered, selective and associative), color (selective and associative), orientation (associative, and selective only in the cases of points and lines), and shape (associative). The choice of retinal variable should be based upon the purpose of the visualization and the type of data to be displayed. For instance, variation in size has been used to represent frequency of data in units [see, e.g., Resta (2009)]. Value, or brightness, has been used to visualize the spread of univariate variable values (i.e., feature planes) on the SOM (see, e.g., Fig. 5.4). Likewise, texture has been used for representing cluster memberships [see, e.g., Sarlin (2012a)]. Orientation is commonly applied to represent high-dimensional reference vectors by the means of arrows [see, e.g., (Kohonen (2001), p. 117)]. Variation in color (or hue) has been used for illustrating crisp cluster memberships (see, e.g., Fig. 5.4) and for a coloring that reveals multivariate cluster structures [see, e.g., Kaski et al. (2001) and Sarlin and Rönnqvist (2013)]. Variation in shape is commonly used on the SOM by the means of labels, such as phoneme strings and phonemic symbols (Kohonen 2001, pp. 208–210).

5.6 Discussion

This chapter has considered data and dimension reduction methods, as well as their combination, for visual financial performance analysis. The discussions and illustrations in this chapter, while being at times somewhat trivial, are motivated by inconsistency of argumentation for and application of various methods. The main conclusion of the comparison is that the SOM has several useful properties for financial performance analysis. In particular, this chapter has noted the following advantages of the SOM over alternative distance-preserving methods:
  1. (i)

    trustworthy neighbors,

     
  2. (ii)

    low computational cost,

     
  3. (iii)

    flexibility for problematic data, and

     
  4. (iv)

    a regularly shaped grid.

     
So, is the superiority of the SOM supported by information visualization theories? Indeed, the SOM representation can be related to Tufte’s (1983) advise and principles on graphical clarity and precision. Due to a potential loss of information when projecting from a high-dimensional space to one of a lower dimension, trustworthy neighbors clearly relates to Tufte’s advise on avoiding distortions of data (given some losses in detail). Furthermore, the regular, predefined grid shape of the SOM enables and facilitates many types of information linking to the same grid structure. This functions as an aid in thinking about the information rather than the design and encourages the eye to compare data. The SOM’s property of approximating the probability density functions of data also facilitates presenting vast amounts of data in a small space, as units will be located in dense areas of the data space, which could also be thought of as an aid in making large data sets coherent. On the SOM, data may be revealed at multiple levels of detail ranging from overview of multivariate structures on the grid, to illustration of individual data on the grid (e.g., trajectories located in their BMUs), which also integrates statistical and verbal descriptions. Along these lines, Tufte’s six guidelines on telling the truth about data are also supported. For instance, showing data variation, not design variation, and not showing data out of context relates to, and is supported by, the use of a regular grid shape. Likewise, an example of visuals being directly proportional to the quantities they represent is the adjustment of color scales used for the linked visualizations, such as normalizations of feature plane scales in order for all variables to be comparable (see, e.g., Sect.  6.2.2), and the use of perceptually uniform color scales, such as CIELab (1986).

It is, however, worth noting that the relative goodness of a method depends always on the task in question. That said, the SOM is obviously far from a panacea for all sorts of data and dimension reduction. When only attempting stand-alone tasks, it is indeed very likely that there exists better methods than the SOM. Similarly, when attempting DDR, the superiority of one method over others depends entirely on the aims of the task in question.

Even though the SOM has been assessed as advantageous for visual financial performance analysis, it is worth to carefully consider its limitations:
  1. (i)

    The SOM performs a crude mapping. Rather than data points, the SOM attempts to embed the reference vectors, a significant constraint if detail is of central importance and/or if only projecting a few data points.

     
  2. (ii)

    The regular grid shape sets some restrictions on the SOM. For instance, it may cause interpolating sparse locations with idle units, it may lead to an analyst overinterpreting the regular-like y and x axes, and leads to the need for additional visual aids to fully represent structures.

     
  3. (iii)

    Mathematical treatment of the SOM has shown to be problematic. The lack of an objective function, as well as a general training schedule for or proof of convergence, complicates parametrizing a SOM.

     
The comparison in this section has covered classical first-generation dimension reduction methods. This leads to one key question: Can the results of this comparison be generalized to all available methods? As reviewed in Sect. 5.1, CCA has been shown to outperform the SOM in terms of trustworthiness of neighborhood relations (Himberg 2004; Venna and Kaski 2007). Likewise, two more recent local versions of MDS, denoted LMDS, by Venna and Kaski (2006) and Chen and Buja (2009) adapt the functioning of standard MDS to preserve local relations. These methods, while holding promise for one criterion, fall short in other, not the least in the shape of the output. It is thus important to consider methods from the second generation with the key properties of the SOM. There are two conceptually similar topology-preserving methods that possess the capabilities of the SOM and a predefined grid shape: GTM and XOM. GTM mainly differs from the SOM by relying on well-founded statistical properties. It is based upon Bayesian learning with an objective function, namely the log-likelihood, which is optimized by the Expectation-maximization algorithm. The objective function directly facilitates assessing convergence of the GTM. Even though Bishop et al. (1998) originally stated that the GTM is computationally comparable to the SOM, it has later been shown that the SOM is cheaper (e.g., Rauber et al. 2000). This may result from the number of developed algorithmic shortcuts for computing SOMs, such as fast-winner search (Kaski 1999). Both methods are flexible for problematic data, i.e., outliers and missing values, through a similar predefined grid shape and an extension of the GTM for treating missing values (Carreira-Perpiñan 2000; Sun et al. 2001). However, while choosing parameters for the SOM may be a tedious task, given adequate initializations and parametrization, convergence has seldom appeared to be a problem in practice (see, e.g., Yin 2008). A decade after the introduction of the GTM, neither it nor its variants, such as the S-Map (Kiviluoto and Oja 1997), have displaced the standard SOM.

The XOM is a computational framework for data and dimension reduction. By inverting the functioning of the SOM, the XOM systematically exchanges functional and structural components of topology-preserving mappings by self-organized model adaptation to the input data. It has two main advantages compared to the SOM: (i) reduced computational cost, and (ii) applicability to non-metric data as there is no restriction on the distance measures. Even though the use of non-metric dissimilarity measures is of little use on the data in these particular examples, while still having potential for other pairwise financial data, the reduced computational cost is particularly beneficial for large financial datasets in general. The XOM has, however, been recently introduced and is thus still lacking thorough tests in relation to other methods, such as comparisons to SOMs with algorithmic shortcuts. Yet, the XOM should be considered as a valid alternative to the SOM paradigm.

The key message is thus that all four criteria are fulfilled by three methods that perform a topology-preserving mapping to a regularly shaped grid: the SOM, GTM and XOM. It is worth noting, as widely suggested (e.g., Lee and Verleysen 2007; Trosset 2008), that one of the main reasons for the SOM being very popular for a broad range of tasks, such as classification, clustering, visualization, prediction, missing value imputation, etc, might be because it produces an intuitive output using a simple and easily understandable principle. This simplicity, while being beneficial for a method to be widely accepted, applied and understood, should still not be used for assessing relative goodness. One should, nevertheless, note that when introducing dimension reductions to the general public, such as policy- or decision-makers in general, simplicity is definitely an asset. To this end, the most suitable method for financial performance analysis is one from the family of methods that perform a topology-preserving mapping to a regularly shaped and predefined grid. In the work in this book, out of the above described family of methods, the choice of the SOM is motivated by the simplicity of and large number of extensions provided to the SOM.

5.7 Concluding Summary

The literature shows a lack of unanimity on the superiority of one dimension reduction method over others. Yet, every task has its own needs. Data and dimension reduction for financial performance analysis should thus be performed with methods that have the best overall suitability for the performed task, not the best processing capabilities for some other objective. To this end, this chapter has addressed the choice of method for visual financial performance analysis from a qualitative perspective. We have first discussed the properties of three inherently different classical first-generation dimension reduction methods, and their combination with data reduction, and illustrated their performance in a real-world financial application to benchmarking European banks. The conclusions drawn from the comparison of classical methods was then prolonged to second-generation methods. The qualitative discussion and experiments showed superiority of the SOM for financial performance analysis in terms of four criteria: form of structure preservation, computational cost, flexibility for problematic data and shape of the output. When considering second-generation methods, the recently introduced GTM and XOM have clear potential for similar tasks. GTM improves the SOM paradigm with its well-defined objective function, but is computationally more costly, whereas XOM is a recently introduced promising method, but lacks still thorough comparisons.

From the discussions in this chapter, an obvious conclusion is that the family of methods that perform a topology-preserving mapping to a regularly shaped and predefined grid provides means for visual financial performance analysis. The aims and needs for the task at hand, where the main focus lies on using the output as a display for additional information in general and individual data in particular, are neither rare objectives in other fields. While not being generalizable to their full extent, parts of the conclusions herein will also apply in other fields, domains and tasks. The methods advocated in this book do obviously not provide a panacea for visual financial performance analysis. They should be paired with other methods, not least visualizations of different kinds, that compensate for missing properties when having, for instance, a regularly shaped grid. To this end, the chapter also motivates exploring the information commonly linked to the SOM in not only the same family of methods with predefined grid shapes, but also other dimension reduction paradigms in general. Figure 5.5 exemplifies how “feature planes” for a Sammon’s mapping visualize the spread of individual variables for the Sammon’s mapping coordinates.
Fig. 5.5

An exemplification of linking information to a Sammon’s mapping. Notes The figures link additional information to the coordinates of the Sammon’s mapping. All three plots show the spread of three individual variables measuring financial performance (i.e., feature planes): capital, loan and profitability ratios. They are comparable to the feature planes of the SOM grid shown in Fig. 5.4d–f. The reader is referred to these scales for an interpretation of the color scale

To sum up, the SOM was found to hold most promise for the task performed in this book, which also sets the direction in the sequel of this book. Yet, the standard SOM as such is not always enough for the task at hand. In the following chapter, we will discuss how the SOM can be extended to better meet the aims and needs for the tasks and data at hand.

Footnotes

  1. 1.

    While Relative MDS (Naud and Duch 2000) allows to add new data to the basis of an old MDS, it does still not update all distances within the mapping.

  2. 2.

    When training SOMs, one has to set a number of free parameters. A set of quality measures is used to track the topographic and quantization accuracy as well as clustering of the map. Given the purpose herein, details about the parametrization of the models in the experiments are not presented in depth.

References

  1. Bação, F., & Sousa Lobo, V. (2005). Self-organizing maps as substitutes for k-means clustering. Proceedings of the International Conference on Computational Science (ICCS 02) (pp. 476–483). Amsterdam: The Netherlands.Google Scholar
  2. Balakrishnan, P., Martha, C., Varghese, S., & Phillip, A. (1994). A study of the classification capabilities of neural networks using unsupervised learning: a comparison with k-means clustering. Psychometrika, 59, 509–525.CrossRefGoogle Scholar
  3. Bertin, J. (1983). Semiology of graphics. WI: The University of Wisconsin Press.Google Scholar
  4. Bezdek, J. C., & Pal, N. R. (1995). An index of topological preservation for feature extraction. Pattern Recognition, 28(3), 381–391.CrossRefGoogle Scholar
  5. Bishop, C., Svensson, M., & Williams, C. (1998). Developments of the generative topographic mapping. Neurocomputing, 21(1–3), 203–224.CrossRefGoogle Scholar
  6. Carreira-Perpiñan, M. (2000). Reconstruction of sequential data with probabilistic models and continuity constraints. In S. Solla, T. Leen, & K. Müller (Eds.), Advances in neural information processing systems (Vol. 12, pp. 414–420)., MIT Press MA: Cambridge.Google Scholar
  7. Chen, L., & Buja, A. (2009). Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. Journal of the American Statistical Association, 104, 209–219.CrossRefGoogle Scholar
  8. CIELab. (1986). Colorimetry. CIE Publication, No. , 15, 2.Google Scholar
  9. Cottrell, M., & Letrémy, P. (2005). Missing values: processing with the kohonen algorithm. Proceedings of Applied Stochastic Models and Data Analysis (ASMDA 05) (pp. 489–496). France: Brest.Google Scholar
  10. de Bodt, E., Cottrell, M., & Verleysen, M., (1999). Using the Kohonen algorithm for quick initialization of simple competitive learning algorithms. In Proceedings of the European Symposium on Artificial Neural Networks (ESANN 99). Bruges, Belgium.Google Scholar
  11. de Vel, O., Lee, S., & Coomans, D. (1996). Comparative performance analysis of non-linear dimensionality reduction methods. In D. Fischer & L. H-J. (Eds.), Learning from data: Artificial intelligence and statistics (pp. 320–345). Heidelberg, Germany: Springer.Google Scholar
  12. Deakin, E. (1976). Distributions of financial accounting ratios: some empirical evidence. The Accounting Review, 51, 90–96.Google Scholar
  13. Denny Squire, D., 2005. Visualization of cluster changes by comparing self-organizing maps. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 05). Hanoi, Vietnam, pp. 410–419.Google Scholar
  14. Duch, W., & Naud, A. (1996). Multidimensional scaling and kohonen’s self-organizing maps. Proceedings of the Conference on Neural Networks and their Applications (CNNA 16) (pp. 138–143). Poland: Szczyrk.Google Scholar
  15. Flexer, A. (1997). Limitations of self-organizing maps for vector quantization and multidimensional scaling. In M. Mozer (Ed.), Advances in Neural Information Processing Systems (Vol. 9, pp. 445–451). Cambridge, MA: MIT Press.Google Scholar
  16. Flexer, A. (2001). On the use of self-organizing maps for clustering and visualization. Intelligent Data Analysis, 5(5), 373–384.Google Scholar
  17. Harrower, M., & Brewer, C. (2003). Colorbrewer.org: an online tool for selecting color schemes for maps. The Cartographic Journal, 40(1), 27–37.CrossRefGoogle Scholar
  18. Himberg, J. (2004). From insights to innovations: data mining, visualization, and user interfaces. Ph.D. thesis, Helsinki University of Technology, Espoo, Finland.Google Scholar
  19. Kaski, S. (1997). Data exploration using self-organizing maps. Ph.D. thesis, Helsinki University of Technology, Espoo, Finland.Google Scholar
  20. Kaski, S., & Kohonen, T. (1996). Exploratory data analysis by the self-organizing map: structures of welfare and poverty in the world. Proceedings of the International Conference on Neural Networks in the Capital Markets (pp. 498–507). London: World Scientific.Google Scholar
  21. Kaski, S. (1999). Fast winner search for som based monitoring and retrieval of high dimensional data. Proceedings of the IEEE International Conference on Artificial Neural Networks (ICANN 99) (pp. 940–945). London, UK: IEEE Press.CrossRefGoogle Scholar
  22. Kaski, S., Venna, J., & Kohonen, T. (2001). Coloring that reveals cluster structures in multivariate data. Australian Journal of Intelligent Information Processing Systems, 60, 2–88.Google Scholar
  23. Kiviluoto, K., & Oja, E. (1997). S-map: a network with a simple self-organization algorithm for generative topographic mappings. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in Neural Information Processing Systems (Vol. 10, pp. 549–555)., MIT Press MA: Cambridge.Google Scholar
  24. Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, 59–69.CrossRefGoogle Scholar
  25. Kohonen, T. (2001). Self-organizing maps (3rd ed.). Berlin: Springer.CrossRefGoogle Scholar
  26. Latif, K., & Mayer, R. (2007). Sky-metaphor visualisation for self-organising maps. In Proceedings of the International Conference on Knowledge Management (I-KNOW 07). Graz, Austria.Google Scholar
  27. Lee, J., & Verleysen, M. (2007). Nonlinear dimensionality reduction. Heidelberg, Germany: Springer, Information Science and Statistics Series.Google Scholar
  28. Lee, J., & Verleysen, M. (2009). Quality assessment of dimensionality reduction: rank-based criteria. Neurocomputing, 72(7–9), 1431–1443.CrossRefGoogle Scholar
  29. Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1), 702–710.CrossRefGoogle Scholar
  30. Lueks, W., Mokbel, B., Biehl, M., & Hammer, B. (2011). How to evaluate dimensionality reduction? In B. Hammer & T. Villmann (Eds.), Proceedings of the Workshop on New Challenges in Neural Computation. Machine Learning Reports: University of Bielefeld, Department of Technology, Frankfurt, Germany.Google Scholar
  31. van der Maaten, L., & Hinton, G. (2008). Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research, 9, 2579–2605.Google Scholar
  32. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). Berkeley, CA: University of California Press.Google Scholar
  33. Merkl, D., & Rauber, A. (1997). Alternative ways for cluster visualization in self-organizing maps. In Proceedings of the Workshop on Self-Organizing Maps (WSOM 97). Helsinki, Finland.Google Scholar
  34. Naud, A., & Duch, W. (2000). Interactive data exploration using MDS mapping. Proceedings of the Conference on Neural Networks and Soft Computing (pp. 255–260). Poland, Zakopane.Google Scholar
  35. Neumayer, N., Mayer, R., Poelzlbauer, G., & Rauber, A. (2007). The metro visualisation of component planes for self-organising maps. In Proceedings of the International Joint Conference on Neural Networks (IJCNN 07). Orlando, FL, USA: IEEE Computer Society.Google Scholar
  36. Nikkilä, J., Törönen, P., Kaski, S., Venna, J., Castrén, E., & Wong, G. (2002). Analysis and visualization of gene expression data using self-organizing maps. Neural Networks, 15(8–9), 953–966.CrossRefGoogle Scholar
  37. Pampalk, E., Rauber, A., & Merkl, D. (2002). Using smoothed data histograms for cluster visualization in self-organizing maps. In Proceedings of the International Conference on Artificial Neural Networks (ICANN 02) (pp. 871–876). Madrid, Spain.Google Scholar
  38. Pölzlbauer, G., Rauber, A., & Dittenbach, M. (2005). Advanced visualization techniques for self-organizing maps with graph-based methods. Proceedings of the International Symposium on Neural Networks (ISNN 05) (pp. 75–80). Chongqing, China: Springer.Google Scholar
  39. Pölzlbauer, G., Dittenbach, M., & Rauber, A. (2006). Advanced visualization of self-organizing maps with vector fields. Neural Networks, 19(6–7), 911–922.CrossRefGoogle Scholar
  40. Purves, D., Augustine, G., Fitzpatrick, D., Hall, W., LaMantila, A., McNamara, J., et al. (Eds.). (2004). Neuroscience. Massachusetts: Sinauer Associates.Google Scholar
  41. Rauber, A., Paralic, J., & Pampalk, E. (2000). Empirical evaluation of clustering algorithms. Journal of Information and Organizational Sciences, 24(2), 195–209.Google Scholar
  42. Resta, M. (2009). Early warning systems: an approach via self organizing maps with applications to emergent markets. In B. Apolloni, S. Bassis, & M. Marinaro (Eds.), Proceedings of the 18th Italian Workshop on Neural Networks (pp. 176–184). Amsterdam: IOS Press.Google Scholar
  43. Samad, T., & Harp, S. (1992). Self-organization with partial data. Network: Computation in Neural Systems, 3, 205–212.CrossRefGoogle Scholar
  44. Sammon, J. (1969). A non-linear mapping for data structure analysis. IEEE Transactions on Computers, 18(5), 401–409.CrossRefGoogle Scholar
  45. Sarlin, P. (2012a). Chance discovery with self-organizing maps: discovering imbalances in financial networks. In Y. Ohsawa & A. Abe (Eds.), Advances in Chance Discovery (pp. 49–61). Heidelberg, Germany: Springer.Google Scholar
  46. Sarlin, P. (2012b). Visual tracking of the millennium development goals with a fuzzified self-organizing neural network. International Journal of Machine Learning and Cybernetics, 3, 233–245.CrossRefGoogle Scholar
  47. Sarlin, P. (2014). Data and dimension reduction for visual financial performance analysis. Information Visualization (forthcoming). doi: 10.1177/1473871613504102
  48. Sarlin, P., & Rönnqvist, S. (2013). Cluster coloring of the self-organizing map: An information visualization perspective. In Proceedings of the International Conference on Information Visualization (iV 13). London, UK: IEEE Press.Google Scholar
  49. Serrano-Cinca, C. (1996). Self organizing neural networks for financial diagnosis. Decision Support Systems, 17, 227–238.CrossRefGoogle Scholar
  50. Sun, Y., Tino, P., & Nabney, I. (2001). GTM-based data visualisation with incomplete data. Technical Report. Birmingham, UK: Neural Computing Research Group.Google Scholar
  51. Torgerson, W. S. (1952). Multidimensional scaling: i. theory and method. Psychometrika, 17, 401–419.CrossRefGoogle Scholar
  52. Trosset, M. (2008). Representing clusters: K-means clustering, self-organizing maps, and multidimensional scaling. Technical Report 08–03. Department of Statistics, Indiana University.Google Scholar
  53. Tufte, E. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press.Google Scholar
  54. Ultsch, A. (2003b). U*-matrix: A tool to visualize clusters in high dimensional data. Technical Report No. 36. Germany: Deptartment of Mathematics and Computer Science, University of Marburg.Google Scholar
  55. Ultsch, A., & Siemon, H. (1990). Kohonen’s self organizing feature maps for exploratory data analysis. In Proceedings of the International Conference on Neural Networks (ICNN 90) (pp. 305–308). Dordrecht, the Netherlands.Google Scholar
  56. Ultsch, A., & Vetter, C. (1994). Self-organizing feature maps versus statistical clustering methods: A benchmark, University of Marburg. Research Report. FG Neuroinformatik & Kuenstliche Intelligenz. 0994.Google Scholar
  57. Ultsch, A. (2003a). Maps for the visualization of high-dimensional data spaces. Proceedings of the Workshop on Self-Organizing Maps (WSOM 03) (pp. 225–230). Kitakyushu, Japan: Hibikino.Google Scholar
  58. Venna, J., & Kaski, S. (2001). Neighborhood preservation in nonlinear projection methods. an experimental study. In Proceedings of the International Conference on Artificial Neural Networks (ICANN 01) (pp. 485–491). Vienna, Austria: Springer.Google Scholar
  59. Venna, J., & Kaski, S. (2006). Local multidimensional scaling. Neural Networks, 19, 889–899.CrossRefGoogle Scholar
  60. Venna, J., & Kaski, S. (2007). Comparison of visualization methods for an atlas of gene expression data sets. Information Visualization, 6(2), 139–154.CrossRefGoogle Scholar
  61. Vesanto, J. (1999). Som-based data visualization methods. Intelligent Data Analysis, 3(2), 111–126.CrossRefGoogle Scholar
  62. Vesanto, J., & Ahola, J. (1999). Hunting for correlations in data using the self-organizing map. Proceeding of the International ICSC Congress on Computational Intelligence Methods and Applications (CIMA 99) (pp. 279–285). Rochester, NY, USA: ICSC Academic Press.Google Scholar
  63. Vesanto, J., & Alhoniemi, E. (2000). Clustering of the self-organizing map. IEEE Transactions on Neural Networks, 11(3), 586–600.CrossRefGoogle Scholar
  64. Waller, N., Kaiser, H., Illian, J., & Manry, M. (1998). A comparison of the classification capabilities of the 1-dimensional kohonen neural network with two partitioning and three hierarchical cluster analysis algorithms. Psychometrika, 63, 5–22.CrossRefGoogle Scholar
  65. Ward, J. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.CrossRefGoogle Scholar
  66. Yin, H. (2008). The self-organizing maps: background, theories, extensions and applications. In J. Fulcher & L. Jain (Eds.), Computational intelligence: A compendium (pp. 715–762). Heidelberg, Germany: Springer.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Centre of Excellence SAFEGoethe University FrankfurtFrankfurt am MainGermany
  2. 2.RiskLab FinlandIAMSR Åbo Akademi UniversityTurkuFinland
  3. 3.Arcada University of Applied SciencesHelsinkiFinland

Personalised recommendations