The difficulties with using market comparisons for calculations of CMO royalties are due to the problem of collecting accurate data in different countries and finding comparable markets. In our analysis, the market comparison methodology supported by taxonomic measures was applied.
It is usually possible to find markets with higher and lower levels of fees collected by CMS. The most important problem is to compare markets with objective criteria. The objective of the investigations presented herein is the recognition and classification of similarities and differences between European countries related to the hotel market. For this purpose, formalism and methods of multivariate comparative statistical analysis were used in the quantitative analysis.
The result of a quantitative comparative analysis is, in general, the appropriate grouping of objects into groups of similar objects, significantly different from objects assigned to other groups. The notion of similarity is connected with the notion of distance between objects. The number of groups and their characteristics are not known in advance. The objective is rather to reveal and classify the existing similarities and dissimilarities.The method of K-means (center of gravity method) is described and used in the analysis.
Organization of Data and Clustering Method
In multivariate statistical analysis, the set of data represents, in general, measurements of many variables related to the set of objects considered. Assume that measurement refers to m variables (m ≥ 2). represented by the vector-function:
$$ \boldsymbol{X}=\left[{X}_1,{X}_2,\dots, {X}_m\right]. $$
(1)
Consider now measurements on the set of n objects. Measurement on the object i is to be represented by the point vector xi:
$$ {\mathbf{x}}_i=\left\{{x}_{i1},{x}_{i2},\dots, {x}_{im}\right\},\kern1.25em \left(i=1,2,\dots, n\right). $$
(2)
Measurements on the set of n objects are represented by the n × m matrix:
$$ \mathbf{X}=\left[{x}_{ij}\right], $$
(3)
where xij is the measurement of the j-th variable on the i-th object.
Measurements of different variables are expressed, in general, in different units. Most multivariate comparative analysis methods may be applied if measurements are given in the same units and are of comparable order. To satisfy this requirement, different normalization procedures are applied. The most common is standardization of the variables (Johnson and Wichern 2007):
$$ {Z}_j=\left\{{z}_{1j},{z}_{2j},\dots, {z}_{nj}\right\},\kern0.5em \left(j=1,2,\dots, m\right) $$
(4)
$$ {z}_{ij}=\left(\frac{x_{ij}-{\overline{x}}_j}{s_j}\right),\kern0.75em \left(i=1,2,\dots, n;j=1,2,\dots, m\right), $$
(5)
where zij is the standardized value of the variable Xj on the i-th object, \( {\overline{x}}_j \) is the arithmetic average of the variable Xj, and sj is the standard deviation of the variable Xj.
The mean value of the standardized variable is zero and its standard deviation is 1. Standardized data allow us to easily distinguish objects, which are below average or above average with relation to specific variables.
To compare items described by many variables, the notions of similarity and dissimilarity are necessary and must be formally defined. In the multivariate comparative analysis, the measure of dissimilarity is the distance between objects, represented as points in the space of standardized variables. It is called statistical distance (Johnson and Wichern 2007). The most commonly used is the Euclidean distance between objects:
$$ {d}_{il}=\sqrt{\sum_{\mathrm{j}=1}^m{\left({z}_{ij}-{z}_{lj}\right)}^2},\kern0.5em \left(i,l=1,2,\dots, n\right), $$
(6)
where dil is distance between objects i and l, and zij is normalized value of the variable Xj on the i-th object.
K-means method
The idea of the K-means method is the partition of a set of objects, represented by points in the space of standardized variables, into subsets (clusters) of similar elements, concentrated around the nearest centroids (means). The term K-means was suggested by MacQueen (Johnson and Wichern 2007).
Partitioning into K clusters Ck is realized by minimization of the function G, which represents overall scattering of points within clusters:
$$ {\displaystyle \begin{array}{c}\left\{{C}_k\right\},\left(k=1,2,\dots K\right)\to \min G\\ {}G={\sum}_{k=1}^K\sum i\in {C}_k{\sum}_{j=1}^m{\left({Z}_{ij}-{\gamma}_{kj}\right)}^2\end{array}} $$
(7)
where γk is the point vector representing the mean position of objects assigned to the cluster Ck; called the center of gravity of the cluster:
$$ {\gamma}_{kj}=\frac{1}{n_k}{\sum}_{i\epsilon {C}_k}{z}_{i j}. $$
(8)
The function G may be thus represented in the form:
$$ G={\sum}_{k=1}^K{\sum}_{i\in {C}_k}{\left({g}_{ik}\right)}^2 $$
(9)
where gik is the Euclidean distance between the object i, which belongs to the cluster Ck, and the center of gravity of this cluster.
One-element clusters do not give rise to the value of G, as they are centroids themselves. The final number of clusters may be specified in advance or determined as part of the clustering procedure. In what follows, the second approach will be used, as the objective is to identify natural similarities and differences between countries, related to the hotel market.
Finding a partition into clusters that corresponds to the minimum of the overall scattering function is a mathematical task, solved numerically, that usually has a specific solution (one can easily come up with point distributions for which there is no single solution). It is quite another thing is to use such a solution in practice. One may be interested also in partitions which are not much worse than the optimum one, having in mind additional criteria not taken into account in calculations.
Partitioning European Countries into Groups in View of Their Economic Conditions and Hotel Market
Aggregated data related to the hotel market and main statistics for a group of 21 selected European countries are presented in Table 1. The following basic variables describing the hotel market were considered: gross domestic product (GDP) per capita, average daily rate (ADR), occupancy rate and revenue per available room (RevPAR).
Table 1 Basic Parameters Related to the Hotel Market in 21 European Countries Variables that strongly differentiate countries are GDP per capita and ADR and, to a much lesser extent, occupancy rate. GDP per capita and ADR are not so strongly correlated with each other as one might suppose; the correlation coefficient is 0.616. At the same time, the correlation coefficient for GDP per capita and occupancy rate is 0.582 and for ADR and occupancy rate is only 0.405. The correlation is weak, but positive, which is not so obvious. RevPAR was calculated as the product of ADR and occupancy rate. It is thus not an independent variable used in the partition procedure.
A number of possible partitions of 21 countries into clusters were examined, using two or three standardized variables. As a preferred solution, discussed later, a partition into four clusters was selected, based only on two variables: GDP per capita and ADR. This solution is distinguished by the fact that any reassignment of items significantly increases the overall scattering of points within clusters.
Results of the partition represented as standardized variables are presented in Table 2. Standardized distances between members of particular clusters and cluster centroids are in all cases smaller than distances between centroids, which vary from 1.412 for Clusters 2 and 4 to 4.277 for Clusters 1 and 3. Partitioning to clusters for the original variables is represented in Table 3. The countries are ordered according to the value of GDP per capita. Distribution of clusters and their centroids on the plane (GDP per capita, ADR) are depicted in Fig. 1.
Table 2 Partition of 21 Countries into Four Clusters Using K-means Method; Standardized Variables Table 3 Basic Parameters Related to the Hotel Market in Four Clusters of European Countries The compositions of the clusters is as follows. Cluster 1 includes Romania, Hungary, Poland, Lithuania, Slovakia, Estonia, the Czech Republic, and Portugal. Cluster 2 includes Malta, Greece, Spain and Italy. Cluster 3 is France. Cluster 4 includes the United Kingdom, Germany, Belgium, Finland, Austria, Ireland, the Netherlands, and Denmark.
Cluster 1 constitutes seven Central-Eastern European countries and Portugal. All of them show GDP per capita and ADR below the average for the group of 21 European countries. Cluster 2 includes four Mediterranean countries, with a comparatively high ADR, but with GDP per capita below or only slightly above the average. Cluster 3 constitutes France by itself which has the highest ADR and a comparatively high GDP per capita. Including France in Cluster 4 increases the overall scattering by more than double. Cluster 4 constitutes eight countries with high GDP per capita and also comparatively high (except Ireland), ADR.