1 Introduction

This paper develops a new method for analyzing a set of points in relation to another single point, the latter of which we call a reference point. We aim to evaluate the spatial relationship between points and a reference point.

An example is the relationship between a retail store and its customer distribution, which has been discussed in retail geography and marketing science (Davies 2012; Scott 2017). Customers of convenience stores are tightly clustered around the stores, while those of shopping malls are widely spread. Epidemiology studies the relationship between air pollution and respiratory, cancers such as lung, bronchus, and larynx cancers (Filippini et al. 2019). Cancer cases are often clustered around air-polluting sites. Geographic profiling is an important topic in criminology (Kent et al. 2006; Trinidad et al. 2021). It analyzes the spatial relationship between crime locations and the offender's residence location.

The distance plays an important role in the above relationships. Customers generally decrease with the distance from retail stores. Cases of respiratory cancers often also decrease with the distance from air-polluting sites. Crime locations, on the other hand, exhibit different patterns. Offenders often avoid the neighborhood of their residence as crime locations, and consequently, crimes first increase and then decrease with the distance from their residences.

The direction is another important factor. Customers of retail stores are often more widely spread along highways due to easy accessibility. Air-polluting materials are conveyed by air currents, which leads to an anisotropic spatial pattern of cancer cases around air-polluting sites (Kurumatani and Kumagai 2008; Nakaya 2010). Directional variation is also found in crime locations since the spatial cognition of criminal offenders varies by direction (Kent and Leitner 2007; Frank et al. 2011; Mohler and Short 2012).

Studying the spatial relationship between points and a reference point helps us understand the properties of points and consider their underlying structures, i.e., how point patterns are formed. The relationship between retail stores and customer distributions tells us how the distance and direction affect the customers’ store choice. Analysis of the relationship between cancer cases and air-polluting sites permits us to find air-polluting sites that are more likely to cause respiratory cancers, which may have to be closed or downsized (Brender et al. 2011; García-Pérez et al. 2015). Using past data to analyze the relationship between crime locations and offenders’ residences permits us to detect offenders’ unknown residences from their crime locations (Kent et al. 2006).

However, analytical methods of the spatial relationship between points and a reference point have not yet been fully established, as discussed in the next section. To fill the research gap, this paper proposes a new method for analyzing the spatial pattern of points in relation to a reference point. We aim to reveal how the number of points varies by the distance from a reference point and by direction. We consider the number of points in relation to the number of the potential locations of points, i.e., the locations where points can be located. Suppose the customers of a retail store in a region. All the residents in the region can be the store's customers; thus, their residence locations are represented as potential locations. Consideration of potential locations is important since their number affects the number of points. Potential locations are often referred to as inhomogeneous population in spatial statistics since the distribution of potential locations is generally inhomogeneous (Gatrell et al. 1996; Diggle 2013).

The rest of the paper is organized as follows. Section 2 reviews related works. Section 3 describes the proposed method in detail. Section 4 applies the method to analyze the spatial pattern of climbers of Mt. Azuma in Japan. Section 5 summarizes the conclusions with a discussion.

2 Related works

2.1 Analysis of a single set of points

Point pattern analysis has long been discussed extensively in geography, ecology, statistics, and other fields. The nearest-neighbor distance method is a basic but essential method to discuss clustered and dispersed point patterns (Clark and Evans 1954). The method evaluates the degree of point clustering within a statistical framework. The K-function and its standardized version, the L-function, consider point patterns with a scale parameter represented as the radius of circles randomly placed in a study region (Ripley 1976). These functions represent the degree of point clustering as a function of the geographic scale of analysis. The two-point correlation function is often used in astronomy (Peebles 1973, 1993). It aims to indicate the probability of finding an unknown galaxy as a function of the distance from known galaxies.

The above methods assume the complete spatial randomness (CSR) in the null hypothesis, i.e., point distribution follows a uniform distribution in a study region, which does not always hold in the real world. Cuzick and Edwards (1990) resolve this problem by a statistical method for evaluating point patterns where points can be located only at limited locations. Diggle and Chetwynd (1991) also discuss point clusters under inhomogeneous potential locations.

The above methods, unfortunately, do not meet our demand since they consider only a single set of points. We aim to discuss the relationship between a single set of points and another reference point, implying that we need to consider two different sets simultaneously.

2.2 Analysis of two sets of points

The nearest-neighbor spatial-association measure evaluates the spatial proximity between two sets of points (Lee 1979). The method provides the probability that the observed proximity occurs under the CSR. The cross K-function and L-function are more flexible and widely used in various academic fields (Ripley 1977). They can change the geographical scale of analysis to evaluate the spatial proximity and can handle cases where the location of one set of points is fixed under the null hypothesis. A drawback is that they also assume the CSR in the null hypothesis. The colocation quotient (CLQ) resolves this problem by introducing a randomization test (Leslie and Kronenfeld 2011). Cromley et al. (2014) generalizes the CLQ using the spatial weight function and proposes a local version of CLQ. Li et al. (2022) further extends the CLQs into the spatiotemporal dimension.

The above methods analyze the relationship between two sets of multiple points. They are effective for grasping the overall pattern of the relationship between points. Our interest, on the other hand, lies in the detailed relationship between a set of points and a reference point, which existing methods cannot discuss. In addition, the above methods do not explicitly consider the directional variation in point patterns around reference points.

2.3 Analysis of the directional variation in a single set of points

Anisotropic K-function (Dale 2000; Rosser and Cheng 2019) considers the directional variation in point patterns. Extending the original K-function, it evaluates point clustering as a function of geographic scale and direction of analysis. The method, unfortunately, assumes a single set of points, and thus, it does not meet our objective.

Directional statistics is a useful tool for discussing the directional variation of spatial phenomena (Pewsey et al. 2013; Ley and Verdebout 2017). It has been widely used in biology, astronomy, climatology, etc. A drawback is that directional statistics does not consider the radial dimension, i.e., the distance from a reference point. Our study requires the consideration of both the directional and radial dimensions, which are not satisfied by directional statistics.

3 Method

3.1 Analysis of a single set of points 1: visualization

This subsection proposes a method for visualizing the relationship between a single set of points Ω and a reference point Z in the region Ξ. We first assume that the potential locations of points are uniformly distributed in Ξ. We describe our method under this situation and then proceed to the case where the potential locations of points are not uniformly distributed.

The method first divides the region Ξ into L sectors centered at Z, which are numbered clockwise from north {Ʌ1, Ʌ2, …, ɅL} as shown in Fig. 1. The number of points in Ʌi is denoted by Mi. Let Ωi = {Pi1, Pi2, …, PiMi} be the set of points in Ʌi arranged in increasing distance from Z. We use the polar coordinate system originating from Z to indicate the location of points. The location of Pij is given by rij and θij, the distance from Z and the angle measured clockwise from north, respectively. The following uses variables i and j to represent the ith sector and the jth point in each sector, respectively.

Fig. 1
figure 1

Radial sectors centered at the reference point Z indicated as the red point. Labels of the points in Ʌ1, Ʌ2, and Ʌ3 are indicated as examples (color figure online)

We then determine the area inside which the α percentage points around Z are contained in each sector. The number of points inside this location in sector Ʌi is given by

$$j = \frac{\alpha }{100}M_{i} ,$$
(1)

where Mi is the number of points in Ʌi. If j is an integer, we take Pij as the location inside which the α percentage points around Z are contained. If j is not an integer, we consider the two adjacent integers j1 and j2 between which j is contained. Using points Pij1 and Pij2, we define Ui, the representative location of the α-covering area in sector Ʌi. Its radial and angular coordinates are given by

$$R_{i} \left( \alpha \right) = \frac{{r_{{ij_{1} }} + r_{{ij_{2} }} }}{2}$$
(2)

and

$$\theta_{i} = \frac{2\pi }{L}\left( {i - \frac{1}{2}} \right),$$
(3)

respectively, where L is the number of sectors. The red points in Fig. 2 indicate the representative locations of the points shown in Fig. 1, where α = 50.

Fig. 2
figure 2

Representative locations where α = 50 (red points) and the 50-covering area (red dotted lines) shown in Fig. 1 (color figure online)

We finally connect the representative points from U1 to UL to generate a polygon containing the α percentage points. We call it the α-covering area. The red dotted line in Fig. 2 indicates the 50-covering area of points shown in Fig. 1. Unlike existing methods, this area explicitly assumes that the point pattern varies by direction, thus permitting us to grasp the anisotropic pattern of points in relation to reference point Z.

In the above procedure, we must determine the number of radial sectors L. A large L is desirable if there are enough points since it permits us to discuss the point pattern in detail. A concern is that the α-covering area can drastically fluctuate along the angular axis, preventing us from grasping the overall directional pattern of points. Spatial smoothing, such as the moving average and kernel smoothing along the angular axis resolves this problem (Silverman 1986; Scott 2015). Smoothing conceals complicated details and increases the interpretability of overall point patterns.

If there are not enough points, we have to decrease sectors. The minimum number of points necessary in each sector depends on the value of α. Let us assume α = 50, where we aim to find the area containing half the points. We consider two cases where each sector contains two points and two hundred points, respectively. We obtain the same 50-covering area in both cases, but the result obtained in the latter is more reliable since it is based on more information. To evaluate the reliability, we introduce a statistical framework that is used for discussing the necessary sample size in binomial distribution (Lwanga et al. 1991; Desu 2012). We assume that the probability that a point is located inside the 50-covering area is 50 percent. In the first case, where there are two points in each sector, the probability that one point is located inside the 50-covering area and the other is outside the area is

$$\left( \frac{1}{2} \right)^{2} {}_{2}C_{1} = 0.5.$$
(4)

If we set the significance level to five percent, this situation is insignificant because it can happen by chance. In the second case, where there are two hundred points in each sector, the probability that half of the points are located inside the 50-covering area is

$$\left( \frac{1}{2} \right)^{200} {}_{200}C_{100} = 0.056.$$
(5)

Though it is still insignificant at the five percent level, the result is more reliable than the first case’s. An increase of points decreases the probability and thus increases the reliability. Table 1 shows Mmin(α, s), the minimum number of points in each sector required at significance level s. As seen in the table, more points are necessary as α goes away from 50. When points are insufficient, we have to decrease sectors or use α value close to 50.

Table 1 The minimum number of points in each sector

So far, we have assumed that the potential locations of points are uniformly distributed in the region Ξ. We then proceed to the cases where the potential locations of points are not uniformly distributed. Let us assume that the region Ξ consists of Y regions, in each of which the number of potential locations is reported. A typical example is the population data aggregated across spatial units such as census districts and zip code areas. The numbers of points and potential locations are denoted by {p1, p2, …, pY} and {q1, q2, …, qY}, respectively. The average of the latter is

$$\overline{q} = \frac{{\sum\limits_{i} {q_{i} } }}{Y}.$$
(6)

We standardize the number of points in each region to eliminate the effect of the variation in the number of potential locations. The standardized number of points in the ith region is given by

$$p_{i}^{\prime } = \frac{{\overline{q}}}{{q_{i} }}p_{i} .$$
(7)

This value works as the ratio of the number of points to the number of potential locations. If qi is larger than its average, we reduce the number of points by randomly choosing \(p_{i}^{\prime }\) points from the pi points. If qi is smaller than its average, we randomly locate \(p_{i}^{\prime }\)-pi points in the ith region. This eliminates the effect of the variation in the number of potential locations. After the standardization, we construct the α-covering area.

The computational complexity of the above processes is given as follows. We rearrange the points in increasing distance from Z in each sector. The calculation order of sorting algorithms such as heap sort and merge sort in the ith sector is

$${\text{O}}\left( {M_{i} \log \left( {M_{i} } \right)} \right)$$
(8)

(Tridgell 2005; Cormen et al. 2022). The total complexity is thus given by

$$\begin{aligned} \sum\limits_{i} {{\text{O}}\left( {M_{i} \log \left( {M_{i} } \right)} \right)} = & L{\text{O}}\left( {\frac{M}{L}\log \left( \frac{M}{L} \right)} \right) \\ = & {\text{O}}\left( {M\log \left( \frac{M}{L} \right)} \right) \\ = & {\text{O}}\left( {M\log M} \right), \\ \end{aligned}$$
(9)

where

$$M = \sum\limits_{i} {M_{i} } .$$
(10)

Equation (9) assures that the computation ends in an acceptable time (Papadimitriou 2003; Arora and Barak 2009).

3.2 Analysis of a single set of points 2: point pattern modeling

Visualization is effective at an early stage of analysis. Visual analysis, however, is inevitably subjective, as mentioned in Sect. 2. This subsection proposes a more objective method to complement the visual analysis.

We introduce mathematical models to describe each sector’s decreasing pattern of points. The models are applied after the standardization mentioned earlier. The negative power function is used in gravity models (Colwell 1982; Fotheringham and O’Kelly 1989):

$$f_{i} \left( r \right) = \frac{{\beta_{i} }}{{r^{{ - \gamma_{i} }} }},$$
(11)

where r represents the distance from Z. The parameters βi and γi represent the number of points at the reference point and the decreasing speed of points with the distance from the reference point, respectively.

A drawback of this function is that it is undefined at r = 0. The negative exponential model is often used instead that can be defined at r = 0 (Haggett 1966; Haynes 1975).

$$f_{i} \left( r \right) = \beta_{i} \exp \left[ { - \gamma_{i} r} \right].$$
(12)

The logistic model is another option:

$$f_{i} \left( r \right) = 1 - \frac{{\beta_{i} }}{{1 + \exp \left[ { - \gamma_{i} \left( {r - \mu_{i} } \right)} \right]}},$$
(13)

Where µi represents the distance between the inflection point and the reference point.

Criminology often uses more complicated models for describing crime locations since crimes first increase and then decrease with the distance from the offender’s residence. Normal and lognormal models describe the spatial pattern of crimes (Snook et al. 2005; Paulsen 2006; O’Leary 2009, 2011):

$$f_{i} \left( r \right) = \frac{{\beta_{i} }}{{\sqrt {2\pi } }}\exp \left[ { - \frac{1}{2}\left( {\frac{{r - \mu_{i} }}{{\gamma_{i} }}} \right)^{2} } \right]$$
(14)

and

$$f_{i} \left( r \right) = \frac{{\beta_{i} }}{{\sqrt {2\pi } }}\exp \left[ { - \frac{1}{2}\left( {\frac{{\log r - \mu_{i} }}{{\gamma_{i} }}} \right)^{2} } \right].$$
(15)

The functions are maximized at r = µi and r = exp[µi] in Eqs. (14) and (15), respectively.

Choosing a mathematical model, we estimate the model from the observed data. We consider two cases that are different in the possible location of points. The first case is that points can be located anywhere in the region Ξ. In this case, the function fi(r) is the probability density function of points. We estimate the model by using the maximum likelihood method, where the likelihood is given by

$$L\left( {f_{i} \left( r \right)} \right) = \prod\limits_{j} {f_{i} \left( {r_{ij} } \right)} .$$
(16)

The second case is that points can be located at limited discrete locations in Ξ. This happens, for instance, when points represent retail stores that can be located only at the sites in commercial areas. Sites are classified into two groups, i.e., inside and outside commercial areas. We denote the latter in sector Ʌi and their distances from Z as Ω\(^\prime\)i = {P\(^\prime\)i1, P\(^\prime\)i2, …, P\(^\prime\)iMi} and {r\(^\prime\)i1, r\(^\prime\)i2, …, r\(^\prime\)imi}, respectively. They represent the locations where points cannot be placed. We consider a stochastic model where the probability that a point is located at a given location is given by the function fi(r). The likelihood of this model is

$$L^{\prime } \left( {f_{i} \left( r \right)} \right) = \prod\limits_{j} {f_{i} \left( {r_{ij} } \right)} \prod\limits_{j} {\left\{ {1 - f_{i} \left( {r^{\prime }_{ij} } \right)} \right\}} .$$
(17)

We estimate the model again based on the maximum likelihood method.

Having estimated models, we statistically compare them to choose the best model. If models have the same number of parameters, we can compare them by their likelihoods. If models contain a different number of parameters, information criteria such as AIC and BIC are effective for evaluating the model fitness (Kuha 2004; Chakrabarti and Ghosh 2011).

The choice of L depends on the analysis objective and the number of points. A detailed discussion of the spatial pattern of the models requires a large L, while the global pattern can be captured by a small L. Appropriate model estimation requires a certain number of points in each sector. The sample size problem has long been discussed in statistics (Cohen 2013; Kraemer 1974). The necessary sample size depends on the model structure, the number of parameters to be estimated, the sample distribution, and so forth. Therefore, discussing sample size problems in a general setting is difficult. In regression modeling, 100–200 samples are often said to be the minimum size for obtaining significant results (Comrey 1978; Green 1991; Tosteson et al. 2003). Since the structures of the models mentioned above are similar to those of simple regression models, it sounds desirable to determine L so that each sector contains at least 100–200 points.

The computational complexity of model estimation is given as follows. We estimate the models using a numerical optimization algorithm such as the EM algorithm. Each iterative process treats M/L points, whose calculation order is O(M/L). We repeat the model estimation in L sectors, and thus, the total complexity of model estimation is

$$L \times {\text{O}}\left( \frac{M}{L} \right) = {\text{O}}\left( M \right).$$
(18)

3.3 Analysis of two sets of points

So far, we have discussed the relationship between a single set of points Ω and a reference point Z. This subsection compares the two sets of points that share the same reference point. Suppose another set of points Θ in the region Ξ. Let Θi = {Qi1, Qi2, …, QiNi} be the set of points in sector Ʌi arranged in ascending order of the distance from Z. The distance of Qij from Z is denoted by sij.

Our interest lies in the difference in the spatial patterns of points around Z, or more precisely, whether points of Ω are more widely spread than those of Θ, or vice versa. We first compare the two point sets in each sector. To this end, we adopt a statistical procedure that compares the two sets of distances {ri1, ri2, …, riMi} and {si1, si2, …, siNi}. We combine and compare the two sets by rearranging the distances in ascending order. This is a typical two-sample problem where many statistical tests have been proposed, such as the MannWhitney and the KolmogorovSmirnov tests (Corder and Foreman 2014; Deshpande et al. 2017). The MannWhitney test is effective when two sets share an equal variance (Sokal and Rolf 1981; Siegel and John Jr 1988). The KolmogorovSmirnov test is applicable under unequal variances. A drawback of the KolmogorovSmirnov test is that it is less statistically powerful than the MannWhitney test due to its wide flexibility. Runs tests are also available for this two-sample problem (O’Brien 1976; O’Brien and Dyck 1985; Maritz 1995). Tests have been developed based on the number and the maximum run length.

The above statistics permit us to test the statistical difference between Ωi and Θi in each sector. Let ωi be the statistic calculated in sector Ʌi. Using ωi, we evaluate the overall difference between Ω and Θ, where the null hypothesis is that there is no statistical difference between the two sets. Let ωi be the statistic calculated in sector Ʌi. Its summation is given by

$$\omega = \sum\limits_{i} {\omega_{i} } .$$
(19)

The static ω shows a large positive value if the points of Ω are more widely distributed around Z than those of Θ. If the points of Θ are more widely distributed, ω shows a small negative value. The static ω is close to zero if no significant difference exists between Ω and Θ.

When each sector contains enough points (at least 50, but 100 is desirable (Zhang and Wu 2002; Happ et al. 2019; Uttley 2019)), the probability distribution of ωi under the null hypothesis independently follows the same distribution. The MannWhitney statistic, for instance, follows the standard normal distribution under the null hypothesis. The same distribution also approximates the Kolmogorov–Smirnov statistic.

If the number of radial sectors L is large enough, the central limit theorem allows us to easily calculate the probability distribution of ω under the null hypothesis. Let φ and \(\delta^{2}\) be the mean and variance of the probability density distribution of ωi under the null hypothesis. We define the statistic ω0 as

$$\omega_{0} = \frac{\omega - L\varphi }{{\sqrt L \delta }}.$$
(20)

The central limit theorem assures that ω0 follows the standard normal distribution under the null hypothesis, and thus we can evaluate the statistical significance of ω, i.e., the overall difference between Ω and Θ.

Even if L is not large enough, the probability density distribution of ω is approximated by a normal distribution when ωi approximately follows a normal distribution under the null hypothesis. The MannWhitney statistic is a case where this requirement is satisfied. Some statistics used in runs tests also follow normal distributions. The reproductive property of the normal distribution assures that ω follows a normal distribution whose mean and variance can be calculated.

The above discussion assumes that each sector contains enough points. If points are not enough, we calculate the probability distribution of ω under the null hypothesis by using the Monte Carlo simulation. We randomize the location of two sets of points in each sector, calculate ωi, and sum up all the ωi's. We repeat this process many times (generally 10,000 times) to obtain the probability distribution of ω.

The computational complexity of the above processes is given by the same discussion in Sect. 3.1, i.e.,

$$\sum\limits_{i} {{\text{O}}\left( {\left( {M_{i} + N_{i} } \right)\log \left( {M_{i} + N_{i} } \right)} \right)} = {\text{O}}\left( {\left( {M + N} \right)\log \left( {M + N} \right)} \right),$$
(21)

where

$$N = \sum\limits_{i} {N_{i} } .$$
(22)

Similar to Eq. (9), the calculation order shown in Eq. (21) is acceptable.

4 Application

This section applies the method proposed in the previous section to the analysis of the spatial pattern of climbers of Mt. Azuma in Fukushima prefecture, Japan. Mt. Azuma is a popular mountain that gathers many climbers from all over Japan. Its elevation is 2035 m, and a wide variety of climbers, from beginners to experts, enjoy hiking, trekking, and mountaineering at Mt. Azuma. We aim to analyze climbers' spatial and temporal patterns and reveal their underlying structure.

YAMAP Inc. developed a smartphone software called YAMAP that records the trajectories of climbers. Thirty percent of all the climbers in Japan have installed this software. YAMAP Inc. kindly provided us with the location data of YAMAP users and those of the climbers of Mt. Azuma from January 2019 to December 2021. Mt. Azuma and its climbers serve as a reference point and a set of points in the proposed method, respectively. The location data of YAMAP users work as the potential locations of points mentioned in Sect. 3.1, which are aggregated at the city level to conceal individual information.

Figure 3 shows the location of Mt. Azuma and its competing mountains, whose elevation is between 1500 and 2500 m. Competing mountains are primarily located in the southwest of Mt. Azuma, while they are sparse in the northern area. Navy shades indicate the number of Mt. Azuma climbs per YAMAP user from 2019 to 2021. The spatial pattern of navy shades reflects the number of competing mountains, i.e., the value is lower in the southwest and higher in the north.

Fig. 3
figure 3

Mt. Azuma and its competing mountains, whose elevation is between 1500 and 2500 m. Navy shades indicate the number of Mt. Azuma climbs per YAMAP user from 2019 to 2021. The data are aggregated at the city level (color figure online)

We divided the whole region into L = 96 radial sectors centered at Mt. Azuma since this value was large enough to analyze the detailed variation in the spatial pattern of climbers, and each sector contains enough climbers for significant analysis. We performed the analysis using programs written in C++, which ran on an i9-9900U, CPU 3.60 GHz, RAM 32 GB computer.

4.1 Visualization and comparison of point patterns

The analysis started with visualizing the spatial pattern of the climbers of Mt. Azuma. We calculated the α-covering area of the climbers defined in Sect. 3.1. The moving average window of width 5 smoothed the areas. Figure 4 shows the 50- and 75-covering areas from 2019 to 2021. The areas widely spread from north to southwest, suggesting that Mt. Azuma attracted climbers from all over Japan. The noncircular shape of the areas indicates the directional variation in the spatial pattern of climbers. The areas were largest in 2019, before the outbreak of COVID-19. The areas drastically shrank in 2020 since climbers avoided traveling to distant places for mountaineering. The areas expanded in 2021, though they are smaller than in 2019.

Fig. 4
figure 4

The 50- and 75-covering areas of climbers of Mt. Azuma from 2019 to 2021. The white triangle indicates Mt. Azuma

Table 2 shows the statistic ω0 of the KolmogorovSmirnov test that compares the spatial pattern of climbers between different years. The absolute value of ω0 is large if the patterns of two point sets differ, while it is small if the difference is insignificant. As seen in the table, the climbers in 2019 are more widely spread than in 2020 at a significance level of five percent, while no significant difference was found between other pairs of years. This indicates that COVID-19 has drastically narrowed the catchment area of Mt. Azuma.

Table 2 The statistic ω0 of the KolmogorovSmirnov test that evaluates the difference in the patterns of climbers between different years

Figure 5 shows the seasonal variation in the 50- and 75-covering areas. We considered the four seasons: spring (March to May), summer (June to August), autumn (September to November), and winter (December to February). The figure indicates that the areas are generally the largest in autumn. This sounds reasonable since the autumn leaves of Mt. Azuma attract many climbers from wider areas. In contrast, the areas are relatively small in summer, especially in the northern area of Mt. Azuma. Summer is the best season for climbing higher mountains, requiring sophisticated skills and techniques that are not necessary for climbing Mt. Azuma. People prefer difficult mountains in summer, which shrank the covering areas of Mt. Azuma.

Fig. 5
figure 5

The 50- and 75-covering areas of climbers of Mt. Azuma calculated for each season. The white triangle indicates Mt. Azuma

Table 3 shows the statistic ω0 that compares the spatial pattern of climbers between different seasons. Large positive values of autumn indicate that the climbers are more widely spread than in other seasons, especially summer and winter. No significant difference was observed between other pairs of seasons. Mt. Azuma gathers its climbers from the widest area in autumn, while the spatial pattern of climbers is not so different between other seasons. Autumn seems to be considered the best season for climbing Mt. Azuma.

Table 3 The statistic ω0 of the KolmogorovSmirnov test that evaluates the difference in the patterns of climbers between different seasons

4.2 Modelling point patterns

This subsection builds mathematical models to describe the spatial pattern of climbers of Mt. Azuma in detail. One of our primary interests lies in the directional variation in the distance deterrence of climbers. Figure 6 shows the 24 directions in which we estimated the models. Directions are numbered clockwise from the north. The radial lines connect Mt. Azuma and the farthest YAMAP users in individual directions.

Fig. 6
figure 6

The 24 directions in which the models were estimated. The radial lines connect Mt. Azuma and the farthest YAMAP users in individual directions. The numbers and colors of radial lines correspond to those in Fig. 7. Navy shades indicate the number of Mt. Azuma climbs per YAMAP user from 2019 to 2021 (color figure online)

We estimated the models represented by Eqs. (12) and (13) by using the likelihood defined by Eq. (17). Comparing the models using AIC, we found that the model represented by Eq. (13) is better in all directions. The following thus focuses on the result obtained based on Eq. (13).

The vertical axis in Fig. 7 indicates the probability density function fi(r). Gray lines correspond to the areas where YAMAP users do not exist. We classified the 24 directions by the slope of functions in Fig. 7: G1 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, G2 = {11, 12}, G3 = {13, 14, 15, 16, 17, 18}, and G4 = {19, 20, 21, 22, 23}. Figure 7b shows that the slopes of Group G1 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10} are the most gentle among the four groups. The slopes of Group G4 = {19, 20, 21, 22, 23} are also gentle but relatively steeper than those of Group G1. One reason for this difference is the difference in train accessibility between these groups. Figure 8 shows the railway network around Mt. Azuma. We checked the timetable of the railway lines and found that the trains of Tohoku Shinkansen and Tohoku Lines indicated by thick white lines are faster and more frequent than those of other lines. This increases the accessibility of Group G1 compared with that of Group G4.

Fig. 7
figure 7

The estimated logistic models defined by Eq. (13). The vertical axis indicates the probability density function fi(r). Gray lines correspond to the areas where YAMAP users do not exist. Graphs show the models between a [0, 1000] meters and b [0, 300] meters (color figure online)

Fig. 8
figure 8

Railway lines around Mt. Azuma. Thick white lines indicate Tohoku Shinkansen and Tohoku Lines, whose frequencies are relatively higher in this region. The numbers and colors of radial lines correspond to those in Fig. 7. Navy shades indicate the number of Mt. Azuma climbs per YAMAP user from 2019 to 2021 (color figure online)

The distance where fi(r) begins to decrease varies among Group G4 = {19, 20, 21, 22, 23}. They decrease from their maxima β to their half at r from 60 to 150 km. It is consistent with Figs. 4 and 5, where the 50-covering areas show clear anisotropic patterns. For instance, the climbers of direction 19 rapidly decrease with the distance from Mt. Azuma in Fig. 8, while the climbers of direction 23 gradually decrease. The traffic network is weak in these directions, and the variation of accessibility causes the variation in estimated models.

Groups G2 = {11, 12} and G3 = {13, 14, 15, 16, 17, 18} have many competing mountains in their directions. A difference in their patterns in Fig. 7 lies in that fi(r) begins to decrease at a farther distance in Group G2. This is due to the train accessibility of the Tohoku Shinkansen and Tohoku Lines mentioned earlier. Easy accessibility attracts distant climbers of Group G2. The slopes of Group G3 are the steepest among the four groups. The probability remains maximum until 200 km in directions {14, 15, 16, 17}, and suddenly decreases to zero. This result is consistent with Fig. 8, where navy shades are dark at least within 200 km of Mt. Azuma. Competing mountains cause the steepest slopes in these directions.

5 Concluding discussion

This paper has developed a new method for analyzing the relationship between points and a reference point. We aim to reveal how the number of points varies by the distance from a reference point and by direction. The α-covering area visualizes the directional variation in the spatial pattern of points, while mathematical models statistically describe the relationship between points and a reference point. The statistics ω and ω0 permit us to evaluate whether a set of points are more widely spread around a reference point than another set of points. Section 4 applied the proposed method to analyze the spatial pattern of the climbers of Mt. Azuma, Japan. The results gave us useful and interesting empirical findings, which support the soundness of the method.

The strength of the proposed method is summarized as follows. Firstly, the proposed method is effective for analyzing the relationship between a set of points in relation to a reference point, which cannot be fully investigated by existing methods discussed in Sect. 2. Particularly the α-covering area is a useful tool for an intuitive understanding of the overall spatial pattern of points. Secondly, the method is flexible in that various models are applicable, as shown in Sect. 3.2. Thirdly, the method lets us compare the spatial spread of two sets of points within a statistical framework.

The method, however, has several limitations. We finally discuss its future extensions. Firstly, the method neglects the temporal dimension in point patterns. Empirical application discusses the difference between different periods. However, this application involves data aggregation along the temporal dimension, reducing the original information’s temporal resolution. An analytical method should be developed to consider the spatial and temporal dimensions without aggregation.

Secondly, we should introduce a statistical framework for evaluating the α-covering area. This permits us to evaluate the statistical significance of the obtained areas. Statistical framework, however, needs careful and extensive discussion. We need to consider null hypotheses, statistics, their probability distributions, and so forth. Though this is an important topic, we leave it for future research.

Thirdly, further applications are necessary to evaluate the effectiveness of the proposed method. An empirical study in this paper gave us new findings useful for understanding the factors determining the climbers’ behavior, such as train accessibility and competing mountain locations. This, however, does not assure that the proposed method always works successfully in other academic fields. Applications in epidemiology, criminology, ecology, and other fields of spatial information science are necessary.

Fourthly, the definition of distance needs reconsideration. The proposed method evaluates the distance between points by the Euclidean distance. The Euclidean distance, however, is not always appropriate for describing the trip behavior of individuals (Fortney et al. 2000; Okabe and Sugihara 2012). If we introduce the time distance in Sect. 4, the directional variation observed in the distribution of climbers may be relaxed. It will also be better to include the waiting time for trains. Time, network, and mental distances are possible alternatives that can improve the model description.

Fifthly, the method neglects the attributes of points. Climbers of Mt. Azuma vary in their attributes, such as age, income, expertise level, etc. Older climbers may tend to choose nearby mountains than younger ones. One option is to classify climbers by age and compare their α-covering areas. This approach, however, depends on the classification scheme, which can be subjective, especially when treating numerical attributes. Further discussion is necessary to consider the point attributes in the analysis.