1 Introduction

Urban development and new housing investments are causing changes in the characteristics of urban space, resulting primarily from urban infrastructure development (Balchin & Rhoden, 2020; Madgin, 2021; Rohani & Ma, 2018). A consequence of urbanisation is an increase in buildings and transport infrastructure. The density of buildings is increasing, while socially desirable green and recreational areas are declining (Conway et al., 2010; Dzhambov & Dimitrova, 2014). The simultaneous growth of population density accompanies this. Economic development in urban areas influences changes in social requirements for social structure, required standard of living, quality of life, and so-called accessibility to urban amenities. The density of urban amenities can be analysed using points of interest (POIs), which, as open-source data, can be easily available and free of charge. Points of interest are specific locations or features cartographically mapped in geographical space and are uniquely associated with different aspects of human life (Liu et al., 2020; Wu et al., 2021). Particular urban amenities in terms of POIs can be extracted and categorised, as POIs represent the finest spatial resolution of the built environment of urban landscapes (Xiao et al., 2017). POIs can, therefore, be divided into the functions they perform in the city, including culture (theatres, libraries, museums, cinemas), education (kindergartens, schools, universities), health (clinics, hospitals, outpatient clinics, pharmacies), transportation (roads, bridges, bus stations, car parks), or recreation (parks, forests, fountains, lakes).

Points of interest can be found on the Web from a broad range of sources, including Twitter and Instagram (social media platforms), OpenStreetMap and Google Maps (map applications), Tripadvisor and Airbnb (reservation applications) (Milias & Psyllidis, 2021). POIs thus play a unique role in navigation applications of GPS (Global Positioning System) receivers and GIS (Geographic Information System) systems. The number and density of these points (Liu et al., 2020; Steiniger et al., 2016) may indicate the degree of urbanisation and primarily reflect the spatial structure of a city. The use of points of interest can thus be helpful in housing studies because it allows researchers to understand better the spatial distribution of prices of various types of dwellings and the factors that may influence them. The quality and number of urban amenities, measured by the density of points of interest, may directly impact the value and housing prices. Price differentiation in urban areas results not only from the structural (internal) characteristics of real estate properties as physical objects but, above all, from the differentiation of environmental features, including the immediate vicinity (e.g., the popularity of the area, noise intensity or crime level).

In the paper, we believe that the number and types (categories) of POIs correspond to the density of urban amenities (mentioned above). We propose two main hypotheses in this paper, which can improve our understanding of real estate functioning and the factors which influenced the diversity of prices in this market:

  • Hypothesis 1 – The higher density of points of interest (POI) in all categories positively affects housing price growth,

  • Hypothesis 2 – The different categories of points of interest (POI) show individually might exhibit stimulating or inhibiting effects on housing prices.

The study employed two hypotheses: one examining the collective or aggregate effects of Points of Interest (POI) and the other scrutinising the individual impacts of each POI. As a result of hypothesis 1, it is assumed that the larger the number of urban amenities (near a specific flat), the higher the quality of life in this zone is, which translates into higher demand for apartments in the area. Consequently, the value of the real estate in the neighbourhood increases. The hypothesis 2 assumes that different categories of POI can be stimulants or destimulants of prices in the real estate market. According to this hypothesis, the presence of various types of points of interest (POI) might positively or negatively impact real estate values. It assumes that although some amenities, attractions, or facilities may benefit property prices, others may negatively impact the real estate market's dynamics. This hypothesis sheds light on these elements’ potential role as stimulants or destimulants in determining market conditions by examining the association between particular POI types and property prices.

The research hypotheses represent a dual (complementary) housing market research procedure that does not exclude any research scenario a priori. Such a research procedure, which considers the individual and integrated effects of POI, allows for a more complete understanding of market dynamics and the factors affecting housing values. The analysis of the model of the combined effect of POI density on housing prices proposed in our work can add value to similar studies while analysing single categories. As a result, this dual approach, derived from the concept of two hypotheses, can complement existing research. Much of the research on the housing market and POI analysis (described in detail in Section 2) has limitations related to analysing individual POI effects rather than integrated POI analysis. Some studies focus on single POIs, such as parks or public transportation stops, without considering the simultaneous impact of multiple POI categories in a single model. This type of analysis is also used in an interesting paper (Thackway et al., 2022), where data on points of interest (POIs) such as tourist attractions, stores, restaurants and parks were used (hedonic model, geographically weighted regression model) to determine their impact on property prices. The study found that a 1% increase in Airbnb listings leads to a 0.018% increase in Sydney housing prices. Similar research on Airbnb listings is presented in the following paper (Shabrina et al., 2022), where urban amenities were considered, but in London.

Our research is generally based on classic theories related to the functioning of housing markets, i.e. location theory (Beckmann, 2007; Funck, 2007), externalities theory (Cornes & Sandler, 1996; Tisdell, 1970) and hedonic analysis theory (Lancaster, 1966; Rosen, 1974). Location theory suggests the need to consider spatial aspects, i.e. accessibility to services, level of infrastructure, security, distance to infrastructure points or attractiveness of a location. The theory of externalities refers to the assumption that objects in economic space are influenced by other objects or processes that have no direct connection to them, e.g., individual home sales. However, these effects can affect prices, demand and supply in the housing market. Therefore, both positive and negative effects in the environment of the studied object that affect it are essential. The core principle of the hedonic methodology is based on the belief that the price of a diverse commodity, such as an apartment in our research, is determined by its features. These attributes can be either objective or subjective, but Lancaster argued in favour of the former. These assumptions are the foundation of our research.

The research on using open-source points of interest (POI) data to understand the causes of volatility in residential property prices was conducted according to the procedure shown in Fig. 1.

Fig. 1
figure 1

Research procedure

This study makes a few major contributions to the extant literature on modelling housing prices. Firstly, we examined the relationship between POI density and housing prices, particularly the influence of specific POI categories. Using information on POI points improves the quality of hedonic models and reduces prediction errors. Secondly, we incorporated spatial quantile regression to assess the impact of POIs on prices in the presence of outlier observations and different price levels. Thirdly, previous studies have generally focused only on selected categories that already constitute amenities by design (schools, restaurants, stores, etc.). We study all categories together and each separately. Fourthly, we pointed out the possibility and usefulness of using open geospatial data to improve price predictions.

The paper is structured as follows. After an introduction to the research (Section 1), a theoretical basis of the conducted study, with literature review is given in Section 2. In this section, we focus on possibly incorporating points of interest in housing market studies. Section 3.1 presents the study area, and the sources of information contained in the research are discussed. Section 3.3 briefly describes the methods we used, named in Fig. 1. Apart from classical regression models, we used the quantile regression model and its spatial version (considering spatial autocorrelation). In the result chapter (Section 4) and discussion section (Section 5), we present the estimation results and their interpretation. Finally, in the conclusion (Section 6), we discuss the findings and compare them with prior studies in the field.

2 Literature review

Housing market analysis and research at the micro level (individual transactions) require detailed information about a property’s features. The use of statistical methods to determine factors affecting the value of properties or to build price indexes requires the availability of databases characterised by a high degree of detail in the description of each observation (Dittmann, 2013; Głuszak, 2015; Hill & Trojanek, 2022; Tomal, 2019). Information from housing transaction price databases usually does not fully describe the condition states necessary to obtain reliable results using econometric and statistical methods. Hence, there is a need to expand these methods based on other data or create new ones. The process of combining data from different databases requires finding a key that will allow the proper data assignment to the appropriate observation.

Recent developments in technology and improved data-gathering techniques have precipitated many changes in the evaluation of housing from applied and theoretical perspectives (Winson-Geideman et al., 2017). Large volumes of data are being collected, transformed, and analysed to predict market trends. The ease of gathering data from different online sources (e.g., information on properties or sales, GIS data, coordinates, layers shared by municipalities, or data from the StreetView project) offers the possibility to merge data and estimate models with new variables and elaborate new ideas and solutions. Open-source data enhances research by providing fast and free access to information.

Points of interest (POIs) are open-source data types that allow research by selected geographical areas. What is a particularly useful feature of POI (e.g. parks, schools, restaurants, hotels, cinemas, monuments) is their cartographic mapping (geographic coordinates), which allows us to use these data in numerous research areas, such as Geographic Information Science (GIScience) (Gao et al., 2017; Wu et al., 2016), Urban Planning (Ganter et al., 2022; Naumzik et al., 2020), Socio-Economic (Dudás et al., 2017; Sun et al., 2022), Tourism and Marketing (Taylor et al., 2018; Yochum et al., 2020), Transportation (Jia et al., 2018) or Environmental Science (Dong et al., 2018).

Points of Interest (POIs) could support research into housing markets to understand better the spatial distribution of different property types and their attributes. There are many examples in the relevant literature of examining the relationship between housing prices and specific types of POI or their spatial density. On the one hand, there are positive urban amenities, i.e. schools, parks, cultural centres or public transport, the accumulation of which can make an area more attractive and thus influence higher property prices (Rae et al., 2012; Spangenberg, 2013; Yang et al., 2021). On the other hand, the existence of such features of urban space as industrial areas, motorways, landfill sites or high crime levels can make an area undesirable, which results in lower property prices because such types of POIs are perceived as characteristics reducing the quality of life in the area (Fu et al., 2019; Jia et al., 2018; Steiniger et al., 2016). Greater accessibility to open-source data has led to an increased interest in this type of source of information on the spatial variation of various facilities. As mentioned, they have been used in real estate market research but on a much smaller scale, primarily because of the challenges associated with gathering such information (public data not being available, dispersion of information due to the lack of databases for larger geographic areas). Nowadays, these data are used in much broader geographical areas, allowing the investigation to be carried out more intensely. Table 1 presents examples of recent studies using POIs based on open-access sources in housing market investigations. This table details the selected studies in terms of, among other things, the types of POIs used in the research, the methods used, and the main results.

Table 1 The examples of recent studies using POIs based on open-access sources in the housing market

There are also many other examples in the relevant literature examining the relationship between housing prices and specific types of POI or their spatial density. An innovative approach to the topic is presented in an interesting paper (Fu et al., 2019), where so-called open-access-dataset-based hedonic price modelling (OADB-HPM) was applied, as road network data and massive street view images were used in addition to house prices and point of interest (POI) data. The empirical results of this study showed that buyers are more willing to pay for homes that provide the opportunity to see natural elements in the surrounding area. In another study (Xiao et al., 2017), an enhanced hedonic regression model that eliminates the effect of spatial autocorrelation in hedonic regression was applied using the eigenvector spatial filtering (ESF) method. The model was built using data on nearly 7,000 housing transactions and more than 70,000 POIs. According to the authors of this study, the method used makes it possible to analyse the determinants of housing prices at a satisfactory spatial resolution, which was not possible in previous studies due to the lack of small urban datasets. This research results align with the authors' opinion that POIs can provide low-cost and quickly acquired knowledge of urban structure (urban amenities). Therefore, our research shows an enormous need to use POIs for housing price studies. Confirmation of this assumption that the information on the spatial organisation of urban infrastructure components is particularly important in the spatial assessments of the housing market, as shown by subsequent publications (Liu et al., 2020; Yue et al., 2017). (POIs have also been used to identify urban functional areas (Chen et al., 2023; Yang et al., 2023), which may indicate the segmentation of local housing markets. More on urban area research can be found in many studies (Bangura & Lee, 2023; Bieda, 2017; Cellmer et al., 2014; Pradhan & Abdullahi, 2017; Trojanek, 2023; Trojanek et al., 2017; Ventura et al., 2020). In the following study (Huang et al., 2022), the results show that the proposed function-intensity index (integrates the quantitative-density index and average-nearest-neighbor index (ANNI) of POIs) can balance the impact of the spatial heterogeneity of each type of POI on determining the functional characteristics of the urban units. Another study (Ganter et al., 2022) proposes a possible way to explain urban inequalities based on point-of-interest (POI) data from the Open Street Map, which can be used to define a more attractive city neighbourhood for residents so that it can contribute to areas with higher housing prices. In that research, Ganter et al. (2022) discovered that POIs are highly predictive of intra-city inequality, explaining up to 75% of out-of-sample variance of urban inequality. The next study (Zhang & Pfoser, 2019) confirms the assumption made in our article that POI data, especially its temporal aspect (set of changes), can be used to drive urban research and study urban change.

The influence of points of interest on property prices is not limited to residential properties, as similar relationships have been observed in the commercial property market. For example, the study carried out by (Fu & Shan, 2018) showed that using POI points, it is possible to determine the reasonable allocation of commercial service facilities where there is an optimal matching relationship between the level of distribution of commercial density and the level of distribution of residential density from a macro-mezo perspective, with the use of Kernel Density Estimation (KDE), only. Our study significantly develops the methodological approach of this research (above), as in addition to KDE, Ordinary Least Squares regression, Quantile regression were also used.

The findings suggest the necessity for further studies exploring POIs and their influence on housing markets. Understanding the relationships between POIs and property values could support planning optimal urban development, determining ideal locations for commercial services, and creating more attractive neighbourhoods, potentially contributing to increased property values. In addition to using several statistical models, our research is a significant development of a number of previous studies, as it is based on the two research hypotheses presented earlier. On the one hand, it assumes that the greater the number of urban amenities, the higher the quality of life in the area, which is reflected in a higher demand for housing in the area. And on the other hand, it classifies the variables into real estate market stimulants or destimulants, which is an added value of the research. We believe that our study can also be critical in better understanding urban development and real estate market trends, helping to make informed decisions and formulate policies in the housing sector.

3 Materials and methods

3.1 Study area

The study was conducted in three Polish cities: Warsaw, Poznań and Olsztyn (Fig. 2). We chose these three cities because of the marked differences in housing prices and the significant differences in the size and character of these cities to ensure a comprehensive view of the housing market and diversity in analysing Points of Interest for a more versatile perspective (detailed information below). Warsaw, the capital of Poland (inhabited by almost two million people), is the political and economic centre of the country. It is also a prominent centre of culture and art (museums, concert halls, theatres), business (seats of numerous overseas branches of multinational corporations), and science (a lot of public and private universities). Poznań, located in western Poland, has a population of 550,000 people. It is an important logistics centre (a railway, road and airport hub). Olsztyn is located in the northeastern part of Poland (one of the least industrialised regions of the country) and has approximately 180,000 inhabitants. This city significantly differs from the other two as it is not an essential industrial or commercial centre. Its economy is based on tourist and educational functions (one university). The cities we selected for this study because of their size and location can represent small, medium and large cities in Poland. They differ not only in size but also in spatial structure and the development level of the real estate market.

Fig. 2
figure 2

Study area

3.2 Data description

Data on residential property transactions come from property price registers run by district counties (in the case of cities with county rights, these registers are run by town halls). In the study, we analysed only the transactions of a market character. We also ignored those selling prices that significantly differed from others and had a character of outliers). In the research, we used information from more than 16,000 transactions concerning dwellings in 2019, including more than 13,000 in Warsaw, almost 2,500 in Poznań and approximately 1,200 in Olsztyn). The basic statistics for the housing prices (PLN per square meter) are presented in Table 2, while Fig. 3 shows those housing prices' histogram and box plots. In the next part of the study, the logarithm of the average unit price of the dwellings was used instead of just the unit price (see Table 3).

Table 2 Basic descriptive statistics of housing prices in Warsaw, Poznań and Olsztyn
Fig. 3
figure 3

Histogram and Box plot of housing prices in Warszawa, Poznań and Olsztyn

Table 3 Data used in the research

The transactions were then geocoded using the application made available by the Centre for Spatial Analysis of Public Administration (capap.gugik.gov.pl—Pol. Centrum Analiz Przestrzennych Administracji Publicznej) based on the addresses of buildings in which dwellings were located. As a result, the spatial distribution of transactions in each city is presented in Fig. 4.

Fig. 4
figure 4

Spatial distribution of transactions concerning dwellings

The analysis of regularities in terms of links and interdependencies between housing prices and the density of POIs requires considering other spatial and non-spatial factors that shape the prices of residential properties. Taking into account the condition of the market and the availability of data, we adopted some property attributes, including the transaction price, as control variables. Information about independent variables constituting a set of control variables is provided in Table 3.

The information about POIs was taken from OpenStreetMap (openstreetmap.org). The study used QGIS software and the QuickOSM plug-in, which allows the objects on the map to be recorded in vector form. The accuracy and timeliness of OSM data can vary depending on the quality and frequency of volunteer contributions and the quality of the data sources used to create it. During the study, the selected areas were analysed for consistency of the map's content with the real state, and no differences were found that could significantly affect the study results. It should be indicated that each service which uses POIs, e.g., for car navigation, has its taxonomy of categories of these points. In OpenStreetMap, specific general rules exist for categorising POIs in the form of dictionaries, classes and corresponding codes. Still, they are not always appropriate for analyses considering the reality of the property market. Therefore, we proposed our taxonomy of POI categories for research, as presented in Table 4.

Table 4 POI categories

Ten categories of POIs connected with the functioning of urban space were proposed: sustenance, education, transportation, healthcare, entertainment, public service and financial, facilities, shops and services, leisure and sport, tourism, and historical. In each category, some specific subcategories were distinguished. Figure 5 shows the spatial distribution of the examined points as a density map (KDE—kernel density analysis method). The kernel function scope of 1 km was adopted for illustrative purposes and preliminary analysis.

Fig. 5
figure 5

Kernel density analysis of POIs data [number/km2]

Figure 5 shows the distribution of POI densities, which naturally visualises a spatial structure in each city. A clear central area is visible in each of the examined cities, with the highest density of POIs. We can also clearly see the areas with the minimum density of points or the lack thereof. These are usually parts of the city that are impossible to use in urbanisation processes, such as woods, lakes or marshlands. It is particularly evident in Olsztyn, with 17 lakes and large areas of urban woods within the administrative boundaries of the town.

One must specify the smoothing parameter (bandwidth) to use the kernel function to determine POI density. The value of this parameter relates to the answer to the question of the scope of impact of POIs and how it affects the location advantages in the place of residence. Different categories of POIs should also be considered, as they will have different scopes of influence. For example, a school will substantially impact more than a local convenience store. However, it can be challenging to specify each category's bandwidth separately. In previous studies, authors usually approached this problem intuitively. For example, (Chen & Clark, 2013) indicate that the acceptable distance of walking to a food shop is approximately 800 m. On the other hand, (Tang et al., 2018) adopt one kilometre as the bandwidth in the kernel function to determine the scope of impact of POI points while believing that this distance should be 700 m. To specify the bandwidth, (Cellmer et al., 2019, 2020) suggest utilising the range of semivariograms of residential property prices, which indicates a distance of about 200 m. Conducting a rational evaluation of the concept of proximity regarding going on foot, we assumed that this distance should correspond to a five-minute walk, i.e., approximately 250 m. Therefore, we adopted such a distance as the bandwidth. Having assessed the density in each point corresponding to a transaction, we read the indicator of POI density in total and for each category separately. After that, we added these values to a set of independent variables, including previously defined control variables. This set of baseline data was used for detailed analysis.

3.3 Methods

The relationship between the number or density of POI points and transaction prices can be expressed as a classical multiple regression model, the parameters of which are estimated using the least square method. The Ordinary Least Squares regression (OLS) is a common technique for estimating the coefficients of linear regression equations describing the relationship between independent quantitative variables and a dependent variable. This model has the following representation:

$${\text{Y}}=\mathrm{X\beta }+\upvarepsilon$$
(1)

where: Y - vector of the dependent variable, X - matrix of covariates, β - vector of parameters, ε - error terms vector.

These models do not always produce satisfactory results. Although less frequently, the LAD (Least Absolute Deviation) is also used. Greene describes the characteristics of this estimator in detail (Greene, 2000), among others. As in the real estate market, relatively many observations can be classified as outliers; we decided to use not only the classical model but also a resistant estimation, i.e., quantile regression (QR) and its spatial version—spatial quantile regression (SQR). Quantile regression was proposed by Koenker and Bassett (Koenker & Bassett, 1978). A specific case of quantile regression for the quantile of 0.5 (median) is equivalent to the LAD estimator, which minimises the sum of absolute errors. By introducing various regression quantiles, we can give a complete description of conditional probability distributions, especially in the case of asymmetrical or truncated distributions. The starting point for formulating the quantile regression model is the conditional quantile function of the random variable Y (Koenker & Machado, 1999):

$${{\text{Q}}}_{{\text{Y}}}\left(\uptau |{\text{X}}\right)={{\text{F}}}^{-1}\left(\uptau |{\text{X}}\right)$$
(2)

The quantile regression model of q has the following representation:

$${{\text{Y}}}_{{\text{i}}}={{\text{X}}}_{{\text{i}}}^{\mathrm{^{\prime}}}{\upbeta }^{\left(\uptau \right)}+{\upvarepsilon }_{{\text{i}}}^{\left(\uptau \right)}$$
(3)

where \({Y}_{i}\equiv {Q}_{\left(\tau \right)}\left({Y}_{i}|{X}_{i}\right)\), \({\beta }^{\left(\tau \right)}={\left({\beta }_{1}^{\tau },{\beta }_{2}^{\tau },...,{\beta }_{k}^{\tau }\right)}^{\mathrm{^{\prime}}}\) is a vector of the sensitivity coefficients of the conditional quantile on the changes in the values of covariates, and \({Q}_{\tau }\left({\varepsilon }_{i}^{\tau }|{x}_{i}\right)=0\).

If εi is a prediction error, the estimation of quantile regression model parameters is based on the assumption of the minimisation of a sum, which gives asymmetrical weights: (1 − τ)| εi| for too big and τ|εi| for too small predictions, i.e., it minimises the function (Koenker, 2017):

$${\text{Q}}\left({\upbeta }^{\left(\uptau \right)}\right)={\sum }_{{\text{i}}:{{\text{y}}}_{{\text{i}}}\ge {{\text{x}}}_{{\text{i}}}^{\mathrm{^{\prime}}}{\upbeta }_{\uptau }}^{{\text{n}}}\uptau \left|{{\text{y}}}_{{\text{i}}}-{{\text{x}}}_{{\text{i}}}^{\mathrm{^{\prime}}}{\upbeta }^{\left(\uptau \right)}\right|+{\sum }_{{\text{i}}:{{\text{y}}}_{{\text{i}}}<{{\text{x}}}_{{\text{i}}}^{\mathrm{^{\prime}}}{\upbeta }_{\uptau }}^{{\text{n}}}\left(1-\uptau \right)\left|{{\text{y}}}_{{\text{i}}}-{{\text{x}}}_{{\text{i}}}^{\mathrm{^{\prime}}}{\upbeta }^{\left(\uptau \right)}\right|$$
(4)

This function is non-differentiable, and its minimum is found through linear programming (Portnoy & Koenker, 1997). Estimators found in this way are asymptotically optimal. Quantile regression is more resistant to outliers, and we avoid assumptions regarding error distributions (Koenker, 2017).

The spatial version of quantile regression is an extension of the SAR (Spatial Autoregressive) model in the form:

$${\text{Y}}=\mathrm{\rho WY}+\mathrm{X\beta }+\upvarepsilon$$
(5)

where: Y - vector of dependent variable realisations, W - spatial weight matrix, ρ - autoregression parameter, X - matrix of covariates realisations, β - vector of parameters, ε - error terms vector.

The quantile spatial autoregressive model QSAR can be written as follows:

$${\text{Y}}={\uprho }^{\left(\uptau \right)}{\text{WY}}+{\text{X}}{\upbeta }^{\left(\uptau \right)}+{\upvarepsilon }^{\left(\uptau \right)}$$
(6)

where \({Y}_{i}\equiv {Q}_{\left(\tau \right)}\left(Y|X\right)\), \({\uprho }^{\left(\uptau \right)}\) - quantile spatial autoregression parameter of order \(\uptau\), \({\upbeta }^{\left(\uptau \right)}\) - vector of the model’s parameters. Vector \({\upvarepsilon }^{\left(\uptau \right)}\) contains independent and identically distributed random variables whose distribution is not specified.

Due to endogeneity problems in models (6) and (7) (on the right side, we have spatial lags of a dependent variable ρWY), their parameters are estimated with the use of procedures of instrumental variables (Chernozhukov & Hansen, 2006). In its generalised version, this procedure requires estimating the ordinary quantile regression model of order τ for WY first (Trzpiot, 2012):

$${\text{WY}}={\text{X}}{\upbeta }^{*\left(\uptau \right)}+{\text{WX}}{\upgamma }^{*\left(\uptau \right)}+{\upvarepsilon }^{*\left(\uptau \right)}$$
(7)

and then calculating predicted values:

$$\widehat{{\text{WY}}}={\text{X}}{\widehat{\upbeta }}^{*\left(\uptau \right)}+{\text{WX}}{\widehat{\upgamma }}^{*\left(\uptau \right)}$$
(8)

Subsequently, the predicted values are used as explanatory variables in the original model:

$${\text{Y}}={\uprho }^{\left(\uptau \right)}{\text{WY}}+{\text{X}}{\upbeta }^{\left(\uptau \right)}+{\upvarepsilon }^{\left(\uptau \right)}$$
(9)

where the parameters of the model are estimated by solving the optimisation problem.

In the examined models, in which the price of a flat is a dependent variable, the quantile regression method and the spatial quantile regression method allow us to answer the question of whether the relationship between the number of POIs and the price is different for the distinguished price segments of the housing market. The expected results of the study will let us find out whether the observed relationships will differ depending on the market segment, the determinant of which is price.

4 Results

The study’s starting point was the classical model of multiple regression. Besides control variables, we considered the density of POI points without dividing them into specific categories (see Table 3). Then, we devised the regression model to assess the particular types of POIs. In the next step, we applied quantile regression, considering all POI points and using specific categories. In the final stage of the study, the quantile regression model was applied to all POIs and their particular categories.

Unfortunately, the absolute values of coefficients will not enable us to directly compare the significance of POI categories due to differences in the average density of points. A more straightforward comparison would be feasible if each category had an equal number of POIs. To ensure the direct comparability of coefficients, we conducted the standardisation of variables before analysis. It should be stressed that standardisation does not influence the study of the significance of model parameters, which is particularly important in the examined models as they have a diagnostic rather than predictive character.

4.1 Multiple regression (OLS)

Correlation analysis between A correlation analysis between explanatory variables was performed before estimating the parameters of the regression model, which concluded that there were no grounds for correlating variables to affect the quality of the model negatively. The results of OLS (ordinary least squares) estimation for the three analysed cities, considering all POI points without dividing them into categories, are presented in Table 5. The regression coefficients presented are related to standardised explanatory variables, allowing direct comparison.

Table 5 The results of OLS estimation. The coefficient values refer to standardised variables

Given the realities of the real estate market, the level of fit between the OLS models and market data was significant. The determination coefficient was 0.616 for Olsztyn, 0.640 for Poznań, and 0.643 for Warsaw. Table 5 also shows the values of the multiple correlation coefficient and the values of the determination coefficients. It is evident that the significance of regression model parameters is affected by the number of observations used for analysis. Thus, in the model estimated for Warsaw, all parameters are statistically significant on a significance level lower than 0.001. The parameters located at control variables and their accuracy characteristics (p-value) show that these variables have been properly selected. In turn, the most important element from the point of view of analyses is the significance of parameters located at a POI variable, indicating the density of the points of interest without dividing them into separate categories. In all the examined cities, we showed a significant statistical link between the number of POIs in the immediate vicinity and the transaction prices of residential properties. Regression relationships do not provide the answer to the question of whether these relationships are of a cause-and-effect nature. In this case, attention should be paid to the study's results by Lu (Lu et al., 2020), which indicate a strong link between the POI density and the area's urbanisation level. The fact that there is a statistical relationship between the existence of POIs in the neighbourhood and transaction prices may imply causality. However, it should be taken into consideration that among POI points, there are many categories, the significance of which may differ, and the impact on prices, if there is any, is not necessarily positive in each category. Therefore, in the next step of the study, we built OLS models in which independent variables were, apart from control variables, POI densities in the categories distinguished in Table 4. We must note, however, that the increase in the number of independent variables entails the risk of collinearity arising from correlating the values of variables. It is a phenomenon that has an adverse effect on the accuracy parameters of the model and may hamper the interpretation of the model as a whole. Figure 6 presents the correlograms of variables we adopted for analysis in Olsztyn, Poznań and Warsaw.

Fig. 6
figure 6

Correlograms of variables adopted for analysis

Our analysis reveals some similarities among the examined cities. However, if there is any correlation between different POI categories, it tends to be negative. For example, we observed a negative correlation between the sustenance and shop variables in each city under study. The correlograms shown in Fig. 6 may suggest the presence of correlations between some variables, posing a significant obstacle to the accurate estimation of model parameters. Addressing the issue of multicollinearity is essential, as it can substantially impact the model's quality. Therefore, before estimating the model parameters, a Variance Inflation Factor (VIF) analysis was performed, which measures how much the variance of an independent variable is influenced, or inflated, by its correlation with the other independent variables (Salmerón et al., 2018). The results do not show a strong correlation between independent variables (Table 6).

Table 6 Variance inflation factor (VIF) for the data from the examined cities

For the data from Olsztyn, the highest VIF coefficient was 3.54, referring to the sustenance variable. For the data from Poznań and Warsaw, this indicator did not exceed 2.70, which means that the variables adopted for analysis do not reveal significant relationships, which could undermine the statistical sense of the examined models. The results of the OLS estimation considering the specific categories of points of interest are shown in Table 7.

Table 7 The results of the OLS estimation consider specific POI categories

The model built based on data from Olsztyn reveals that only three POI categories may be necessary for the formation of housing prices: sustenance, leisure and tourism (p-value was lower than 0.05). The puzzle lies in the fact that the collective impact of all categories remains undoubtedly significant, as demonstrated in the preceding model (Table 4). Therefore, it is safe to say that there is a synergy effect here, i.e., the individual components considered separately do not have much importance, but their combined effect is noticeably significant.

The model built based on data from Poznań shows that regression coefficients located at variables such as sustenance, entertainment, shop, leisure, and tourism were statistically significant (p-value lower than 0.05). What is worth particular attention is that the sign of the coefficient located at the shop variable means that this variable is a destimulant. Some limitations should be considered when interpreting the results because the model parameters and their accuracy ratings indicate the strength and direction of linear relationships. At the same time, they may not be cause-and-effect relationships. For example, it can be observed that in the case of new development projects, where there are relatively high prices, the retail and service infrastructure is only in the formative stage, and buyers invest more in future rather than current benefits.

The regression analysis based on Warsaw data, like in the other two cities, revealed that not all POI categories are essential for forming residential property transaction prices. The significant categories include sustenance, transportation, entertainment, shopping, and tourism.

4.1.1 Quantile regression (QR)

We used quantile regression at the following analysis stage to estimate links and relationships between POI density and transaction prices. As part of the study, we built some quantile regression models for different quantile values of unit prices. In addition, we considered the combined effect of all POIs (see Appendix) and individual categories. Table 8 presents the results of regression for quantile 0.5 (median).

Table 8 The results of QR (quantile regression) taking into consideration specific POI categories for τ = 0.50

The pseudo-R2 measure suggested by Koenker and Machado (1999) measures the goodness of fit by comparing the sum of weighted deviations for the model of interest with the same sum from a model in which only the intercept appears. It is a local fit measure for the quantile regression model since it depends on τ, unlike the global R2 from OLS.

The parameters of the quantile regression model for Olsztyn are similar to the parameters of the OLS model. The parameters for the sustenance, entertainment, and leisure variables were significant (on a significance level lower than 0.05). The sustenance, leisure and tourism variables were significant in the OLS model. The statistically insignificant parameters of the remaining variables related to POIs confirm the assumption that in smaller cities like Olsztyn, the individual point of interest categories, when considered separately, do not exert any real influence on prices. At the same time, we observe a scale effect, i.e., the coexistence of POIs of all categories may be linked with higher prices.

In the model built based on data from Poznań, the regression coefficients located at the sustenance, shop, leisure, and tourism variables turned out to be significant (p-value lower than 0.05). Such findings indicate slight variations between QR results for Poznań and OLS. OLS. Furthermore, the shop variable appeared to be destimulant.

The quantile regression analysis based on the data from Warsaw revealed that POI categories significant for the formation of transaction prices include sustenance, transportation, entertainment, public service, shop, leisure and tourism. However, it is striking that only two categories, i.e., sustenance and leisure, were significant in all the examined cities (it was the sustenance in the OLS model).

The results of the characteristic POI categories strictly connected with local amenities are very interesting. The analysis of both models, i.e., QLS and QR, showed that the shop variable in all three cities is a destimulant. The QR result may be dependent on the adopted quantile τ. Thus, in the course of the study, we examined the results for different quantiles, which, in practice, means that the impact of POIs on prices may differ depending on the relative price level. For example, Fig. 7 shows the relationship between the adopted quantile and the value of parameter β for the shop variable in each of the three cities.

Fig. 7
figure 7

The relationship between the adopted quantile τ and the value of parameter β for the shop variable in the examined cities

This parameter turned out to be important in Poznań and Warsaw. The analysis conducted for Poznań reveals that the parameter's value grows slightly to quantile 0.3, then it maintains the same level to quantile 0.8, after which it drops. In Warsaw, this value is a lot more evident in so-called tails. In the group of the cheapest dwellings, we observe a clear positive link between the density of shops and service outlets. Between quantiles 0.2 and 0.8, this relationship maintains a similar level. On the other hand, this relationship is negative in the group of the most expensive dwellings (above quantile 0.8). The relationships concerning the remaining POI categories are presented in Appendix 1.

4.2 Spatial quantile regression (SQR)

As spatial autocorrelation may appear in the real estate market, we decided to use the quantile regression model in the spatial version, i.e., spatial quantile regression (SQR). Using quantile regression avoids many problems known to researchers using classical regression models. Quantile regression models allow for a more flexible fit in the case of non-linear relationships in the real estate market, which are more resistant to outlier observations (commonly seen among transaction prices). Moreover, they take into account asymmetry in the distribution of the explanatory variable and allow for the study of the relationship between variables in different market segments (defined based on price levels). The occurrence of spatial autocorrelation of transaction prices was confirmed by Moran's I statistics. For Olsztyn, Poznań and Warsaw, these stats were 0.208, 0,119 and 0.249, with a significance level lower than 0.001. Significantly positive spatial autocorrelation was observed in all three analysed cities. The Moran scatterplots also indicated the occurrence of autocorrelation (Fig. 8), presenting the relationship between the standardised price and the same variable spatially lagged.

Fig. 8
figure 8

Moran scatterplots for standardised prices in Olsztyn, Poznań and Warsaw

The occurrence of spatial autocorrelation provides the basis for spatial modelling. In the course of the study, we estimated the model in which all POI categories were taken into account together (Appendix 2) and the model in which individual categories were distinguished. The results of the spatial quantile regression model for τ = 0.5 for the particular categories of POI are presented in Table 9. Appendix 2 additionally shows the results for τ = 0.25 and τ = 0.75.

Table 9 The results of SQR (spatial quantile regression) taking into consideration specific categories of POI for τ = 0.5

The highest spatial correlation of housing prices (rho) was observed in the SQR model based on the data from Olsztyn. In the other two cities, it also turned out to be statistically significant, which confirms previously estimated Moran’s I statistics. In Olsztyn, the only statistically significant variable is entertainment (in previous models, sustenance also turned out to be significant). In the spatial quantile regression model built based on Poznań data, the regression coefficients located at the leisure and tourism variables were significant (p-value lower than 0.05). The set of significant variables is thus a bit different than in OLS and QR models. Still, leisure and tourism are significant variables in all three models, being stimulants simultaneously.

The SQR based on the data from Warsaw shows that, out of all POI categories, seven can be considered statistically significant price determinants (sustenance, transportation, entertainment, public service, shop, leisure and tourism). Like in OLS and QR models, transportation, entertainment, and shop were destimulants. Model estimation was carried out for different values of the quantile τ during the study. The variables education and facilities become significant in Warsaw, where there is the most data, in the cheaper housing segment (τ = 0.25), while the parameter next to the variable leisure&sport is statistically insignificant (see Appendix 2). The model built for the bottom quartile for the other cities indicates that the variables shop&service (Poznań) and tourism&historical (Olsztyn) may be significant in this case. The spatial quantile regression model built based on the upper quartile (τ = 0.75) indicates that in the segment of more expensive apartments in Warsaw, such variables as transportation or public service&financial are less important. In contrast, the parameters located at the sustenance and shop&service variables became important in all analysed cities. These results may be of particular importance in segmenting the real estate market, as they may indicate differences in the perception of factors relevant to the formation of housing prices.

5 Discussion

Our findings confirm a close relationship between POI density and housing prices, which is a clearly shown fact, so our results confirm previous studies. For example, in Beijing (Xiao et al., 2017), the impact of grouped types of POIs (green areas and commercial and business areas), treated as specific hot spots, was investigated and shown to have a high impact on housing prices using geographically weighted regression. Another study (Wu et al., 2016) in Shenzhen, China, found that POI points, considered a measure of urban infrastructure accessibility levels, can explain housing price heterogeneity. Overall, several studies (Fu et al., 2019; Hidalgo et al., 2020; Taylor et al., 2018; Tschernutter & Feuerriegel, 2021) confirm that points of interest (POIs) have been recognised as an essential element in explaining urban phenomena, including the housing market behaviour relationship. Our research also extends to many other publications because it pays particular attention to the number of POI points and their density in a housing price environment.

We demonstrated a statistically significant link between the number of POIs nearby and the transaction prices of residential properties in each of the cities under study. These data offer empirical support for the first hypothesis, indicating that higher densities of facilities in all categories could have a favourable impact on dwelling values. This idea that such elements promote increased demand for dwellings in the neighbourhood, ultimately leading to an upward trend in property prices, is further reinforced by the relationship between the number of urban facilities close to a particular dwelling and a higher quality of life in the area.

However, it refers to considering all POIs together without dividing them into separate categories. If we treat each category as an independent variable, these relationships are not so clear. While we can naturally observe a different influence (in terms of strength and significance) on prices, the direction of impact of some categories may be inconsistent with our assumptions and expectations.

A statistically significant relationship between specific kinds of points of interest (POI) and their impact on real estate prices has been found, supporting the second hypothesis. After a thorough investigation, It has been determined that specific POI constantly impact property prices, acting as stimulants or destimulants within the real estate market. The detailed results of the research papers on the relationship between the different POI categories and their impact on property prices are shown below, highlighting the various factors that can enhance or inhibit property values in an area. The sustenance category in the OLS models is a significant stimulant of prices in all three cities (although in the SQR models for Olsztyn and Poznań, the significance level is higher than 0.05), which can be considered to be a natural regularity related to local amenities. In the education category, the significance level in all cities and models was higher than 0.05. It possibly arose from schools' significantly broader impact compared to the adopted scope of the kernel function, which remained consistent for all POI categories.

The transportation category was a destimulant, but it significantly impacted prices only in Warsaw. The negative character of the relationship may stem from the fact that the highest density of POIs from this category occurs mainly along the city's main roads, which in turn means inconvenience related to traffic, noise, and pollution. The healthcare variable was found to be insignificant in all models, although experience would suggest that points connected with healthcare positively affect prices.

The entertainment category is quite interesting because, in the analysed models, it is both a stimulant (OLS for Poznań) and a destimulant (in all models for Warsaw and Olsztyn). This may imply that, within a single category, we took into consideration the POIs, which, on the one hand, positively affect the quality of location but, on the other hand, cause some inconvenience. This means the adopted classification of POIs is imperfect and will require future verification. The public service & financial variable appeared significant only in the QR and SQR model for Warsaw (it was a stimulant).

In the case of the facilities variable, the significance level in each model was higher than 0.05What deserves special attention is the category of shop & service. Because of the convenience related to the proximity of these buildings, one might assume that there is a positive relationship between their density and housing prices. However, the examined models show that this relationship is negative. However, our study does not justify the statement that many shops and service outlets in the immediate vicinity lead to lower prices. Instead, this relationship may result from the fact that the highest prices are usually observed in new housing estates, where there is no sufficient infrastructure yet. Moreover, there is a delay in further information about POI facilities in the OSM service, which could be the reason for some inaccuracies in interpreting the results if they referred only to intensively developing parts of cities. Thus, it can be assumed that more shops and service outlets exist in newly built housing estates than the OpenStreetMap resources show. The remaining variables, i.e., leisure & sport and tourism & history, can be seen as categories of a similar character. Therefore, we should consider classifying them into a single category (regarding the impact on housing prices).

6 Conclusions

Our study highlights the significant relationship between Points of Interest (POI) density and residential property prices in Polish cities: Warsaw, Poznań and Olsztyn. We found that while individual categories of POIs may have little significance, their collective density positively influences housing prices, demonstrating a synergistic effect. This finding adds value to our research and sets it apart from other similar studies. Thus, we believed that information about POI density could be an important predictor when modelling housing prices, which can improve the quality of hedonic models designed for predicting prices and diagnosing regularities observed in the market.

The study's findings also indicate that the relationship between POI density and residential property prices does not necessarily have to be interpreted in terms of causality but, above all, as a statistical relationship. These relationships can obviously have a cause-and-effect character (causality was not the subject of the study), but this would require additional research, considering the time factor and changes in prices and POI density over time. Addressing these limitations is crucial to enhance the credibility and applicability of the study's findings. Further research endeavours should aim to refine methodologies, validate POI classifications, consider additional variables, and employ diverse analytical approaches to comprehensively understand the intricate relationship between POI density and housing prices.

The study's results reveal that the use of widely available spatial information, such as POI points, not only can improve the quality of modelling but should also constitute an inseparable element of real market analysis in the conditions of the fast development of spatial data infrastructure. The research can be helpful for various stakeholders, including real estate developers, urban planners, valuers and policymakers.