1 Introduction

New forms of hospitality have arisen and developed in the last two decades: couches, private and shared rooms, boats, and apartments are the new forms of accommodations that can be booked besides the more traditional hotel rooms. The choice among these new forms of accommodations is determined by different aspects, such as a wider range of listings, a favorable price-quality ratio, a lower price, the possibility of choosing between a private or a shared environment, the possibility of meeting new people, or living a more authentic experience (Skalska 2017). These new kinds of accommodations are booked online on different platforms. Among these platforms, probably the most famous and successful is Airbnb. Airbnb has been founded in 2008 in San Francisco and has grown significantly over the years. At the moment, Airbnb lists over 6 million accommodation located in 100,000 cities and 220 countries and regions (https://news.airbnb.com/about-us/) (accessed on October 31st, 2022).

One of the most important factors that influence booking choices on the Airbnb platform is price, as price is the key to increasing profitability after satisfying guests expectations (Chattopadhyay and Mitra 2019). It is a strategic element that should be defined considering different aspects such as the specific characteristics of the accommodation, the provided services, and the location.

Identifying the determinants of price on Airbnb is such an important topic that numerous articles have been written on the subject. For instance, Dudás et al. (2020) have discovered that Airbnb accommodation prices are influenced by property-related attributes, such as air conditioning, free internet, free parking, number of bedrooms, and the possibility of booking the entire house or only a room. Moreover, Lorde et al. (2018) have demonstrated that attributes such as location, reputation, convenience, personal and amenities attribute significantly affect prices in the Caribbean. Lawani et al. (2019) have evaluated the impact of three groups of variables on the booking price: structural variables (as number of bedrooms and bathrooms), neighborhood variables (as the distance in feet to the closest convention center and train stations), and the quality-signaling variables (as cleanliness, number of reviews and communication). They have discovered that price is influenced by review scores, room characteristics, and neighborhood features. Finally, Teubner et al. (2017) have highlighted how host reputation can also be a price determinant.

The studies investigating Airbnb price are carried out using different methods such as ordinary least squares (see for instance Voltes-Dorta and Sánchez-Medina (2020)), panel data analysis (see for instance Neumann and Gutt (2017); Falk et al. (2019)), quantile regression (see for instance Wang and Nicolau (2017); Perez-Sanchez et al. (2018); Lorde et al. (2018)), hedonic price models (see for instance Sainaghi et al. (2021); Voltes-Dorta and Sánchez-Medina (2020); Faye (2021)), price equations with spatial effects (Tang et al. 2019), pricing strategy model (Ye et al. 2018), and machine learning (see for instance Kalehbasti et al. (2019)).

The variables included in the models are usually the price as the response variable, together with the independent variables that are normally grouped in categories such as site characteristics, reputation, convenience, personal and amenities attributes Faye (2021); Barnes and Kirshner (2021); Chattopadhyay and Mitra (2019). Generally speaking, previous studies investigated the impact of different groups of variables taking into account a whole city (see for instance Aznar et al. (2018); Moreno-Izquierdo et al. (2018); Voltes-Dorta and Sánchez-Medina (2020)) or more cities, comparing the results among them (see for instance Perez-Sanchez et al. (2018); Rodríguez-Pérez et al. (2019); Chattopadhyay and Mitra (2019)). Only some researchers as Faye (2021), Sainaghi et al. (2021) and Falk et al. (2019) have included in their models variables related to neighborhood’s features in order to evaluate the impact that these aspects might have on price determination.

However, to our knowledge, nobody has studied the price determinants concerning the neighborhoods of a city to investigate the existence of differences in terms of variables’ impact within a specific city. For this reason, we propose two indicators, capable of evaluating the impact of different groups of variables, hence called also dimensions, on Airbnb prices, that could therefore support hosts in the difficult task of price determination.

The two proposed indicators, which we will call respectively q indicator and r indicator, aim to estimate the importance of the dimensions in the price determination for each neighborhood, and the relative importance of a single dimension concerning the other neighborhoods. In particular, the indicators allow identifying the existence of possible differences related to the location of the accommodation.

The variables taken into account in the definition of the indicators have been previously used by other researchers in the investigation of the Airbnb price (see for instance Faye (2021); Önder et al. (2019); Cai et al. (2019)). As evidenced before, in literature they are grouped in a specific dimension. We decide to group the variables into six different dimensions. Specifically, we have grouped the variables related to the proximity to the center and the presence of subway and bus stops nearby in a dimension called transports; those related to the proximity of the accommodation to monuments and museums in a dimension called culture; the presence of other Airbnb accommodations and hotels in the neighborhood in the dimension crowd; the aspects related to the management of the accommodations such as the provided services, the badge of Superhost, the number of photos in the Airbnb platform in the dimension management; the characteristics of the accommodation such as the number of bedrooms and bathrooms in the dimension property; and, finally, the seasonality of the reservation in the dimension time. The choice of name dimensions is related to specific aspects that we have supposed to have a direct impact on the price determination. This is the first time that all these dimensions and variables are combined together in the estimation of a price indicator.

In this study, we illustrate the application of the two indicators on the Airbnb accommodations located in 61 neighborhoods of the capital of Italy, the city of Rome, which is one of the most visited cities in Italy and Europe. The findings of the analysis have shown, generally speaking, a different impact of the six dimensions on the entire city, but also specific differences within the single neighborhoods.

Five sections, besides the introduction, complete this study. The second and third sections are related to the research design: data and methodology are here described. The results and implications are explained in the fourth and fifth sections. Finally, the sixth and last section focuses on theoretical and practical implications, limitations, and future developments.

2 Data

The study has been realized using a dataset composed by different groups of variables (Table 1).

The choice of these dimensions and the relative variables has been affected by the literature reviews. Specifically, we have investigated the covariates used by other researchers with the aim to include in the model only the elements that previous studies have demonstrated to have really influenced Airbnb price determination. More in details, in literature the covariates used for price determination can be grouped in specific categories such as site characteristics, reputation, personal and amenities attributes. Analyzing the literature, we have discovered that researchers have included different variables inside each category. For instance, in the group site characteristics, Faye (2021) has included aspects such as the distance to the city center and the proximity to shopping areas or tourist attraction areas; Önder et al. (2019) the number of points of interest in the surrounding area; Cai et al. (2019) the number of Airbnb listings in the same district. In the group reputation, Faye (2021) has included aspects such as superhost badge, number of reviews, star ratings, host listing counts, host profile picture, response time, and cancellation rate. Instead, Cai et al. (2019) have considered only the variables number of reviews and the overall rating as part of this group. They have inserted the variables response time, cancellation rate, and Superhost into the last group of variables, called personal and amenities attributes. Chattopadhyay and Mitra (2019) have taken into account the variables bathrooms, beds, number of reviews, review scores rating, and reviews per month as elements of the last group.

Analyzing the similarities and the differences in the variables classification, we have decided to define six specific groups of variables. They are called: transports, culture, crowd, property, management, and time.

More in detail, the variables in transports include the distance from the accommodations to the closest subway stop and to the city center (i.e. Pantheon), as well as the number of bus stops within 200 ms. Some researchers have discovered that Airbnb accommodations are often close to the main tourist attractions (see for instance Gutiérrez et al. (2017); Romano (2021); Celata and Romano (2022)); concentrated in specific areas of tourist cities (see for instance Dudás et al. (2017)); and located in areas that are attractive and accessible by public transport (Quattrone et al. 2016, p. 1391). Moreover, the same researchers have investigated the impact of distance to the nearest bus stop, to the nearest bus/train station, or to the city center (see for instance Voltes-Dorta and Sánchez-Medina (2020)). All these aspects have been summarized in the categories transports with the aim to investigate if the proximity to the city center or to public transportation facilities can influence the Airbnb price.

The group of variables called crowd measures the number of other accommodations or hotels close to each accommodation. Specifically, we consider the number of Airbnb accommodations and hotels within 50 meters for each accommodation, as a well as those within 500 meters. Other researchers have inserted these variables in their model and they have labelled them with names such as market conditions (Faye 2021), listing location (Cai et al. 2019), and supply and demand dynamics (Ye et al. 2018). In this study, we have chosen the label crowd focusing on the aim to comprehend if the major presence of Airbnb listings and hotels, which can be assimilated to a crowd, can impact on price.

The dimension of culture concerns the number of monuments located within 500 and 2000 meters from the accommodation. Different researchers have analyzed the impact of tourism attractions and their distance from Airbnb accommodations (see for instance Voltes-Dorta and Sánchez-Medina (2020)). In this study, we take into consideration tourist attractions such as cultural monuments, museums, and churches. We believe that people choose Rome as a tourist destination also to visit its famous monuments, and for this reason we add these variables in the model and called their group culture, as the goal is in fact to investigate the impact of cultural attractions on price determination.

The group of variables related to the characteristics of the property includes the number of bedrooms and of bathrooms in the accommodation, as well as a dummy variable concerning the type of accommodation (1 if entire home, 0 if room). Analyzing the literature, we have found out that several researchers have included these variables in their price determination models (see for instance Chen and Xie (2017); Gibbs et al. (2018)).They have used different labels to define this category, such as characteristics of listing (Chen and Xie 2017), listing attributes (Cai et al. 2019), rental listing features (Kakar et al. 2016). We have also discovered that the characteristics of the accommodation are usually joined with other variables such as the number of reviews, the provided services, and the Superhost Badge. In this paper, we have decided to distinguish between the characteristics of the accommodation and services with the aim to comprehend if specific accommodation features are able to impact on the price determination independently of the provided services and the positive reputation of the host. This decision has also influenced the choice of the labels. In fact, we have chosen the label property for the specific characteristics of the accommodation; and the label management for those aspects related with both the activities directly handled by the hosts and their reputation.

As we just noticed, the group of variables called management includes those aspects that can be directly handled by hosts when managing their accommodations, i.e. the flexibility of cancellation policies, the amount of time that guests must wait on average to get an answer to their questions, the minimum number of nights that must be booked, the number of photos published on the Airbnb online platform, the Business Ready status, and the Superhost status. Before moving on to explain the choice of these variables, a few words are needed here to explain the last two variables. The Business Ready status is granted to accommodations equipped with services to support business travelers, such as Wi-Fi, laptop-friendly workspace, iron, hangers, shampoo, hairdryer, and all the other essentials such as toilet paper and clean towels. On the other hand, the Superhost status is attributed by Airbnb to accommodations that have fulfilled in the previous 12 months the following criteria (https://www.airbnb.com/d/superhost, accessed on October 31st, 2022): host at least 10 stays per year; honor at least 99% of the reservations unless there is an extenuating circumstance; answer guests’ requests within 24 h at least 90% of the time; and achieve a 4.8+ overall rating.

Now let us return to explain the choice of variables in the management group. The variables Cancellation policy and Response Time have been chosen for two reasons: firstly, Airbnb has declared to use them in the attribution of the badge of Superhost; and secondly, Gunter (2018) has discovered that they are actually relevant in the Superhost badge attribution. Since in literature researchers have discovered the existence of a positive relationship between the Superhost status and the price (see for instance Ma et al. (2017)), we believe that the two aspects, being involved in the attribution of the Superhost badge, can have an impact on the price determination, too. Similarly we believe that the variable Business Ready, collecting together several services of the accommodation, can be relevant in the price determination. Finally, the variable Superhost has been chosen as it is intended as a measure of the hosts’ reputation, whereas the variable Number of Photos measures the host’s commitment in the promotion of their accommodation.

In literature, different researchers have used the same variables, but grouped them differently. In some cases, the researchers have combined the five variables in two different groups. For instance, Ye et al. (2018) have included the first three variables in a category called listing features and Lorde et al. (2018) in the group reputation of the product. Meanwhile, the last two variables are grouped together in categories called host attributes by Cai et al. (2019) or reputation of the host by Lorde et al. (2018). In other cases, the five variables are joined together in the same group called reputational variables as in Faye (2021). We have decided to join all this variables in one group called management because this way we can put together aspects that are directly influenced by the host, as Cancellation policy, Response Time and the Business Ready, with others that can be interpreted as the result of the good management of an accommodation, as the badge of Superhost and the number of photos.

The group of variables called time consists of three dummy variables created to represent how low season and high season influence prices. More specifically, these variables estimate the impact that the high season might have on prices, in order to comprehend if and how prices are influenced during those periods of time with high tourism flows. We have created the variables attributing the value 1 to each day or month in the high season (like national holidays, the summer and so on). Other researchers have created similar variables, like for instance, Ye et al. (2018) who have used specific days of the year and the different days of the week to create a group called temporal features.

The final group of variables used in the analysis is called instrumental and includes the response variable “published nightly rate of the accommodation”, as well as the neighborhood where the accommodation is located. The response variable has been transformed into an ordinal variable of five classes according to its quantiles (i.e. \(1 = [0,57], 2 = (57,82], 3 = (82,110], 4 = (110,155], 5 = (155,\infty )\)). We chose five classes as this number is a good trade-off to detect the variability of the price (from very low to very high), avoiding at the same time its excessive spreading out.

The city of Rome has been divided into 116 non-administrative units: 22 districts, called in Italian rioni, located in the historic center of the city, 35 quarters surrounding the historic center of Rome, 6 suburbs, and 53 zones in the countryside of the city, as shown in Fig. 1. Not all 116 non-administrative units have Airbnb accommodations, and for this reason the analysis has been held only for those areas of Rome where the listings are located. Specifically, we have focused on 61 different geographical areas of Rome, which are represented in different colors with respect to the typology of non-administrative units in Fig. 1, where more than 20 Airbnb accommodations are located. The aim was to obtain a significant sample that could be actually useful for the analysis.

The data that have been used in the analysis have been either bought or downloaded for free. Specifically, the data about transports, crowd, and culture have been downloaded for free from the website Open Data Roma Capitale (https://dati.comune.roma.it/), which is an official website of the Municipality of Rome. The Open Data Roma Capitale includes information on different topics, such as public transportation, libraries, local police, and nurseries. On the other side, the data related to management, property, time and instrumental have been provided by AirDNA LLC, a company that manages Airbnb data. Our analysis takes into consideration exclusively the data for one year, since the aim is to carry out a cross-section analysis capable of supporting hosts in the definition of a price strategy in line with the most relevant dimensions and the characteristic of the neighborhood where the accommodation is located. For data coherence reasons, we used the data of 2016.

Table 1 Variables used in the analysis
Fig. 1
figure 1

Neighborhoods of Rome: in green (Quartieri), red (Rioni), purple (Suburbi) and yellow (Zone) the neighborhoods considered in the work, in grey those not analyzed. In the centroid of each analyzed neighborhood a corresponding label is printed. (Color figure online)

3 Methodology

In order to define the two price indicators q and r for a specific neighborhood, we operate in three different phases. In the first phase the Proportional Odds Model framed in Vector Generalized Additive Model introduced by Yee (2015) is used to model the price by the variables of one dimension each time, and to estimate the probability for accommodations to belong to a specific class of price. In the second phase, the estimated probability is used to evaluate the ability of the model to correctly predict the price class of accommodations by three indexes defined in [0, 1]. Finally, in the third phase, these three indexes are combined and the two price indicators are obtained.

3.1 Price modeling

Notationally, let us consider \(\textbf{D} = [ \textbf{y}, \textbf{x}_1, \dots , \textbf{x}_p]\) the dataset made up of an ordinal outcome variable \(\textbf{y}\) that can assume values in the set \(\{1, 2, \dots , K \}\), where 1 corresponds to the lowest class and K to the highest one, and p independent variables \(\textbf{x}_i\) each of them can be mapped to a set \(M \in {\mathcal {M}}\) gathering variables of a specific dimension. We define a function \(\nu (\cdot )\) that corresponds to the fitted probabilities of a Proportional Odds Model trained on bootstrap samples \(\textbf{T}\) and tested on corresponding out-of-bag samples \(\textbf{V}\) using \(\textbf{y}\) as the dependent variable and those belonging to a set M as independent ones. The bootstrap strategy is carried out to reduce the uncertainty and avoid potential overfitting issues. Specifically, Proportional Odds Model is defined through \(\text {logit}\)

$$\begin{aligned} P(y \le k |\textbf{x})=\eta _{k}(\textbf{x}) \end{aligned}$$
(1)

subject to the constraint

$$\begin{aligned} \eta _{k}(\textbf{x})= \beta _{(k)1}^{*} +\textbf{x}'_{[-1]} \beta ^{*}_{[-(1:K)]} \end{aligned}$$
(2)

with \(k=1,\dots ,K\), where k identifies the level of y and moves from 1 to K, \(\textbf{x}_{[-1]}\) is the \(\textbf{x}\) with the first element deleted (the intercept), \(*\) denotes the regression coefficients that have to be estimated (Yee 2015, p. 11).

The dataset \(\textbf{T}\), is a bootstrap sample of \(\textbf{D}\), that is it is created by sampling with replacement from \(\textbf{D}\) the same number of observations. Instead, \(\textbf{V}\) is made up of the out-of-bag observations, that are the observations of \(\textbf{D}\) which are not included in \(\textbf{T}\).

The output of the function \(\nu (\cdot )\) is a \(N \times K\) matrix \(\textbf{P} = \nu (\textbf{y} \sim M, \textbf{T}, \textbf{V})\), where N is the number of the observations that composed the dataset \(\textbf{V}\). Specifically, each row \(\textbf{P}_{i\cdot }\) is a vector \((p_{i1}, p_{i2}, \dots , p_{iK})\) of probabilities, with \(\sum _{j=1}^K p_{ij} = 1\), that express the ability of the variables belonging to the dimension M to explain the target variable \(\textbf{y}\) (i.e. price). In fact, if the estimated probabilities are similar to each other, the prediction uncertainty is high and the dimension is not able to explain the price classes. On the contrary, if one probability is close to one, and the others to zero, the prediction uncertainty is low and the dimension explains very well the price classes.

3.2 Assessment of price class prediction

Referring to the notation above, the first phase consists in estimating a matrix \(\textbf{P}\) for each set of variables M belonging to \({\mathcal {M}}\). In the second phase, for each dimension, the corresponding estimated matrix \(\textbf{P}\) is used to compute three different indexes expressing the ability of the model to predict correctly the class of price through the variables of the dimension. More in detail, the first index, called \(\theta \), measures the concentration of the probabilities of \(\textbf{P}_{i\cdot }\) and corresponds to the complementary of the Gini’s Index normalized. Namely,

$$\begin{aligned} \theta _i = 1- \left[ \frac{\left( 1- \textbf{P}_{i\cdot }\textbf{P}_{i\cdot }'\right) \cdot K}{K-1}\right] . \end{aligned}$$
(3)

\(\theta _i = 1\) means that the model assigns a probability of 100% to one class and 0% to the other \(K-1\) classes to be the price of the accommodation i. On the contrary, \(\theta _i = 0\) means that the model gives the same probability to all classes to be the price of the accommodation i.

For assessing the goodness of fit of the model, another aspect to take into account is prediction correctness. Being the dependent variable ordinal, so the predicted class \({\hat{y}}\) is not that characterized by the highest probability, but the median one. The absolute difference between \({\hat{y}}\) and y provides the information of how wrong the prediction is: in fact, if the prediction is correct \(|{\hat{y}} - y|= 0\), otherwise the higher value of the difference the more wrong the prediction is. Consequently, to assess the prediction correctness the following normalized index \(\gamma \) is considered

$$\begin{aligned} \gamma _i = 1 - \frac{|{\hat{y}}_i - y_i |}{K-1}. \end{aligned}$$
(4)

If \(\gamma _i\) assumes a value equal to 1 then the model has predicted correctly the class of the accommodation i. The lower value the higher difference between the correct class and the predicted one. The worst situation occurs when the lowest class (i.e. 1) is predicted while the highest one (i.e. K) was the true or vice versa.

The third index, called \(\delta \), measures the gap between the probability of the true class and that of the highest one, that is the probability of the predicted class. It computes the complementary absolute difference value between the probability fitted for the true class (\(\textbf{P}_{i{{\hat{y}}}}\)) and that for the predicted one (\(\textbf{P}_{i{y}}\)). Specifically,

$$\begin{aligned} \delta _i = 1- |\textbf{P}_{i{{\hat{y}}}} - \textbf{P}_{i{y}}|. \end{aligned}$$
(5)

If \(\delta _i = 1\) then either the predicted class is the true one or they differ but have an identical probability. In both situations, the model provides the highest probability to the true class for the accommodation i. The lower \(\delta _i\) the less likely the true class can be predicted.

3.3 Price indicators definition

Finally, the third phase aims to estimate the two price indicators able to measure the impact and the relative importance of the dimensions on the price determination. Specifically, for each dimension the values of the accommodations of three indexes introduced above are combined by multiplying them into a unique index that summarizes the capacity of the corresponding independent variables to explain the price. Then for all the dimensions, the values of the accommodations of this unique index are both averaged obtaining the first price indicators q organized in a vector \(\textbf{q}\), and aggregated according to the true classes of each accommodation by averaging, obtaining a descriptive matrix \(\textbf{A}\). The matrix \(\textbf{A}\) is made up of K rows and a number of columns equal to the number of dimensions and provides information useful to analyze deeply the neighborhoods. It gives information about the capacity of each dimension to explain a specific class of price. Nonetheless, that is inconvenient for comparison aims between neighborhoods. Instead, the vector \(\textbf{q}\) is made up of one single value q per dimension, and summarizes the information of \(\textbf{A}\). In fact, it corresponds to the weighted average of the values of the K price classes in \(\textbf{A}\). To sum up, it consists in a synthetic value expressing the impact of each dimension on the price determination and allows easy comparison between neighborhoods.

Algorithm 1 illustrates all steps needed to obtain \(\textbf{q}\) and \(\textbf{A}\). Computationally, for each set of independent variables \(M \in {\mathcal {M}}\), B iterations are carried out. At each iteration b, firstly the bootstrap sample \(\textbf{T}^b\) and the corresponding out-of-bag one \(\textbf{V}^b\) are generated and used as inputs of the function \(\nu (\cdot )\), setting \(\textbf{y}\) as the dependent variable and the variables belonging to one of the sets M as independent ones. Then, the matrix of probabilities \(\textbf{P}^{Mb}\) obtained from \(\nu (\cdot )\) is used to compute the \(\theta \) index for each observation. The true classes \(\textbf{y}^b\) of the dataset \(\textbf{V}^b\) is compared to the predicted ones \(\mathbf {{\hat{y}}}^b\) to compute the \(\gamma \) index. Next, the probabilities associated with the true classes and the predicted ones are used to compute the \(\delta \) index. Successively, a matrix \(\textbf{U}\) stores the sum of the values obtained after multiplying the three indexes, according to the true class associated with the observations and the iterations. For computational reasons, in the algorithm we introduce an indicator function \(I_k(\textbf{y}^b)\), that generates a vector with elements equal to 1 if \(y^b = k\) and 0 otherwise. The matrix \(\textbf{Y}\), instead, stores the frequencies of the true classes in each iteration. Finally, the results collected in \(\textbf{U}\) are summed by iterations and then, using the frequencies contained in \(\textbf{Y}\), averaged to obtain the matrix \(\textbf{A}\) and the vector \(\textbf{q}\). They store the indicator values of all dimensions concerning, respectively, the K price classes and their weighted combination.

figure a

For convenience reasons, the information of \(\textbf{q}\) of all neighborhoods are then combined into a matrix \(\textbf{Q}\), such that each row corresponds to a vector \(\textbf{q}\) concerning a neighborhood. In such a way, the matrix \(\textbf{Q}\) groups together the importance of the dimensions for each neighborhood, showing the different levels of impact of the dimensions on the price with respect to the neighborhoods.

A second normalized price indicator r is obtained from \(\textbf{Q}\): it measures the relative impact that each dimension has on a specific neighborhood with respect to others. The price indicator r is estimated for each neighborhood and each dimension, and it is stored in a matrix \(\textbf{R}\), where each column i is computed as follows

$$\begin{aligned} \textbf{R}_{\cdot i} = \frac{\textbf{Q}_{\cdot j}^0 - \min (\textbf{Q}_{\cdot j}^0)}{ \max (\textbf{Q}_{\cdot j}^0) - \min (\textbf{Q}_{\cdot j}^0)} \end{aligned}$$
(6)

with \(\textbf{Q}^0 = \textbf{Q} - \left( \textbf{Q}\textbf{1}/ |{\mathcal {M}} |\right) \textbf{1}'\) expressing the variations of \(\textbf{Q}\) from its row means. It defines how much the first price indicator of the dimensions of each neighborhood differs from the average: a low (high) value represents a small (big) relative importance of the dimension in the neighborhood. It can assume values in a range [0, 1]. It assumes a value equal to one in the neighborhood where the dimension has the highest impact concerning the other neighborhoods, the value zero with the lowest impact. The proximity of the value 1 identifies an elevated relative impact of the dimension. It is a sort of rank, where the value one identifies the highest relative impact.

In fact, the indicator allows defining a rank for each dimension that identifies where a specific aspect has a major relevance. The value 1 identifies the neighborhood where the dimension has the highest impact with respect to the other neighborhoods. The other values, ranging from 1 to 0, identify the relevance of the dimension with respect to the other neighborhoods. The value of r can be considered as a position occupied by the neighborhoods with respect to the importance of a specific dimension in the price determination.

4 Results and discussion

For each neighborhood, we have calculated firstly the impact of each dimension for the five different levels of price (i.e. a matrix \(\textbf{A}\) for each neighborhood); secondly, the average impact of each dimension on the price (i.e. price indicator q, stored in \(\textbf{Q}\)); and, finally, the relative importance of dimensions within the neighborhood (i.e. price indicator r, stored in \(\textbf{R}\)).

4.1 Example of price indicators estimation for a single neighborhood

We are now going to explain how a single neighborhood can be deeply analyzed using the price indicators mentioned above. Specifically, we will be analyzing the neighborhood of Trastevere (R21), one of the traditional tourist quarters located in the center of the city. The first information that we obtain can be found in the matrix \(\textbf{A}\), which shows how the impact of the six dimensions on price classes changes within the neighborhood. By investigating the results shown in Table 2 we can observe how the values of all dimensions increase, moving from the lowest price to the highest one. The low values recorded for the lowest price classes show how these dimensions cannot explain these specific classes. In other words, the price of cheaper accommodations is not affected by our six dimensions. On the contrary, the highest values recorded for the highest price classes indicate that the dimensions influence the price. For instance, the dimension property assumes the value 0.0466 for the first class and the value 0.1768 for the fifth class, highlighting that the latter class is affected by that dimension, whilst the former is much less affected. From a practical point of view, it means that if hosts want to set a very-high price they have to offer accommodations with a high number of rooms and bathrooms. Nonetheless, the fact that the very-low price class is almost not affected by the property dimension also means that guaranteeing high standards with respect to this dimension is a necessary but not sufficient condition to ensure a very-high price class: it can happen that, despite high standards at the property dimension, other elements can negatively influence the price.

Moreover, the two main price indicators q and r (Table 2) confirm that, in order to control the price as efficiently as possible, it is worthwhile for hosts to focus on the property dimension, since its overall importance is the highest (\(q=0.1038\)). Instead, the second indicator r highlights that in this neighborhood the effect of the dimension culture (\(r=0.9646\)) is one of the highest among all Rome’s neighborhoods: consequently, if hosts with accommodations in several neighborhoods want to focus on this dimension, they will know that in Trastevere they will get the best results.

4.2 Price indicator q

The values of the first price indicator in Fig. 2 and Table 4, show how the six dimensions impact differently on the price of accommodations in different neighborhoods. The dimension transports has a high impact in the neighborhoods located in the center of Rome and in the suburbs. The highest impact is registered in the Suburbs of Torre Maura (Z3) (\(q=0.3051\)), located in the East of the city and recently included in a process of urban regeneration, and Torrino (Z4) (\(q=0.2850\)), one of the most popular and livable places in Rome, with a variety of civil buildings, such as shopping centers, sport facilities, recreational centers, and buildings of worship. These suburbs are far away from the city center, and for this reason the presence of train or subway stops is crucial.

The dimension crowd has a high impact on the price of accommodation located in the center such as in the neighborhood of Colonna (R6) (\(q=0.2060\)), a tourist quarter with well-known historical monuments and government buildings, and in suburban neighborhoods such as Ponte Mammolo (Q20) (\(q=0.3388\)), which is a residential area of the city. The results highlight also how the presence of other tourist facilities, like hotels or Airbnb accommodations, increases the possibility of a price raise. They suggest that the presence of other accommodations in the neighborhood tend to attract more guests, since tourists might think that an area with more accommodations is also a one with more attractions, monuments, and services.

The dimension culture is significant in some of the neighborhoods located in the outskirts and in the center of the city. Generally, the impact of this variable is not high. However, in some neighborhoods the proximity to monuments seems to affect the the price of the accommodations. This means that the cultural elements are important in defining the price, but only if they are close to the accommodation. The dimension is significant in the above mentioned neighborhood of Torre Maura (Z3) (\(q=0.2561\)), and in Rome’s city center, in the neighborhood Colonna (R6) (\(q=0.2040\)).

Higher values have been recorded for the dimension property. They underline how the booking of an accommodation is influenced by the property characteristics, and how the number of bedrooms and bathrooms influence prices. The highest impact of this dimension has been recorded in Torre Maura (Z3) where the importance is 0.5592; and Castel Giubileo (Z1), a neighborhood located in the Northeast of the city, with a value of 0.5096. These areas have been recently interested by an urban regeneration, that apparently improved the attraction of new Airbnb guests.

On the contrary, the dimension management is more relevant in the outskirts and, specifically, in the neighborhoods Giuliano-Dalmata (Q12) located at south of the city, Ponte Mammolo (Q20), Castel Giubileo (Z1) and Val Melaina (Z5). Their estimated values are respectively 0.3133, 0.5456, 0.4609 and 0.4306. This high relevance can be justified by the fact that guests book accommodations in the suburbs only if the services somehow enhance their experience. Services become in this case the main element to influence the level guests’ satisfaction and, consequently, the price determination of future rents.

Similar results have been recorded for the dimension time, which explains the impact of tourism high season on the accommodation rate. This aspect impacts only on the outskirts, which means that the rates of the accommodations located in the city center are not influenced by the different days or months in which they are booked. On the contrary, the tourist high season, which in Rome corresponds to all the weekends and to the months of March, April, May, September and October, causes a price increase in the Neighborhoods of Alessandrino (Q1) (\(q=0.4696\)), Ponte Mammolo (Q20) (\(q=0.4011\)) and Prenestino Centocelle (Q22) (\(q=0.3892\)).

Fig. 2
figure 2

Distribution of the first price indicator q of the six dimensions on the neighborhoods

4.3 Price indicator r

Next, we have evaluated the relative importance of the dimensions among the neighborhoods. Figure 3 and Table 5 highlight how the dimensions transports, crowd and culture have a major relative importance in the central neighborhoods of the city and in the suburbs located at the east and south side. On the contrary, the dimensions property, management and time have major relative importance only in the outskirts located at the east and south side of the city.

Overall, we can state that the relative impact of the dimensions property, management and time is significant in the suburban neighborhoods. For instance, the dimension property assumes the maximum value (\(r=1.000\)) in Torre Maura (Z3), where the distance from the center is overcome by staying in nice and bigger accommodations. The dimension management assumes the maximum value in the suburban neighborhood of Ponte Mammolo (Q20). In this case, the distance from the center is overcome providing a variety of services that justify higher prices. Instead, the dimension time assumes the maximum in Alessandrino (Q1), where the distance from the center allows increasing the price only in tourism high season.

The dimensions transports, crowd and culture affect price both in the central neighborhoods of Rome and in the outskirts. Although the dimension transports assumes the maximum in the suburb Torrino (Z4), one of the most popular and livable places in the capital of Italy, high values can be recorded even in central neighborhoods such as Sant’Angelo (R16) (\(r=0.8077\)) and Ponte (R12) (\(r=0.7189\)). Similarly, the dimension crowd assumes the maximum in the suburb Monte Sacro Alto (Q14), but high values can be recorded also in central neighborhoods such as Campitelli (R2) (\(r=0.9451\)), one of the most visited neighborhoods for the presence of numerous cultural monuments. Finally, the dimension culture assumes its maximum value in Ponte (R12) and in the other central neighborhoods that are rich of monuments, museums and churches.

Fig. 3
figure 3

Distribution of the second price indicator r of the six dimensions on the neighborhoods

Table 2 Matrix \(\textbf{A}\) and the two price indicators q and r estimated for the neighborhood Trastevere (R21)

4.4 Cluster analysis

To conclude the investigation, we have used the information derived by the estimation of matrices of the indicators q and r to perform a hierarchical clustering using Ward’s method to identify similarities among the neighborhoods. We have obtained four clusters, that have been characterized by the average of the two price indicators (hence \({\bar{q}}\) and \({\bar{r}}\)) of the six dimensions (Table 3). Figure 4 shows a contrast between the city center (green and blu clusters) and the suburbs (yellow and red clusters).

The green cluster includes tourist and residential neighborhoods that are rich in monuments. Furthermore, we can state that in this area the major impact on the price is generated by the dimensions property (\({\bar{q}} = 0.2193\)) and management (\({\bar{q}} = 0.1597\)). Instead, the major relative impact on the price is generated by the dimensions transports (\({\bar{r}} = 0.6665\)), crowd (\({\bar{r}} = 0.8236\)) and culture (\({\bar{r}} = 0.8743\)). Consequently, all dimensions should be considered by hosts while fixing the price for their accommodations.

The blue cluster have similar characteristics with respect to the previous group. Government offices and embassies are located in the neighborhoods included in this cluster. The dimensions that affect prices the most are property (\({\bar{q}} = 0.1425\)) and management (\({\bar{q}} = 0.0658\)). The dimensions with the highest relative importance are, in descending order, culture (\({\bar{r}} = 0.8526\)), crowd (\({\bar{r}} = 0.6285\)) and transports (\({\bar{r}} = 0.5949\)). In this case, the proximity to cultural monuments gives hosts the possibility to increase prices.

The red cluster consists of residential neighborhoods, where also university buildings and the Cinecittà film studios are located. The dimensions property (\({\bar{q}} = 0.3104\)) and time (\({\bar{q}} = 0.2317\)) are the most relevant in this area. The relevance of the dimension time is confirmed also by the results of \({\bar{r}}\), which assumes the highest value in this cluster. The type (apartment or single room) and size of the accommodation strongly influence price. The same happens for high tourism seasons, since during this days or months hosts have the possibility to increase the rate for their apartments.

Finally, the yellow cluster includes the suburban neighborhoods recently redeveloped and far from the city center. The price of accommodations is generally lower in this areas and the dimensions that influence the rates are property (\({\bar{q}} = 0.3670\)) and management (\({\bar{q}} = 0.4110\)). However, also the other dimensions have a strong relevance and their impact ranges from the \({\bar{q}} = 0.1816\) of the dimension culture to the \({\bar{q}} = 0.2586\) of the dimension time. The major relative impact is recorded for the dimensions management (\({\bar{r}} = 0.7118\)) and crowd (\({\bar{r}} = 0.5721\)).

Fig. 4
figure 4

Distribution of the four clusters in the neighborhoods

Table 3 Average of the dimensions of the indicators q and r by cluster belonging neighborhoods

5 Conclusions

5.1 Theoretical implication

The new forms of accommodations and the necessity to manage them properly generate a new challenge: the determination of the correct price and the evaluation of its determinants. For this reason, we have defined two price indicators, called q and r, to evaluate the relevance of different dimensions on the price and to highlight the differences among geographical areas.

The first indicator q allows identifying the relevance of each dimension on price determination, attributing a weight to each dimension. This weight allows to rank the different dimensions and to measure the impact of each group of variables on the price. On the other side, the indicator r measures the relative importance that each dimension has in specific neighborhoods.

Some innovative elements can be identified in the estimation of the two indicators. Firstly, we have estimated the indicators combining different methodologies and operating in three steps. In the first step, we have used the Proportional Odds Model to evaluate the impact that each dimension have on different levels of price. In this way, for each geographical area taken into account, we have estimated the impact of dimensions on five different levels of price. In the second step, the estimated probability is used to evaluate the ability of the model to predict correctly the price class of accommodations by three indexes defined in [0, 1]. Finally, the three indexes are combined into two price indicators.

Secondly, we have obtained different outputs that can be used to analyze the Airbnb price determination. Specifically, the matrix \(\textbf{A}\) summarizes the impact that each dimension has on different levels of price and for each analyzed neighborhood. The matrix is able to create a useful framework to comprehend the price behavior in a specific geographical area. The two indicators q and r allow estimating the relevance of each dimension on price determination, offering two specific pieces of information: more in detail, the indicator q measures the different impact of the dimensions on the price with respect to the single neighborhoods; whereas the indicator r identifies the relative impact that each dimension has on a specific neighborhood with respect to the others. In other words, the two indicators allow evaluating the relevance of each dimension inside the neighborhood and with respect to the other neighborhoods. In this way, we can analyze the price determination for a specific area and, at the same time, we can also compare the results in different neighborhoods, in order to comprehend where a specific dimension have a major relevance.

Finally, the proposed price indicators have not been developed according to the features of a specific location, and therefore they can be generalized. Consequently, they can be used for all kinds of locations such as countries, regions, cities, neighborhoods, etc., allowing also the comparison among them. In the specific case of Rome, the results have demonstrated that the indicators record the differences among neighborhoods: that prices change with respect to the location of the accommodation: and that there are differences between the city center and the outskirts.

5.2 Practical implications

From a practical point of view, it is crucial to comprehend which aspects really influence the price and which are irrelevant. Moreover, knowing the relevance of dimensions for each neighborhood supports hosts in the correct definition of prices, in order to become more competitive on online platforms such as Airbnb.

Our results can indicate to each host which dimensions really impact on the price of accommodations, based on where they are, on the services included and on the time of the year in which they are booked. This way, hosts have a tool to help them establish a price in line with the market. Specifically, the old hosts, that is those who already have experience on the Airbnb platform, can better understand the dimensions that affect the rate and, eventually, modify it accordingly. On the other hand, new hosts can use the results of the model to establish the best renting class price for their accommodation. Moreover, the estimation for the neighborhood allows to better explain the price and its determinants taking into account the geographical proximity. We suppose that accommodations close to each other are subject to the same impacts. These externalities can be also perceived by tourists, and they can similarly impact on the price. Defining a model able to represent the importance of proximity is relevant in order to evaluate the impact with respect to geographical characteristics. Then, the identification of different rankings for each neighborhood allows to discover how the rate of Airbnb accommodation is influenced by the local environment and the elements located in the area. Specifically, transportation facilities, monuments, and the proximity of other accommodations affect the management of an accommodation and its price. This impact demonstrates the need to insert in the model variables that are not directly related to accommodation management.

To conclude, we have defined two indicators able to estimate the price determinants and the differences among differente neighborhoods, offering an important tool to support hosts in the definition of one of the most important aspects in the management of an Airbnb accommodation, i.e. its price.

5.3 Limitations

The study has some limitations. Firstly, we have considered six dimensions in the price determination. We believe that more aspects can be included in the model. For instance, a variable related to weather conditions could be also taken into account. We cannot be sure, for instance, that the impact of the proximity of public transportation facilities is the same in all seasons: we might suppose, on the contrary, that a bus or a train stop nearby an accommodation is more relevant in winter, when rain and cold push tourists to spend as little time as possible on the outside. Secondly, we have analyzed the impact of the determinants of price only for one city. Taking into account other cities might be interesting in order to comprehend if the weights of the dimensions can be different in other tourist areas. Future research should therefore consider more dimensions and more destinations.