Adverse selection in iBuyer business models—don’t buy lemons!

The rise of instant buyer (iBuyer) businesses in the past years has made automated valuation models (AVMs) an important part of the property market. Although iBuyer services are in demand, large actors within the segment have reported dissatisfying profits over time. The business model is subject to adverse selection as homeowners based on their superior knowledge of their home are more likely to accept overpriced bids than underpriced bids, making the iBuyer purchase more overpriced dwellings. In this paper, we use a dataset consisting of 84,905 apartment transactions from Oslo, the Norwegian capital. We use 80% of the dataset to train three different AVMs similar to those used by iBuyers. Next, we construct some simple purchasing rules from the predictive accuracies found in the training dataset. Finally, taking the remaining 20% of the data in a test dataset, we introduce an adverse selection indicator based on accepted probability distributions and calculate the average expected resale profits per apartment for a hypothetical iBuyer. We find that adverse selection has a large negative impact on average profits for the hypothetical iBuyer. Furthermore, the simple purchasing rules are able to improve the profit by 1 percentage point per apartment when adverse selection is present.


Introduction
Buying and selling homes are the largest transactions most people make during their lifetime.Traditionally, the process of selling a house is through a broker, with a listing and an auction.Instant buyers, or iBuyers, challenge this process.The iBuyer business model involves automated valuation models (AVMs) to predict the market value of a home.It uses this prediction to give a fast bid on the dwelling before a potential auction starts.The seller provides the iBuyer with information on the apartment, before quickly receiving a 'take-it-or-leave-it' bid (s)he can choose to accept or reject without an auction.This rapid and convenient process, however, does not come without challenges.
AVMs are statistical prediction models that try to predict the value of an object.Although progress is continuously made to develop AVMs with the highest accuracy possible, they are not able to fully reflect reality and capture all the factors that affect the price of a home.Apartments will be both over-and undervalued by the AVM algorithms with symmetric probability.In a case, where all bids are treated equally by homeowners, the iBuyer is expected to buy an equivalent share of underpriced and overpriced apartments, suggesting that the additional resale profits and losses will zero each other out.However, rational homeowners are intuitively more likely to accept a high bid rather than a low or a correct bid, suggesting that iBuyers may suffer from purchasing more of the overpriced dwellings.Buchak et al. (2020) argue that iBuyers must focus on the most liquid dwellings to limit adverse selection.Little or no research beyond Buchak et al. (2020) has studied adverse selection in the context of iBuyer businesses.The purpose of this paper is to examine the consequence and the extent of adverse selection as a built-in mechanism in the iBuyer business model and show how this effect can be limited through simple purchasing rules.
During the fall of 2021, one of the biggest companies competing in the iBuyer market in the United States, Zillow, is pulling the plug on its iBuyer operations. 1he purchase of overvalued dwellings in markets with low liquidity were identified as causes of the failure.The company bought around 10,000 dwellings but managed to sell only 3000 of them.Consequently, Zillow had to take a write-down on inventory of 300 million USD.Although the loss cannot solely be accredited to adverse selection, this event underlines some of the challenges in the iBuyer business model and motivates our research.
The dataset used in the paper consists of 84,905 apartment transactions in Oslo.Firstly, three AVMs are trained to predict apartment prices, using 80% of the data.The valuation models are based on linear regression, gradient boosting, and a support vector machine.We also use the results from the training dataset to create simple purchasing rules and to simulate an ex-ante situation.After training the models, the most important predictors are consequently used as dimensions for splitting the data into subgroups.We analyze the predictive performances of the AVMs for the different subgroups, before formulating purchasing rules to avoid bids in groups with bad performance.Lastly, the average expected resale profit per apartment is examined for a hypothetical iBuyer using a test set with the remaining 20% of the dwellings.The profit calculations are done with and without adverse selection and purchasing rules, to address their financial impacts.Adverse selection is implemented through accept probability distributions, indicating how likely a seller is to accept an offer when the bid is a certain percentage lower/higher than the seller's perceived valuation.Using out-of-sample observations allows replicating ex-ante purchasing decisions that iBuyers need to make before they learn about the actual market value for which they can resell the property.
The paper finds that the consideration of adverse selection leads to a large reduction in expected profit per dwelling for the hypothetical iBuyer, from 6.29-7.96%without adverse selection to 0.19-1.21%when adverse selection is taken into account.The results imply that adverse selection poses a noticeable threat to iBuyer businesses.In contrast, we also find that the implementation of simple purchasing rules increases the average expected profits per apartment by 1.03 to 1.57 percentage points.The results are robust to changes in market conditions such as the contribution margin of the iBuyer, the perceived convenience of the iBuyer service, the proxy for homeowners' personal valuation of their dwellings, and the probability of them accepting a bid deviating from this valuation.
The remainder of the paper is as follows: In Sect.2, we present the relevant literature.Sect. 3 describes our dataset, the cleaning process, and the different dependent and independent variables.Sect. 4 introduces the hedonic price, extreme gradient boosting, and support vector machine models.Sect. 5 examines the model performances and develops the purchasing rules.Sect.6 presents the average profits for a hypothetical iBuyer.Sect.7 discusses the results in the light of previous literature as well as the impact for iBuyer businesses.Sect.8 concludes.

Literature review
This section reviews the relevant literature.Thereby, it introduces the iBuyer business model and the use of AVMs in the real estate sector.We then discuss the related adverse selection and lemon problem.Finally, we give an overview of the property market in Oslo, Norway.

IBuyers and aVMs in real estate
iBuyers are PropTech companies which buy and re-sell dwellings.They profit from advanced automated valuation models (AVMs) in accurately assessing the value of the dwellings.iBuyers have the advantage of reducing the time spent in the traditional property sales process as an offer can be received almost "instantly." The traditional sales process requires to get in touch with a real estate agent, who performs research, communicates with potential buyers, and creates listings before arranging a bidding process.The sale to an iBuyer removes most of these steps.The seller provides information on variables such as size, location, and condition which allows the iBuyer to derive a price prediction.The offer to the seller equals this prediction minus a contribution margin captured by the iBuyer.The offer is a takeit-or-leave-it bid before a potential auction on the open market starts.
After having purchased the apartments, the iBuyer proceeds to resell them.In Norway, this is normally done to private households through an English auction.2By accepting the offer from the iBuyer, the seller thereby gets a rapid sale and moves the cost of time on the market over to the company.The average profit margin of an iBuyer is 3.7% per dwelling, while Zillow Offers reported a negative per-unit profit of -2% in the 4th quarter of 2019.Furthermore, bids from Zillow and Opendoor generally corresponded to 98.6% of the AVM predicted value. 3he advantage of AVMs in real estate is a controversially discussed subject.Kok et al. (2017) find strong evidence of the superiority of automated valuation models over traditional appraisals in terms of lower absolute error as well as time cost efficiency.Furthermore, Mooya (2011) finds "no theoretical or practical reasons why AVMs should not completely replace traditional valuers."In contrast, Reed (2008) and Waller et al. (2001) suggest that AVMs should be used as a supplement to enhance rather than an alternative to replace manual appraisals.For the Norwegian housing market, Birkeland et al. (2021) compares the performances of AVM to those of humans and find that AVMs are a valuable tool, but still fall short of human capability in stable markets.
For iBuyers, there is less focus on manual appraisals or physical inspections involved in the housing transactions.Manual appraisals are subject to human bias and subjectiveness, whereas the advantages of AVMs are quick and consistent valuations (Jahanshiri et al. 2011), that are not biased by undue stimulus (Fortelny and Reed 2005).However, due to this lack of physical inspection in the iBuyer business model, there may be aspects that affect the price of a dwelling that the AVMs do not fully capture.

Adverse selection and the "lemon problem"
Adverse selection is a well-known phenomenon within principal-agent theory, and a consequence of asymmetric information between two parties in a contractual agreement (Wilson 1989).Adverse selection generally occurs when a seller has more information about a product than the buyer (Wilson 1989).The buyer cannot observe the quality of the product to a full extent, only the distribution of "good" and "bad" products sold in the past.The seller thus has the incentive to market a "bad" unit as a "good" one (Akerlof 1970).Akerlof (1970) goes further in describing the "lemon problem."The buyer cannot know or observe whether an item is a lemon (bad) or not, and the risk of purchasing a lemon reduces the average reservation price of buyers.This reduced reservation price makes non-lemon sellers less interested in selling, which further increases the proportion of lemons.Genesove (1993) suggests four criteria that must be met for a market with adverse selection suffering from a lemon problem: (i) there must be asymmetric information regarding quality of the good between the seller and the buyer at the time of the purchase, (ii) both the seller and the buyer must value quality, (iii) the price must be determined by the party with less information, and (iv) there must be no institutions that completely remove uncertainty related to the quality of the good (Genesove 1993).Buchak et al. (2020) points out the problems of adverse selection for iBuyers.iBuyers generally operate with higher fees than conventional realtors, indicating that the homeowners selling to these businesses trade profit for a quick transaction.However, the implementation of such quick transactions usually comes at a cost of information loss.
AVM-algorithms will usually predict market prices with a symmetric probability of error on both sides, meaning the iBuyer should in principle give an equal share of too high and too low bids.In a context with no adverse selection, where all bids are accepted by homeowners, the profits from purchasing underpriced apartments would theoretically get zeroed out by the losses of purchasing overpriced apartments.The iBuyer would thus be left with the contribution margin it chooses.However, with asymmetric information about the true market value of the apartment, sellers, who receive too-high offers compared to their perceived valuation, are more likely to accept the offer than sellers, who receive a correct offer, as illustrated in Fig. 1 (Akerlof 1970).
In the iBuyer case, the Genesove (1993) criteria hold, and adverse selection can result in a lemon problem.This happens if the iBuyer increases the premium to stay profitable after taking the increased risk of buying overvalued apartments (i.e., lemons) into account.These increased premiums can result in more "correct" valuations that are turned down, which increases adverse selection further.
Previous research thereby suggests that adverse selection might be a problem in the iBuyer business model.However, the most widely accepted actions to reduce adverse selection in general, signaling and screening, are difficult to implement efficiently within this segment.To reduce problems with adverse selection, the aim is to avoid that homeowners receive "too good" or "too bad" offers, as illustrated in Fig. 1.
On the other hand, there is no exact definition of what makes a good, bad, or correct offer."Too good" refers to an offer that is noticeably higher than the real market value and the perceived value of the homeowner."Too bad" means that the offer is well below the actual value.Both scenarios are damaging for the business model.A "too good" offer results in negative profits, while a "too bad" offer damages the credibility of the business.Furthermore, the real market value of an apartment is unknown until it has been sold on the open housing market.The iBuyer relies on the AVM for settling the price, and the seller usually has her own perceived opinion about the value.This price perception leads to further deviations from the true (unknown) value.
Previous literature covers the area of creating well-performing AVMs.An AVM with good performance will, all other things being equal, result in profits for the iBuyer, if all offers are accepted.However, when introducing the aspect of adverse selection, the acceptance of all offers does not seem to be a realistic assumption for the iBuyer business.Buchak et al. (2020) suggest that iBuyers should only purchase Fig. 1 Adverse selection for iBuyers.The graph illustrates the scenarios for how sellers of dwellings are more likely to accept an offer when it is "good", rather than correct or "bad.".(According to Wilson (1989), Akerlof (1970) the most liquid, and easy to value houses, which is one way of dealing with potential adverse selection.

The norwegian property market
To utilize pricing models properly, it is helpful to understand the dynamics of the relevant market.Buying and selling homes in the Norwegian property market is, in general, done through an open auction, where 90% are sold via English auctions (Olaussen et al. 2018).Furthermore, the Norwegian real estate market differs from other international markets in terms of regulation.A central consequence of Norwegian regulations, imposed by the Regulation on Real Estate (2007), the Marketing Control Act (2009), and the Industry Norm (2014), is that strategies such as underpricing to attract more buyers are prohibited.Hence, Flått et al. (2022) claim that the asking price is a suitable indicator for the seller's reservation price.
In September 2021, the population of Oslo counted 698,660 inhabitants. 4The inner city consists typically of apartment buildings with four-and five-stories.Most of them are brick buildings from the construction boom in the 1870s, 1880s, and 1890s (Oust 2013).The late nineteenth century also saw segregation between the rich western and the poorer eastern part of the city.The pattern continued for several decades and probably explains much of the price difference between east and west today (Oust 2013).
The city boundary has been enlarged several times over the past century.The biggest enlargement took place in 1948, when the Aker region was incorporated.After Second World War and through the 1980s, construction of residential buildings in Oslo mainly took place in the new suburbs in the former Aker region (Oust 2013).Oslo has 15 administrative districts, in addition to Marka and Sentrum.Each district has a district committee that organizes and provides services.

Data
The primary data used in the study were provided by Solgt.no, a Norwegian PropTech startup and iBuyer operating in Oslo.The dataset contains information on housing transactions listed on Finn.no, the largest online marketplace for private properties in Norway, which are merged with public data from the Norwegian Mapping Authority (NMA).The relevant transactions took place between 2007 and 2021 and consist of apartments in Oslo.In this section, we will describe the relevant variables, the data filtering, and cleaning process.We then provide descriptive statistics and describe the construction of our indicator variable for adverse selection.

Data pre-processing
Before providing a descriptive examination of the final dataset, we elaborate on the process of removing missing values and erroneous observations.Missing values occur as many of the observations do not contain values for all the relevant variables.Furthermore, the data were extracted from ads on Finn.no and are thus subject to errors if realtors include obviously wrong information in the housing ads.Therefore, the data cleansing process does not only consider missing values, but also examines whether the existing values appear realistic.To ensure a suitable contribution to the prediction models, several modifications are applied to the variables (see Table 1).

Descriptive statistics
Table 2 shows descriptive statistics.The aim of AVMs is to predict the price of a good.The total transaction price, including both price and associated debt, for  the different apartments in the dataset is therefore the dependent variable. 5Our explanatory variables are related to the size and the physical dimensioning of the apartments.These variables are related to the total interior livable area (living area), which floor the apartment is located on, the number of bedrooms, and the number of bathrooms.
In order to provide more information on the condition of the apartment, in addition to the physical measures, several binary variables on facilities were included, as seen in Table 3.
We also construct a binary indicator showing whether the relevant apartment is a renovation project or not based on text analysis.This was also included in the dataset.This variable is derived from the information found in the title of the ad.If the title includes one or several keywords related to need for renovation, the variable takes the value 1, otherwise 0.6 In total, there are 1147 renovation projects out of the 67,924 transactions in the training dataset.
Geographical location is a central factor in determining the price of an apartment.The location could have been included in the dataset in several ways, as the original dataset contains information on both coordinates and full address with postal code.In this study, the apartments are mainly sorted into geographical groups based on the administrative district to which the property belongs (Table 4).Finding a suitable trade-off between low and high spatial aggregation is influential (Sommervoll and Sommervoll 2018).Low spatial aggregation captures more of the systematic spatial This table shows the regions in the dataset with mean, standard deviation, minimum and maximum price in thousands NOK, and the number of transactions in each area There are noticeable differences between areas in the mean price and number of transactions, which will be important to consider later when creating purchasing rules Exchange rate per 31 December 2021: NOK 1 = EUR 0.1001 variation but also reduces the number of observations in each region.Meanwhile, a high aggregation has the opposite effect.Administrative districts were found to have satisfyingly similar intra-regional location premiums, thus making them a good candidate for capturing spatial effects.As the smallest of the districts, Sentrum, has a limited number of transactions; it is included in the district of St. Hanshaugen.

Seller valuation
To incorporate how likely a homeowner is to accept an offer from the iBuyer, a measure for what the seller believes the apartment is worth is necessary.These variables are not used as predictors in the AVMs but are necessary to include adverse selection in our analysis.
In this paper, two different variables for homeowner valuation are used.The first one is the asking price found in the sales ad.This represents the price at which a professional real estate agent chose to value the apartment.Such an appraisal is available to homeowners considering selling their apartment and is thus deemed appropriate for representing the seller's valuation of the apartment.The second measure for the perceived apartment valuation of the homeowner is based on a repeat sales calculation.The repeat sales valuation uses the price of the previous transaction of an apartment before adding the general price development in the relevant market, i.e. it is calculated as difference between the previous sales time and the current one.As repeat sales valuations incorporate the previous purchase price of the apartment, the repeat sales may seem like a reasonable estimate for what sellers think their home will be sold later.However, a small proportion of the apartments in the dataset only had a single owner since the time it was built.These apartments, without a previous transaction price, cannot be given a new valuation through repeat sales calculation.For this reason, the amount of the dataset with value estimates derived from repeat sales technique consists of 52,667 apartments, compared to the complete training set of 67,924.
The asking price on average deviates 5.8% from the actual sales price.Table 5 indicates that the asking price most often is lower than the sales price, as both the mean and the quantiles are lower.The asking price will typically be lower than the actual price in markets when prices increase (because of the lagging effect).There is also a price difference between the repeated sale valuation and the actual price.The difference is caused by a deviated dataset; the repeated sale estimate is calculated by using apartments that are sold more than ones.To improve the estimates, we have increased the length of the dataset back to 1992.

Methodology
With the aim of examining how adverse selection can be considered in an iBuyer business model, the first step is to create the AVMs.We will study three different models: a hedonic regression, a Support Vector Machine, and a gradient boosting model known as XGBoost (eXtreme Gradient Boosting).The reason for creating different AVMs is to underline that the methodology and results are applicable and generalizable for a wide range of models.The hedonic linear regression model is chosen for its interpretability in addition to the fact that it is widely used in the field of real estate valuations.This model will serve as a baseline for the others.The two machine learning approaches are chosen based on predictive accuracy found in previous literature, see Table 10 in the Appendix 1.The XGBoost algorithm is still a relatively recent addition to the machine learning world at the time of writing, but, as seen in Table 10, studies of the approach have already shown encouraging results within the area of real estate valuation. 7efore running the experiment, the data were randomly split into a training and a test set.The test set is held out during training and is later used to replicate the new apartments the iBuyer is bidding on.The offers are determined before assessing the acutal sales price, similar to the ex-ante bids an iBuyer needs to give in practice.We choose an 80%-20% split.

Hedonic regression model
The price of the product can be explained by the sum of its hedonic prices for the housing (Rosen 1974).
We apply the following hedonic regression: Where the dependent variable is natural logarithm of sales price including common debt, β0 is the intercept, βk is the coefficient of predictor k, and Xki is the feature value of predictor k for a given apartment i.Our independent variables are living area, location, sale period, building year, number of bathrooms, number of bedrooms, floor, renovation object, elevator, balcony, child friendly, garage, view, fireplace and quiet.εi is the error term of apartment i.
We follow earlier literature using hedonic models on property data, which indicate that the least absolute deviation (LAD) method is preferred over the ordinary least squares method (OLS), due to being more robust towards outliers (Yoo 2001).LAD was first introduced by Koenker and Basset (1978) and minimizes the absolute value of these errors.As the errors are not squared, the LAD loss function is less sensitive to outliers (Stock and Watson 2019).

Support vector machine
While originally created for classification problems, support vector machines (SVM) have been developed to handle regressions as well.Support vector regression (SVR), which builds on the foundation of the SVM algorithm, was introduced by Drucker et al. (1997).
Support vector regression utilizes an -tube and slack variables to find the regression line.represents the allowed error within which all errors are disregarded.For observations outside the tube, errors are measured as the deviation between the actual response value and the -tube itself, rather than the regression line.This separates SVR from methods like the hedonic price model, in which the goal is to minimize the residual errors of all observations.Observations inside the -tube are referred to as support vectors.
SVR in R is explained in Meyer (2021).We follow a random search method to obtain optimal hyperparameters for the cost, C, and , as suggested by Villalobos-Arias et al. (2020).The results in the tuning process and hyperparameters is shown in Table 11 of Appendix 2. We are using the same variables as for the hedonic model.

XGBoost
XGBoost, or eXtreme Gradient Boosting, is an implementation of gradient boosting introduced by Chen in 2014.The XGBoost approach is an optimized implementation of gradient boosting where a variety of regularization options help avoid problems with overfitting.Furthermore, the XGBoost approach also allows for parallel processing to improve computation speed, tree pruning, and handling of missing data.
We follow the methodology of Chen and Guestrin (2016) and Choi (2019).In the AVM produced in this study, we used the "XGBoost" package in R. The implementation requires choosing a value for several hyperparameters.The results in the tuning process and hyperparameters is shown in Table 12 of Appendix 2. We are using the same variables as for the hedonic model.

Model outputs and purchasing rules
The aim of this section is twofold.Firstly, the predictive accuracies of the different AVMs are assessed.Next, we use feature importance and the models' predictive accuracies in different subgroups used to create simple rules to prevent iBuyers bidding on apartments from groups that are hard to price.The simple idea is that iBuyers have less information about apartments from groups that are hard to price, and therefore should stay away from these apartments when adverse selection is introduced in the model.By avoiding buying lemons, iBuyers' profits should increase.

Model performance
We start by looking at the predictive performance of the three AVMs on the test dataset, using Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), and Percentage Predicted Error (PPE) buckets as performance metrics.The linear model, as expected, performing worse than the two ML models.There is approximately a two-percentage point difference between the ML AVMs and the hedonic price model in terms of Mean Average Percentage Error (MAPE), as seen left in Table 6.The XGBoost model performs slightly better than SVM, yielding better results for MAPE and PPE10.However, the SVM has somewhat lower Root K In the rest of section, the training data is still in focus, to prevent information in the test data from influencing methodological and strategical choices.

Feature importance
Feature importance can be evaluated in several ways.We chose to assess the standardized regression coefficients of the hedonic LAD model and the mean absolute SHAP values, following Lundberg and Lee (2017), of the XGBoost AVM.Both the standardized coefficients and the SHAP values indicate that living area is the most important variable by a clear margin for describing variation in apartment prices.District and building year are other variables that stand out for both models.These variables will be examined further in the next section.

Purchasing rules
In this subsection, performance is examined further on different subgroups of the data.The purpose of dividing the training set into different subgroups is to highlight potential systematic differences in predictive accuracy for certain types of apartments.The aim is to find out whether the models for some subgroups are less informed about the dwelling value.Throughout the subsection, MAPE is used as measurement of the model performance for the different subgroups.In addition to living area, district and building year that was chosen as subgroups based on the feature importance, we also include predictive performance based on price groups.
The results are presented in Figs. 3, 4, 5 and 6 of Appendix 3. The results show that it is more difficult to predict house prices in some districts than in others.In our data, Nordre Aker is the district with the worst MAPE, followed by Nordstrand, Søndre Nordstrand, and Vestra Aker.A common trait for all these underperforming districts is that they contain less than 4000 observations.This suggests a plausible correlation between having few observations and worse predictive performance.Additionally, they are also known to be quite heterogeneous compared with other small districts, such as Grorud and Stovner.
In contrast to the districts, living area and building year are not pre-divided in groups appropriate for examining the predictive performance.For this reason, the variables are divided into groups based on the feature values.There do not, however, appear to be any systematic patterns indicating that buildings from a certain year are significantly harder to predict for the AVMs, see Fig. 4 in Appendix 3. Since we did not find any systematic patterns, we did not create purchasing rules based on building year.
The training set observations are stratified based on the size of the living area, as displayed in Fig. 5 of Appendix 3.These groups are made based on distance to the mean apartment size, measured in standard deviations.The reason for dividing the data into groups based on standard deviations from the mean, rather than specifical pre-defined values, is to make the study more generalizable for cities where average apartment size differs noticeably from this study on Oslo.The size of each group is 0.5 standard deviations.There appear to be systematic differences in training MAPE for apartments of different sizes, more specifically for the smallest and the largest apartments.These groups generally have higher MAPE across the models, although the differences are noticeably smaller from row to row for the XGBoost model.
Lastly, the predictive performance is examined based on price groups (see Fig. 6 in Appendix 3).As price is not a value known to the iBuyer ex ante, a predicted price must be used instead.The XGBoost price predictions are used to divide the observations into different groups.These groups are, as for the living area variable, also separated based on distance from the mean price measured in standard deviations.Higher MAPE is found for the cheapest and most expensive apartments.The predicted price groups and the living area groups show similar systematic patterns, which is as expected since living area is the most important variable for determining the price prediction.
After having examined the predictive performance of the AVMs for different subgroups, we can use the results to create simple purchasing rules (see Table 7).The intuition is that adverse selection is a bigger problem for apartments that are harder to price, and that the iBuyer should thus not bid on such dwellings.This follows the implication that hard-to-price apartments are more likely to be underpriced This table shows three different purchasing rules made subject to systematic worse predictive performance by the AVMs for apartments in these groups Before giving an actual offer on an apartment, the iBuyer will consider whether the apartment is in one of the highlighted districts, or of a too small/large size, to bid on The iBuyer will also automatically get a price prediction from its AVM algorithm If this prediction indicates that the apartment will be outside the price range from purchasing rule 3, it is also deemed too risky to bid on K or overpriced.Underpricing will lead to a bid not accepted by the seller, while overpricing will lead to a bid from which the iBuyer in most cases loses money.

Empirical results
In this section, we aim to examine the financial impact of adverse selection and the purchasing rules for a hypothetical iBuyer.We begin by disclosing our assumptions, before reporting the results from calculating profits for a hypothetical iBuyer with and without adverse selection and purchasing rules.The results are computed for apartments in the separated test set, which did not impact the training of the models nor the formulation of the purchasing rules.

Assumptions
Before diving into the results of our study, we lay out the assumptions upon which our findings rest.Firstly, it is assumed that (I) the hypothetical iBuyer does not face competition from other corresponding companies.The iBuyer can therefore decide internally the size of contribution margins to take.The iBuyer margin is the difference between the AVM predicted price and the actual bid, as shown in Eq. 4, and (II) is initially assumed to be 6%.Average expected profit per apartment is calculated as a percentage of the bid, illustrated in Eq. 3, and we assume that (III) this is the key performing indicator that the iBuyer will want to improve.We use percentage profits instead of absolute profits because the iBuyer cannot purchase all apartments in the market.Costs of financing, employee salaries, administration, and additional costs are not included in the equation.The aim of the iBuyer in this simplified illustration is to purchase apartments for less money than they are sold for, to generate a profit.

Percentage Profit
Total price i Bid i Bid i P .accept/i (3) here: Note: The margin is included to allow the iBuyer to make a profit.Our margin is in line with the normal margin among iBuyers.The margin will influence which offers from the iBuyer our simulated sellers will accept.
The probability of a bid being accepted, P (accept), depends on the bid from the iBuyer, as well as both the homeowner's perceived valuation and a probability distribution for how likely a seller is to accept a bid that is X% higher or lower than the perceived valuation.The iBuyer works as a substitute for selling through a traditional real estate agency assumed to have a provision of 2% of the sales price.
With this in mind, (IV) we use 98% of the asking price in the sales ads as an initial proxy for what the seller believes (s)he will receive by selling through a real K estate agent, and thus the seller's perceived valuation of the property.As indicated in Subsection 2.3, strategic underpricing to gain the interest of more potential sellers is not legal in Norway.For this very reason, previous research suggests the asking price to be a suitable representation for the seller's reservation price.It is also natural that a homeowner has already spoken to a real estate agent to get such an appraisal, before considering selling to an iBuyer.
The final factor needed in the equation is the probability distribution.In Sect.2, adverse selection for iBuyers was introduced from a theoretical point of view.We presented the probable issue of homeowners being more likely to accept a bid based on a too-high predicted value than a bid based on a correct value.Sellers are biased upwards when it comes to their own valuations of the properties, as given by the theories of Lovallo and Kahneman (2003), further implying that an overvalued dwelling is more likely to be bought than an undervalued one.The purpose of the accept-probability distribution, P (accept), is to include adverse selection in the profit equation above.
The probability function is assumed to be normally distributed around a mean replicating a convenience factor.The convenience factor determines where the centre of the probability distribution is located, and thereby divides it into two equal parts: half of the distribution to its left and half of the distribution to its right.This implies that 50% of the homeowners that get a bid corresponding to their perceived valuation after subtracting brokering commissions, minus the convenience factor, will accept this bid.In essence, the convenience factor is thereby used to model how much the sellers value the quick transaction services of the iBuyer.In a market where these services are highly demanded, the convenience factor will be large, and the centre of the distribution is moved extensively leftwards by this value.In a market where the demand is lower, this centre will be closer to zero.Initially, (V) the convenience factor is assumed to be 4% for the hypothetical iBuyer.
The width of the distribution is determined by its standard deviation.A narrow distribution means few sellers accept bids that are too low, and almost all accept too high bids.In contrast, a wider distribution implies more people accept low bids, and fewer accept high bids.This is illustrated in Fig. 2. In the narrow distribution, marked in red, a bid 15% below the seller valuation will never be accepted, while in the wider distribution, marked in blue, some choose to accept.As the accept probability distribution is a limited reflection of reality, three different scenarios of probabilities are created.The three scenarios are referred to as "pessimistic," "neutral," and "optimistic."(VI) The neutral distribution has a standard deviation of 6%, while the pessimistic scenario has a narrower distribution with a standard deviation of 4% and the optimistic one wider with 8%.
The neutral scenario works as a benchmark, and this is the distribution of probabilities that is assumed most likely to reflect reality.As Fig. 2 shows, this implies that about 1 in 6 homeowners will accept a bid that is 10% lower than their perceived market price after broker provision, while 3 in 4 will sell when they receive a bid equal to their valuation.93% will accept an offer that is 5% higher than the seller's perceived valuation after broker provision, and nearly all bids more than 10% higher than the seller's valuation will be accepted.The curves are all centered around a convenience factor of 4%, implying that half of the sellers in the market would accept a bid that is 4% lower than their own valuation in return for a quick sale.The three scenarios have curves that differ in width, with the neutral probability distribution having a standard deviation of 6% and the pessimistic and optimistic distributions having standard deviations of 4% and 8%, respectively.b The cumulative probabilities of a seller accepting a bid that is a certain percentage higher or lower than the seller's perceived valuation, for the same scenarios as the left panel.The three curves meet for bids 4% below the seller's valuation, where half of the sellers accept the bid for all three scenarios After having described the different factors in the profit equation, the next assumption is that (VII) the iBuyer will be able to sell the apartment for the same price as the apartment was sold for in the dataset.Since an iBuyer in the Norwegian market is a small actor and resells apartments to private households through open English auctions, it is reasonable to assume the market price from the dataset to equal the sales price the iBuyer would achieve.Furthermore, the iBuyer can resell the apartments s/he purchased within a short enough time frame to avoid general price changes in the market.If the AVM models were able to completely predict this selling price, i.e., the predictions were 100% accurate, the average profit per apartment in an initial market without adverse selection would equal the bid margin of 6%.
To sum up, we assume (I) a hypothetical market with no competition, that (II) the iBuyer gives bids that are 6% lower than the AVM price prediction, and that (III) the company wants to improve average expected resale profit per apartment as a percentage of the bid.We assume that (IV) homeowners believe they will receive 98% of the asking price after realtor provision, and by extension that this is how people value their home.Furthermore, (V) a convenience factor of 4% is assumed, implying that half of all offers that are 4% lower than the sellers' perceived valuations are accepted, and that (VI) the market in a neutral state is reflected by a normally distributed acceptance probability distribution with a standard deviation of 6%.Corresponding distributions in the case of pessimistic or optimistic scenarios, have standard deviations of 4% and 8% respectively.Lastly, it is assumed that (VII) the iBuyer can sell the apartments for the same price as a broker.

Profit calculations
The next step is to examine profits.In a scenario without adverse selection, we assume that all bids are accepted, regardless of how high or low the bid is compared to the seller's opinion of a fair price.Both undervalued and overvalued dwellings are bought, and the average expected resale profit is determined with P (accept) from Eq. 3 equal to 1 for all apartments.The top left panel in Table 8 displays the average expected profits per apartment in the scenario without adverse selection with an iBuyer margin of 6% and a convenience factor of 4%.As seen in the first row, where none of the purchasing rules are applied, the profits for all models exceed This table shows average expected resale profits with 6% iBuyer margin and 4% convenience factor The iBuyer margin represents the difference between the AVM predictions and the bids, while the convenience factor represents the loss at which half of all bids are accepted The top left panel is a market with no adverse selection, where all bids from the iBuyer is accepted The remaining three panels show profits in markets with different assumed probabilities for accepting bids K the iBuyer margin of 6%, with SVM and LAD giving a profit of 7.29% and 7.96% respectively, and XGBoost 6.29%.Before implementing any purchase rules, the test set contains 16,981 observations.These observations represent the available apartments on which the iBuyer can bid on.When implementing all the purchasing rules, specified in Table 7, the number of apartments to bid on is decreased to 7142.In practice, this means that approximately 59% of all inquiries from homeowners wanting to receive an offer on their apartment get rejected, creating a more homogeneous dataset.We only allow the iBuyer to bid on apartments where we assume to have enough information, to avoid buying lemons.
Not surprisingly applying all the purchasing rules, without adverse selection, has little to no effect on the profit.The rules are intended to remove groups of apartments the models struggle to predict accurately, whether the apartments are over-or underpriced.Underpriced apartments are, however, profitable for the iBuyer when assuming that all bids are accepted, as homeowners may sell their houses for far below market value.
The top right panel in Table 8 gives the new average expected resale profits after introducing adverse selection with the neutral probabilities.For now, we do not consider the purchasing rules, but rather only the first rows in the top left and right panels, without and with adverse selection, respectively.The profits in the latter are 0.19%, 0.97%, and 1.21% for the LAD, the XGBoost, and the SVM models, respectively.This corresponds to reductions of 7.77, 5.32, and 6.08 percentage points compared to the scenario with no adverse selection.This sharp reduction implies that adverse selection poses a threat to the hypothetical iBuyer, as a large proportion of the iBuyer's profit margin is lost.The positive scenario is slightly better, and the negative scenario is slightly worse, but both are very similar to the neutral scenario.This demonstrates how the iBuyer business model might look very profitable without taking adverse selection into account, while it may explain some of the lack of success of the business model so far.
Considering the reduced profits after introducing adverse selection, it is also interesting to assess whether implementing the purchasing rules can help limit the loss.The change in profits from none to all purchasing rules applied is displayed in the bottom row in the top right panel of Table 8 with neutral accept probabilities.The lower panels show the same effect in the pessimistic and optimistic probability scenarios.The increase in profits surpasses 1 percentage point for practically all models and scenarios, suggesting that implementing the purchasing rules will increase the profits of the hypothetical iBuyer with a 6% margin and a 4% customer convenience factor.
Furthermore, the purchasing rule based on living area gives the highest isolated increase in profits out of all the individual rules.The price rule gives the secondhighest increase, followed by the district rule.However, a combination of the rules improves the average profit per apartment further.

Robustness tests
We have made several additional robustness checks.First, Fig. 3 of Appendix 3 displays the corresponding results with iBuyer margins of 3% and 9%.The 9% margin overall generates higher profits than the lower margins, although such a high margin leads to more bids being rejected.The 3% margin has the opposite effect, with more bids being accepted and lower average resale profits.However, despite differing profits, the panels show that both the higher and the lower margins give similar results as the initial assumptions in Table 8.Profits drop noticeably when adverse selection is implemented, and the purchasing rules improve the profits by around 1%.
Figure 4 of Appendix 3 shows the results from calculating profits with 2% and 6% convenience factors, respectively.Both additional convenience factor values give This table shows average expected resale profits with a 6% iBuyer margin and a 4% convenience factor Under these altered market assumptions, a repeat sales estimate is used as a proxy for the sellers' own valuations of the apartments The top left panel is a market with no adverse selection, where all bids from the iBuyer is accepted The remaining three panels show profits in markets with different assumed probabilities for accepting bids K similar results as the benchmark model with 4%.The findings are thus robust to changes in how much customers value the services of the iBuyer.A final robustness check was implemented related to the perceived valuation of the seller.In case the asking price does not serve as a realistic proxy for the actual seller valuations, the profits were computed with a repeat sales proxy as an alternative.It is reasonable to think that a homeowner will consider the price (s)he purchased the dwelling for, adjusted for price development, when establishing a reservation price.Repeat sales estimation thus offers another proxy for seller valuation, differencing enough to serve as a suitable robustness check for asking price.Table 9 displays the results from using the repeat sales.When introducing adverse selection, the purchasing rules based on repeat sales have similar effects as under the asking price assumption.However, with profits decreasing from 7.94 down to 1.38% on average across the AVMs, the iBuyer is slightly more profitable with this seller estimate.The effects of the purchasing rules are also lower.
One potential reason for being more profitable in the repeat sales case is that around ¼ of the data is removed, due to not having any previous transactions.An examination of the removed data points shows that these are apartments that the AVMs generally struggle to price accurately.Furthermore, the repeat sales estimate has a larger error compared to the actual sales price.Too-low seller valuations are profitable for the company, while too-high ones do not affect the returns noticeably as iBuyer bids are likely rejected.Despite these differences in profitability, the general effects are similar as adverse selection largely reduces the profits while the purchasing rules help limit this loss.

Conclusion
The use of AVMs in real estate has grown increasingly important in recent years.Suddenly, homeowners can sell apartments in a matter of days rather than weeks and months.In a market full of rich data, AVMs constantly improve to make the bids as correct as possible.However, during the same period, several iBuyers reported disappointing financial returns.One of the biggest actors, Zillow, recently pulled out of the automated bid segment.Whereas these financial challenges are complex and partly a result of external issues such as gearing of home purchase under falling house prices, this paper hypothesized that adverse selection is a strong built-in mechanism in the iBuyer business model and needs to be dealt with by companies.
In our study, we created three different AVMs, before we examined the predictive performance of each of these models for different groups of apartments.These groups were made based on the most important predictor variables.Consequently, we introduced a set of simple purchasing rules to prevent iBuyers from purchasing apartments in groups with bad performance.Thereafter, we examine a hypothetical iBuyer case by using the AVMs, where average expected resale profits were computed both with and without adverse selection and purchasing rules.
We find a sharp reduction in average expected resale profits per apartment when introducing adverse selection to our hypothetic iBuyer, thereby suggesting that adverse selection is problematic in the iBuyer business model.Furthermore, the intro-duction of simple purchasing rules can help improve the average profit, in our case by around 1 percentage point.This is a noticeable increase, considering the low initial profits with no rules.The suggested purchasing rules do not reduce or remove adverse selection directly, but rather limit its negative impact by avoiding hard-toprice dwellings where the risk of over and underpricing is higher.More advanced purchasing rules might prove to increase the profits even more.
The findings confirm our hypothesis that adverse selection poses a potential noticeable threat to the iBuyer business model (Buchak et al. 2020), a hypothesis that may also be derived from the well-known theories of Akerlof (1970) and Genesove (1993).Furthermore, Buchak et al. (2020) points out that iBuyers, in order to limit these problems, might tend to purchase the most liquid and easy-to-price homes.
Based on previous literature, the purpose of this paper has been to examine the effects of adverse selection for iBuyers and investigate whether simple strategic changes in the use of AVMs can help reduce the potential threat.In addition to the theoretical insights, our results have clear practical implications.iBuyers should learn from our study and implement purchasing rules in their algorithms that take adverse selections into consideration.iBuyers should not bid on apartments for which they have only limited information in order to avoid buying lemons.

Fig. 2
Fig.2Accept-Probabilities with Convenience Factor 0.04.a Density distribution curves for the normally distributed bid acceptance probabilities.The x-axis indicates in percentage how much higher/lower the bid from the iBuyer is than the seller's perceived value.The curves are all centered around a convenience factor of 4%, implying that half of the sellers in the market would accept a bid that is 4% lower than their own valuation in return for a quick sale.The three scenarios have curves that differ in width, with the neutral probability distribution having a standard deviation of 6% and the pessimistic and optimistic distributions having standard deviations of 4% and 8%, respectively.b The cumulative probabilities of a seller accepting a bid that is a certain percentage higher or lower than the seller's perceived valuation, for the same scenarios as the left panel.The three curves meet for bids 4% below the seller's valuation, where half of the sellers accept the bid for all three scenarios

Table 1
Data Pre-processing Remove dwellings sold in months with less than 100 observations 84,905This table shows the data pre-processing In addition to the steps, the right column shows how many observations are left in the data after implementing the relevant step The raw dataset includes 178,001 observations, while the cleaned dataset includes 84,905 observations

Table 2
Summary Statistics This table shows the number of observations in the training data having the different facilities listed in their respective sales ads

Table 4
District Prices (NOK in thousands)

Table 5
Repeat Sales, Asking Price, and Actual Price (NOK in thousands) Repeat sales against asking price and actual price (NOK in thousands)

Table 6
AVM Performance MetricsThe table shows predictive performance metrics for the three AVMs Left panel gives metrics for the test set, while the right panel gives metrics for the training data Mean Squared Error (RMSE).Generally, both ML models have satisfying accuracy, albeit lower than what would be expected from commercially used models.8

Table 7
Purchasing Rules

Table 8
iBuyer Average Profits, 6% margin between predicted and bid price and 4% convenience factor, with list price as proxy for seller valuation

Table 9 iBuyer
Average Profits-Repeat Sales

Table 10
Gradient Boosting outperforms five other estimation methods (linear least squares, robust regression, mixed-effects regression, random forests, and neural networks) in terms of accuracy, for predicting the value of single-family houses in Switzerland

Table 11
Predictive Performances in Size Groups (This figure shows the predictive performance for the AVMs in different sizes in living area.Red cells indicate that the group is underperforming (MAPE is more than 5% higher than the average), yellow are around average (within ±5% of the average), and green are overperforming (MAPE more than 5% lower than average)) Predictive Performances in Price Groups (This figure shows the predictive performance for the AVMs in different XGBoost predicted price groups.Red cells indicate that the group is underperforming (MAPE is more than 5% higher than the average), yellow are around average (within ±5% of the average), and green are overperforming (MAPE more than 5% lower than average))

margin between predicted and bid price and 4% convenience factor, with list price as proxy for seller valuation Without adverse selection With adverse selection, neutral probability
This table shows the iBuyer average profits with 3% and 9% margin between predicted and bid price K Table 14 iBuyer Average Profits-New Convenience Factors

margin between predicted and bid price and 6% convenience factor, with list price as proxy for seller valuation Without adverse selection With adverse selection, neutral probability
This table shows the iBuyer average profits with 2% and 6% convenience factor Acknowledgements We are indebted to Solgt.no for providing us the data.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.