Abstract
Housing markets are inherently spatial, yet many existing models fail to capture this spatial dimension. Here, we introduce a new graph-based approach for incorporating a spatial component in a large-scale urban housing agent-based model (ABM). The model explicitly captures several social and economic factors that influence the agents’ decision-making behaviour (such as fear of missing out, their trend-following aptitude, and the strength of their submarket outreach), and interprets these factors in spatial terms. The proposed model is calibrated and validated with the housing market data for the Greater Sydney region. The ABM simulation results not only include predictions for the overall market, but also produce area-specific forecasting at the level of local government areas within Sydney as arising from individual buy and sell decisions. In addition, the simulation results elucidate agent preferences in submarkets, highlighting differences in agent behaviour, for example, between first-time home buyers and investors, and between both local and overseas investors.
This is a preview of subscription content,
to check access.







Similar content being viewed by others
Notes
\(p_\mathrm{list}\) can technically be \(<0\) or \(>1\), so \(p_\mathrm{list}\) is capped to be between 0 and 1, in order to be a true probability, although this is exceptionally rare and does not appear to occur in Fig. 4.
Perfect knowledge in this paper is assumed to mean \(\alpha =1, O(v_i,v_j)=1\), i.e. ability to view every listing across all of Greater Sydney, i.e. \({\mathcal {M}}\) in 4.1.1.
All movements are scaled by the population size to allow a fair comparison, as outlined in “Appendix J”.
References
Alhashimi H, Dwyer W (2004) Is there such an entity as a housing market. In: 10th annual pacific rim real estate conference (press), Bangkok
Arcaute E, Molinero C, Hatna E, Murcio R, Vargas-Ruiz C, Masucci AP, Batty M (2016) Cities and regions in Britain through hierarchical percolation. R Soc Open Sci 3(4):150691
Axtell R, Farmer D, Geanakoplos J, Howitt P, Carrella E, Conlee B, Goldstein J, Hendrey M, Kalikman P, Masad D, et al. (2014) An agent-based model of the housing market bubble in metropolitan Washington, DC. In: Whitepaper for Deutsche Bundesbank’s Spring conference on “Housing markets and the macroeconomy: challenges for monetary policy and financial stability”
Bahadir B, Mykhaylova O (2014) Housing market dynamics with delays in the construction sector. J Hous Econ 26:94–108
Bangura M, Lee CL (2019) The differential geography of housing affordability in Sydney: a disaggregated approach. Aust Geogr 50(3):295–313
Bangura M, Lee CL (2020) Housing price bubbles in Greater Sydney: evidence from a submarket analysis. Hous Stud 1–36. https://doi.org/10.1080/02673037.2020.1803802
Baptista R, Farmer JD, Hinterschweiger M, Low K, Tang D, Uluc A (2016) Macroprudential policy in an agent-based model of the UK housing market. Bank of England working papers 619, Bank of England. https://doi.org/10.2139/ssrn.2850414
Barbosa H, Barthelemy M, Ghoshal G, James CR, Lenormand M, Louail T, Menezes R, Ramasco JJ, Simini F, Tomasini M (2018) Human mobility: models and applications. Phys Rep 734:1–74
Barthelemy M (2016) The structure and dynamics of cities. Cambridge University Press, Cambridge
Barthelemy M (2019) The statistical physics of cities. Nat Rev Phys 1(6)
Barthelemy M, Bordin P, Berestycki H, Gribaudi M (2013) Self-organization versus top-down planning in the evolution of a city. Sci Rep 3:2153
Bergstra J, Yamins D, Cox D (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: International conference on machine learning, pp 115–123
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Proceedings of the 3rd international conference on knowledge discovery and data mining. AAAI Press, AAAIWS’94, pp 359–370
Bessant JC, Johnson G (2013) Dream on declining homeownership among young people in Australia? Hous Theory Soc 30(2):177–192
Burnside C, Eichenbaum M, Rebelo S (2016) Understanding booms and busts in housing markets. J Polit Econ 124(4):1088–1147
Campolongo F, Cariboni J, Saltelli A (2007) An effective screening design for sensitivity analysis of large models. Environ Model Softw 22(10):1509–1518
Carstensen CL (2015) An agent-based model of the housing market: steps toward a computational tool for policy analysis. University of Copenhagen, MSc-szakdolgozat
Chang SL, Harding N, Zachreson C, Cliff OM, Prokopenko M (2020) Modelling transmission and control of the COVID-19 pandemic in Australia. arXiv preprint arXiv:2003.10218
Cheng IH, Raina S, Xiong W (2014) Wall street and the housing bubble. Am Econ Rev 104(9):2797–2829
Cliff OM, Harding N, Piraveenan M, Erten EY, Gambhir M, Prokopenko M (2018) Investigating spatiotemporal dynamics and synchrony of influenza epidemics in Australia: an agent-based modelling approach. Simul Model Pract Theory 87:412–431
Conlisk J (1996) Why bounded rationality? J Econ Lit 34(2):669–700
Crosato E, Nigmatullin R, Prokopenko M (2018) On critical dynamics and thermodynamic efficiency of urban transformations. R Soc Open Sci 5(10):180863
Crosato E, Prokopenko M, Harré MS (2021) The polycentric dynamics of Melbourne and Sydney: suburb attractiveness divides a city at the home ownership level. Proc R Soc A 477(2245):20200514
Edmonds B, ní Aodha L (2018) Using agent-based modelling to inform policy—what could possibly go wrong? In: International workshop on multi-agent systems and agent-based simulation. Springer, pp 1–16
Fernald M (2020) Americas rental housing 2020. Joint Center for Housing Studies of Harvard University, Cambridge
Frías-Paredes L, Mallor F, León T, Gastón-Romeo M (2016) Introducing the temporal distortion index to perform a bidimensional analysis of renewable energy forecast. Energy 94:180–194
Frías-Paredes L, Mallor F, Gastón-Romeo M, León T (2017) Assessing energy forecasting inaccuracy by simultaneously considering temporal and absolute errors. Energy Convers Manag 142:533–546
Gallegati M, Kirman A (1999) Beyond the representative agent. Edward Elgar Publishing, Cheltenham
Gauder M, Houssard C, Orsmond D, et al. (2014) Foreign investment in residential real estate. RBA Bulletin, June pp 11–18
Ge J (2013) Who creates housing bubbles? An agent-based study. In: International workshop on multi-agent systems and agent-based simulation. Springer, pp 143–150
Ge J (2017) Endogenous rise and collapse of housing price: an agent-based model of the housing market. Comput Environ Urban Syst 62:182–198
Geanakoplos J, Axtell R, Farmer JD, Howitt P, Conlee B, Goldstein J, Hendrey M, Palmer NM, Yang CY (2012) Getting at systemic risk via an agent-based model of the housing market. Am Econ Rev 102(3):53–58
Gilbert N, Hawksworth JC, Swinney PA (2009) An agent-based model of the English housing market. In: AAAI spring symposium: technosocial predictive analytics, pp 30–35
Glavatskiy KS, Prokopenko M, Carro A, Ormerod P, Harre M (2020) Explaining herding and volatility in the cyclical price dynamics of urban housing markets using a large scale agent-based model. arXiv preprint arXiv:2004.07571
Goldstein J (2017) Rethinking housing with agent-based models: Models of the housing bubble and crash in the Washington DC area 1997–2009. PhD thesis, George Mason University
Greater Sydney Commission (2018) Greater Sydney region plan: a metropolis of three cities. NSW Department of Planning and Environment. https://www.greater.sydney/metropolis-of-three-cities. Accessed 22 Aug 2020
Guest R, Rohde N (2017) The contribution of foreign real estate investment to housing price growth in Australian capital cities. Abacus 53(3):304–318
Haylen A (2014) House prices, ownership and affordability: trends in New South Wales. NSW Parliamentary Library
Herman J, Usher W (2017) SALib: an open-source Python library for sensitivity analysis. J Open Source Softw. https://doi.org/10.21105/joss.00097
House of Representatives Standing Committee on Economics (2014) Report on Foreign Investment in Residential Real Estate. The Parliament of the Commonwealth of Australia
Huang Y, Ge J (2009) House prices and the collapse of stock market in mainland China?-an empirical study on house price index. In: Pacific Rim real estate conference
Iggulden T (2014) ABS admits data on foreign real estate buyers is ‘hit and miss’. https://www.abc.net.au/news/2014-06-25/abs-admits-foreign-real-estate-purchase-data-unreliable/5549926 Accessed 10 May 2020
Kim JH, Pagliara F, Preston J (2005) The intention to move and residential location choice behaviour. Urban Stud 42(9):1621–1636
Kouwenberg R, Zwinkels R (2014) Forecasting the US housing market. Int J Forecast 30(3):415–425
Kouwenberg R, Zwinkels RC (2015) Endogenous price bubbles in a multi-agent system of the housing market. PLoS ONE 10(6):e0129070
Kupke V, Rossini P (2011) Housing affordability in Australia for first home buyers on moderate incomes. Property Management
La Cava G, Leal H, Zurawski A et al (2017) Housing accessibility for first home buyers. Reserve Bank of Australia Bulletin, pp 19–28
LeBaron B, Tesfatsion L (2008) Modeling macroeconomies as open-ended dynamic systems of interacting agents. Am Econ Rev 98(2):246–50
Louf R, Barthelemy M (2013) Modeling the polycentric transition of cities. Phys Rev Lett 111(19):198702
Mc Breen J, Goffette-Nagot F, Jensen P (2010) Information and search on the housing market: an agent-based model. In: Li Calzi M, Milone L, Pellizzari P (eds) Progress in artificial economics. Springer, Berlin, pp 153–164
McMaster R, Watkins C (1999) The economics of housing: the need for a new approach. In: PRRES/AsRES/IRES conference. Kuala Lumpur
Miles W (2008) Boom-bust cycles and the forecasting performance of linear and non-linear models of house prices. J Real Estate Finance Econ 36(3):249–264
Morris MD (1991) Factorial sampling plans for preliminary computational experiments. Technometrics 33(2):161–174
Myers C, Rabiner L, Rosenberg A (1980) Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans Acoust Speech Signal Process 28(6):623–635
Pangallo M, Nadal JP, Vignes A (2019) Residential income segregation: a behavioral model of the housing market. J Econ Behav Organ 159:15–35
Pawson H, Martin C (2020) Rental property investment in disadvantaged areas: the means and motivations of Western Sydney’s new landlords. Hous Stud 1–23 https://doi.org/10.1080/02673037.2019.1709806
Piazzesi M, Schneider M, Tuzel S (2007) Housing, consumption and asset pricing. J Financ Econ 83(3):531–569
Piovani D, Arcaute E, Uchoa G, Wilson A, Batty M (2018) Measuring accessibility using gravity and radiation models. R Soc Open Sci 5(9):171668
Poledna S, Miess MG, Hommes CH (2019) Economic forecasting with an agent-based model. Available at SSRN 3484768
Polhill G (2018) Why the social simulation community should tackle prediction. Rev Artif Soc Soc Simul. https://rofasss.org/2018/08/06/gp/. Accessed 1 May 2020
Power C (2009) A spatial agent-based model of n-person prisoner’s dilemma cooperation in a socio-geographic community. J Artif Soc Soc Simul 12(1):8
Raimbault J, Broere J, Somveille M, Serna JM, Strombom E, Moore C, Zhu B, Sugar L (2020) A spatial agent based model for simulating and optimizing networked eco-industrial systems. Resour Conserv Recycl 155:104538
Randolph B, Pinnegar S, Tice A (2013) The first home owner boost in Australia: a case study of outcomes in the Sydney housing market. Urban Policy Res 31(1):55–73
Rogers D, Lee CL, Yan D (2015) The politics of foreign investment in Australian housing: Chinese investors, translocal sales agents and local resistance. Hous Stud 30(5):730–748
Rogers D, Wong A, Nelson J (2017) Public perceptions of foreign and Chinese real estate investment: intercultural relations in Global Sydney. Aust Geogr 48(4):437–455
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49
Saltelli A, Tarantola S, Campolongo F, Ratto M (2004) Sensitivity analysis in practice: a guide to assessing scientific models, vol 1. Wiley Online Library, New York
Saltelli A, Aleksankina K, Becker W, Fennell P, Ferretti F, Holst N, Li S, Wu Q (2019) Why so many published sensitivity analyses are false: a systematic review of sensitivity analysis practices. Environ Model Softw 114:29–39
Sanchez DG, Lacarrière B, Musy M, Bourges B (2014) Application of sensitivity analysis in building energy simulations: combining first-and second-order elementary effects methods. Energy Build 68:741–750
Scott DW (2015) Multivariate density estimation: theory, practice, and visualization. Wiley, New York
Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N (2015) Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE 104(1):148–175
Shimizu C, Nishimura KG, Watanabe T (2010) Housing prices in Tokyo: a comparison of hedonic and repeat sales measures. Jahrbücher für Nationalökonomie und Statistik 230(6):792–813
Simini F, González MC, Maritan A, Barabási AL (2012) A universal model for mobility and migration patterns. Nature 484(7392):96
Simon HA (1955) A behavioral model of rational choice. Q J Econ 69(1):99–118
Simon HA (1957) Models of man; social and rational. Wiley, New York
Sinai TM (2012) House price moments in boom-bust cycles. Technical report, National Bureau of Economic Research
Slavko B, Glavatskiy K, Prokopenko M (2019) Dynamic resettlement as a mechanism of phase transitions in urban configurations. Phys Rev E 99(4):042143
Slavko B, Glavatskiy K, Prokopenko M (2020a) City structure shapes directional resettlement flows in Australia. Sci Rep 10(1):1–11
Slavko B, Prokopenko M, Glavatskiy KS (2020b) Diffusive resettlement: irreversible urban transitions in closed systems. arXiv preprint arXiv:2009.04094
Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, pp 2951–2959
Tesfatsion L (2002) Agent-based computational economics: growing economies from the bottom up. Artif Life 8(1):55–82
Thomas M, Hall A (2016) Housing affordability in Australia. Briefing Book: Key Issues for the 45th Parliament pp 86–90
Ustvedt S (2016) An agent-based model of a metropolitan housing market-linking micro-level behavior to macro-level analysis. Master’s thesis, NTNU
Vallance L, Charbonnier B, Paul N, Dubost S, Blanc P (2017) Towards a standardized procedure to assess solar forecast accuracy: a new ramp and time alignment metric. Sol Energy 150:408–422
Vincent L, Thome N (2019) Shape and time distortion loss for training deep time series forecasting models. In: Advances in neural information processing systems, pp 4189–4201
Wang W, Yang S, Hu F, Han Z, Jaeger C (2018) An agent-based modeling for housing prices with bounded rationality. In: Journal of physics: conference series, vol 1113. IOP Publishing, p 012014
Watkins CA (2001) The definition and identification of housing submarkets. Environ Plan A 33(12):2235–2253
Wei SJ, Zhang X, Liu Y (2012) Status competition and housing prices. Technical report, National Bureau of Economic Research
Wilkins R, Lass I (2015) The household, income and labour dynamics in Australia survey: Selected findings from waves 1 to 12. Melbourne Institute of Applied Economic and Social Research, University of
Wong PY (2017) Foreign real estate investment and the Australian residential property market: a study on Chinese investors. Int J Soc Behav Educ Econ Bus Ind Eng 11:1529–1538
Yetsenga R, Emmett F (2020) The ANZ CoreLogic housing affordability report 2020. ANZ Media Centre
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors are thankful to Paul Ormerod, Adrián Carro and Markus Brede for many helpful discussions of the baseline model. The authors acknowledge the HPC service at The University of Sydney for providing HPC resources that have contributed to the research results reported within this paper. The authors would also like to acknowledge the Securities Industry Research Centre of Asia-Pacific (SIRCA) and CoreLogic, Inc. (Sydney, Australia) for their data on Greater Sydney housing transactions.
Appendices
A: Implementation
The model is written from scratch in Python3, based on the C++ code from Glavatskiy et al. (2020).
B: Baseline model parameters
The work extends the model of Glavatskiy et al. (2020) (the baseline). For completeness, all parameters of the baseline method with explanations are given in Table 2.
Local sensitivity analysis of these parameters is presented in Fig. 9, where we vary the internal values around their default ranges (within a range of \(\pm 20\%\)) one at a time while keeping the other parameters at their default values. The resulting analysis shows the model is robust to small changes in the parameters. The only parameters that stand out from the analysis deserving additional discussion are the listing price factor, the expectation downshift ratio, and the amount quality reference.
Local sensitivity analysis showing variation in output based on varying the baseline model input parameters one at a time \(\pm 20\%\) around their default value (with others fixed at their default). The red lines represent the output for the default value, and the grey lines represent for the varying values (with shade indicating the distance to the default, black meaning very near the default, and light grey being further away). All show relatively small variations in the output based on the input parameters, indicating the robustness of the parameters (colour figure online)
The listing price factor \(b_\ell \) has relatively large variation in prices, but as shown in Eq. (3.2) this is because it acts as a linear scaler on the list prices, so scales the output within the \(\pm 20\%\) range too, which means the parameter is behaving as expected. The expectation downshift ratio \(p_\mathrm{d}\) highlights the willingness of agents to downgrade the pre-purchase expectation based on the amount offered by the bank. For higher values, this indicates buyers wanting more from banks, and we see this increases prices overall. It is interesting to note, however, that higher values also increase the volatility of the market, with larger fluctuations seen for example in late 2017 (at the peak of the market). This confirms the discussion in the study of Glavatskiy et al. (2020) which highlighted the propensity to borrow as a key explanatory factor of volatility. For the amount quality reference \(\bar{Q}_\mathrm{h}\), this is an integer parameter which explains why there are fewer comparison lines than the continuous parameters, but we still observe well-behaved outputs in the \(\pm 20\%\) range.
For all other parameters, the outputs only result in small variations from the default value indicating the robustness of the model to variations around the internal parameters.
C: market matching
The market matching process where buyers and sellers are matched is relatively simple and given in Algorithm 1. We can see the highest bidding buyer gets preference to the listings, and every buyer attempts to purchase the most expensive listing they can afford. In certain cases, deals are rejected due to external influences (modelled by a random 20% chance of rejection). This matching process is performed once every simulation step, with bids and listings that did not clear persisting into the next step.

D: Rental market
In the baseline model, there was no concept of rental matching. Households were randomly assigned a rental, with no regard to the cost of the dwelling or income of the household. Here, we add in an additional matching process based on the idea that households should spend maximum 30% of their income on housing when possible to avoid housing stress (Thomas and Hall 2016; Fernald 2020).
New households are randomly assigned a “local” area (weighted by the population of each area), where they begin and have their characteristics (wealth, income, cash flow, etc.) assigned. From there, every household attempts to find a vacant rental in their price range (which will likely result in various households moving out of financial requirements). Households with extremely high incomes, where all dwellings are less than 10% of their income, get the most expensive rental available. Households with extremely low income, where all dwellings are at least 30% of their income, get the cheapest one they can afford. All other households randomly choose a rental they can afford (in the 10–30% of income range).
Households remain in their rentals for the duration of the simulation. In this work, we do not attempt to capture the rental market in its entirety and leave this for future work where we would like to model the relationship between renters and investors. The changes were made to ensure the cash flow situations of each household match closer to those seen in the real world, where in the previous model many households would be in a poor cash flow situation due to the rental price. Other work such as Mc Breen (2010) looks more in depth at modelling the rental market.
E: Bayesian optimisation
1.1 E.1 Details
Bayesian optimisation is performed using the Tree of Parzen Estimators approach with hyperopt from Bergstra et al. (2013). The optimisation process was run for 2000 iterations in all cases. The loss was measured as the average loss over several stochastic runs for each set of parameters, to minimise the effect of randomisation in the model and resulting loss.
1.2 E.2 Loss function
For measuring the goodness of fit, we use a loss function with two terms—a shape and temporal term to try and capture the nonlinearities overtime when predicting housing price trends. This loss function is a modification of DILATE (Vincent and Thome 2019) which was introduced as a loss function for neural networks for time series predictions, although DILATE has been simplified here (with the removal of smoothing parameters) as Bayesian optimisation does not require the loss function to be differentiable.
The loss function is given in Eq. (E.1).
\(\lambda =0.5\) was used throughout since this was the most common in the original paper of Vincent and Thome (2019). However, as the terms are not normalised, the two do not have an equal contribution; instead, the temporal term serves more like a penalty on the shape (with \(\lambda \) controlling the strength of the penalty).
The shape term is based on dynamic time warping (DTW) which has commonly been used in speech recognition tasks (Sakoe and Chiba 1978; Myers et al. 1980) and, however, has a wide range of applications in time series data (Berndt and Clifford 1994). Dynamic time warping can be expressed recursively as a minimisation problem as in Eq. (E.2)
This can be read as minimising the cumulative distance (using distance measure d, in this case, euclidean distance) on some warped path between x and y, by taking the distance between the current elements and the minimum of the cumulative distances of neighbouring points.
Unlike the common applications in speech recognition, where words can be spoken at varying speeds (so the peaks do not necessarily match up), in financial markets, timing such peaks is important. This motivates the introduction of a temporal term, for trying to align such peaks and dips. The temporal term is based on Time Distortion Index (TDI) (Frías-Paredes et al. 2016, 2017), which can be thought of as the normalised area between the optimal path and the identity path (where the identify path is (1, 1), (2, 2).., (N, N)) (Vallance et al. 2017) and aims to minimise the impact of shifting and distortion in time series forecasting (Frías-Paredes et al. 2016).
To see the usefulness over a more standard approach loss function such as MSE for time series, consider the example in Fig. 10. We can see the MSE can be a problematic approach, and in some cases (as in the example where the linear line Fig. 10b has a lower loss) be a misleading measure of goodness of fit. DTW helps to match points in the two time series, while TDI helps minimise the offset of the predictions. (Graphically, in the example this corresponds to shortening the dotted grey lines.) For a full analysis, we refer you to the original DILATE paper of Vincent and Thome (2019), noting that all smoothing terms have been removed in the modification here.
Motivation of time series-based loss using a constructed example. We can see the line on the right is a very poor predictor of the true trend, failing to capture any of the peaks or dips. However, the MSE is significantly lower than the line on the left. DTW captures the shifts, and incorporating a penalty on time can penalise these shifts. The light grey lines show how DTW matches points together, even if they do not occur at the same time period
1.3 E.3 Global constraints
We can see 2011–2015 and 2016–2019 fit the trend very closely, although despite having a low loss, the 2006–2010 simulation path does not follow the dip well, as no distinction is made about being above or below the trend in the loss function. Looking at the individual paths from every run, we can see that a peak and dip is predicted in many of the cases, although the distance is greater than the path with the lowest loss which was perfectly matching across a large portion of the training data but missing the dip. We apply a post-optimisation global constraint to 2006–2010, again only using this training period, that the midpoint of the simulation must be higher than the start and ending points (i.e. a peak must occur), and take the parameters with the lowest loss matching this criterion. The process is shown in Fig. 11 and the result is shown in Fig. 5p. We can see for 2006–2010, the \(\ell \) is higher than before the constraint, however, clearly, the constraint allows for a closer overall trend following in the training period. The visualisation in Fig. 11 can also begin to show the wide range of possible market outcomes, for various combinations of the parameters. If a certain section occurs from many parameter outcomes (i.e. with the peak), we can deduce that such dynamics were likely to occur just due to the agent characteristics, regardless of the parameters used. This shows many combinations lead to a peak and dip, perhaps due to mortgage rates and worrying mortgage vs income ratios. This is more in line with suggestions in Edmonds and ní Aodha (2018), which suggest ABMs be used to determine a range of potential future outcomes, which in this case shows a variety of paths leading to a peak and dip.
1.4 E.4 Parameter space
The parameter space is defined in Table 3.
Even though there are only three parameters to tune, the number of potential combinations exceeds 4 million (this is assuming values are discretised values, so the true number is far greater), making a grid search impractical.
The three parameters are h, \(\alpha \), and \(\beta \).
F: Sensitivity analysis
While in Sect. 5.1 we analysed the contribution of each new component by comparing the resulting optimised time series after introducing the components one at a time, here we verify and rank the importance of each of the contributions explicitly using global sensitivity analysis (GSA).
Specifically, we analyse the importance of the trend-following aptitude (h), the social contribution (\(\beta \)), and the role of \(\alpha \) in minimising the loss function.
We use the Morris method (Morris 1991) for a GSA and present the revised \(\mu ^*\) as suggested in Saltelli et al. (2004) and \(\sigma \). \(\mu ^*\) represents the mean absolute elementary effect and can be used to rank the contribution of each parameter, and this solves the problem of \(\mu \) where elementary effects can cancel out. We also analyse \(\sigma \), i.e. the standard deviation of the elementary effects, as a measure of the interactions.
For parameters for the Morris Method, we use \(r=20\) trajectories, \(p=10\) levels, and step size \(\Delta =p/[2(p-1)]\), i.e. \(\Delta \approx 0.52\) with \(p=10\). These are within the range of commonly used parameters, e.g. in Campolongo et al. (2007).
The results are presented in Table 4, and visualised in Figs. 15 and 14.
Checking the importance of each parameter, or \(\mu ^{\star }\), we can see h consistently ranks the most important, showing its changes have the largest effect on \(\ell \). This is followed in importance by \(\beta \), and then \(\alpha \) each year. However, we see that confidence bars do overlap in Fig. 14.
Viewing the Morris plots in Fig. 15, we can see all parameters are deemed important, where unimportant parameters would show up in the bottom leftmost portion of the plot. Using the classification strategy of Sanchez et al. (2014), all parameters are all considered to be non-monotonic and/or with high levels of interaction, since \(\frac{\sigma }{\mu ^{\star }} > 1\) in all cases.
This analysis agrees with the preliminary parameter analysis in Sect. 5.2.
Global sensitivity analysis with Morris plots. Diagonal lines represent the ranges for \(\sigma / \mu ^{\star }\). One classification strategy proposed by Sanchez et al. (2014) says factors which are almost linear should be below the 0.1 line, factors which are monotonic between 0.1 and 0.5 lines, or almost monotonic between the 0.5 and 1 line, and factors with non-monotonic nonlinearities or interactions with other factors above the 1 line
While the Morris method gives us the overall sensitivity across the parameter ranges (in a global way) and allows us to rank the factors in terms of importance, we also provide a fine-grained sensitivity analysis around the default values, i.e. a local sensitivity analysis (LSA). For this, we use \(p=100\) levels, but vary only one parameter at a time while keeping the others fixed at their default values. This is shown in Fig. 16. This analysis shows how robust the resulting default values are to small perturbations, but as this is a local method, the results should be interpreted with caution (and only in conjunction with the GSA method above), since this does not account for any parameter interactions as warned in Saltelli et al. (2019).
Viewing h (the left column), we can see all values surrounding the default have a similar loss, showing the model is robust to small changes in the aptitude. Looking across the entire search space, we can see choosing from within an appropriate range for the aptitude is important though, but the surrounding parameters are always relatively smooth to the resulting loss. Viewing \(\beta \) (the middle column), we can see the sharp transition above zero. There is a clear optimal range for \(\beta \), where the default lies. However, again, the area surrounding the default values is smooth showing robustness to the default parameters (assuming we do not vary past the sharp transition). Looking at \(\alpha \) (the final column), the plots initially seem somewhat jagged, although when looking at the scale of the y-axis it becomes clear these are very small shifts in loss (as verified by the plotted time series with varying \(\alpha \) levels). \(\alpha \) was deemed the least important of the three parameters by the Morris method screening, but was still important based on the positioning on the Morris plot. We can verify this here, where changes in \(\alpha \) do not have a huge impact on \(\ell \).
GSA was performed using SALib from Herman and Usher (2017).
G: Network topology
In Sect. 4, we introduced a novel graph-based structure for representing the region, at the same time introducing spatial submarkets into the simulation (based on nodes in the graph). In Sect. 5.1 and “Appendix F”, we have validated the usefulness of the newly introduced parameters and performed a sensitivity analysis of the parameters, whereas in this section, we look to validate the usefulness of the network structure itself.
To do this, we compare the newly proposed model (with all parameters included), against an identical model with only a single node. We also compare to a fully connected network, i.e. where the spatial element (in terms of neighbourhoods) is not considered directly, but specific areas still exist. These structures are visualised in Fig. 17.
Various potential network architectures. The left represents a single node (i.e. no spatial component). The middle is the proposed graph-based approach constructed from the topological layout of the region. The right is a complete graph, with individual areas, but no concept of spatial neighbours (due to the fully connected nature)
1.1 G.1 Topologies
1.1.1 G.1.1 No spatial component
To remove submarkets and all spatial components, we use a graph composed of a single node representing the overall Greater Sydney region. That is, agent characteristics and dwelling prices are assigned based on the overall Greater Sydney distributions, rather than specific area distributions. To implement this, rather than G being defined as in Sect. 4.1.1, instead, G contains a single node (i.e. a singleton graph) where the node represents the overall Greater Sydney region, i.e. it is the graph \(K_{1}\). With this single-node configuration, the spatial outreach from Eq. (4.3) is removed, as there is no concept of space. Likewise, \(\beta \) is no longer defined, as this is expressed in spatial terms. However, \(\alpha \) remains, which controls the boundedness of the agent as discussed in “Appendix L”, and h keeps the same interpretation.
1.1.2 G.1.2 Fully connected graph
To represent the fully connected areas, we use a complete graph where every LGA is connected to every other LGA, i.e. \(G=K_{38}\). Again, outreach [from Eq. (4.3)] need not be considered, since now every area is directly connected to one another. With this representation, the introduced parameters still remain (i.e \(\alpha , \beta , h\)). Note that \(\alpha \) again directly corresponds to the boundedness (discussed in “Appendix L”) and does not encompass outreach. \(\beta \) and h have no change to their original meanings introduced in Sect. 4. This introduces individual submarkets (based on LGAs) into the simulation, but does not enforce any spatial-based search costs within the market. As in the proposed approach, agent calibration is also based on the area in which they reside, so agent characteristics match that of their area.
1.1.3 G.1.3 Analysis
The resulting comparisons are visualised in Fig. 18 which shows the models which include individual areas significantly outperforming the overall Greater Sydney model, indicating the usefulness of area-specific submarkets. Between the two area models, there was little difference in aggregate performance (as shown in Fig. 18) which shows the performance improvements come mainly from the introduction of submarkets, not necessarily on the overall spatial structure. This shows that \(\beta \), which is based on the individual areas, initialising agent characteristics based on location, and the modification of price setting in terms of \(\overline{Q}_\mathrm{h}\) based on the area are key for resulting prices (more so than the spatial outreach costs).
However, when considering the resulting agent preferences (in terms of suburbs to purchase in), the fully connected map had significantly more people moving to more remote regions, whereas the LGA connected spatial topology prevented as drastic movements (with the spatial outreach term), capturing the fact people are often tied to specific areas (i.e. those who work in the CBD and currently reside near there, are unlikely to the outskirts of the Greater Sydney region). This is visualised in Fig. 19, where we show the proposed movements follow a diffusive-like pattern (Slavko et al. 220b) compared to the fully connected topology which had movements to more remote regions of Greater Sydney. We further discuss the resulting movements Sect. 6.3, where we show with the agent movement patterns with the proposed spatial structure are logical and consistent with the actual reported movements in the Greater Sydney region based on the observed trends reported in the recent literature.
Variations in agent preferences with the proposed spatial structure (left) and without a spatial structure enforced (right) from first-time home buyers situated in the Canterbury–Bankstown region (visualised in pink and labelled). The colour indicates the percentage of purchases in those areas, with dark indicating a high percentage of relative purchases (controlled for population sizes). Without a preserved spatial structure we see much higher rate of purchases on the outskirts of the Greater Sydney region, such as in Upper Lachlan Shire and Lithgow (labelled). We verify the proposed movements are logical in Sect. 6.3 (colour figure online)
In this section, we have shown the area-specific submarket extensions significantly outperform an equivalent model which does not include individual areas, highlighting the importance of capturing submarkets. Furthermore, the incorporation of area-specific submarkets allows for additional insights (such as those in Sect. 6.3.1 which would not otherwise be possible). We then further validated the choice of the network topology by comparing resulting agent preferences, and showing the proposed architecture (with spatial-based search costs) prevents drastic movements by the agent (in terms of relative distance from an agent’s current location), allowing the agents to act based on their location (preferring closer areas) in a manner consistent with the actual observed trends as discussed in Sect. 6.3.
H: Experiment settings
Due to the stochastic and non-deterministic nature of ABMs, we run 100 Monte Carlo simulations per run (unless otherwise stated) and report the aggregate results over all runs to get a robust estimate.
1.1 H.1 Scale
Experiments are run at a 1:100 scale of the true housing market, i.e. every one hundred households in the Greater Sydney region are represented by one household in the model. The 1:100 scale was chosen for efficiency, but results for 1:50, 1:100, and 1:200 are also presented in Fig. 20 to show robustness to scale. There is an upper limit on the scale where the performance will begin to degrade, for example, the number of overseas investments is given in Table 5, and by using a scale close to 1:1000, we would lose the contribution of foreign investments (since the values would be \(<1\)). For lower scales (i.e. 1:1), the results may be more accurate but this comes at the expense of increased computational power, so the 1:100 provided a good trade-off between accuracy and efficiency.
I: Initialisation data
1.1 I.1 Data
All real estate listings and sales from 2006 to present (2020) were used from SIRCA–CoreLogic, including the sale price, LGA, and sale date. These data are used as the actual price, and to calibrate the ABM.
1.2 I.2 Spatial initialisation
1.2.1 I.2.1 Pricing distributions
Between LGAs, there is a wide range of dwelling sale prices, and different distributions of prices amongst the LGAs as well. To sample from this effectively, we use kernel density estimation (KDE) to create a probability density function for each LGA for each time period. The previous 3 months of sales from the beginning of the time period are used to generate the density function. Scott’s Rule (Scott 2015) is used to assign the bandwidth, which sets the bandwidth to \(n^{\frac{-1}{d+4}}\), where n is the number of data points (in this case dwelling sales in the LGA at the beginning of the time period), and d is the number of dimensions (in this case \(d=1\)). When new houses are created for an area, they are set with an initial quality based on this distribution. The resulting KDEs are shown in Fig. 21.
1.2.2 I.2.2 Positioning
Households are not assigned to an LGA directly, as households can freely move areas. Instead, the households area is based on the residential dwelling of the household (and thus can change over time). When we reference a households area, we are referring to the LGA of the dwelling where the household currently resides.
At the beginning of the simulation, households which are homeowners are assigned to dwellings to match the population distribution amongst LGAs. The income and liquid wealth for the household are then assigned based on the brackets from the dwellings LGA. Renters are assigned a random LGA to begin with (again weighted by the population of each LGA) and income and wealth based on the distribution of that LGA. Households then try and find a rental they can afford (on with a rental price approximately 10–30% of the household’s income) which may mean some have to move LGAs.
1.3 I.3 Time periods
In line with the previous work of Glavatskiy et al. (2020), and following the Australian census timelines (which are performed every 5 years), we choose the three most recent census periods for analysis. These are 2006–2010, 2011–2015, and 2016–2019. The length was chosen such that upon new census information becoming available, a new simulation is run. Meaning, a separate model (and optimisation process) is run for each of the time periods to ensure the model is calibrated to the most recent data available. In doing so, we ensure the agent characteristics of the model most closely match those in the true Greater Sydney market. As each period corresponds to the census years, there is a large array of available data for calibration to ensure the models begin in a state as close to possible as the true populations state. Alternate (non-census) dates could be used; however, the model may not begin with as accurate of a reflection on the true underlying agent characteristics (depending on the data availability). While the models are calibrated for the time periods outlined here, such calibrations would also work well for surrounding dates (or alternate run duration’s), or the model could be re-calibrated for alternative dates to provide additional forecasting—for example, after the 2021 census, the agent characteristics could be reassigned and a new optimisation process run to reflect updated agent behaviours, likewise for past market behaviour such as with the 2001 census.
1.4 I.4 Household characteristics
1.4.1 I.4.1 Area
Agents are initialised into an area based on the Australian census data, meaning the population of each area at the beginning of the simulations corresponds to the proportions from the census data for that time period. For example, if there are three areas “A”, “B”, “C”, and the true proportions in each are 60:25:15, the model will also populate agents into the three areas according to this proportion. Throughout the simulation, agents may move areas. They may be forced to move to a cheaper area if they cannot afford their current area, or they may move to a more affluent area if they can afford a dwelling there. So once the simulation begins, the movement dynamics are controlled by the agents’ cash flow position (again from census data, outlined below). Initialising agents into areas based on census data allows for correct agent characteristics (such as income and net worth) that directly line up with those observed throughout the Greater Sydney region.
1.4.2 I.4.2 Income
Income is assigned from the distribution based on the households area. This distribution comes from the census data. Income grows throughout the simulation. The income brackets follow those specified in the census data.
1.4.3 I.4.3 Liquid wealth
Again, the liquid wealth (liquidity) of a household is based on the true distributions from census data. However, in this case, liquidity is not available per LGA, only for Greater Sydney as a whole. So to map a household to an appropriate liquidity bracket, the households liquid is based on the income of the household. That is, if a household is in the top X% of earners in an LGA, the liquidity will be in the top X% as well (approximately, since liquidity is from brackets).
1.5 I.5 Population distribution
In this case, there are three measures of interest. The total number of dwellings, the total number of households, and the distribution of these households amongst LGAs. The dwellings and households estimates from the census data are used for each year, and simple linear projections used for forecasting the growth of these. The distribution amongst LGAs is that recorded at the start of the simulation and is assumed to grow linearly with the overall population size. Individual LGA future population projections are available from 2016 onward, but as no projections existed before this date, we used this simplified measure instead of all LGAs growing by a fixed percentage within a given simulation period. As such, higher movements towards one particular LGA throughout simulation could indicate the requirement of additional dwellings being built here to cater for the growth, which is another contribution we consider in later sections of this work.
J: Movement pattern visualisations
Over 10 million total movements were tracked across the simulations (approximately 3.3 million per time period). All plots in this section represent the normalised heatmaps of these movements. The total number of movements to a particular LGA is scaled by the population size of this LGA, meaning the results can be interpreted as a preference for certain areas rather than visualising the population size of the LGAs. Therefore, movements are not just reflecting larger populations, instead, reflecting a larger portion of people moving there relative to the size. All movements are then normalised such that the summation of all cells in the plot is 1, meaning if a particular cell has a value of 0.05, this means 5% of all matched movements moved to this LGA.
The rows and columns of the plots are always sorted in ascending order based on median price, i.e. the most affordable LGAs first, and the most expensive LGA as the final row or column.
K: Exogenous variables
There are two main external influences on the model, which are governed by government approvals (in the case of overseas investments) and the central bank (in the case of mortgage rates).
1.1 K.1 Overseas investors
Overseas investments are often cited as a key driver of price growth in the Australian market (Rogers et al. 2017), and figures show the foreign investment has more than tripled since the mid-1990s (Haylen 2014). However, actual data on foreign investments are difficult to find. ABS has described their own data on overseas investments to parliament as “hit or miss” Iggulden (2014).
The purpose of this work is not a full investigation into overseas investments [overviews are given in Gauder et al. (2014), House of Representatives Standing Committee on Economics (2014)], but rather the contribution overseas might have in relation to many other factors with the readily available data (be this complete or not).
For this, we use the annual reports from the Foreign Investment Review Board (FIRB) from June 2006 to June 2019. The June 2019–June 2020 report was not available at the time of this writing (in 2020), as reports are not made available until the following year. Data are provided yearly at a NSW level, which is converted to monthly (simply dividing by 12). Again, data in this area are sparse, so this is the closest estimate we could derive. These data are provided in Table 5, and the average approval per year given in Fig. 25.
While the data are provided for the entirety of NSW, it has been shown that foreign investors prefer the inner city over rural areas, and thus, the NSW levels have been used for Greater Sydney. This is a fair assumption since the numbers are relatively conservative anyway. For the testing period, the most recent overseas approval value from the training period is used.
1.2 K.2 Mortgage rates
Mortgage rates are those set by the RBA. The final training months mortgage rate is used throughout the testing period since no real value can be read.
L: Utility function
Following Axtell et al. (2014), agents are assumed to choose the most expensive house they can afford, that is, the house price directly corresponds to the utility for the agent.
However, the introduction of \(\alpha \) alters this, such that there is some uncertainty or error in the agents choice. When \(\alpha =1\), the perfect utility maximisation behaviour is recovered where the agent attempts to purchase the most expensive dwelling they can afford. For \(\alpha < 1\), the agent buys the most expensive dwelling they can afford with probability \(\alpha \), which then decreases for each subsequent listing in turn. This is visualised in Fig. 26. For high \(\alpha \), we can see the probability mass is contained only in the highest priced dwellings. For lower \(\alpha \), this probability mass becomes more distributed, meaning less focus on utility, and potential for cheaper houses to be purchased. For \(\alpha =0\), the utility is not considered at all and a random house within the agents budget is chosen (i.e. the probability mass is uniform across options). \(\alpha \), therefore, corresponds to the boundedness of the agent.
The above description considers the case of uniform knowledge, i.e. for investors where they are assumed to be invariant to the areas available. However, for first-time home buyers, we propose a space-based knowledge where buyers are more likely to consider listings close to where they are renting. The probability associated with the distance to the agents’ location is visualised in Fig. 27. The uniform knowledge of investors is given in green, and the spatial knowledge of first-time home buyers is given as the dotted black line.
First-time home buyers probability of viewing a listing for various \(\alpha \)’s. Showing the relationship between dwelling price (x-axis) and distance to dwelling (y-axis), and how \(\alpha \) adjusts this distribution. Low \(\alpha \)’s correspond to higher dispersion, and less focus on utility maximising behaviour. High \(\alpha \)’s focus the agent on dwellings which maximise utility
For first-time home buyers, the probability of viewing a listing is therefore controlled by both the proximity of the listing to the agents current (rental) location, and the price of the listing. This is visualised in Fig. 28. For \(\alpha =0\), the agent preference is uniform across all choices, placing no emphasis on utility (from either price or difference). As \(\alpha \) increases, the focus shifts to the more expensive dwellings, and does so based on the distance to the listing. This is shown in Fig. 28, where with increasing \(\alpha \) the emphasis focuses on the top right corner, which is the optimal value for both distance (closest) and price (most expensive in the agents budget). We can see that price remains the most important term in the agents’ utility though, with close listings with low prices having a low resulting probability, indicating the agent likely wants to move to a more affluent area if they can afford to do so. However, given an equal price, agents will prefer the closer listing.
Rights and permissions
About this article
Cite this article
Evans, B.P., Glavatskiy, K., Harré, M.S. et al. The impact of social influence in Australian real estate: market forecasting with a spatial agent-based model. J Econ Interact Coord 18, 5–57 (2023). https://doi.org/10.1007/s11403-021-00324-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11403-021-00324-7