Skip to main content
Log in

The impact of social influence in Australian real estate: market forecasting with a spatial agent-based model

Journal of Economic Interaction and Coordination Aims and scope Submit manuscript

Cite this article


Housing markets are inherently spatial, yet many existing models fail to capture this spatial dimension. Here, we introduce a new graph-based approach for incorporating a spatial component in a large-scale urban housing agent-based model (ABM). The model explicitly captures several social and economic factors that influence the agents’ decision-making behaviour (such as fear of missing out, their trend-following aptitude, and the strength of their submarket outreach), and interprets these factors in spatial terms. The proposed model is calibrated and validated with the housing market data for the Greater Sydney region. The ABM simulation results not only include predictions for the overall market, but also produce area-specific forecasting at the level of local government areas within Sydney as arising from individual buy and sell decisions. In addition, the simulation results elucidate agent preferences in submarkets, highlighting differences in agent behaviour, for example, between first-time home buyers and investors, and between both local and overseas investors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others


  1. \(p_\mathrm{list}\) can technically be \(<0\) or \(>1\), so \(p_\mathrm{list}\) is capped to be between 0 and 1, in order to be a true probability, although this is exceptionally rare and does not appear to occur in Fig. 4.

  2. Perfect knowledge in this paper is assumed to mean \(\alpha =1, O(v_i,v_j)=1\), i.e. ability to view every listing across all of Greater Sydney, i.e. \({\mathcal {M}}\) in 4.1.1.

  3. All movements are scaled by the population size to allow a fair comparison, as outlined in “Appendix J”.


  • Alhashimi H, Dwyer W (2004) Is there such an entity as a housing market. In: 10th annual pacific rim real estate conference (press), Bangkok

  • Arcaute E, Molinero C, Hatna E, Murcio R, Vargas-Ruiz C, Masucci AP, Batty M (2016) Cities and regions in Britain through hierarchical percolation. R Soc Open Sci 3(4):150691

    Article  Google Scholar 

  • Axtell R, Farmer D, Geanakoplos J, Howitt P, Carrella E, Conlee B, Goldstein J, Hendrey M, Kalikman P, Masad D, et al. (2014) An agent-based model of the housing market bubble in metropolitan Washington, DC. In: Whitepaper for Deutsche Bundesbank’s Spring conference on “Housing markets and the macroeconomy: challenges for monetary policy and financial stability”

  • Bahadir B, Mykhaylova O (2014) Housing market dynamics with delays in the construction sector. J Hous Econ 26:94–108

    Article  Google Scholar 

  • Bangura M, Lee CL (2019) The differential geography of housing affordability in Sydney: a disaggregated approach. Aust Geogr 50(3):295–313

    Article  Google Scholar 

  • Bangura M, Lee CL (2020) Housing price bubbles in Greater Sydney: evidence from a submarket analysis. Hous Stud 1–36.

  • Baptista R, Farmer JD, Hinterschweiger M, Low K, Tang D, Uluc A (2016) Macroprudential policy in an agent-based model of the UK housing market. Bank of England working papers 619, Bank of England.

  • Barbosa H, Barthelemy M, Ghoshal G, James CR, Lenormand M, Louail T, Menezes R, Ramasco JJ, Simini F, Tomasini M (2018) Human mobility: models and applications. Phys Rep 734:1–74

    Article  Google Scholar 

  • Barthelemy M (2016) The structure and dynamics of cities. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Barthelemy M (2019) The statistical physics of cities. Nat Rev Phys 1(6)

  • Barthelemy M, Bordin P, Berestycki H, Gribaudi M (2013) Self-organization versus top-down planning in the evolution of a city. Sci Rep 3:2153

    Article  Google Scholar 

  • Bergstra J, Yamins D, Cox D (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: International conference on machine learning, pp 115–123

  • Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Proceedings of the 3rd international conference on knowledge discovery and data mining. AAAI Press, AAAIWS’94, pp 359–370

  • Bessant JC, Johnson G (2013) Dream on declining homeownership among young people in Australia? Hous Theory Soc 30(2):177–192

    Article  Google Scholar 

  • Burnside C, Eichenbaum M, Rebelo S (2016) Understanding booms and busts in housing markets. J Polit Econ 124(4):1088–1147

    Article  Google Scholar 

  • Campolongo F, Cariboni J, Saltelli A (2007) An effective screening design for sensitivity analysis of large models. Environ Model Softw 22(10):1509–1518

    Article  Google Scholar 

  • Carstensen CL (2015) An agent-based model of the housing market: steps toward a computational tool for policy analysis. University of Copenhagen, MSc-szakdolgozat

  • Chang SL, Harding N, Zachreson C, Cliff OM, Prokopenko M (2020) Modelling transmission and control of the COVID-19 pandemic in Australia. arXiv preprint arXiv:2003.10218

  • Cheng IH, Raina S, Xiong W (2014) Wall street and the housing bubble. Am Econ Rev 104(9):2797–2829

    Article  Google Scholar 

  • Cliff OM, Harding N, Piraveenan M, Erten EY, Gambhir M, Prokopenko M (2018) Investigating spatiotemporal dynamics and synchrony of influenza epidemics in Australia: an agent-based modelling approach. Simul Model Pract Theory 87:412–431

    Article  Google Scholar 

  • Conlisk J (1996) Why bounded rationality? J Econ Lit 34(2):669–700

    Google Scholar 

  • Crosato E, Nigmatullin R, Prokopenko M (2018) On critical dynamics and thermodynamic efficiency of urban transformations. R Soc Open Sci 5(10):180863

    Article  Google Scholar 

  • Crosato E, Prokopenko M, Harré MS (2021) The polycentric dynamics of Melbourne and Sydney: suburb attractiveness divides a city at the home ownership level. Proc R Soc A 477(2245):20200514

    Article  Google Scholar 

  • Edmonds B, ní Aodha L (2018) Using agent-based modelling to inform policy—what could possibly go wrong? In: International workshop on multi-agent systems and agent-based simulation. Springer, pp 1–16

  • Fernald M (2020) Americas rental housing 2020. Joint Center for Housing Studies of Harvard University, Cambridge

  • Frías-Paredes L, Mallor F, León T, Gastón-Romeo M (2016) Introducing the temporal distortion index to perform a bidimensional analysis of renewable energy forecast. Energy 94:180–194

    Article  Google Scholar 

  • Frías-Paredes L, Mallor F, Gastón-Romeo M, León T (2017) Assessing energy forecasting inaccuracy by simultaneously considering temporal and absolute errors. Energy Convers Manag 142:533–546

    Article  Google Scholar 

  • Gallegati M, Kirman A (1999) Beyond the representative agent. Edward Elgar Publishing, Cheltenham

    Google Scholar 

  • Gauder M, Houssard C, Orsmond D, et al. (2014) Foreign investment in residential real estate. RBA Bulletin, June pp 11–18

  • Ge J (2013) Who creates housing bubbles? An agent-based study. In: International workshop on multi-agent systems and agent-based simulation. Springer, pp 143–150

  • Ge J (2017) Endogenous rise and collapse of housing price: an agent-based model of the housing market. Comput Environ Urban Syst 62:182–198

    Article  Google Scholar 

  • Geanakoplos J, Axtell R, Farmer JD, Howitt P, Conlee B, Goldstein J, Hendrey M, Palmer NM, Yang CY (2012) Getting at systemic risk via an agent-based model of the housing market. Am Econ Rev 102(3):53–58

    Article  Google Scholar 

  • Gilbert N, Hawksworth JC, Swinney PA (2009) An agent-based model of the English housing market. In: AAAI spring symposium: technosocial predictive analytics, pp 30–35

  • Glavatskiy KS, Prokopenko M, Carro A, Ormerod P, Harre M (2020) Explaining herding and volatility in the cyclical price dynamics of urban housing markets using a large scale agent-based model. arXiv preprint arXiv:2004.07571

  • Goldstein J (2017) Rethinking housing with agent-based models: Models of the housing bubble and crash in the Washington DC area 1997–2009. PhD thesis, George Mason University

  • Greater Sydney Commission (2018) Greater Sydney region plan: a metropolis of three cities. NSW Department of Planning and Environment. Accessed 22 Aug 2020

  • Guest R, Rohde N (2017) The contribution of foreign real estate investment to housing price growth in Australian capital cities. Abacus 53(3):304–318

    Article  Google Scholar 

  • Haylen A (2014) House prices, ownership and affordability: trends in New South Wales. NSW Parliamentary Library

  • Herman J, Usher W (2017) SALib: an open-source Python library for sensitivity analysis. J Open Source Softw.

  • House of Representatives Standing Committee on Economics (2014) Report on Foreign Investment in Residential Real Estate. The Parliament of the Commonwealth of Australia

  • Huang Y, Ge J (2009) House prices and the collapse of stock market in mainland China?-an empirical study on house price index. In: Pacific Rim real estate conference

  • Iggulden T (2014) ABS admits data on foreign real estate buyers is ‘hit and miss’. Accessed 10 May 2020

  • Kim JH, Pagliara F, Preston J (2005) The intention to move and residential location choice behaviour. Urban Stud 42(9):1621–1636

    Article  Google Scholar 

  • Kouwenberg R, Zwinkels R (2014) Forecasting the US housing market. Int J Forecast 30(3):415–425

    Article  Google Scholar 

  • Kouwenberg R, Zwinkels RC (2015) Endogenous price bubbles in a multi-agent system of the housing market. PLoS ONE 10(6):e0129070

    Article  Google Scholar 

  • Kupke V, Rossini P (2011) Housing affordability in Australia for first home buyers on moderate incomes. Property Management

  • La Cava G, Leal H, Zurawski A et al (2017) Housing accessibility for first home buyers. Reserve Bank of Australia Bulletin, pp 19–28

  • LeBaron B, Tesfatsion L (2008) Modeling macroeconomies as open-ended dynamic systems of interacting agents. Am Econ Rev 98(2):246–50

    Article  Google Scholar 

  • Louf R, Barthelemy M (2013) Modeling the polycentric transition of cities. Phys Rev Lett 111(19):198702

    Article  Google Scholar 

  • Mc Breen J, Goffette-Nagot F, Jensen P (2010) Information and search on the housing market: an agent-based model. In: Li Calzi M, Milone L, Pellizzari P (eds) Progress in artificial economics. Springer, Berlin, pp 153–164

    Chapter  Google Scholar 

  • McMaster R, Watkins C (1999) The economics of housing: the need for a new approach. In: PRRES/AsRES/IRES conference. Kuala Lumpur

  • Miles W (2008) Boom-bust cycles and the forecasting performance of linear and non-linear models of house prices. J Real Estate Finance Econ 36(3):249–264

    Article  Google Scholar 

  • Morris MD (1991) Factorial sampling plans for preliminary computational experiments. Technometrics 33(2):161–174

    Article  Google Scholar 

  • Myers C, Rabiner L, Rosenberg A (1980) Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans Acoust Speech Signal Process 28(6):623–635

    Article  Google Scholar 

  • Pangallo M, Nadal JP, Vignes A (2019) Residential income segregation: a behavioral model of the housing market. J Econ Behav Organ 159:15–35

    Article  Google Scholar 

  • Pawson H, Martin C (2020) Rental property investment in disadvantaged areas: the means and motivations of Western Sydney’s new landlords. Hous Stud 1–23

  • Piazzesi M, Schneider M, Tuzel S (2007) Housing, consumption and asset pricing. J Financ Econ 83(3):531–569

    Article  Google Scholar 

  • Piovani D, Arcaute E, Uchoa G, Wilson A, Batty M (2018) Measuring accessibility using gravity and radiation models. R Soc Open Sci 5(9):171668

    Article  Google Scholar 

  • Poledna S, Miess MG, Hommes CH (2019) Economic forecasting with an agent-based model. Available at SSRN 3484768

  • Polhill G (2018) Why the social simulation community should tackle prediction. Rev Artif Soc Soc Simul. Accessed 1 May 2020

  • Power C (2009) A spatial agent-based model of n-person prisoner’s dilemma cooperation in a socio-geographic community. J Artif Soc Soc Simul 12(1):8

    Google Scholar 

  • Raimbault J, Broere J, Somveille M, Serna JM, Strombom E, Moore C, Zhu B, Sugar L (2020) A spatial agent based model for simulating and optimizing networked eco-industrial systems. Resour Conserv Recycl 155:104538

    Article  Google Scholar 

  • Randolph B, Pinnegar S, Tice A (2013) The first home owner boost in Australia: a case study of outcomes in the Sydney housing market. Urban Policy Res 31(1):55–73

    Article  Google Scholar 

  • Rogers D, Lee CL, Yan D (2015) The politics of foreign investment in Australian housing: Chinese investors, translocal sales agents and local resistance. Hous Stud 30(5):730–748

    Article  Google Scholar 

  • Rogers D, Wong A, Nelson J (2017) Public perceptions of foreign and Chinese real estate investment: intercultural relations in Global Sydney. Aust Geogr 48(4):437–455

    Article  Google Scholar 

  • Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49

    Article  Google Scholar 

  • Saltelli A, Tarantola S, Campolongo F, Ratto M (2004) Sensitivity analysis in practice: a guide to assessing scientific models, vol 1. Wiley Online Library, New York

    Google Scholar 

  • Saltelli A, Aleksankina K, Becker W, Fennell P, Ferretti F, Holst N, Li S, Wu Q (2019) Why so many published sensitivity analyses are false: a systematic review of sensitivity analysis practices. Environ Model Softw 114:29–39

    Article  Google Scholar 

  • Sanchez DG, Lacarrière B, Musy M, Bourges B (2014) Application of sensitivity analysis in building energy simulations: combining first-and second-order elementary effects methods. Energy Build 68:741–750

    Article  Google Scholar 

  • Scott DW (2015) Multivariate density estimation: theory, practice, and visualization. Wiley, New York

    Google Scholar 

  • Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N (2015) Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE 104(1):148–175

    Article  Google Scholar 

  • Shimizu C, Nishimura KG, Watanabe T (2010) Housing prices in Tokyo: a comparison of hedonic and repeat sales measures. Jahrbücher für Nationalökonomie und Statistik 230(6):792–813

    Article  Google Scholar 

  • Simini F, González MC, Maritan A, Barabási AL (2012) A universal model for mobility and migration patterns. Nature 484(7392):96

    Article  Google Scholar 

  • Simon HA (1955) A behavioral model of rational choice. Q J Econ 69(1):99–118

    Article  Google Scholar 

  • Simon HA (1957) Models of man; social and rational. Wiley, New York

    Google Scholar 

  • Sinai TM (2012) House price moments in boom-bust cycles. Technical report, National Bureau of Economic Research

  • Slavko B, Glavatskiy K, Prokopenko M (2019) Dynamic resettlement as a mechanism of phase transitions in urban configurations. Phys Rev E 99(4):042143

    Article  Google Scholar 

  • Slavko B, Glavatskiy K, Prokopenko M (2020a) City structure shapes directional resettlement flows in Australia. Sci Rep 10(1):1–11

    Article  Google Scholar 

  • Slavko B, Prokopenko M, Glavatskiy KS (2020b) Diffusive resettlement: irreversible urban transitions in closed systems. arXiv preprint arXiv:2009.04094

  • Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, pp 2951–2959

  • Tesfatsion L (2002) Agent-based computational economics: growing economies from the bottom up. Artif Life 8(1):55–82

    Article  Google Scholar 

  • Thomas M, Hall A (2016) Housing affordability in Australia. Briefing Book: Key Issues for the 45th Parliament pp 86–90

  • Ustvedt S (2016) An agent-based model of a metropolitan housing market-linking micro-level behavior to macro-level analysis. Master’s thesis, NTNU

  • Vallance L, Charbonnier B, Paul N, Dubost S, Blanc P (2017) Towards a standardized procedure to assess solar forecast accuracy: a new ramp and time alignment metric. Sol Energy 150:408–422

    Article  Google Scholar 

  • Vincent L, Thome N (2019) Shape and time distortion loss for training deep time series forecasting models. In: Advances in neural information processing systems, pp 4189–4201

  • Wang W, Yang S, Hu F, Han Z, Jaeger C (2018) An agent-based modeling for housing prices with bounded rationality. In: Journal of physics: conference series, vol 1113. IOP Publishing, p 012014

  • Watkins CA (2001) The definition and identification of housing submarkets. Environ Plan A 33(12):2235–2253

    Article  Google Scholar 

  • Wei SJ, Zhang X, Liu Y (2012) Status competition and housing prices. Technical report, National Bureau of Economic Research

  • Wilkins R, Lass I (2015) The household, income and labour dynamics in Australia survey: Selected findings from waves 1 to 12. Melbourne Institute of Applied Economic and Social Research, University of

  • Wong PY (2017) Foreign real estate investment and the Australian residential property market: a study on Chinese investors. Int J Soc Behav Educ Econ Bus Ind Eng 11:1529–1538

    Google Scholar 

  • Yetsenga R, Emmett F (2020) The ANZ CoreLogic housing affordability report 2020. ANZ Media Centre

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Benjamin Patrick Evans.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors are thankful to Paul Ormerod, Adrián Carro and Markus Brede for many helpful discussions of the baseline model. The authors acknowledge the HPC service at The University of Sydney for providing HPC resources that have contributed to the research results reported within this paper. The authors would also like to acknowledge the Securities Industry Research Centre of Asia-Pacific (SIRCA) and CoreLogic, Inc. (Sydney, Australia) for their data on Greater Sydney housing transactions.


A: Implementation

The model is written from scratch in Python3, based on the C++ code from Glavatskiy et al. (2020).

B: Baseline model parameters

The work extends the model of Glavatskiy et al. (2020) (the baseline). For completeness, all parameters of the baseline method with explanations are given in Table 2.

Table 2 Baseline model parameters

Local sensitivity analysis of these parameters is presented in Fig. 9, where we vary the internal values around their default ranges (within a range of \(\pm 20\%\)) one at a time while keeping the other parameters at their default values. The resulting analysis shows the model is robust to small changes in the parameters. The only parameters that stand out from the analysis deserving additional discussion are the listing price factor, the expectation downshift ratio, and the amount quality reference.

Fig. 9
figure 9

Local sensitivity analysis showing variation in output based on varying the baseline model input parameters one at a time \(\pm 20\%\) around their default value (with others fixed at their default). The red lines represent the output for the default value, and the grey lines represent for the varying values (with shade indicating the distance to the default, black meaning very near the default, and light grey being further away). All show relatively small variations in the output based on the input parameters, indicating the robustness of the parameters (colour figure online)

The listing price factor \(b_\ell \) has relatively large variation in prices, but as shown in Eq. (3.2) this is because it acts as a linear scaler on the list prices, so scales the output within the \(\pm 20\%\) range too, which means the parameter is behaving as expected. The expectation downshift ratio \(p_\mathrm{d}\) highlights the willingness of agents to downgrade the pre-purchase expectation based on the amount offered by the bank. For higher values, this indicates buyers wanting more from banks, and we see this increases prices overall. It is interesting to note, however, that higher values also increase the volatility of the market, with larger fluctuations seen for example in late 2017 (at the peak of the market). This confirms the discussion in the study of Glavatskiy et al. (2020) which highlighted the propensity to borrow as a key explanatory factor of volatility. For the amount quality reference \(\bar{Q}_\mathrm{h}\), this is an integer parameter which explains why there are fewer comparison lines than the continuous parameters, but we still observe well-behaved outputs in the \(\pm 20\%\) range.

For all other parameters, the outputs only result in small variations from the default value indicating the robustness of the model to variations around the internal parameters.

C: market matching

The market matching process where buyers and sellers are matched is relatively simple and given in Algorithm 1. We can see the highest bidding buyer gets preference to the listings, and every buyer attempts to purchase the most expensive listing they can afford. In certain cases, deals are rejected due to external influences (modelled by a random 20% chance of rejection). This matching process is performed once every simulation step, with bids and listings that did not clear persisting into the next step.

figure a

D: Rental market

In the baseline model, there was no concept of rental matching. Households were randomly assigned a rental, with no regard to the cost of the dwelling or income of the household. Here, we add in an additional matching process based on the idea that households should spend maximum 30% of their income on housing when possible to avoid housing stress (Thomas and Hall 2016; Fernald 2020).

New households are randomly assigned a “local” area (weighted by the population of each area), where they begin and have their characteristics (wealth, income, cash flow, etc.) assigned. From there, every household attempts to find a vacant rental in their price range (which will likely result in various households moving out of financial requirements). Households with extremely high incomes, where all dwellings are less than 10% of their income, get the most expensive rental available. Households with extremely low income, where all dwellings are at least 30% of their income, get the cheapest one they can afford. All other households randomly choose a rental they can afford (in the 10–30% of income range).

Households remain in their rentals for the duration of the simulation. In this work, we do not attempt to capture the rental market in its entirety and leave this for future work where we would like to model the relationship between renters and investors. The changes were made to ensure the cash flow situations of each household match closer to those seen in the real world, where in the previous model many households would be in a poor cash flow situation due to the rental price. Other work such as Mc Breen (2010) looks more in depth at modelling the rental market.

E: Bayesian optimisation

1.1 E.1 Details

Bayesian optimisation is performed using the Tree of Parzen Estimators approach with hyperopt from Bergstra et al. (2013). The optimisation process was run for 2000 iterations in all cases. The loss was measured as the average loss over several stochastic runs for each set of parameters, to minimise the effect of randomisation in the model and resulting loss.

1.2 E.2 Loss function

For measuring the goodness of fit, we use a loss function with two terms—a shape and temporal term to try and capture the nonlinearities overtime when predicting housing price trends. This loss function is a modification of DILATE (Vincent and Thome 2019) which was introduced as a loss function for neural networks for time series predictions, although DILATE has been simplified here (with the removal of smoothing parameters) as Bayesian optimisation does not require the loss function to be differentiable.

The loss function is given in Eq. (E.1).

$$\begin{aligned} \ell = \lambda * \hbox {shape} + (1 - \lambda ) * \hbox {temporal} \end{aligned}$$

\(\lambda =0.5\) was used throughout since this was the most common in the original paper of Vincent and Thome (2019). However, as the terms are not normalised, the two do not have an equal contribution; instead, the temporal term serves more like a penalty on the shape (with \(\lambda \) controlling the strength of the penalty).

The shape term is based on dynamic time warping (DTW) which has commonly been used in speech recognition tasks (Sakoe and Chiba 1978; Myers et al. 1980) and, however, has a wide range of applications in time series data (Berndt and Clifford 1994). Dynamic time warping can be expressed recursively as a minimisation problem as in Eq. (E.2)

$$\begin{aligned} \hbox {shape} = \hbox {DTW} = d(x,y) + \min \begin{bmatrix} \hbox {DTW}(x-1, y), \\ \hbox {DTW}(x-1, y-1), \\ \hbox {DTW}(x, y-1) \end{bmatrix} \end{aligned}$$

This can be read as minimising the cumulative distance (using distance measure d, in this case, euclidean distance) on some warped path between x and y, by taking the distance between the current elements and the minimum of the cumulative distances of neighbouring points.

Unlike the common applications in speech recognition, where words can be spoken at varying speeds (so the peaks do not necessarily match up), in financial markets, timing such peaks is important. This motivates the introduction of a temporal term, for trying to align such peaks and dips. The temporal term is based on Time Distortion Index (TDI) (Frías-Paredes et al. 2016, 2017), which can be thought of as the normalised area between the optimal path and the identity path (where the identify path is (1, 1), (2, 2).., (NN)) (Vallance et al. 2017) and aims to minimise the impact of shifting and distortion in time series forecasting (Frías-Paredes et al. 2016).

$$\begin{aligned} P_\mathrm{l}&= \int _{i_{l}}^{i_{l+1}} \left( x-\frac{(x-i_l) (j_{l+1} - j_\mathrm{l})}{(i_{l+1}-i_{l})} + j_{l} \right) \hbox {d}x \end{aligned}$$
$$\begin{aligned} \hbox {temporal}&= \hbox {TDI} = \frac{2 \sum | P_{l}}{N^2} \end{aligned}$$

To see the usefulness over a more standard approach loss function such as MSE for time series, consider the example in Fig. 10. We can see the MSE can be a problematic approach, and in some cases (as in the example where the linear line Fig. 10b has a lower loss) be a misleading measure of goodness of fit. DTW helps to match points in the two time series, while TDI helps minimise the offset of the predictions. (Graphically, in the example this corresponds to shortening the dotted grey lines.) For a full analysis, we refer you to the original DILATE paper of Vincent and Thome (2019), noting that all smoothing terms have been removed in the modification here.

Fig. 10
figure 10

Motivation of time series-based loss using a constructed example. We can see the line on the right is a very poor predictor of the true trend, failing to capture any of the peaks or dips. However, the MSE is significantly lower than the line on the left. DTW captures the shifts, and incorporating a penalty on time can penalise these shifts. The light grey lines show how DTW matches points together, even if they do not occur at the same time period

Fig. 11
figure 11

Global constraint process

1.3 E.3 Global constraints

We can see 2011–2015 and 2016–2019 fit the trend very closely, although despite having a low loss, the 2006–2010 simulation path does not follow the dip well, as no distinction is made about being above or below the trend in the loss function. Looking at the individual paths from every run, we can see that a peak and dip is predicted in many of the cases, although the distance is greater than the path with the lowest loss which was perfectly matching across a large portion of the training data but missing the dip. We apply a post-optimisation global constraint to 2006–2010, again only using this training period, that the midpoint of the simulation must be higher than the start and ending points (i.e. a peak must occur), and take the parameters with the lowest loss matching this criterion. The process is shown in Fig. 11 and the result is shown in Fig. 5p. We can see for 2006–2010, the \(\ell \) is higher than before the constraint, however, clearly, the constraint allows for a closer overall trend following in the training period. The visualisation in Fig. 11 can also begin to show the wide range of possible market outcomes, for various combinations of the parameters. If a certain section occurs from many parameter outcomes (i.e. with the peak), we can deduce that such dynamics were likely to occur just due to the agent characteristics, regardless of the parameters used. This shows many combinations lead to a peak and dip, perhaps due to mortgage rates and worrying mortgage vs income ratios. This is more in line with suggestions in Edmonds and ní Aodha (2018), which suggest ABMs be used to determine a range of potential future outcomes, which in this case shows a variety of paths leading to a peak and dip.

Table 3 Three tunable hyperparameters

1.4 E.4 Parameter space

The parameter space is defined in Table 3.

Even though there are only three parameters to tune, the number of potential combinations exceeds 4 million (this is assuming values are discretised values, so the true number is far greater), making a grid search impractical.

The three parameters are h, \(\alpha \), and \(\beta \).

Fig. 12
figure 12

Search space exploration. Colour indicates the loss

Fig. 13
figure 13

Parameter interactions and parameter sampling

F: Sensitivity analysis

While in Sect. 5.1 we analysed the contribution of each new component by comparing the resulting optimised time series after introducing the components one at a time, here we verify and rank the importance of each of the contributions explicitly using global sensitivity analysis (GSA).

Specifically, we analyse the importance of the trend-following aptitude (h), the social contribution (\(\beta \)), and the role of \(\alpha \) in minimising the loss function.

We use the Morris method (Morris 1991) for a GSA and present the revised \(\mu ^*\) as suggested in Saltelli et al. (2004) and \(\sigma \). \(\mu ^*\) represents the mean absolute elementary effect and can be used to rank the contribution of each parameter, and this solves the problem of \(\mu \) where elementary effects can cancel out. We also analyse \(\sigma \), i.e. the standard deviation of the elementary effects, as a measure of the interactions.

For parameters for the Morris Method, we use \(r=20\) trajectories, \(p=10\) levels, and step size \(\Delta =p/[2(p-1)]\), i.e. \(\Delta \approx 0.52\) with \(p=10\). These are within the range of commonly used parameters, e.g. in Campolongo et al. (2007).

The results are presented in Table 4, and visualised in Figs. 15 and 14.

Table 4 Morris method for sensitivity analysis

Checking the importance of each parameter, or \(\mu ^{\star }\), we can see h consistently ranks the most important, showing its changes have the largest effect on \(\ell \). This is followed in importance by \(\beta \), and then \(\alpha \) each year. However, we see that confidence bars do overlap in Fig. 14.

Viewing the Morris plots in Fig. 15, we can see all parameters are deemed important, where unimportant parameters would show up in the bottom leftmost portion of the plot. Using the classification strategy of Sanchez et al. (2014), all parameters are all considered to be non-monotonic and/or with high levels of interaction, since \(\frac{\sigma }{\mu ^{\star }} > 1\) in all cases.

This analysis agrees with the preliminary parameter analysis in Sect. 5.2.

Fig. 14
figure 14

Importance plot showing \(\mu ^{\star }\). Error bars are displayed at the 95% confidence level

Fig. 15
figure 15

Global sensitivity analysis with Morris plots. Diagonal lines represent the ranges for \(\sigma / \mu ^{\star }\). One classification strategy proposed by Sanchez et al. (2014) says factors which are almost linear should be below the 0.1 line, factors which are monotonic between 0.1 and 0.5 lines, or almost monotonic between the 0.5 and 1 line, and factors with non-monotonic nonlinearities or interactions with other factors above the 1 line

While the Morris method gives us the overall sensitivity across the parameter ranges (in a global way) and allows us to rank the factors in terms of importance, we also provide a fine-grained sensitivity analysis around the default values, i.e. a local sensitivity analysis (LSA). For this, we use \(p=100\) levels, but vary only one parameter at a time while keeping the others fixed at their default values. This is shown in Fig. 16. This analysis shows how robust the resulting default values are to small perturbations, but as this is a local method, the results should be interpreted with caution (and only in conjunction with the GSA method above), since this does not account for any parameter interactions as warned in Saltelli et al. (2019).

Viewing h (the left column), we can see all values surrounding the default have a similar loss, showing the model is robust to small changes in the aptitude. Looking across the entire search space, we can see choosing from within an appropriate range for the aptitude is important though, but the surrounding parameters are always relatively smooth to the resulting loss. Viewing \(\beta \) (the middle column), we can see the sharp transition above zero. There is a clear optimal range for \(\beta \), where the default lies. However, again, the area surrounding the default values is smooth showing robustness to the default parameters (assuming we do not vary past the sharp transition). Looking at \(\alpha \) (the final column), the plots initially seem somewhat jagged, although when looking at the scale of the y-axis it becomes clear these are very small shifts in loss (as verified by the plotted time series with varying \(\alpha \) levels). \(\alpha \) was deemed the least important of the three parameters by the Morris method screening, but was still important based on the positioning on the Morris plot. We can verify this here, where changes in \(\alpha \) do not have a huge impact on \(\ell \).

Fig. 16
figure 16

Univariate LSA of default parameters, varying one factor at a time with others at their optimised values. The plots give the change in parameter value (x-axis) versus \(\ell \) (y-axis). The dotted vertical black line shows the optimised value

GSA was performed using SALib from Herman and Usher (2017).

G: Network topology

In Sect. 4, we introduced a novel graph-based structure for representing the region, at the same time introducing spatial submarkets into the simulation (based on nodes in the graph). In Sect. 5.1 and “Appendix F”, we have validated the usefulness of the newly introduced parameters and performed a sensitivity analysis of the parameters, whereas in this section, we look to validate the usefulness of the network structure itself.

To do this, we compare the newly proposed model (with all parameters included), against an identical model with only a single node. We also compare to a fully connected network, i.e. where the spatial element (in terms of neighbourhoods) is not considered directly, but specific areas still exist. These structures are visualised in Fig. 17.

Fig. 17
figure 17

Various potential network architectures. The left represents a single node (i.e. no spatial component). The middle is the proposed graph-based approach constructed from the topological layout of the region. The right is a complete graph, with individual areas, but no concept of spatial neighbours (due to the fully connected nature)

1.1 G.1 Topologies

1.1.1 G.1.1 No spatial component

To remove submarkets and all spatial components, we use a graph composed of a single node representing the overall Greater Sydney region. That is, agent characteristics and dwelling prices are assigned based on the overall Greater Sydney distributions, rather than specific area distributions. To implement this, rather than G being defined as in Sect. 4.1.1, instead, G contains a single node (i.e. a singleton graph) where the node represents the overall Greater Sydney region, i.e. it is the graph \(K_{1}\). With this single-node configuration, the spatial outreach from Eq. (4.3) is removed, as there is no concept of space. Likewise, \(\beta \) is no longer defined, as this is expressed in spatial terms. However, \(\alpha \) remains, which controls the boundedness of the agent as discussed in “Appendix L”, and h keeps the same interpretation.

1.1.2 G.1.2 Fully connected graph

To represent the fully connected areas, we use a complete graph where every LGA is connected to every other LGA, i.e. \(G=K_{38}\). Again, outreach [from Eq. (4.3)] need not be considered, since now every area is directly connected to one another. With this representation, the introduced parameters still remain (i.e \(\alpha , \beta , h\)). Note that \(\alpha \) again directly corresponds to the boundedness (discussed in “Appendix L”) and does not encompass outreach. \(\beta \) and h have no change to their original meanings introduced in Sect. 4. This introduces individual submarkets (based on LGAs) into the simulation, but does not enforce any spatial-based search costs within the market. As in the proposed approach, agent calibration is also based on the area in which they reside, so agent characteristics match that of their area.

1.1.3 G.1.3 Analysis

The resulting comparisons are visualised in Fig. 18 which shows the models which include individual areas significantly outperforming the overall Greater Sydney model, indicating the usefulness of area-specific submarkets. Between the two area models, there was little difference in aggregate performance (as shown in Fig. 18) which shows the performance improvements come mainly from the introduction of submarkets, not necessarily on the overall spatial structure. This shows that \(\beta \), which is based on the individual areas, initialising agent characteristics based on location, and the modification of price setting in terms of \(\overline{Q}_\mathrm{h}\) based on the area are key for resulting prices (more so than the spatial outreach costs).

Fig. 18
figure 18

Various network architectures (visualised as the mean ± standard deviation range), the actual trend is given in orange. We see the individual submarket methods perform well, whereas in this case the single area (meaning no submarkets) method fails to adequately capture the overall trends

However, when considering the resulting agent preferences (in terms of suburbs to purchase in), the fully connected map had significantly more people moving to more remote regions, whereas the LGA connected spatial topology prevented as drastic movements (with the spatial outreach term), capturing the fact people are often tied to specific areas (i.e. those who work in the CBD and currently reside near there, are unlikely to the outskirts of the Greater Sydney region). This is visualised in Fig. 19, where we show the proposed movements follow a diffusive-like pattern (Slavko et al. 220b) compared to the fully connected topology which had movements to more remote regions of Greater Sydney. We further discuss the resulting movements Sect. 6.3, where we show with the agent movement patterns with the proposed spatial structure are logical and consistent with the actual reported movements in the Greater Sydney region based on the observed trends reported in the recent literature.

Fig. 19
figure 19

Variations in agent preferences with the proposed spatial structure (left) and without a spatial structure enforced (right) from first-time home buyers situated in the Canterbury–Bankstown region (visualised in pink and labelled). The colour indicates the percentage of purchases in those areas, with dark indicating a high percentage of relative purchases (controlled for population sizes). Without a preserved spatial structure we see much higher rate of purchases on the outskirts of the Greater Sydney region, such as in Upper Lachlan Shire and Lithgow (labelled). We verify the proposed movements are logical in Sect. 6.3 (colour figure online)

In this section, we have shown the area-specific submarket extensions significantly outperform an equivalent model which does not include individual areas, highlighting the importance of capturing submarkets. Furthermore, the incorporation of area-specific submarkets allows for additional insights (such as those in Sect. 6.3.1 which would not otherwise be possible). We then further validated the choice of the network topology by comparing resulting agent preferences, and showing the proposed architecture (with spatial-based search costs) prevents drastic movements by the agent (in terms of relative distance from an agent’s current location), allowing the agents to act based on their location (preferring closer areas) in a manner consistent with the actual observed trends as discussed in Sect. 6.3.

H: Experiment settings

Due to the stochastic and non-deterministic nature of ABMs, we run 100 Monte Carlo simulations per run (unless otherwise stated) and report the aggregate results over all runs to get a robust estimate.

1.1 H.1 Scale

Experiments are run at a 1:100 scale of the true housing market, i.e. every one hundred households in the Greater Sydney region are represented by one household in the model. The 1:100 scale was chosen for efficiency, but results for 1:50, 1:100, and 1:200 are also presented in Fig. 20 to show robustness to scale. There is an upper limit on the scale where the performance will begin to degrade, for example, the number of overseas investments is given in Table 5, and by using a scale close to 1:1000, we would lose the contribution of foreign investments (since the values would be \(<1\)). For lower scales (i.e. 1:1), the results may be more accurate but this comes at the expense of increased computational power, so the 1:100 provided a good trade-off between accuracy and efficiency.

Fig. 20
figure 20

Robustness of scale over the training phases (visualised as the mean ± standard deviation range for varying scales). We see with varying scales similar results are recovered (with large areas of overlap), indicating in-variance to the scales used (within acceptable bounds)

I: Initialisation data

1.1 I.1 Data

All real estate listings and sales from 2006 to present (2020) were used from SIRCA–CoreLogic, including the sale price, LGA, and sale date. These data are used as the actual price, and to calibrate the ABM.

1.2 I.2 Spatial initialisation

1.2.1 I.2.1 Pricing distributions

Between LGAs, there is a wide range of dwelling sale prices, and different distributions of prices amongst the LGAs as well. To sample from this effectively, we use kernel density estimation (KDE) to create a probability density function for each LGA for each time period. The previous 3 months of sales from the beginning of the time period are used to generate the density function. Scott’s Rule (Scott 2015) is used to assign the bandwidth, which sets the bandwidth to \(n^{\frac{-1}{d+4}}\), where n is the number of data points (in this case dwelling sales in the LGA at the beginning of the time period), and d is the number of dimensions (in this case \(d=1\)). When new houses are created for an area, they are set with an initial quality based on this distribution. The resulting KDEs are shown in Fig. 21.

Fig. 21
figure 21

KDE plots for each LGA based on SIRCA–CoreLogic data. Dark red indicates the Greater Sydney average, and this is assigned to LGAs without enough data to generate their own reliable KDE

1.2.2 I.2.2 Positioning

Households are not assigned to an LGA directly, as households can freely move areas. Instead, the households area is based on the residential dwelling of the household (and thus can change over time). When we reference a households area, we are referring to the LGA of the dwelling where the household currently resides.

At the beginning of the simulation, households which are homeowners are assigned to dwellings to match the population distribution amongst LGAs. The income and liquid wealth for the household are then assigned based on the brackets from the dwellings LGA. Renters are assigned a random LGA to begin with (again weighted by the population of each LGA) and income and wealth based on the distribution of that LGA. Households then try and find a rental they can afford (on with a rental price approximately 10–30% of the household’s income) which may mean some have to move LGAs.

1.3 I.3 Time periods

In line with the previous work of Glavatskiy et al. (2020), and following the Australian census timelines (which are performed every 5 years), we choose the three most recent census periods for analysis. These are 2006–2010, 2011–2015, and 2016–2019. The length was chosen such that upon new census information becoming available, a new simulation is run. Meaning, a separate model (and optimisation process) is run for each of the time periods to ensure the model is calibrated to the most recent data available. In doing so, we ensure the agent characteristics of the model most closely match those in the true Greater Sydney market. As each period corresponds to the census years, there is a large array of available data for calibration to ensure the models begin in a state as close to possible as the true populations state. Alternate (non-census) dates could be used; however, the model may not begin with as accurate of a reflection on the true underlying agent characteristics (depending on the data availability). While the models are calibrated for the time periods outlined here, such calibrations would also work well for surrounding dates (or alternate run duration’s), or the model could be re-calibrated for alternative dates to provide additional forecasting—for example, after the 2021 census, the agent characteristics could be reassigned and a new optimisation process run to reflect updated agent behaviours, likewise for past market behaviour such as with the 2001 census.

1.4 I.4 Household characteristics

1.4.1 I.4.1 Area

Agents are initialised into an area based on the Australian census data, meaning the population of each area at the beginning of the simulations corresponds to the proportions from the census data for that time period. For example, if there are three areas “A”, “B”, “C”, and the true proportions in each are 60:25:15, the model will also populate agents into the three areas according to this proportion. Throughout the simulation, agents may move areas. They may be forced to move to a cheaper area if they cannot afford their current area, or they may move to a more affluent area if they can afford a dwelling there. So once the simulation begins, the movement dynamics are controlled by the agents’ cash flow position (again from census data, outlined below). Initialising agents into areas based on census data allows for correct agent characteristics (such as income and net worth) that directly line up with those observed throughout the Greater Sydney region.

1.4.2 I.4.2 Income

Income is assigned from the distribution based on the households area. This distribution comes from the census data. Income grows throughout the simulation. The income brackets follow those specified in the census data.

1.4.3 I.4.3 Liquid wealth

Again, the liquid wealth (liquidity) of a household is based on the true distributions from census data. However, in this case, liquidity is not available per LGA, only for Greater Sydney as a whole. So to map a household to an appropriate liquidity bracket, the households liquid is based on the income of the household. That is, if a household is in the top X% of earners in an LGA, the liquidity will be in the top X% as well (approximately, since liquidity is from brackets).

1.5 I.5 Population distribution

In this case, there are three measures of interest. The total number of dwellings, the total number of households, and the distribution of these households amongst LGAs. The dwellings and households estimates from the census data are used for each year, and simple linear projections used for forecasting the growth of these. The distribution amongst LGAs is that recorded at the start of the simulation and is assumed to grow linearly with the overall population size. Individual LGA future population projections are available from 2016 onward, but as no projections existed before this date, we used this simplified measure instead of all LGAs growing by a fixed percentage within a given simulation period. As such, higher movements towards one particular LGA throughout simulation could indicate the requirement of additional dwellings being built here to cater for the growth, which is another contribution we consider in later sections of this work.

J: Movement pattern visualisations

Over 10 million total movements were tracked across the simulations (approximately 3.3 million per time period). All plots in this section represent the normalised heatmaps of these movements. The total number of movements to a particular LGA is scaled by the population size of this LGA, meaning the results can be interpreted as a preference for certain areas rather than visualising the population size of the LGAs. Therefore, movements are not just reflecting larger populations, instead, reflecting a larger portion of people moving there relative to the size. All movements are then normalised such that the summation of all cells in the plot is 1, meaning if a particular cell has a value of 0.05, this means 5% of all matched movements moved to this LGA.

The rows and columns of the plots are always sorted in ascending order based on median price, i.e. the most affordable LGAs first, and the most expensive LGA as the final row or column.

Fig. 22
figure 22

Migrations. These plots capture new households in Greater Sydney throughout the simulation period, due to either migration or splitting of existing households. The first row is the 2006–2010 period, the middle row the 2011–2015 period, and the final row the 2016–2019 period

Fig. 23
figure 23

Investors. These plots show the simulation difference between local and overseas investment patterns. The first row is the 2006–2010 period, the middle row the 2011–2015 period, and the final row the 2016–2019 period

Fig. 24
figure 24

First-time home buyers. The first row is the 2006–2010 period, the middle row the 2011–2015 period, and the final row the 2016–2019 period

K: Exogenous variables

There are two main external influences on the model, which are governed by government approvals (in the case of overseas investments) and the central bank (in the case of mortgage rates).

1.1 K.1 Overseas investors

Overseas investments are often cited as a key driver of price growth in the Australian market (Rogers et al. 2017), and figures show the foreign investment has more than tripled since the mid-1990s (Haylen 2014). However, actual data on foreign investments are difficult to find. ABS has described their own data on overseas investments to parliament as “hit or miss” Iggulden (2014).

The purpose of this work is not a full investigation into overseas investments [overviews are given in Gauder et al. (2014), House of Representatives Standing Committee on Economics (2014)], but rather the contribution overseas might have in relation to many other factors with the readily available data (be this complete or not).

For this, we use the annual reports from the Foreign Investment Review Board (FIRB) from June 2006 to June 2019. The June 2019–June 2020 report was not available at the time of this writing (in 2020), as reports are not made available until the following year. Data are provided yearly at a NSW level, which is converted to monthly (simply dividing by 12). Again, data in this area are sparse, so this is the closest estimate we could derive. These data are provided in Table 5, and the average approval per year given in Fig. 25.

Fig. 25
figure 25

Average overseas approval amount

Table 5 Overseas investment approval

While the data are provided for the entirety of NSW, it has been shown that foreign investors prefer the inner city over rural areas, and thus, the NSW levels have been used for Greater Sydney. This is a fair assumption since the numbers are relatively conservative anyway. For the testing period, the most recent overseas approval value from the training period is used.

1.2 K.2 Mortgage rates

Mortgage rates are those set by the RBA. The final training months mortgage rate is used throughout the testing period since no real value can be read.

L: Utility function

Following Axtell et al. (2014), agents are assumed to choose the most expensive house they can afford, that is, the house price directly corresponds to the utility for the agent.

Fig. 26
figure 26

\(\alpha \)’s effect on utility maximising behaviour

However, the introduction of \(\alpha \) alters this, such that there is some uncertainty or error in the agents choice. When \(\alpha =1\), the perfect utility maximisation behaviour is recovered where the agent attempts to purchase the most expensive dwelling they can afford. For \(\alpha < 1\), the agent buys the most expensive dwelling they can afford with probability \(\alpha \), which then decreases for each subsequent listing in turn. This is visualised in Fig. 26. For high \(\alpha \), we can see the probability mass is contained only in the highest priced dwellings. For lower \(\alpha \), this probability mass becomes more distributed, meaning less focus on utility, and potential for cheaper houses to be purchased. For \(\alpha =0\), the utility is not considered at all and a random house within the agents budget is chosen (i.e. the probability mass is uniform across options). \(\alpha \), therefore, corresponds to the boundedness of the agent.

The above description considers the case of uniform knowledge, i.e. for investors where they are assumed to be invariant to the areas available. However, for first-time home buyers, we propose a space-based knowledge where buyers are more likely to consider listings close to where they are renting. The probability associated with the distance to the agents’ location is visualised in Fig. 27. The uniform knowledge of investors is given in green, and the spatial knowledge of first-time home buyers is given as the dotted black line.

Fig. 27
figure 27

Probability of viewing based on distance to agents location

Fig. 28
figure 28

First-time home buyers probability of viewing a listing for various \(\alpha \)’s. Showing the relationship between dwelling price (x-axis) and distance to dwelling (y-axis), and how \(\alpha \) adjusts this distribution. Low \(\alpha \)’s correspond to higher dispersion, and less focus on utility maximising behaviour. High \(\alpha \)’s focus the agent on dwellings which maximise utility

For first-time home buyers, the probability of viewing a listing is therefore controlled by both the proximity of the listing to the agents current (rental) location, and the price of the listing. This is visualised in Fig. 28. For \(\alpha =0\), the agent preference is uniform across all choices, placing no emphasis on utility (from either price or difference). As \(\alpha \) increases, the focus shifts to the more expensive dwellings, and does so based on the distance to the listing. This is shown in Fig. 28, where with increasing \(\alpha \) the emphasis focuses on the top right corner, which is the optimal value for both distance (closest) and price (most expensive in the agents budget). We can see that price remains the most important term in the agents’ utility though, with close listings with low prices having a low resulting probability, indicating the agent likely wants to move to a more affluent area if they can afford to do so. However, given an equal price, agents will prefer the closer listing.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Evans, B.P., Glavatskiy, K., Harré, M.S. et al. The impact of social influence in Australian real estate: market forecasting with a spatial agent-based model. J Econ Interact Coord 18, 5–57 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


JEL Classification