Forecasting building permits with Google Trends

Coble, David; Pincheira, Pablo

doi:10.1007/s00181-020-02011-1

Forecasting building permits with Google Trends

Published: 26 January 2021

Volume 61, pages 3315–3345, (2021)
Cite this article

Empirical Economics Aims and scope Submit manuscript

474 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

We propose a useful way to predict building permits in the USA, exploiting rich data from web search queries. The relevance of our work relies on the fact that the time series on building permits is used as a leading indicator of economic activity in the construction sector. Nevertheless, new data on building permits are released with a lag of a few weeks. Therefore, an accurate nowcast of this leading indicator is desirable. In this paper, we show that models including Google search queries nowcast and forecast better than many of our good, not naïve benchmarks. We show this with both in-sample and out-of-sample exercises. In addition, we show that the results of these predictions are robust to different specifications, the use of rolling or expanding windows and, in some cases, to the forecasting horizon. Since Google queries information is free, our approach is a simple and inexpensive way to predict building permits in the USA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Timely Indices for Residential Construction Sector

Metropolitan Hotel Sector Forecast Accuracy in El Paso

Article 30 May 2019

Spatial panel count data: modeling and forecasting of urban crimes

Article Open access 21 February 2022

Notes

For example, Strauss (2013) finds that building permits outperform other standard leading indicators of overall economic activity, such as interest rates and oil prices in most US states.
In the USA, the federal agency in charge of collecting these data from granting government agencies is the US Census Bureau, which provides a monthly estimate through the Building Permits Survey. See more information in the data section.
Arouba and Diebold (2010), for example, stressed the importance of having higher frequency, real-time data to monitor macroeconomic variables. Also, the term nowcasting—which was coined by Giannone et al. (2008)—was introduced in the literature to refer to their methodology to update forecasts of lower-frequency variables, such as quarterly GDP, as higher-frequency relevant information appears, such as monthly industrial production.
Naturally, Google Trends has also been used in other research areas, such as oil spending (Yu et al. 2019) youth unemployment (Naccarato et al. 2018) and macroeconomics, for example to forecast inflation and consumer confidence (Niesert et al. 2019), just to name a few. For a review of the use of Google Trends in research during the last decade, see Jun et al. (2018).
We are only interested in the aggregated number of building permits in the USA, which present no missing data.
To see the exact dates of data releases, see https://www.census.gov/construction/bps/schedule.html.
D’Amuri and Marcucci (2017) clarify this point. They present the following equations for calculating the Google search index (GI):
- The search participation of a certain term in a day (d) and geographical location (r) is given by the number of searches of the term (\(V_{d,r}\)), divided by the total number of searches (\(T_{d,r}\)). Therefore, the daily relative searches of a certain term is \(S_{d,r} = \frac{{V_{d,r} }}{{T_{d,r} }}\).
- The relative weekly searches of the term are calculated as a simple average of the daily searches: \(S_{T,r} = \frac{1}{7}\mathop \sum \nolimits_{{d = {\text{Sunday}}}}^{{{\text{Saturday}}}} S_{d,r}\).
Google also scales the index as follows: \({\text{GI}}_{T,r} = \frac{100}{{{\text{max}}_{t} \left( {S_{T,r} } \right)}}S_{T,r}\). D’Amuri and Marcucci (2017) interpret GI as the probability of a random user of the r location searching in Google for a particular term during a week.
It is also important to mention that D’Amuri and Marcucci (2017) show that the effects of sampling errors in Google Trends are quite negligible when applied to unemployment data.
Examples of this literature are, to name a few, Ginsberg et al. (2009) who select 45 queries over 50 million search terms using out-of-sample goodness of fit for illness data; and Scott and Varian (2014) who use Bayesian methods to automatically select predictors of initial claims and retail sales.
Of course, it is possible to use both simultaneously: for example, using the first method to narrow down some terms, and use judgment to discard terms that are most likely spurious. Examples of this approach are Fondeur and Karamé (2013), Choi and Varian (2012) and D’amuri and Marcucci (2017).
For our preferred search queries, we find high correlations between building permits and each of these variables, with and without seasonal adjustment. The lowest correlation is 0.86 for the query “new housing development” while the highest is 0.96 for the seasonally adjusted query “real estate exam.”
See a complete list of requisites to become a realtor in https://www.kapre.com/resources/real-estate/how-to-become-a-real-estate-agent.
Notice that estimates of the drift terms are removed from Table 7.
The penalty for the number of parameters is much higher with BIC than with AIC in estimation windows of 50 observations.
Here, \(\gamma \left( L \right)\) and \(x_{t}\) are defined as in expression (4).
Table 8 in “Appendix” shows estimates and diagnostic statistics of models (10) and (12) for building permits and the four different Google search queries under consideration. We have removed, again, estimates of the drift terms. We observe that our SARIMA specifications seem to offer a better representation of the data relative to our models (3) and (5). In particular, all the coefficients shown in Table 8 are statistically significant at usual levels, the Schwarz criterion are lower in Table 8 relative to the comparable figures in Table 7 and also the Durbin Watson statistics are closer to 2. This last point is indicating that SARIMA specifications seem to be more successful at removing the excess of first order autocorrelation in the errors relative to our simple specifications in (3) and (5). Finally, while the coefficients of determination show an important degree of heterogeneity, relative to our univariate linear specifications, we observe that SARIMA models tend to produce a higher coefficient of determination for our Google search queries, and a slightly lower one for building permits. This is the only aspect in which the basic linear model in (3) seems to be slightly better than the SARIMA specification in (9) and (11).
Notice that for estimation of our models we only use a total of R observations both for building permits and Google Trends. The extra observation of Google Trends is used only in the generation of nowcasts and forecasts.
When recursive or expanding windows are used instead, the size of the estimation window grows with the number of available observations for estimation. For instance, the first nowcast is constructed estimating the models with R observations, whereas the last nowcast is constructed estimating the models with T observations.
Simulation evidence carried out by Clark and McCracken (2013) and Pincheira and West (2016) show that normal critical values tend to work well when multistep-ahead forecasts are constructed using the iterative method, at least when the data generating process is not very persistent. This is very important because in this paper we use the iterative method for the construction of multistep-ahead forecasts.
We use the word “lags” in parenthesis because we are also including in expression (1) contemporaneous terms of the search queries.
Let us recall that in nested environments the CW test removes a term that should be zero in population under the null hypothesis, but that is not zero in finite samples. Tables 4, 13, 14, and 15 corroborate this prior as the corresponding t-statistics of the GW/DMW test are always lower than the comparable t-statistics of the CW test.
The pairwise Pearson correlations of the 15 series for “real estate exam” fluctuate between 0.97 and 0.99. For "new construction," all correlations are at least 0.99. For "new housing development,” the correlation factors are between 0.90 and 0.96. Finally, the correlations for “new home construction” fluctuate between 0.97 and 0.99.

References

Ang A, Piazzesi M, Wei M (2006) What does the yield curve tell us about GDP growth? J Econom 131(1–2):359–403
Article Google Scholar
Arouba S, Diebold F (2010) Real-time macroeconomic monitoring: real activity, inflation, and interactions. Am Econ Rev 100(2):20–24. https://doi.org/10.1257/aer.100.2.20
Article Google Scholar
Askitas N (2015) Trend-spotting in the housing market. (IZA Discussion Paper No. 9427), Retrieved from http://papers.ssrn.com/abstract=2675484.
Askitas N, Zimmermann K (2011) Detecting mortgage delinquencies with google trends. IZA Discussion Paper 5895.
Beracha E, Wintoki M (2013) Forecasting residential real estate price changes from online search activity. J Real Estate Res 35(3):283–312
Article Google Scholar
Berge TJ, Jordà Ò (2011) Evaluating the classification of economic activity into recessions and expansions. Am Econ J Macroecon 3(2):246–277
Article Google Scholar
Capozza D, Israelsen R (2007) Predictability in equilibrium: the price dynamics of real estate investment trusts. Real Estate Econ 35(4):541–567. https://ssrn.com/abstract=1030895.
Case KE, Shiller RJ (1989) The efficiency of the market for single-family homes. Am Econ Rev 79(1):125–137
Google Scholar
Case KE, Shiller RJ (1990) Forecasting prices and excess returns in the housing market. Real Estate Econ 18:253–273. https://doi.org/10.1111/1540-6229.00521
Article Google Scholar
Chauvet M, Gabriel S, Lutz C (2016) Mortgage default risk: new evidence from internet search queries. J Urban Econ 96:91–111. https://doi.org/10.1016/j.jue.2016.08.004
Article Google Scholar
Choi H, Varian H (2012) Predicting the present with google trends. Econ Rec 88(Suppl 1):2–9. https://doi.org/10.1111/j.1475-4932.2012.00809.x
Article Google Scholar
Clark T, McCracken M (2001) Tests of equal forecast accuracy and encompassing for nested models. J Econom 105(1):85–110. https://doi.org/10.1016/S0304-4076(01)00071-9
Article Google Scholar
Clark T, McCracken M (2013) Evaluating the accuracy of forecasts from vector autoregressions. In: Fomby T, Killian L, Murphy A (eds), Vector autoregressive modeling—new developments and applications: essays in honor of Christopher A. Sims. Emerald Group Publishing Limited, Bingley, United Kingdom
Clark T, West K (2007) Approximately normal tests for equal predictive accuracy in nested models. J Econom 138(1):291–311. https://doi.org/10.1016/j.jeconom.2006.05.023
Article Google Scholar
D’Amuri F, Marcucci J (2017) The predictive power of Google searches in forecasting US unemployment. Int J Forecast 33(4):801–816
Article Google Scholar
Das P, Ziobrowski A, Coulson N (2015) Online information search, market fundamentals and apartment real estate. J Real Estate Finance Econ 51(4):480–502. Retrieved from http://link.springer.com/https://doi.org/10.1007/s11146-015-9496-1.
Diebold F, Mariano R (1995) Comparing predictive accuracy. J Bus Econ Stat 13(3):253–263. https://doi.org/10.1080/07350015.1995.10524599
Article Google Scholar
Fondeur Y, Karamé F (2013) Can Google data help predict French youth unemployment? Econ Model 30(C):117–125
Article Google Scholar
Giacomini R, White H (2006) Tests of conditional predictive ability. Econometrica 74(6):1545–1578. https://doi.org/10.1111/j.1468-0262.2006.00718.x
Article Google Scholar
Giannone D, Reichlin L, Small D (2008) Nowcasting: the real-time informational content of macroeconomic data. J Monet Econ 55(4):665–676. https://doi.org/10.1016/j.jmoneco.2008.05.010
Article Google Scholar
Ginsberg J, Mohebbi M, Patel R et al (2009) Detecting influenza epidemics using search engine query data. Nature 457:1012–1014. https://doi.org/10.1038/nature07634
Article Google Scholar
Jun S-P, Yoo H, Choi S (2018) Ten years of research change using Google Trends: from the perspective of big data utilizations and applications. Technol Forecast Soc Change 130:69–87. https://doi.org/10.1016/j.techfore.2017.11.009
Article Google Scholar
Kouwenberg R, Zwinkels R (2014) Forecasting de US housing market. Int J Forecast 30:415–425
Article Google Scholar
Li X (2018) Nowcasting with Big Data: is Google useful in the presence of other information? Lond Bus Sch. https://www.dropbox.com/s/phrqn9l214hiw1v/20181120LiXinyuanJMP.pdf?dl=0
Marcellino M, Stock J, Watson M (2006) A comparison of direct and iterated multistep AR methods for forecasting macroeconomic time series. J Econom 135(1–2):499–526
Article Google Scholar
McGuckin RH, Ozyildirim A, Zarnowitz V (2007) A more timely and useful index of leading indicators. J Bus Econ Stat 25:110–120
Article Google Scholar
Naccarato A, Falorsi S, Loriga S, Pierini A (2018) Combining official and Google Trends data to forecast the Italian youth unemployment rate. Technol Forecast Soc Chang 130:114-122 https://doi.org/10.1016/j.techfore.2017.11.022
Article Google Scholar
National Association of Realtors (2019) Home buyer and seller generational trends. https://www.nar.realtor/research-and-statistics/research-reports/home-buyer-and-seller-generational-trends.
Newey W, West K (1987) A simple, positive semi-definite, heteroskedasticity and autocorrelationconsistent covariance matrix. Econometrica 55(3):703–708. https://doi.org/10.3386/t0055
Article Google Scholar
Niesert R, Oorschot J, Veldhuisen C, Brons K, Lange R-J (2019) Can Google search data help predict macroeconomic series? Int J Forecast. https://doi.org/10.1016/j.ijforecast.2018.12.006
Article Google Scholar
Oestmann M, Bennöhr L (2015) Determinants of house price dynamics. What can we learn from search engine data?. (No. A15-V3). Beiträge zur Jahrestagung des Vereins für Socialpolitik, Retrieved from https://www.econstor.eu/dspace/bitstream/10419/113198/1/VfS_2015_pid_849.pdf.
Pincheira P, Gatty A (2016) Forecasting Chilean inflation with international factors. Empir Econ 51(3):981–1010
Article Google Scholar
Pincheira P, West K (2016) A comparison of some out-of-sample tests of predictability in iterated multi-step-ahead forecasts. Res Econ 70(2):304–319. https://doi.org/10.1016/j.rie.2016.03.002
Article Google Scholar
Plakandaras V, Gupta R, Gogas P (2015) Forecasting the U.S. real house price index. Econ Model 45:259–267
Article Google Scholar
Rapach D, Strauss J (2009) Differences in housing price forecastability across US states. Int J Forecast 25(2):351–372
Article Google Scholar
Scott S, Varian H (2014) Predicting the present with Bayesian structural time series. IJMNO 5:4–23
Article Google Scholar
Stock JH, Watson MW (1989) New indexes of coincident and leading economic indicators. In: Blanchard OJ, Fischer S (eds) NBER macroeconomics annual. MIT Press, Cambridge, Massachusetts, London, England, pp 351–409
Google Scholar
Strauss J (2013) Does housing drive state-level job growth? building permits and consumer expectations forecast a state’s economic activity. J Urban Econ 73(1):77–93. https://doi.org/10.1016/j.jue.2012.07.005
Article Google Scholar
West K (1996) Asymptotic inference about predictive ability. Econometrica 64(5):1067–1084. Retrieved from www.jstor.org/stable/2171956.
Wu L, Brynjolfsson E (2015) The future of prediction: how Google searches foreshadow housing prices and sales. In: Goldfarb A, Greenstein S, Tucker C (eds) Economic analysis of the digital economy. University of Chicago Press, pp 89–118
Yu L, Zhao Y, Tang L, Yang Z (2019) Online big data-driven oil consumption forecasting with Google trends. Int J Forecast 35(1):213–223
Article Google Scholar

Download references

Acknowledgements

We would like to thank two anonymous referees and participants of workshops at the Central Bank of Chile, Central Bank of Argentina, Universidad de Lima, Peru; Universidad de Santiago and Universidad de Talca, Chile. Erik Hurst, Yan Carrière-Swallow and Felipe Labbé have provided wonderful comments. We are also grateful to Rodrigo Cruz and Montserrat Martí for outstanding research assistance.

Author information

Authors and Affiliations

Medium-Term Forecasting Department, Central Bank of Chile, Agustinas 1180, Santiago, Chile
David Coble
School of Business, Universidad Adolfo Ibáñez, Diagonal Las Torres 2640, Peñalolén, Santiago, Chile
Pablo Pincheira

Authors

David Coble
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Pincheira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Coble.

Ethics declarations

Conflict of interest

Both authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Additional figures and tables

See Fig. 3 and Tables 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 and 16.

1.2 Between-days analysis

During 15 days, we downloaded the series for the four search terms—real estate exam, new construction, new housing development and new home construction—using two different IPs and Google accounts. Four graphs are presented below, one for each Google index that represents the series downloaded for each term according to the IP and day of download. The lines specified in the legend take the form query_ip`i’_`j’, where `i’ and `j’ represent the IP and day, respectively.^{Footnote 22} For example, rex_ip1_2 represents the query for real estate exam, IP1 for day 2 (Figs. 4, 5 6 and 7).

1.3 Intra-day analysis

Li (2018) raises some concerns about possible sampling errors if the series were downloaded in different moments during a day, from different computers (IPs) and Google accounts. To check the stability of the variables we use in this study, we carry out the present intra-day robustness analysis.

We conclude that all the series are highly robust. We find that for the same IP and Google account, each Google index downloaded in the same day is identical, consistent with what Li (2018) reports. However, if we compare the same series for different IPs, although very similar, these are not exactly the same. We run correlations of these series and find that they are almost equal to one. A summary of the correlation analysis can be found in Table 17.

We downloaded Google indices for the four queries—real estate exam, new construction, new housing development and new home construction—eight times a day, during three days, using two different IPs and Google accounts. Then, we calculated the average for the eight versions of each index (which are the identical), for each IP address. Finally, we calculate the correlation of each term between the two different IPs for each day.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Coble, D., Pincheira, P. Forecasting building permits with Google Trends. Empir Econ 61, 3315–3345 (2021). https://doi.org/10.1007/s00181-020-02011-1

Download citation

Received: 20 February 2019
Accepted: 28 December 2020
Published: 26 January 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00181-020-02011-1

Keywords

JEL Codes

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Forecasting building permits with Google Trends

Abstract

Access this article

Similar content being viewed by others

Timely Indices for Residential Construction Sector

Metropolitan Hotel Sector Forecast Accuracy in El Paso

Spatial panel count data: modeling and forecasting of urban crimes

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix

1.1 Additional figures and tables

1.2 Between-days analysis

1.3 Intra-day analysis

Rights and permissions

About this article

Cite this article

Keywords

JEL Codes

Navigation

Forecasting building permits with Google Trends

Abstract

Access this article

Similar content being viewed by others

Timely Indices for Residential Construction Sector

Metropolitan Hotel Sector Forecast Accuracy in El Paso

Spatial panel count data: modeling and forecasting of urban crimes

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix

Appendix

1.1 Additional figures and tables

1.2 Between-days analysis

1.3 Intra-day analysis

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Codes

Search

Navigation