Skip to main content
Log in

A New Appraisal Model of Second-Hand Housing Prices in China’s First-Tier Cities Based on Machine Learning Algorithms

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

The accurate appraisal of second-hand housing prices plays an important role in second-hand housing transactions, mortgages and risk assessment. Machine learning technology, gradually applied to finance and economics, can also be used to upgrade the traditional appraisal methods of second-hand housing. A large number of appraisal indicators and price data on second-hand housing in Beijing, Shanghai, Guangzhou and Shenzhen, four first-tier cities in China, can be obtained by using crawler technology. Then, the geographical location information of second-hand housing can be visualized by GIS technology, and the descriptive text of second-hand housing can be processed by natural language processing. Finally, combined with other numerical and classification indicators, the second-hand housing appraisal model based on a two-tier stacking framework is constructed by using random forest, adaptive boosting, gradient boosting decision tree, light gradient boosting machine and extreme gradient boosting as base models and back propagation neural network as the meta-model. The result of model training shows that the machine learning models improve the accuracy significantly compared to linear multiple regression and spatial econometric models, and the performance of the stacking model is better than that of standalone machine learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Bohanec, M., Kljajić Borštnar, M., & Robnik-Šikonja, M. (2017). Explaining machine learning models in sales predictions. Expert Systems with Applications, 71, 416–428.

    Article  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  Google Scholar 

  • Chen, J. H., Ong, C. F., Zheng, L., & Hsu, S. C. (2017). Forecasting spatial dynamics of the housing market using support vector machine. International Journal of Strategic Property Management, 21(3), 273–283.

    Article  Google Scholar 

  • Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In ACM SIGKDD international conference on knowledge discovery & data mining.

  • Cheng, L., & Huang, C. (2019). Exploring contextual factors from consumer reviews affecting movie sales: an opinion mining approach. Electronic Commerce Research, 1–26.

  • Chiu, D. K. W., Yueh, Y. T. F., Leung, H., & Hung, P. C. K. (2009). Towards ubiquitous tourist service coordination and process integration: A collaborative travel agent system architecture with semantic web services. Information Systems Frontiers, 11(3), 241–256.

    Article  Google Scholar 

  • Das, P., Smith, P., & Gallimore, P. (2018). Pricing extreme attributes in commercial real estate: The case of hotel transactions. The Journal of Real Estate Finance and Economics, 57(2), 264–296.

    Article  Google Scholar 

  • Freund, Y. (1996). Experiment with a new boosting algorithm. In Machine learning: Proceedings of the thirteen international conference, 1996.

  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.

    Article  Google Scholar 

  • Friendly, M. (2002). Corrgrams: Exploratory displays for correlation matrices. American Statistician, 56(4), 316–324.

    Article  Google Scholar 

  • Fu, X., Du, J., Guo, Y., Liu, M., Dong, T., & Duan, X. (2018). A Machine Learning Framework for Stock Selection. arXiv:1806.01743.

  • Genuer, R., Poggi, J., Tuleau-Malot, C., & Villa-Vialaneix, N. (2017). Random forests for big data. Big Data Research, 9, 28–46.

    Article  Google Scholar 

  • Gogas, P., Papadimitriou, T., Matthaiou, M., & Chrysanthidou, E. (2015). Yield curve and recession forecasting in a machine learning framework. Computational Economics, 45(4), 635–645.

    Article  Google Scholar 

  • Graves, A. (2016). Adaptive computation time for recurrent neural networks. arXiv preprint, arXiv:1603.08983.

  • Guo, J., & Qu, X. (2019). Spatial interactive effects on housing prices in Shanghai and Beijing. Regional Science and Urban Economics, 76, 147–160.

    Article  Google Scholar 

  • Heaton, J. B., Polson, N. G., & Witte, J. H. (2017). Deep learning for finance: deep portfolios. Applied Stochastic Models in Business and Industry, 33(1), 3–12.

    Article  Google Scholar 

  • Huang, C., Guo, R., Tang, Z., & Zhang, Z. (2005). Preparation of Zirconia base solid solution nanopowder by exothermal solid-state synthesis. Journal of the American Ceramic Society, 88(6), 1651–1654.

    Article  Google Scholar 

  • Huang, Z., Chen, R., Xu, D., & Zhou, W. (2017). Spatial and hedonic analysis of housing prices in Shanghai. Habitat International, 67, 69–78.

    Article  Google Scholar 

  • Ji, L. V. (2014). Real estate appraisal model and empirical research based on genetic algorithm to optimize neural network. Computer Science.

  • Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 3146–3154.

  • Kesorn, K., & Poslad, S. (2012). An enhanced bag-of-visual word vector space model to represent visual content in athletics images. IEEE Transactions on Multimedia, 14(1), 211–222.

    Article  Google Scholar 

  • Kondylidis, N., Tzelepi, M., & Tefas, A. (2018). Exploiting tf-idf in deep convolutional neural networks for content based image retrieval. Multimedia Tools and Applications, 77(23), 30729–30748.

    Article  Google Scholar 

  • Kulkarni, R., Haynes, K. E., Stough, R. R., & Paelinck, J. H. (2009). Forecasting housing prices with Google econometrics. GMU School of public policy research paper (2009-10).

  • Lee, N., Kim, E., & Kwon, O. (2018). Combining TF-IDF and LDA to generate flexible communication for recommendation services by a humanoid robot. Multimedia Tools and Applications, 77(4), 5043–5058.

    Article  Google Scholar 

  • Li, J., Fan, Q. N., & Zhang, K. (2007). Keyword extraction based on tf/idf for Chinese news document. Wuhan University Journal of Natural Sciences, 12(5), 917–921.

    Article  Google Scholar 

  • Medeiros, M. C., Vasconcelos, G. F., Veiga, Á., & Zilberman, E. (2019). Forecasting Inflation in a data-rich environment: the benefits of machine learning methods. Journal of Business & Economic Statistics, 1–22.

  • Novikov, A., Trofimov, M., Oseledets, I. (2016). Exponential machines. arXiv preprint, arXiv:1605.03795.

  • Park, B., & Bae, J. K. (2015). Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data. Expert Systems with Applications, 42(6), 2928–2934.

    Article  Google Scholar 

  • Taigel, F., Tueno, A. K., & Pibernik, R. (2018). Privacy-preserving condition-based forecasting using machine learning. Journal of Business Economics, 88(5), 563–592.

    Article  Google Scholar 

  • Tanuwijaya, S., & Ohno, Y. (2010). TF–DF indexing for mocap data segments in measuring relevance based on textual search queries. The Visual Computer, 26(6), 1091–1100.

    Article  Google Scholar 

  • Theebe, M. A. J. (2004). Planes, trains, and automobiles: The impact of traffic noise on house prices. The Journal of Real Estate Finance and Economics, 28(2), 209–234.

    Article  Google Scholar 

  • Turney, P. D., & Pantel, P. (2010). From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research, 37(1), 141–188.

    Article  Google Scholar 

  • Varian, H. R. (2014). Big data: New tricks for econometrics. Journal of Economic Perspectives, 28(2), 3–27.

    Article  Google Scholar 

  • Węckowski, D. G. (2013). Crawling data-intensive web sources using structure information. In W. Abramowicz (Ed.) (pp. 196–207). Berlin: Springer.

  • Wolfe, E., Spekkens, R.W., & Fritz, T. (2019). The inflation technique for causal inference with latent variables. Journal of Causal Inference, 7(2).

  • Wu, L., & Brynjolfsson, E. (2015). The future of prediction: How Google searches foreshadow housing prices and sales. In Economic analysis of the digital economy (pp. 89–118). University of Chicago Press.

  • Yao, J., Chen, J., Wei, J., Chen, Y., & Yang, S. (2019). The relationship between soft information in loan titles and online peer-to-peer lending: Evidence from RenRenDai platform. Electronic Commerce Research, 19(1), 111–129.

    Article  Google Scholar 

  • Yu, Z., Niu, Z., Tang, W. H., & Wu, Q. (2019). Deep learning for daily peak load forecasting: A novel gated recurrent neural network combining dynamic time warping.

  • Zhou, L., Shi, L., & He, Y. (2016). Review and prospect of real estate assessment in the background of artificial intelligence. Housing & Real Estate, 20, 51–57.

    Google Scholar 

  • Zhu, J., Xie, B., Luo, X., Fan, X., Zeng, W., & Zheng, C. (2016). Net-mediated public opinion analysis of China’s real estate. Journal of Applied Statistics and Management, 35(04), 722–741.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lulin Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, L., Li, Z. A New Appraisal Model of Second-Hand Housing Prices in China’s First-Tier Cities Based on Machine Learning Algorithms. Comput Econ 57, 617–637 (2021). https://doi.org/10.1007/s10614-020-09973-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10614-020-09973-5

Keywords

Navigation