Abstract
The accurate appraisal of second-hand housing prices plays an important role in second-hand housing transactions, mortgages and risk assessment. Machine learning technology, gradually applied to finance and economics, can also be used to upgrade the traditional appraisal methods of second-hand housing. A large number of appraisal indicators and price data on second-hand housing in Beijing, Shanghai, Guangzhou and Shenzhen, four first-tier cities in China, can be obtained by using crawler technology. Then, the geographical location information of second-hand housing can be visualized by GIS technology, and the descriptive text of second-hand housing can be processed by natural language processing. Finally, combined with other numerical and classification indicators, the second-hand housing appraisal model based on a two-tier stacking framework is constructed by using random forest, adaptive boosting, gradient boosting decision tree, light gradient boosting machine and extreme gradient boosting as base models and back propagation neural network as the meta-model. The result of model training shows that the machine learning models improve the accuracy significantly compared to linear multiple regression and spatial econometric models, and the performance of the stacking model is better than that of standalone machine learning models.
Similar content being viewed by others
References
Bohanec, M., Kljajić Borštnar, M., & Robnik-Šikonja, M. (2017). Explaining machine learning models in sales predictions. Expert Systems with Applications, 71, 416–428.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Chen, J. H., Ong, C. F., Zheng, L., & Hsu, S. C. (2017). Forecasting spatial dynamics of the housing market using support vector machine. International Journal of Strategic Property Management, 21(3), 273–283.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In ACM SIGKDD international conference on knowledge discovery & data mining.
Cheng, L., & Huang, C. (2019). Exploring contextual factors from consumer reviews affecting movie sales: an opinion mining approach. Electronic Commerce Research, 1–26.
Chiu, D. K. W., Yueh, Y. T. F., Leung, H., & Hung, P. C. K. (2009). Towards ubiquitous tourist service coordination and process integration: A collaborative travel agent system architecture with semantic web services. Information Systems Frontiers, 11(3), 241–256.
Das, P., Smith, P., & Gallimore, P. (2018). Pricing extreme attributes in commercial real estate: The case of hotel transactions. The Journal of Real Estate Finance and Economics, 57(2), 264–296.
Freund, Y. (1996). Experiment with a new boosting algorithm. In Machine learning: Proceedings of the thirteen international conference, 1996.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.
Friendly, M. (2002). Corrgrams: Exploratory displays for correlation matrices. American Statistician, 56(4), 316–324.
Fu, X., Du, J., Guo, Y., Liu, M., Dong, T., & Duan, X. (2018). A Machine Learning Framework for Stock Selection. arXiv:1806.01743.
Genuer, R., Poggi, J., Tuleau-Malot, C., & Villa-Vialaneix, N. (2017). Random forests for big data. Big Data Research, 9, 28–46.
Gogas, P., Papadimitriou, T., Matthaiou, M., & Chrysanthidou, E. (2015). Yield curve and recession forecasting in a machine learning framework. Computational Economics, 45(4), 635–645.
Graves, A. (2016). Adaptive computation time for recurrent neural networks. arXiv preprint, arXiv:1603.08983.
Guo, J., & Qu, X. (2019). Spatial interactive effects on housing prices in Shanghai and Beijing. Regional Science and Urban Economics, 76, 147–160.
Heaton, J. B., Polson, N. G., & Witte, J. H. (2017). Deep learning for finance: deep portfolios. Applied Stochastic Models in Business and Industry, 33(1), 3–12.
Huang, C., Guo, R., Tang, Z., & Zhang, Z. (2005). Preparation of Zirconia base solid solution nanopowder by exothermal solid-state synthesis. Journal of the American Ceramic Society, 88(6), 1651–1654.
Huang, Z., Chen, R., Xu, D., & Zhou, W. (2017). Spatial and hedonic analysis of housing prices in Shanghai. Habitat International, 67, 69–78.
Ji, L. V. (2014). Real estate appraisal model and empirical research based on genetic algorithm to optimize neural network. Computer Science.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 3146–3154.
Kesorn, K., & Poslad, S. (2012). An enhanced bag-of-visual word vector space model to represent visual content in athletics images. IEEE Transactions on Multimedia, 14(1), 211–222.
Kondylidis, N., Tzelepi, M., & Tefas, A. (2018). Exploiting tf-idf in deep convolutional neural networks for content based image retrieval. Multimedia Tools and Applications, 77(23), 30729–30748.
Kulkarni, R., Haynes, K. E., Stough, R. R., & Paelinck, J. H. (2009). Forecasting housing prices with Google econometrics. GMU School of public policy research paper (2009-10).
Lee, N., Kim, E., & Kwon, O. (2018). Combining TF-IDF and LDA to generate flexible communication for recommendation services by a humanoid robot. Multimedia Tools and Applications, 77(4), 5043–5058.
Li, J., Fan, Q. N., & Zhang, K. (2007). Keyword extraction based on tf/idf for Chinese news document. Wuhan University Journal of Natural Sciences, 12(5), 917–921.
Medeiros, M. C., Vasconcelos, G. F., Veiga, Á., & Zilberman, E. (2019). Forecasting Inflation in a data-rich environment: the benefits of machine learning methods. Journal of Business & Economic Statistics, 1–22.
Novikov, A., Trofimov, M., Oseledets, I. (2016). Exponential machines. arXiv preprint, arXiv:1605.03795.
Park, B., & Bae, J. K. (2015). Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data. Expert Systems with Applications, 42(6), 2928–2934.
Taigel, F., Tueno, A. K., & Pibernik, R. (2018). Privacy-preserving condition-based forecasting using machine learning. Journal of Business Economics, 88(5), 563–592.
Tanuwijaya, S., & Ohno, Y. (2010). TF–DF indexing for mocap data segments in measuring relevance based on textual search queries. The Visual Computer, 26(6), 1091–1100.
Theebe, M. A. J. (2004). Planes, trains, and automobiles: The impact of traffic noise on house prices. The Journal of Real Estate Finance and Economics, 28(2), 209–234.
Turney, P. D., & Pantel, P. (2010). From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research, 37(1), 141–188.
Varian, H. R. (2014). Big data: New tricks for econometrics. Journal of Economic Perspectives, 28(2), 3–27.
Węckowski, D. G. (2013). Crawling data-intensive web sources using structure information. In W. Abramowicz (Ed.) (pp. 196–207). Berlin: Springer.
Wolfe, E., Spekkens, R.W., & Fritz, T. (2019). The inflation technique for causal inference with latent variables. Journal of Causal Inference, 7(2).
Wu, L., & Brynjolfsson, E. (2015). The future of prediction: How Google searches foreshadow housing prices and sales. In Economic analysis of the digital economy (pp. 89–118). University of Chicago Press.
Yao, J., Chen, J., Wei, J., Chen, Y., & Yang, S. (2019). The relationship between soft information in loan titles and online peer-to-peer lending: Evidence from RenRenDai platform. Electronic Commerce Research, 19(1), 111–129.
Yu, Z., Niu, Z., Tang, W. H., & Wu, Q. (2019). Deep learning for daily peak load forecasting: A novel gated recurrent neural network combining dynamic time warping.
Zhou, L., Shi, L., & He, Y. (2016). Review and prospect of real estate assessment in the background of artificial intelligence. Housing & Real Estate, 20, 51–57.
Zhu, J., Xie, B., Luo, X., Fan, X., Zeng, W., & Zheng, C. (2016). Net-mediated public opinion analysis of China’s real estate. Journal of Applied Statistics and Management, 35(04), 722–741.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xu, L., Li, Z. A New Appraisal Model of Second-Hand Housing Prices in China’s First-Tier Cities Based on Machine Learning Algorithms. Comput Econ 57, 617–637 (2021). https://doi.org/10.1007/s10614-020-09973-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10614-020-09973-5