Abstract
The Home Sales Index (HSI) is considered one of the most crucial factors for predicting economic trends in the real estate and construction industries. Accordingly, numerous studies have investigated the precise estimation and prediction of the HSI. However, previous studies have shown limitations in collecting valuable data that can be used for such estimations and predictions. Several studies have shown that web search data have a significant relationship with various social trends, including the HSI. The goal of this study is to analyze the relationship between the HSI and feasible web search data and suggest an HSI prediction model based on the results. Our analysis includes a method for enhancing the prediction accuracy using principal component analysis. The varimax rotation method is used to find significant factors, and a genetic algorithm is used for optimization. Our results demonstrate that the prediction accuracy of the proposed model is enhanced compared to previous studies, and its capability is increased for practical field applications.
Similar content being viewed by others
References
Albright, S. C., Winston, W., and Zappe, C. (2011). Data analysis and Decision Making, South-Western Cengage Learning, 4th edition, Mason, USA.
Abdul-Wahab, S. A., Bakheit, C. S., and Al-Alawi, S. M. (2005). “Principal component and multiple regression analysis in modelling of ground-level ozone and factors affecting its concentrations.” Environmental Modelling & Software, vol. 20, no. 10, pp. 1263–1271, DOI: 10.1016/j.envsoft.2004.09.001.
Bang, K. (2011). Real Estate Terms Dictionary, Buyeonsa. South Korea.
Beracha, E. and Wintoki, M. B. (2013). “Forecasting residential real estate price changes from online search activity.” Journal of Real Estate Research, vol. 35, no. 3, pp. 283–312, DOI: 10.5555/rees. 36.3.6417244666x72788.
BizSpring Inc. (2016). statistics available online: http://www.internettrend.co.kr/trendForward.tsp.
Borůvka, L., Vacek, O., and Jehlička, J. (2005). “Principal component analysis as a tool to indicate the origin of potentially toxic elements in soils.” Geoderma, vol. 128, no. 3, pp. 289–300, DOI: 10.1016/j.geoderma.2005.04.010.
Chan, E. H., Sahai, V., Conrad, C. and Brownstein, J. S. (2011). “Using web search query data to monitor dengue epidemics: A new model for neglected tropical disease surveillance.” PLoS Negl Trop Dis, Vol. 5, No. 5, pp. e1206, DOI: 10.1371/journal.pntd.0001206.
Chang, Y. (2001). “Hybrid fuzzy least-squares regression analysis and its reliability measures.” Fuzzy Sets and Systems, vol. 119, no. 2, pp. 225–246, DOI: 10.1016/S0165-0114(99)00092-5.
Devore, J. (2012). Probability and Statistics for Engineering and Science, Richard Stratton, 8th edition, Boston, USA.
Ettredge, M., Gerdes, J., and Karuga, G. (2005). “Using web-based search data to predict macroeconomic statistics.” Communications of the ACM, vol. 48, no. 11, pp. 87–92, DOI: 10.1145/1096000.1096010.
Facchinelli, A., Sacchi, E., and Mallen, L. (2001). “Multivariate statistical and GIS-based approach to identify heavy metal sources in soils.” Environmental pollution, vol. 114, no. 3, pp. 313–324, DOI: 10.1016/S0269-7491(00)00243-8.
Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., and Brilliant, L. (2009). “Detecting influenza epidemics using search engine query data.” Nature, vol. 457, no. 7232, pp. 1012–1014, DOI: 10.1038/nature07634.
Han, S., Ko, Y., Kim, S., and Shin, D. (2017). “Home sales index prediction model based on cluster and principal component statistical approaches in a big data analytical concept.” KSCE Journal of Civil Engineering, vol. 21, no. 1, pp. 67–75, DOI: 10.1007/s12205-016-0574-6.
Hansen, J. V., McDonald, J. B., and Nelson, R. D. (1999). “Time series prediction with genetic-algorithm designed neural networks: An empirical comparison with modern statistical models.” Computational Intelligence, vol. 15, no. 3, pp. 171–184, DOI: 10.1111/0824-7935.00090.
Hu, Y., Chen, H., Xie, J., Yang, X., and Zhou, C. (2012). “Chiller sensor fault detection using a self-adaptive principal component analysis method.” Energy and buildings, vol. 54, pp. 252–258, DOI: 10.1016/j.enbuild.2012.07.014.
Jung, T., Kim, B., and Jung, C. (2014). “The construction of housing price indices using matching approach: The case of apartments in daegu. korea research institute for human settlements.” The Korea Spatial Planning Review, pp. 77–95.
Kim, D. and Yu, J. (2014). “A dynamic relationship between internet search activity, Housing price, and trading volume.” Korea Real Estate Review, vol. 24, no. 2, pp. 125–140, DOI: 10.1016/j.buildenv. 2004.02.013.
Kim, K. J. and Han, I. (2000). “Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index.” Expert Systems with Applications, vol. 19, no. 2, pp. 125–132, DOI: 10.1016/S0957-4174(00)00027-0.
Lee, J. and Lee, C. (2014). “Analysis on the determinants of korea housing price index in unstable housing market by volatility of the housing price.” Korea Real Estate Academy Review, vol. 59, pp. 203–216.
McMillen, D. P. (2012). “Repeat sales as a matching estimator.” Real Estate Economics, vol. 40, no. 3, pp. 745–773, DOI: 10.1111/j.1540-6229.2012.00343.x
Min, S. H., Lee, J., and Han, I. (2006). “Hybrid genetic algorithms and support vector machines for bankruptcy prediction.” Expert Systems with Applications, vol. 31, no. 3, pp. 652–660, DOI: 10.1016/j.eswa.2005.09.070.
Minaei-Bidgoli, B., Kashy, D. A., Kortemeyer, G., and Punch, W. F. (2003). “Predicting student performance: An application of data mining methods with an educational web-based system.” In Frontiers in Education 2003, Vol. 1, pp. T2A–13. IEEE, DOI: 10.1109/FIE.2003. 1263284.
Motulsky, H. and Ransnas, L. (1987). “Fitting curves to data using nonlinear regression: A practical and nonmathematical review.” The FASEB journal, vol. 1, no. 5, pp. 365–374.
Petroni, A. and Braglia, M. (2000). “Vendor selection using principal component analysis.” Journal of Supply Chain Management, vol. 36, no. 1, pp. 63–69, DOI: 10.1111/j.1745-493X.2000.tb00078.x
Poovich, P., Kim, S., Chen, V., and Jiang, W. (2013). “Principal component analysis-based control charts for multivariate nonnormal distributions.” Expert Systems and Applications, vol. 40, no. 8, pp. 3044–3054, DOI: 10.1016/j.eswa.2012.12.020.
Repkine, A. and Song, S. (2015). “A spatial approach to the hedonic pricing of apartment attributes.” Korea Real Estate Academy Review, vol. 63, pp. 5–17.
Sim, S. (2015). “Panel analysis of relationship between house sales prices and trading volume.” Korea Real Estate Academy Review, vol. 63, pp. 18–31.
Web Search Queries data available from Internet: http://datalab.naver.com/keyword/trendSearch.naver.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Han, S., Ko, Y., Kim, JY. et al. Enhancement of Prediction Accuracy for Home Sales Index Prediction Model based on Integration of Multiple Regression Analysis and Genetic Algorithm. KSCE J Civ Eng 22, 2159–2166 (2018). https://doi.org/10.1007/s12205-017-1648-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12205-017-1648-9