Skip to main content
Log in

Home sales index prediction model based on cluster and principal component statistical approaches in a big data analytic concept

  • Construction Management
  • Published:
KSCE Journal of Civil Engineering Aims and scope

Abstract

Data analysis has become one of the most important tools in various fields of engineering. HSI (Home Sales Index) has been considered one of the most crucial factors for predicting the economic trends in the construction industry. While a precise prediction of a HSI is very significant, it is limited due to the difficulty in collecting valuable information and analyzing that information properly. The several conducted studies related to a similar subject have mostly approached to the time series analysis with assumption that the HSI has been influenced with the past patterns such as seasons and historic issues. This paper mainly focused on the objective of presenting the performance of a new HSI prediction model using a big data analysis method focusing on social factors in high correlations in HSI without any those assumptions. In accordance with this research objective, this paper suggests a definite methodology for analyzing data and developing a related prediction model using cluster and principal component statistical approached focusing on comprehensive search queries on the Web. The significant results derived from the model applied to HSI prediction can be used as the initial phase of a big data analysis in the construction field in both academia and industry.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahn, J., Byun, H., Oh, K., and Kim, Y. (2012). “Using ridge regression with genetic algorithm to enhance real estate appraisal forecasting.” Expert Systems with Applications, Vol. 39, No. 9, pp. 8369–8379, DOI: 10.1016/j.eswa.2012.01.183.

    Article  Google Scholar 

  • Al-Degs, Y., Abu-El-Halawa, R., and Abu-Alrub, S. (2012). “Analyzing adsorption data of erythrosine dye using principal component analysis.” Chemical Engineering Journal, Vol. 191, pp. 185–194, DOI: 10.1016/j.cej.2012.03.002.

    Article  Google Scholar 

  • Anwar, S. and Mikami, Y. (2011). “Comparing accuracy performance of ANN, MLR, and GARCH model in predicting time deposit return of islamic bank.” International Journal of Trade Economics and Finance, Vol. 2, No. 1, pp. 44–51, DOI: 10.7763/IJTEF.2011.V2.77.

    Article  Google Scholar 

  • Beracha, E. and Wintoki, M. B. (2013). “Forecasting residential real estate price changes from online search activity.” Journal of Real Estate Research, Vol. 35, No. 3, pp. 283–312, DOI: 10.5555/rees.35.3.c0ru080q45n34064.

    Google Scholar 

  • Bollen, J., Mao, H., and Zeng, X. (2011). “Twitter mood predicts the stock market.” Journal of Computational Science, Vol. 2, No. 1, pp. 1–8, DOI: 10.1016/j.jocs.2010.12.007.

    Article  Google Scholar 

  • Cao, H., Chen, W., Jia, L., Zhang, Y., and Lu, Y. (2013). “Cluster analysis based on attractor particle swarm optimization with boundary zoomed for working conditions classification of power plant pulverizing system.” Neurocomputing, Vol. 117, pp. 54–63, DOI: 10.1016/j.neucom.2013.01.040.

    Article  Google Scholar 

  • Cattell, B. (1943). “The description of personality: Basic traits resolved into clusters.” The Journal of Abnormal and Social Psychology, Vol. 38, No. 4, pp. 476–506, DOI: 10.1037/h0054116.

    Article  Google Scholar 

  • Chang, Y. (2001). “Hybrid fuzzy least-squares regression analysis and its reliability measures.” Fuzzy Sets and Systems, Vol. 119, No. 2, pp. 225–246, DOI: 10.1016/S0165-0114(99)00092-5.

    Article  MathSciNet  MATH  Google Scholar 

  • Choi, H. and Varian, H. (2012). “Predicting the present with google trends.” Economic Record, Vol. 88, No. 1, pp. 2–9, DOI: 10.1111/j.1475-4932.2012.00809.x.

    Article  Google Scholar 

  • Devore, J. (2012). Probability and Statistics for Engineering and Science, Richard Stratton, 8th edition, Boston, USA.

  • Estivill-Castro, V. (2002). “Why so many clustering algorithms: A position paper.” ACM SIGKDD Explorations Newsletter, Vol. 4, No. 1, pp. 65–75, DOI: 10.1145/568574.568575.

    Article  MathSciNet  Google Scholar 

  • Fernandez, C. and Steel, M. (2000). “Bayesian regression analysis with scale mixtures of normal.” Econometric Theory, Vol. 16, No. 1, pp. 80–101.

    Article  MathSciNet  MATH  Google Scholar 

  • Ginsberg, J., Matthew, H., Mohebbi, R., Patel, L., Brammer, M., Smolinski, S., and Brilliant, L. (2008). “Detecting influenza epidemics using search engine query data.” Nature, Vol. 457, No. 7232, pp. 1012–1014, DOI: 10.1038/nature07634.

    Article  Google Scholar 

  • Goel, S., Hofman, J. M., Lahaie, S., Pennock, D. M., and Watts, D. J. (2010). “Predicting consumer behavior with Web search.” Proceedings of the National Academy of Sciences, Vol. 107, No. 41, pp. 17486–17490, DOI: 10.1073/pnas.1005962107.

    Article  Google Scholar 

  • Hu, Y., Chen, H., Xie, J., Yang, X., and Zhou, C. (2012). “Chiller sensor fault detection using a self-Adaptive Principal Component Analysis method.” Energy and Buildings, Vol. 54, pp. 252–258, DOI: 10.1016/j.enbuild.2012.07.014.

    Article  Google Scholar 

  • Johnson, R. and Death, W. (2007). “Applied multivariate statistical analysis.” Pearson Group, ISBN:978-0131877153.

  • Kim, D. and Yu, J. (2014). “A dynamic relationship between internet search activity, housing price, and trading volume.” Korea Real Estate Review, Vol. 24, No. 2, pp. 125–140.

    Google Scholar 

  • Kim, E., Lee, S., and Kim, J. (2009). “An analysis on the influence of cd interest rate and spread on housing transaction index and Jeonse Rental Index.” Architectural Institute of Korea, Vol. 25, No. 12, pp. 207–214.

    Google Scholar 

  • Kim, G., An, S., and Kang, K. (2004). “Comparison of construction cost estimating models based on regression analysis, neural networks, and case-based reasoning.” Building and Environment, Vol. 39, No. 10, pp. 1235–1242, DOI: 10.1016/j.buildenv.2004.02.013

    Article  Google Scholar 

  • Kim, Y. and Cho, K. (2013). “Big data and statistics.” Journal of the Korean Data & Information Science Society, Vol. 24, No. 5, pp. 959–974.

    Article  Google Scholar 

  • Lim, B. and Han, S. (2009). “A study on the relationship between return on real estate and Korea composite stock price Index.” Korean Industrial Economic Association, Vol. 22, No. 4, pp. 2065–2083.

    Google Scholar 

  • Motulsky, H. and Ransnas, L. (1987). “Fitting curves to data using nonlinear regression: A practical and nonmathematical review.” The FASEB Journal, Vol. 1, No. 5, pp. 365–374.

    Google Scholar 

  • Pearson, K. (1901). “On lines and planes of closest fit to systems of points in space.” Philosophical Magazine, Vol. 2, No. 11, pp. 559–572.

    Article  MATH  Google Scholar 

  • Peeters, B., Maeck, J., and De Roeck, G. (2001). “Vibration-based damage detection in civil engineering: Excitation sources and temperature effects.” Smart materials and Structures, Vol. 10, No. 3, pp. 518–528.

    Article  Google Scholar 

  • Petcharat, S., Chungpaibul-patana, S., and Rakkwamsuk, P. (2012). “Assessment of potential energy saving using cluster analysis: A case study of lighting systems in buildings.” Energy and Buildings, Vol. 52, pp. 145–152, DOI: 10.1016/j.enbuild.2012.06.006.

    Article  Google Scholar 

  • Phaladiganon, P., Kim, S., Chen, V., and Jiang, W. (2013). “Principal component analysis-based control charts for multivariate nonnormal distributions.” Expert Systems and Applications, Vol. 40, No. 8, pp. 3044–3054, DOI: 10.1016/j.eswa.2012.12.020. Search Queries available from Internet: http://trend.naver.com

    Article  Google Scholar 

  • Smith, S. (1999). “Earthmoving productivity estimation using linear regression techniques.” Journal of construction engineering and management, Vol. 125, No. 3, pp. 133–141, DOI: 10.1061/(ASCE) 0733-9364(1999)125:3(133).

    Article  Google Scholar 

  • Wu, L. and Brynjolfsson, E. (2015). “The future of prediction: How Google searches foreshadow housing prices and sales.” Economic Analysis of the Digital Economy. University of Chicago Press, DOI: 10.2139/ssrn.2022293.

  • Xin, J. and Huang, C. (2013). “Fire risk analysis of residential buildings based on scenario clusters and its application in fire risk management.” Fire Safety Journal, Vol. 62, pp. 72–78, DOI: 10.1016/j.firesaf.2013.09.022.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Do Hyoung Shin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, S., Ko, Y., Kim, S. et al. Home sales index prediction model based on cluster and principal component statistical approaches in a big data analytic concept. KSCE J Civ Eng 21, 67–75 (2017). https://doi.org/10.1007/s12205-016-0574-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12205-016-0574-6

Keywords

Navigation