Abstract
The opacity of real-estate market involves some challenges in their agent-based simulation. While some real-estate Web sites provide the prices of a great amount of houses publicly, the prices of the rest are not available. The estimation of these prices is necessary for simulating their evolution from a complete initial set of houses. Additionally, this estimation could also be useful for other purposes such as appraising houses, letting buyers know which are the best offered prices (i.e., the lowest ones compared to the appraisals) and recommending the buyers to set an initial price. This work proposes combining dimensionality reduction methods with machine learning techniques to obtain the estimated prices. In particular, this work analyzes the use of nonnegative factorization, recursive feature elimination and feature selection with a variance threshold, as dimensionality reduction methods. It compares the application of linear regression, support vector regression, the k-nearest neighbors and a multilayer perceptron neural network, as machine learning techniques. This work has applied a tenfold cross-validation for comparing the estimations and errors and assessing the improvement over a basic estimator commonly used in the beginning of simulations. The developed software and the used dataset are freely available from a data research repository for the sake of reproducibility and the support to other researchers.
Similar content being viewed by others
Notes
https://www.idealista.com (last accessed October 22, 2018).
http://data.gov.uk/ (last accessed September 16, 2017).
https://www.fotocasa.es/es/ (last accessed October 22, 2018).
References
Anya O, Moore B, Kieliszewski C, Maglio P, Anderson L (2015) Understanding the practice of discovery in enterprise big data science: an agent-based approach. Procedia Manuf 3:882–889
Bárcena Ruiz MJ, Menéndez P, Palacios MB, Tusell Palmer FJ (2011) Measuring the effect of the real estate bubble: a house price index for Bilbao. Biltoki 5463. http://hdl.handle.net/10810/5463. Last accessed 19 July 2017
Becker T, Illigen C, McKelvey B, Hülsmann M, Windt K (2016) Using an agent-based neural-network computational model to improve product routing in a logistics facility. Int J Prod Econ 174:156–167
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Borges F, Gutierrez-Milla A, Luque E, Suppi R (2017) Care HPS: a high performance simulation tool for parallel and distributed agent-based modeling. Future Gener Comput Syst 68:59–73
Bosch M, Carnero MA, Farré L (2015) Rental housing discrimination and the persistence of ethnic enclaves. SERIEs 6(2):129–152
Brown JM, Phelps JJ, Barkwith A, Hurst MD, Ellis MA, Plater AJ (2016) The effectiveness of beach mega-nourishment, assessed over three management epochs. J Environ Manag 184:400–408
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Last accessed 19 July 2017
Chang CC, Chao CH, Yeh JH (2016) The role of buy-side anchoring bias: evidence from the real estate market. Pacific-Basin Finance J 38:34–58
Chasco Yrigoyen C, Le Gallo J (2012) Hierarchy and spatial autocorrelation effects in hedonic models. Econ Bull 32(2):1474–1480
Chen J, Feng S, Liu J (2014) Topic sense induction from social tags based on non-negative matrix factorization. Inf Sci 280:16–25
Chiarazzo V, Caggiani L, Marinelli M, Ottomanelli M (2014) A neural network based model for real estate price estimation considering environmental quality of property location. Transp Res Procedia 3:810–817. https://doi.org/10.1016/j.trpro.2014.10.067, http://www.sciencedirect.com/science/article/pii/S2352146514002300, 17th Meeting of the EURO working group on transportation, EWGT2014, 2–4 July 2014, Sevilla, Spain
Chung H, Badeau R, Plourde E, Champagne B (2018) Training and compensation of class-conditioned nmf bases for speech enhancement. Neurocomputing 284:107–118
Cicirelli F, Furfaro A, Giordano A, Nigro L (2011) HLA\_ACTOR\_REPAST: an approach to distributing RePast models for high-performance simulations. Simul Modell Pract Theory 19(1):283–300
Cui G, Zhuang G, Lu J (2016) Neural-network-based distributed adaptive synchronization for nonlinear multi-agent systems in pure-feedback form. Neurocomputing 218:234–241
Davidsson P (2002) Agent based social simulation: a computer science view. J Artif Soc Soc Simul 5(1):1–7
Dismuke C, Lindrooth R (2006) Ordinary least squares. In: Chumney E, Simpson NK (eds) Methods and designs for outcomes research. American Society of Health-System Pharmacists, Bethesda, pp 93–104
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, Hoboken
Faul F, Erdfelder E, Lang AG, Buchner A (2007) G* power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 39(2):175–191
Galey M (2005) System and method of online real estate listing and advertisement. US Patent App. 10/896,331
Garca N, Gmez M, Alfaro E (2008) Ann+gis: an automated system for property valuation. Neurocomputing 71(4):733–742. https://doi.org/10.1016/j.neucom.2007.07.031, http://www.sciencedirect.com/science/article/pii/S0925231207003505, Neural Networks: algorithms and applications 50 years of artificial intelligence: a neuronal approach
García M (2010) The breakdown of the spanish urban growth model: social and territorial effects of the global crisis. Int J Urban Reg Res 34(4):967–980
García-Magariño I, Lacuesta R (2017) Agent-based simulation of real-estate transactions. J Comput Sci 21:60–76
García-Magariño I, Plaza I (2017) ABS-MindHeart: an agent based simulator of the influence of mindfulness programs on heart rate variability. J Comput Sci 19:11–20
García-Magariño I, Gómez-Rodríguez A, González-Moreno JC, Palacios-Navarro G (2015) PEABS: a process for developing efficient agent-based simulators. Eng Appl Artif Intell 46:104–112
García-Magariño I, Medrano C, Delgado J (2017) Python code for the estimation of missing prices in real-estate market with a dataset of house prices from Teruel city. Mendeley Data, v2 https://doi.org/10.17632/mxpgf54czz.2
Gilbert N, Terna P (2000) How to build and use agent-based models in social science. Mind Soc 1(1):57–72
Gómez-Sanz JJ, Fernández CR, Arroyo J (2010) Model driven development and simulations with the INGENIAS agent framework. Simul Model Pract Theory 18(10):1468–1482
Hassan S, Garmendia L, Pavón J (2010) Introducing uncertainty into social simulation: using fuzzy logic for agent-based modelling. Int J Reasoning-based Intell Syst 2(2):118–124
Houari R, Bounceur A, Kechadi MT, Tari AK, Euler R (2016) Dimensionality reduction in data mining: a copula approach. Expert Syst Appl 64:247–260
Jalalimanesh A, Haghighi HS, Ahmadi A, Soltani M (2017) Simulation-based optimization of radiotherapy: agent-based modeling and reinforcement learning. Math Comput Simul 133:235–248
Jayaram D, Manrai AK, Manrai LA (2015) Effective use of marketing technology in Eastern Europe: web analytics, social media, customer analytics, digital campaigns and mobile applications. J Econ Finance Adm Sci 20(39):118–132
Jiang GM, Hu ZP, Jin JY (2007) Quantitative evaluation of real estate’s risk based on AHP and simulation. Syst Eng Theory Pract 27(9):77–81
Khalil KM, Abdel-Aziz M, Nazmy TT, Salem ABM (2015) MLIMAS: a framework for machine learning in interactive multi-agent systems. Procedia Comput Sci 65:827–835
Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Lee D, Seung H (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13:556–562
Li ZX (2006) Using fuzzy neural network in real estate prices prediction. In: 2007 Chinese control conference, pp 399–402. https://doi.org/10.1109/CHICC.2006.4347291
Maltamo M, Kangas A (1998) Methods based on k-nearest neighbor regression in the prediction of basal area diameter distribution. Can J For Res 28(8):1107–1115
Maruyama R, Maeda K, Moroda H, Kato I, Inoue M, Miyakawa H, Aonishi T (2014) Detecting cells using non-negative matrix factorization on calcium imaging data. Neural Netw 55:11–19
Nguyen N, Cripps A (2001) Predicting housing value: a comparison of multiple regression analysis and artificial neural networks. J Real Estate Res 22(3):313–336
North MJ, Collier NT, Ozik J, Tatara ER, Macal CM, Bragen M, Sydelko P (2013) Complex adaptive systems modeling with Repast Simphony. Complex Adapt Syst Model 1(1):1
Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5:111–126
Park B, Bae JK (2015) Using machine learning algorithms for housing price prediction: the case of Fairfax county, Virginia housing data. Expert Syst Appl 42(6):2928–2934. https://doi.org/10.1016/j.eswa.2014.11.040
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830
Provost F, Fawcett T (2013) Data science and its relationship to big data and data-driven decision making. Big Data 1(1):51–59
Pyhrr SA (1973) A computer simulation model to measure the risk in real estate investment. Real Estate Econ 1(1):48–78
Reiser L, Mueller LA, Rhee SY (2002) Surviving in a sea of data: a survey of plant genome data resources and issues in building data management systems. Functional genomics. Springer, Berlin, pp 59–74
Sabarina K, Priya N (2015) Lowering data dimensionality in big data for the benefit of precision agriculture. Procedia Comput Sci 48:548–554
Simovici D (2012) Linear algebra tools for data mining. World Scientific Publishing, Singapore
Sun Y, Wen G (2017) Cognitive facial expression recognition with constrained dimensionality reduction. Neurocomputing 230:397–408
Symeonidis S, Effrosynidis D, Arampatzis A (2018) A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis. Expert Syst Appl 110:298–310
Tratalos J, Haines-Young R, Potschin M, Fish R, Church A (2016) Cultural ecosystem services in the UK: lessons on designing indicators to inform management and policy. Ecol Indic 61:63–73
Urbanavičiene V, Kaklauskas A, Zavadskas EK (2009) The conceptual model of construction and real estate negotiation. Int J Strateg Prop Manag 13(1):53–70
Wang R, Hou J, He X (2017) Real estate price and heterogeneous investment behavior in China. Econ Model 60:271–280
Wang S, Wan J, Zhang D, Li D, Zhang C (2016) Towards smart factory for industry 4.0: a self-organized multi-agent system with big data based feedback and coordination. Comput Netw 101:158–168
Wojtusiak J, Warden T, Herzog O (2012) Machine learning in agent-based stochastic simulation: inferential theory and evaluation in transportation logistics. Comput Math Appl 64(12):3658–3665
Yaqoob I, Hashem IAT, Gani A, Mokhtar S, Ahmed E, Anuar NB, Vasilakos AV (2016) Big data: from beginning to future. Int J Inf Manag 36(6):1231–1247
Zhang L, Wang Z, Sagotsky JA, Deisboeck TS (2009) Multiscale agent-based cancer modeling. J Math Biol 58(4–5):545–559
Zhuge C, Shao C, Gao J, Dong C, Zhang H (2016) Agent-based joint model of residential location choice and real estate price for land use and transport model. Comput Environ Urban Syst 57:93–105
Žibert J, Cedilnik J, Pražnikar J (2016) Particulate matter (pm10) patterns in Europe: an exploratory data analysis using non-negative matrix factorization. Atmos Environ 132:217–228
Acknowledgements
This work has been supported by the program “Estancias de movilidad en el extranjero José Castillejo para jóvenes doctores” funded by the Spanish Ministry of Education, Culture and Sport with reference CAS17/00005. This work also acknowledges the research project “Diseño de actividades de aprendizaje colaborativas con Big Data” with reference PIIDUZ_16_120 funded by University of Zaragoza. We acknowledge the research project “Construcción de un framework para agilizar el desarrollo de aplicaciones móviles en el ámbito de la salud” funded by University of Zaragoza and Foundation Ibercaja with grant reference JIUZ-2017-TEC-03. We also acknowledge support from “Universidad de Zaragoza,” “Fundación Bancaria Ibercaja” and “Fundación CAI” in the “Programa Ibercaja-CAI de Estancias de Investigación” with reference IT1/18. This work was partially supported by the Spanish Research grant MTM2015-65433-P (MINECO/FEDER), Gobierno de Aragón and Fondo Social Europeo. Furthermore, we acknowledge the “Fondo Social Europeo” and the “Departamento de Tecnología y Universidad del Gobierno de Aragón” for their joint support with grant number Ref-T81.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is not any conflict of interest about this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
García-Magariño, I., Medrano, C. & Delgado, J. Estimation of missing prices in real-estate market agent-based simulations with machine learning and dimensionality reduction methods. Neural Comput & Applic 32, 2665–2682 (2020). https://doi.org/10.1007/s00521-018-3938-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3938-7