Abstract
The dynamics of the potato market in Agra, Uttar Pradesh, India, represent significant price volatility that affects stakeholders across the supply chain. This study addresses the critical need for accurate forecasting of potato price, which is utmost for optimising production, marketing strategies and inventory management. However, existing forecasting models often fail to provide the accuracy required for effective planning and resource allocation. This research aims to bridge this gap by investigating the potential of advanced predictive models to offer closer approximations of potato prices. Covering the period from January 1, 2006, to July 31, 2023, the methodology employed the H2O AutoML framework to identify and evaluate predictive models based on two distinct train-test split ratios, 80:20 and 70:30. The selection of the top 20 models for each configuration, assessed using the root mean square error, revealed the 70:30 split’s superior performance. Further analysis identified the top three models: stacked ensemble, gradient boosting machine and extreme gradient boosting, with the stacked ensemble model emerging as the optimal choice with forecasting errors ranging from 0.08 to 2.09% for daily prices of potato. This result illustrates the effectiveness of the stacked ensemble model in advancing strategic decision-making and resource distribution within the potato industry, with a notable improvement in the accuracy of price predictions contributing to more efficient and informed operational strategies.
Similar content being viewed by others
References
Abou Omar KB (2018) XGBoost and LGBM for Porto Seguro’s Kaggle challenge: a comparison. Preprint Sem Project 1:16
AGMARKNET (2023) Agriculture marketing website. http://agmarknet.gov.in/. Accessed 13 Aug 2023
AutoML (2023) Automatic machine learning. https://www.automl.org/automl/. Accessed Jun 2023
Bontempi G, Taieb S, Borgne YL (2013) Machine learning strategies for time series forecasting. In: Aufaure MA, Zimányi E (eds) Business Intelligence. eBISS 2012. Lecture Notes in Business Information Processing, vol 138. Springer, Berlin, Heidelberg, pp. 62–77. https://doi.org/10.1007/978-3-642-36318-4_3
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining. pp 785–794
Choudhary K, Jha GK, Kumar RR, Mishra C et al (2019) Agricultural commodity price analysis using ensemble empirical mode decomposition: a case study of daily potato price series. Indian J Agric Sci 85(5):882–886
Cui S, Yin Y, Wang D, Li Z, Wang Y et al (2021) A stacking-based ensemble learning method for earthquake casualty prediction. Appl Soft Comput 101:107038. https://doi.org/10.1016/j.asoc.2020.107038
Davis R, Nielsen M (2020) Modeling of time series using random forests: theoretical developments. Electron J Stat 14:3644–4367
Derbentsev V, Matviychuk A, Soloviev VN (2020) Forecasting of cryptocurrency prices using machine learning. In: Pichl L, Eom C, Scalas E, Kaizoji T (eds) Advanced Studies of Financial Technologies and Cryptocurrency Markets. Springer, Singapore, pp 211–231
Diagrams.net (2023) An open-source cross-platform graph drawing software. http://app.diagrams.net. Accessed 10 Jul 2023
Douna V, Barraza V, Grings F, Huete A (2012) Towards a remote sensing data based evapotranspiration estimation in Northern Australia using a simple random forest approach. J Arid Environ 191:104513. https://doi.org/10.1016/j.jaridenv.2021.104513
Ferreira AJ, Figueiredo MAT (2012) Boosting algorithms: a review of methods, theory, and applications. In: Zhang C, Ma Y (eds) Ensemble Machine Learning. Springer, New York, NY, pp 35–85. https://doi.org/10.1007/978-1-4419-9326-7_2
Flach P (2012) Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press, Cambridge
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Statist 29(5):1189–1232. https://doi.org/10.1214/aos/1013203451
H2O.ai (2023) H2O: scalable machine learning platform. https://h2o.ai/platform/h2o-automl/. Accessed Jun 2023
Harshith N, Kumari P (2024) Memory-based neural network for cumin price forecasting in Gujarat. India. Journal of Agriculture and Food 15:101020. https://doi.org/10.1016/j.jafr.2024.101020
Heidari M, Zad S, Rafatirad S (2021) Ensemble of supervised and unsupervised learning models to predict a profitable business decision. 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS). pp 1–6. https://doi.org/10.1109/IEMTRONICS52119.2021.9422649
Jabeur SB, Mefteh-Wali S, Viviani JL (2021) Forecasting gold price with the XGBoost algorithm and SHAP interaction values. Ann Oper Res 1:21. https://doi.org/10.1007/s10479-021-04187-w
Jeong JH, Resop JP, Mueller ND, Fleisher DH, Yun K, Butler EE (2016) Random forests for global and regional crop yield predictions. PLoS ONE 11(6):e0156571. https://doi.org/10.1371/journal.pone.0156571
Kumar D, Rath SK (2020) Predicting the trends of price for ethereum using deep learning technique. In: Dash S, Lakshmi C, Das S, Panigrahi B (eds) Artificial intelligence and evolutionary computations in engineering systems. Advances in Intelligent Systems and Computing. Springer, Singapore, pp 103–114. https://doi.org/10.1007/978-981-15-0199-9_9
Kumari P, Mishra GC, Srivastava CP (2016) Statistical models for forecasting pigeonpea yield in Varanasi region. J Agrometeorol. 18(2):306–310. https://doi.org/10.54386/jam.v18i2.956
Kumari P, Mishra GC, Srivastava CP (2017) Forecasting models for predicting pod damage of pigeonpea in Varanasi region. J Agrometeorol. 19(3):265–269. https://doi.org/10.54386/jam.v19i3.669
Kumari P, Parmar DJ, Kumar MS, Lad YA, Mahera AB (2022a) An artificial neural network approach for predicting area, production, and productivity of Banana in Gujarat. The Pharma Innovation Journal 11(4):816–821
Kumari P, Parmar DJ, Kumar MS, Lad YA, Mahera AB (2022b) Forecasting area, production, and productivity of mango in Gujarat by using an artificial neural network model. The Pharma Innovation Journal 11(4):822–826
Kumari P, Goswami V, Harshith N, Pundir RS (2023) Recurrent neural network architecture for forecasting banana prices in Gujarat. India PLOS ONE 18(6):e0275702. https://doi.org/10.1371/journal.pone.0275702
Li W, Luo Y, Zhu Q (2008) Applications of AR*-GRNN model for financial time series forecasting. Neural Comput Appl 17:441–448. https://doi.org/10.1007/s00521-007-0131-9
Martínez F, Charte F, Frías MP, Martínez-Rodríguez AM (2022) Strategies for time series forecasting with generalized regression neural networks. Neurocomputing 491:509–521. https://doi.org/10.1016/j.neucom.2021.12.028
Reinstein I (2017) XGBoost a top machine learning method on Kaggle, explained. Available online: http://www.kdnuggets.com/2017/10/xgboost-topmachine-learning-method-kaggle-explained.html. Accessed 22 Jan 2023
Sagi O, Rokach L (2018) Ensemble learning: a survey. Data Mining Knowledge Discovery 8(4):e1249. https://doi.org/10.1002/widm.1249
Wang C, Deng C, Wang S (2019) Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost, arXiv. Available online: https://arxiv.org/abs/1908.01672. Accessed 25 Jan 2023
Weng Y, Wang X, Hua J, Wang H, Kang M, Wang FY (2019) Forecasting horticultural products price using ARIMA model and neural network based on a large-scale data set collected by web crawler. IEEE Trans Comput Soc Syst 6(3):547–553. https://doi.org/10.1109/TCSS.2019.2914499
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259. https://doi.org/10.1016/S0893-6080(05)80023-1
Yang XD, Wang JM, Zhang LN (2017) Application of XGBoost in ultra-short term load forecasting. Electr Drive Autom 39:21–25
Yu E, Wei H, Han Y (2021) Application of time series prediction techniques for coastal bridge engineering. ABEN 2:6
Zhang GP, Kline DM (2007) Quarterly time-series forecasting with neural networks. IEEE Trans Neural Netw 18(6):1800–1814. https://doi.org/10.1109/TNN.2007.896859
Zhang GP, Qi M (2005) Neural network forecasting for seasonal and trend time series. Eur J Oper Res 160(2):501–514. https://doi.org/10.1016/j.ejor.2003.08.037
Zhang D, Chen S, Ling L, Xia Q (2020) Forecasting agricultural commodity prices using model selection framework with time series features and forecast horizons. IEEE Access 8:28197–28209. https://doi.org/10.1109/ACCESS.2020.2971591
Zhang X, Zhou D, Wang L (2010) Stacking algorithms for automated container ports: an improvement by direct stacking. In: Proceedings - 2010 2nd WRI Global Congress on Intelligent Systems, GCIS 2010. IEEE, pp 35–38
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kumari, P., M, S.K., Vekariya, P. et al. Predicting Potato Prices in Agra, UP, India: An H2O AutoML Approach. Potato Res. (2024). https://doi.org/10.1007/s11540-024-09726-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11540-024-09726-z