Abstract
Crop yield prediction is one of the burgeoning research areas in the agriculture domain. The crop yield forecasting models are developed to enhance productivity with improved decision-making strategies. The highly efficient crop yield forecasting model assists farmers in determining when, what and how much to plant on their cultivable land. The main objective of the proposed research work is to build a high efficacious crop yield prediction model based on the data available for the period of 21 years from 1997 to 2017 using machine learning and hybrid deep learning approaches. Two prediction models have been proposed in this research work to predict the crop yield accurately. The first model is a machine learning-based model which uses the CatBoost regression model and its hyperparameters are tuned which improves the performance of the yield prediction using the Optuna framework. The second model is the hybrid deep learning model which uses spatio-temporal attention-based convolutional neural network (STACNN) for extracting the features and the bidirectional long short-term memory (BiLSTM) model for predicting the crop yield effectively. The proposed models are evaluated using the error metrics and compared with the latest contemporary models. From the evaluation results, it is shown that the proposed models significantly outperform all other existing models and CatBoost regression model slightly performs better than the STACNN-BiLSTM model, with the R-squared value of 0.99.
Similar content being viewed by others
Availability of data and materials
The dataset for crop yield is collected from Kaggle website (The source of the dataset used in this research work is https://www.kaggle.com/datasets/abhinand05/crop-production-in-india/data or https://www.kaggle.com/code/ anjali21/indian-production-analysis-and-prediction/data) and Tata-Cornell Institute (TCI) website. The Kaggle website crop yield data are 646 districts of 33 Indian states and consists of historical information of crop yields between the years 1997–2015. Seven instances are present in the Kaggle crop yield dataset namely State name, District name, Crop year, Season, Crop name, Area and Production with 246091different attributes. The Kaggle crop yield dataset consists of nearly 124 types of crops grown all over in India. The crop yield dataset for district level Indian agriculture from TCI website is created by International Crops Research Institute for the Semi-Arid Tropics and TCI. The crop yield TCI dataset is collected for the years 2016 and 2017 which has 18,009 different attributes. Both the datasets are used for the proposed work by merging the data according to the common instances.
References
Abduljabbar RL, Dia H, Tsai PW (2021) Unidirectional and bidirectional LSTM models for short-term traffic prediction. J Adv Transp 2021:1–16. https://doi.org/10.1155/2021/5589075
Agarwal S, Tarar S (2021) A hybrid approach for crop yield prediction using machine learning and deep learning algorithms. J Phys Conf Ser 1714(1):012012. https://doi.org/10.1088/1742-6596/1714/1/012012
Agrawal D, Minocha S, Goel AK (2021) Gradient boosting based classification of ion channels. In: 2021 International conference on computing, communication, and intelligent systems (ICCCIS), pp 102–107. IEEE. https://doi.org/10.1109/ICCCIS51004.2021.9397161
Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2623–2631. https://doi.org/10.1145/3292500.3330701
Al-Khowarizmi RS, Nasution MK, Elveny M (2021) Sensitivity of MAPE using detection rate for big data forecasting crude palm oil on k-nearest neighbor. Int J Electr Comput Eng (IJECE) 11(3):2696–2703. https://doi.org/10.11591/ijece.v11i3.pp2696-2703
Anguita D, Ghelardoni L, Ghio A, Oneto L, Ridella S (2012) The 'K' in K-fold cross validation. In: ESANN, pp 441–446.
Belhadi A, Kamble SS, Mani V, Benkhati I, Touriki FE (2021) An ensemble machine learning approach for forecasting credit risk of agricultural SMEs’ investments in agriculture 4.0 through supply chain finance. Ann Oper Res. https://doi.org/10.1007/s10479-021-04366-9
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(2):281–305
Bock S, Weiß M (2019) A proof of local convergence for the Adam optimizer. In: 2019 International joint conference on neural networks (IJCNN), pp 1–8. IEEE. https://doi.org/10.1109/IJCNN.2019.8852239
Burdett H, Wellen C (2022) Statistical and machine learning methods for crop yield prediction in the context of precision agriculture. Precis Agric. https://doi.org/10.1007/s11119-022-09897-0
Calicioglu O, Flammini A, Bracco S, Bellù L, Sims R (2019) The future challenges of food and agriculture: an integrated analysis of trends and solutions. Sustainability 11(1):222. https://doi.org/10.3390/su11010222
Cameron AC, Windmeijer FA (1997) An R-squared measure of goodness of fit for some common nonlinear regression models. J Econom 77(2):329–342. https://doi.org/10.1016/S0304-4076(96)01818-0
Chandriah KK, Naraganahalli RV (2021) RNN/LSTM with modified Adam optimizer in deep learning approach for automobile spare parts demand forecasting. Multimed Tools Appl 80(17):26145–26159. https://doi.org/10.1007/s11042-021-10913-0
Chen C, Liu LM (1993) Forecasting time series with outliers. J Forecast 12(1):13–35. https://doi.org/10.1002/for.3980120103
Chlingaryan A, Sukkarieh S, Whelan B (2018) Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: a review. Comput Electron Agric 151:61–69. https://doi.org/10.1016/j.compag.2018.05.012
Coxe S, West SG, Aiken LS (2009) The analysis of count data: a gentle introduction to Poisson regression and its alternatives. J Pers Assess 91:121–136. https://doi.org/10.1080/00223890802634175
Crane-Droesch A (2018) Machine learning methods for crop yield prediction and climate change impact assessment in agriculture. Environ Res Lett 13:114003. https://doi.org/10.1088/1748-9326/aae159
Dahouda MK, Joe I (2021) A deep-learned embedding technique for categorical features encoding. IEEE Access 9:114381–114391. https://doi.org/10.1109/ACCESS.2021.3104357
De Myttenaere A, Golden B, Le Grand B, Rossi F (2016) Mean absolute percentage error for regression models. Neurocomputing 192:38–48. https://doi.org/10.1016/j.neucom.2015.12.114
Farhangfar A, Kurgan LA, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern A 37:692–709. https://doi.org/10.1109/TSMCA.2007.902631
Fukase E, Martin W (2020) Economic growth, convergence, and world food demand and supply. World Dev 132:104954. https://doi.org/10.1016/j.worlddev.2020.104954
Geng R, Li M, Sun M, Wang Y (2021) Comparing methods of imputation for time series missing values. In: IoT and big data technologies for health care (pp. 333–340). Springer International Publishing, Cham. https://doi.org/10.1007/978-3-030-94182-6_24
Ghasemlounia R, Gharehbaghi A, Ahmadi F, Saadatnejadgharahassanlou H (2021) Developing a novel framework for forecasting groundwater level fluctuations using Bi-directional long short-term Memory (BiLSTM) deep neural network. Comput Electron Agric 191:106568. https://doi.org/10.1016/j.compag.2021.106568
Gil JDB, Reidsma P, Giller K, Todman L, Whitmore A, van Ittersum M (2019) Sustainable development goal 2: Improved targets and indicators for agriculture and food security. Ambio 48:685–698. https://doi.org/10.1007/s13280-018-1101-4
Gopal PM, Bhargavi R (2019) A novel approach for efficient crop yield prediction. Comput Electron Agric 165:104968. https://doi.org/10.1016/j.compag.2019.104968
Green JM, Croft SA, Durán AP, Balmford AP, Burgess ND, Fick S, Gardner TA, Godar J, Suavet C, Virah-Sawmy M, Young LE (2019) Linking global drivers of agricultural trade to on-the-ground impacts on biodiversity. Proc Natl Acad Sci USA 116:23202–23208
Gupta A, Nahar P (2022) Classification and yield prediction in smart agriculture system using IoT. J Ambient Intell Humaniz Comput, pp.1–10. https://doi.org/10.1007/s12652-021-03685-w
Hameed Z, Garcia-Zapirain B (2020) Sentiment classification using a single-layered BiLSTM model. IEEE Access 8:73992–74001. https://doi.org/10.1109/ACCESS.2020.2988550
Hammer RG, Sentelhas PC, Mariano JC (2020) Sugarcane yield prediction through data mining and crop simulation models. Sugar Tech 22:216–225. https://doi.org/10.1007/s12355-019-00776-z
Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415. https://doi.org/10.48550/arXiv.1606.08415
Hossain MR, Timmer D (2021) Machine learning model optimization with hyper parameter tuning approach. Glob J Comput Sci Technol D Neural Artif Intell 21(2).
Inoue Y, Moran MS, Horie T (1998) Analysis of spectral measurements in paddy field for predicting rice growth and yield based on a simple crop simulation model. Plant Prod Sci 1(4):269–279. https://doi.org/10.1626/pps.1.269
Ishfaque M, Salman S, Jadoon KZ, Danish AAK, Bangash KU, Qianwei D (2022) Understanding the effect of hydro-climatological parameters on dam seepage using shapley additive explanation (SHAP): a case study of earth-fill tarbela dam, Pakistan. Water 14(17):2598. https://doi.org/10.3390/w14172598
Jang B, Kim M, Harerimana G, Kang SU, Kim JW (2020) Bi-LSTM model to increase accuracy in text classification: combining Word2vec CNN and attention mechanism. Appl Sci 10(17):5841. https://doi.org/10.3390/app10175841
Jung Y (2018) Multiple predicting K-fold cross-validation for model selection. J Nonparametr Stat 30(1):197–215. https://doi.org/10.1080/10485252.2017.1404598
Kamble VB, Deshmukh SN (2017) Comparision between accuracy and MSE, RMSE by using proposed method with imputation technique. Orient J Comput Sci Technol 10(4):773–779. https://doi.org/10.13009/ojcst/10.04.11
Kastner T, Chaudhary A, Gingrich S, Marques A, Persson UM, Bidoglio G, Le Provost G, Schwarzmüller F (2021) Global agricultural trade and land system sustainability: implications for ecosystem carbon storage, biodiversity, and human nutrition. One Earth 4(10):1425–1443
Khaki S, Wang L (2019) Crop yield prediction using deep neural networks. Front Plant Sci 10:621. https://doi.org/10.3389/fpls.2019.00621
Kim S, Kim H (2016) A new metric of absolute percentage error for intermittent demand forecasts. Int J Forecast 32(3):669–679. https://doi.org/10.1016/j.ijforecast.2015.12.003
Kim B, Lee DE, Hu G, Natarajan Y, Preethaa S, Rathinakumar AP (2022) Ensemble machine learning-based approach for predicting of FRP–concrete interfacial bonding. Math 10(2):231. https://doi.org/10.3390/math10020231
Kumar MV, Venugopal P (2016) E-Agriculture and rural development. J Chem Pharm Sci 9(4):3356–3362
Kumar S, Raut RD, Nayal K, Kraus S, Yadav VS, Narkhede BE (2021) To identify industry 4.0 and circular economy adoption barriers in the agriculture supply chain by using ISM-ANP. J Clean Prod 293:126023. https://doi.org/10.1016/j.jclepro.2021.126023
Kuradusenge M, Hitimana E, Hanyurwimfura D, Rukundo P, Mtonga K, Mukasine A, Uwitonze C, Ngabonziza J, Uwamahoro A (2023) Crop yield prediction using machine learning models: case of irish potato and maize. Agric 13(1):225. https://doi.org/10.3390/agriculture13010225
Li J, Si Y, Xu T, Jiang S (2018) Deep convolutional neural network based ECG classification system using information fusion and one-hot encoding techniques. Math Probl Eng 2018:1–10. https://doi.org/10.1155/2018/7354081
Li Y, Guan K, Yu A, Peng B, Zhao L, Li B, Peng J (2019) Toward building a transparent statistical model for improving crop yield prediction: Modeling rainfed corn in the US. Field Crop Res 234:55–65. https://doi.org/10.1016/j.fcr.2019.02.005
Mallikarjuna Rao GS, Dangeti S, Amiripalli SS (2022) An Efficient modeling based on XGBoost and SVM algorithms to predict crop yield. In: Advances in data science and management (pp. 565–574). Springer, Singapore. https://doi.org/10.1007/978-981-16-5685-9_55
Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci USA 116:22071–22080. https://doi.org/10.1073/pnas.1900654116
Nayana BM, Kumar KR, Chesneau C (2022) Wheat yield prediction in India using principal component analysis-multivariate adaptive regression splines (PCA-MARS). AgriEng 4:461–474. https://doi.org/10.3390/agriengineering4020030
Nevavuori P, Narra N, Lipping T (2019) Crop yield prediction with deep convolutional neural networks. Comput Electron Agric 163:104859. https://doi.org/10.1016/j.compag.2019.104859
Nguyen A, Pham K, Ngo D, Ngo T, Pham L (2021) An analysis of state-of-the-art activation functions for supervised deep neural network. In: 2021 International conference on system science and engineering (ICSSE) (pp. 215–220). IEEE. https://doi.org/10.1109/ICSSE52999.2021.9538437
Oikonomidis A, Catal C, Kassahun A (2022) Hybrid deep learning-based models for crop yield prediction. Appl Artif Intell. https://doi.org/10.1080/08839514.2022.2031823
Okada S, Ohzeki M, Taguchi S (2019) Efficient partition of integer optimization problems with one-hot encoding. Sci Rep 9:13036. https://doi.org/10.1038/s41598-019-49539-6
Osgood DW (2017) Poisson-based regression analysis of aggregate crime rates. In: Quantitative methods in criminology (pp. 577–599). Routledge.
Palanivel K, Surianarayanan C (2019) An approach for prediction of crop yield using machine learning and big data techniques. Int J Comput Eng Technol 10:110–118
Paudel D, Boogaard H, de Wit A, Janssen S, Osinga S, Pylianidis C, Athanasiadis IN (2021) Machine learning for large-scale crop yield forecasting. Agric Syst 187:103016. https://doi.org/10.1016/j.agsy.2020.103016
Pawlak K, Kołodziejczak M (2020) The role of agriculture in ensuring food security in developing countries: Considerations in the context of the problem of sustainable food production. Sustainability 12:5488. https://doi.org/10.3390/su12135488
Prasad NN, Rao JN (1990) The estimation of the mean squared error of small-area estimators. J Am Stat Assoc 85:163–171. https://doi.org/10.1080/01621459.1990.10475320
Pravin PS, Tan JZM, Yap KS, Wu Z (2022) Hyperparameter optimization strategies for machine learning-based stochastic energy efficient scheduling in cyber-physical production systems. Digit Chem Eng 4:100047. https://doi.org/10.1016/j.dche.2022.100047
Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A (2018) CatBoost: unbiased boosting with categorical features. Adv Neural Inf Process Syst 31.
Pu C, Huang H, Yang L (2021) An attention-driven convolutional neural network-based multi-level spectral–spatial feature learning for hyperspectral image classification. Expert Syst Appl 185:115663. https://doi.org/10.1016/j.eswa.2021.115663
Ramli MN, Yahaya AS, Ramli NA, Yusof NFFM, Abdullah MMA (2013) Roles of imputation methods for filling the missing values: a review. Adv Environ Biol 7:3861–3870
Reich NG, Lessler J, Sakrejda K, Lauer SA, Iamsirithaworn S, Cummings DA (2016) Case study in evaluating time series prediction models using the relative mean absolute error. Am Stat 70:285–292. https://doi.org/10.1080/00031305.2016.1148631
Ristaino JB, Anderson PK, Bebber DP, Brauman KA, Cunniffe NJ, Fedoroff NV, Finegold C, Garrett KA, Gilligan CA, Jones CM, Martin MD (2021) The persistent threat of emerging plant disease pandemics to global food security. Proc Natl Acad Sci 118(23):e2022239118. https://doi.org/10.1073/pnas.2022239118
Sandha SS, Aggarwal M, Saha SS, Srivastava M (2021) Enabling hyperparameter tuning of machine learning classifiers in production. In: 2021 IEEE third international conference on cognitive machine intelligence (CogMI), pp 262–271. https://doi.org/10.1109/CogMI52975.2021.00041
Saravanan KS, Bhagavathiappan V (2022) A comprehensive approach on predicting the crop yield using hybrid machine learning algorithms. J Agrometeorol 24(2):179–185. https://doi.org/10.54386/jam.v24i2.1561
Shakoor N, Northrup D, Murray S, Mockler TC (2019) Big data driven agriculture: big data analytics in plant breeding, genomics, and the use of remote sensing technologies to advance crop productivity. Plant Phenome J 2(1):1–8. https://doi.org/10.2135/tppj2018.12.0009
Sharma S, Rai S, Krishnan NC (2020) Wheat crop yield prediction using deep LSTM model. arXiv preprint arXiv:2011.01498. https://doi.org/10.48550/arXiv.2011.01498
Shekhar S, Bansode A, Salim A (2021) A comparative study of hyper-parameter optimization tools. In: 2021 IEEE asia-pacific conference on computer science and data engineering (CSDE), pp 1–6. https://doi.org/10.1109/CSDE53843.2021.9718485
Shyam R, Ayachit SS, Patil V, Singh A (2020) Competitive analysis of the top gradient boosting machine learning algorithms. In: 2020 2nd international conference on advances in computing, communication control and networking (ICACCCN), pp 191–196. https://doi.org/10.1109/ICACCCN51052.2020.9362840
Tang P, Du P, Xia J, Zhang P, Zhang W (2021) Channel attention-based temporal convolutional network for satellite image time series classification. IEEE Geosci Remote Sens Lett 19:1–5. https://doi.org/10.1109/LGRS.2021.3095505
Tian F, Wu B, Zeng H, Watmough GR, Zhang M, Li Y (2022) Detecting the linkage between arable land use and poverty using machine learning methods at global perspective. Geogr Sustain 3(1):7–20. https://doi.org/10.1016/j.geosus.2022.01.001
Van Klompenburg T, Kassahun A, Catal C (2020) Crop yield prediction using machine learning: a systematic literature review. Comput Electron Agric 177:105709. https://doi.org/10.1016/j.compag.2020.105709
Vance J, Rasheed K, Missaoui A, Maier F, Adkins C, Whitmire C (2022) Comparing machine learning techniques for alfalfa biomass yield prediction. arXiv preprint arXiv:2210.11226. https://doi.org/10.48550/arXiv.2210.11226
Wang Z, Bovik AC (2009) Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Process Mag 26(1):98–117. https://doi.org/10.1109/MSP.2008.930649
Whetton R, Zhao Y, Shaddad S, Mouazen AM (2017) Nonlinear parametric modelling to study how soil properties affect crop yields and NDVI. Comput Electron Agric 138:127–136. https://doi.org/10.1016/j.compag.2017.04.016
Willmott CJ, Matsuura K (2005) Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res 30(1):79–82. https://doi.org/10.3354/cr030079
Yasir M, Karim AM, Malik SK, Bajaffer AA, Azhar EI (2022) Prediction of antimicrobial minimal inhibitory concentrations for Neisseria gonorrhoeae using machine learning models. Saudi J Biol Sci 29(5):3687–3693. https://doi.org/10.1016/j.sjbs.2022.02.047
Yuan S (2022) Review of root-mean-square error calculation methods for large deployable mesh reflectors. Int J Aerosp Eng. https://doi.org/10.1155/2022/5352146
Zahedi L, Mohammadi FG, Rezapour S, Ohland MW, Amini MH (2021) Search algorithms for automated hyper-parameter tuning. arXiv preprint arXiv:2104.14677. https://doi.org/10.48550/arXiv.2104.14677
Zambon I, Cecchini M, Egidi G, Saporito MG, Colantoni A (2019) Revolution 4.0: industry vs. agriculture in a future development for SMEs. Processes 7(1):36. https://doi.org/10.3390/pr7010036
Funding
No funding has been claimed for this research work.
Author information
Authors and Affiliations
Contributions
KSS contributed to the conceptualization, methodology, resources, data curation, writing—original draft preparation, writing—review and editing, investigation and validation. VB was involved in the methodology, resources, supervision and validation.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
Since this research work deals with text data, ethical approval is not applicable.
Additional information
Edited by Dr. Ahmad Sharafati (ASSOCIATE EDITOR) / Prof. Theodore Karacostas (CO-EDITOR-IN-CHIEF).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Saravanan, K.S., Bhagavathiappan, V. Prediction of crop yield in India using machine learning and hybrid deep learning models. Acta Geophys. (2024). https://doi.org/10.1007/s11600-024-01312-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11600-024-01312-8