Abstract
For extrapolation, climate change and other meteorological analysis, a study of past and current weather events is a prerequisite. NASA (National Aeronautics and Space Administration) has been able to develop a model capable of predicting various weather data for any location on the Earth, including locations lacking weather stations, weather satellite coverage, and other weather measuring instruments. This paper evaluates the prediction accuracy of the NASA temperature data with respect to NiMet (Nigerian Meteorological Agency) ground truth measurement, using Akwa Ibom Airport as a case study. Exploratory data analysis (descriptive and diagnostic analyses) of temperature retrieved from NiMet and NASA was performed to give a clear path to follow for predictive and prescriptive analyses. Using 2783 days of weather data retrieved from NiMet as ground truth, the accuracy of NASA predictions with the corresponding resolution was calculated. Mean absolute error (MAE) of 2.184 °C and root mean square error (RMSE) of 2.579 °C, with a coefficient of determination (R2) of 0.710 for maximum temperature, then MAE of 0.876 °C, RMSE of 1.225 °C with a coefficient of determination (R2) of 0.620 for minimum temperature was discovered. There is a good correlation between the two datasets; hence, a model can be developed to generate more accurate predictions, using the NASA data as input. Predictive and prescriptive analyses were performed by employing five prediction algorithms: decision tree regression, XGBoost regression and MLP (multilayer perceptron) with LBFGS (limited-memory Broyden-Fletcher-Goldfarb-Shanno) optimizer, MLP with SGD (stochastic gradient) optimizer and MLP with Adam optimizer. The MLP LBFGS algorithm performed best, by significantly reducing the MAE by 35.35% and RMSE by 31.06% for maximum temperature, accordingly, MAE by 10.05% and RMSE by 8.00% for minimum temperature. Results obtained show that given sufficient data, plugging NASA predictions as input to an LBFGS-MLP model gives more accurate temperature predictions for the study area.
Similar content being viewed by others
Data availability
Processed data are available upon request to the authors.
References
Abhishek, S., Neeta, V., & Tripathi, K. (2013). A review study of weather forecasting using artificial neural network approach. International Journal of Engineering Research & Technology, 2(11), 2029–2035.
Aboelkhair, H., Morsy, M., & El Afandi, G. (2019). Assessment of agroclimatology NASA POWER reanalysis datasets for temperature types and relative humidity at 2 meter against ground observations over Egypt. Advances in Space Research, 64, 129–142. https://doi.org/10.1016/j.asr.2019.03.032
Bhardwaj, R., & Duhoon, V. (2018). Weather forecasting using soft computing techniques. 2018 International Conference on Computing, Power and Communication Technologies (GUCON) (pp. 1111–1115). New Delhi: IEEE. https://doi.org/10.1109/GUCON.2018.8675088
Brownlee, J. (2021). How to choose an activation function for deep learning. (Machine Learning Mastery) Retrieved June 21, 2022, from https://machinelearningmastery.com/choose-an-activation-function-for-deep-learning/
Chavan, P. (2013). How to decide the number of hidden layers and nodes in a hidden layer? Retrieved from Researchgate: https://www.researchgate.net/post/How-to-decide-the-number-of-hidden-layers-and-nodes-in-a-hidden-layer
Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, 623.
Denissen, J., Butalid, L., Penke, L., & Aken, M. (2008). The effects of weather on daily mood: A multilevel approach. Emotion (Washington, D.C.), 8, 662–7. https://doi.org/10.1037/a0013497
Doring, M. (2018). Prediction vs forecasting. Retrieved July 9, 2022, from Data Science Blog: https://www.datascienceblog.net/post/machine-learning/forecasting_vs_prediction/
Dundas, S. J., & Von Haefen, R. H. (2021). The importance of data structure and nonlinearities in estimating climate impacts on outdoor recreation. Natural Hazards, 107(3), 2053–2075. https://doi.org/10.1007/s11069-020-04484-w
Elsaraiti, M., & Merabet, A. (2021). A comparative analysis of the ARIMA and LSTM predictive models and their effectiveness for predicting wind speed. Energies, 14(20). https://doi.org/10.3390/en14206782
Fathi, M., Haghi Kashani, M., & Jemeii, S. (2021). Big data analytics in weather forecasting: A systematic review. Archives of Computational Methods in Engineering. https://doi.org/10.1007/s11831-021-09616-4
Faybishenko, B., Versteeg, R., & Pastorello, G. (2022). Challenging problems of quality assurance and quality control (QA/QC) of meteorological time series data. Stochastic Environmental Research and Risk Assessment, 36, 1049–1062. https://doi.org/10.1007/s00477-021-02106-w
Findawati, Y., Indra Astutik, I., Fitroni, A., Indrawati, I., & Yuniasih, N. (2019). Comparative analysis of Naïve Bayes, K Nearest Neighbor and C.45 method in weather forecast. Journal of Physics: Conference Series, 1–6. https://doi.org/10.1088/1742-6596/1402/6/066046
Gad, I., & Hosahalli, D. (2022). A comparative study of prediction and classification models on NCDC weather data. International Journal of Computers and Applications, 44(5), 414–425. https://doi.org/10.1080/1206212X.2020.1766769
Garbade, M. (2018). Regression versus classification machine learning: What’s the difference? (Medium) Retrieved December 7, 2021, from https://medium.com/quick-code/regression-versus-classification-machine-learning-whats-the-difference-345c56dd15f7
Gelaro, R., McCarty, W., Suárez, M., Todling, R., Molod, A., Takacs, L., & Zhao, B. (2017). The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). Journal of Climate, 30(14), 5419–5454.
Gill, M., Asefa, T., Kaheil, Y., & Mckee, M. (2007). Effects of missing data on performance of learning algorithms for hydrologic predictions. Advancing Earth and Space Science, 50–62.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning (adaptive computation and machine learning series). MIT Press.
Gupta, I., Mittal, H., Rikhari, D., & Singh, A. K. (2022). MLRM: a multiple linear regression based model for average temperature prediction of a day. (arXiv) Retrieved July 09, 2022, from https://arxiv.org/abs/2203.05835
Halabi, M. L., Mekhilef, S., Olatomiwa, L., & Hazelton, J. (2017). Performance analysis of hybrid PV/diesel/battery system using HOMER: A case study Sabah, Malaysia. Energy Conversion and Management, 322–339. https://doi.org/10.1016/j.enconman.2017.04.070
Idrissi, E. L., & T., Idri, A., & Bakkoury, Z. (2019). Systematic map and review of predictive techniques in diabetes self-management. International Journal of Information Management, 46, 263–277. https://doi.org/10.1016/j.ijinfomgt.2018.09.011
Johnson, T. F., Isaac, N. J., Paviolo, A., & González-Suárez, M. (2021). Handling missing values in trait data. Global Ecology and Biogeography, 30(1), 51–62. https://doi.org/10.1111/geb.13185
Kaneko, A., Zhu, X. -H., & Lin, J. (2020). Data Assimilation. In A. Kaneko, X. -H. Zhu, & J. Lin, Coastal Acoustic Tomography (pp. 95–106). Taizhou: Elsevier. https://doi.org/10.1016/B978-0-12-818507-0.00008-1
Khajure, S., & Mohod, S. W. (2016). Future weather forecasting using soft computing techniques. Procedia Computer Science, 78, 402–407. Nagpur. https://doi.org/10.1016/j.procs.2016.02.081
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. 3rd International Conference for Learning Representations. San Diego.
Kong, W., Li, H., Yu, C., Xia, J., Kang, Y., & Zhang, P. (2022). A deep spatio-temporal forecasting model for multi-site weather prediction post-processing. Communications in Computational Physics, 31, 131–153.
Kusiak, A., & Shah, S. (2006). Data-mining-based system for prediction of water chemistry faults. IEEE Transactions on Industrial Electronics, 53(2), 593–603. https://doi.org/10.1109/TIE.2006.870706
Li, Y., Lang, J., Ji, L., Zhong, J., Wang, Z., Guo, Y., & He, S. (2021). Weather forecasting using ensemble of spatial-temporal attention network and multi-layer perceptron. Asia-Pacific Journal of Atmospheric Sciences, 57, 533–546. https://doi.org/10.1007/s13143-020-00212-3
Markovics, D., & Mayer, M. J. (2022). Comparison of machine learning methods for photovoltaic power forecasting based on numerical weather prediction. Renewable and Sustainable Energy Reviews, 161(112364). https://doi.org/10.1016/j.rser.2022.112364
Marson, S., & Legerton, M. (2021). Disaster diaspora and the consequences of economic displacement and climate disruption, including Hurricanes Matthew (October 8, 2016) and Florence (September 14, 2018) in Robeson County. North Carolina. Natural Hazards, 107(3), 2247–2262. https://doi.org/10.1007/s11069-021-04529-8
Maydon, T. (2017). The 4 Types of Data Analytics. (KD Nuggets). Retrieved April 04, 2022, from https://www.kdnuggets.com/2017/07/4-types-data-analytics.html
NASA. (2022). NASA Power Data Access Viewer. Retrieved October 18, 2021, from https://power.larc.nasa.gov/data-access-viewer/
Nature. (2021). The rise of data-driven modelling. Nature Reviews Physics, 3(6), 383. https://doi.org/10.1038/s42254-021-00336-z
Nikam, V., & Meshram, B. (2013). Modeling rainfall prediction using data mining method: A Bayesian approach. 2013 Fifth International Conference on Computational Intelligence, Modelling and Simulation. Seoul. Retrieved March 8, 2013, from https://doi.org/10.1109/CIMSim.2013.29
NiMet. (2022). Nigerian meteorological agency. Retrieved October 8, 2021, from https://www.nimet.gov.ng/
Nnah, B. C., Okenwa, A. I., Oloyede, O. A., Nwaibe, O., & Agbu, A. U. (2021). Geospatial assessment of urban heat island in Port Harcourt L.G.A, Rivers State, Nigeria. International Journal of Sciences: Basic and Applied Research (IJSBAR), 33–55.
Okewu, E., Adewole, P., & Sennaike, O. (2019). Experimental comparison of stochastic optimizers in deep learning. International Conference on Computational Science and Its Applications (pp. 704–715). Saint Petersburg: Springer. https://doi.org/10.1007/978-3-030-24308-1_55
Olaiya, F., & Adeyemo, A. (2012). Application of data mining techniques in weather prediction and climate change studies. I.J. Information Engineering and Electronic Business, 1, 51–59. https://doi.org/10.5815/ijieeb.2012.01.07
Olatomiwa, L., Mekhilef, S., Shamshirband, S., Mohammadi, K., Petković, D., & Sudheer, C. (2015). A support vector machine–firefly algorithm-based model for global solar radiation prediction. Solar Energy, 115, 632–644. https://doi.org/10.1016/j.solener.2015.03.015
Oloyede, A. O., Olatunbosun, D. E., Asuquo, P. M., Udo, U. E., & Essien, I. O. (2021). Correlation Analysis of Vegetation and Land Surface Temperature in Uyo, Nigeria Using Satellite Remote Sensing and Python-Based Geographic Information System. Science and Technology Publishing, 1126–1133.
Oloyede, A., Ozuomba, S., & Asuquo, P. (2022). Descriptive and diagnostic analysis of NASA and NiMet big weather data. 2022 IEEE Nigeria 4th International Conference on Disruptive Technologies for Sustainable Development (NIGERCON). Abuja. https://doi.org/10.1109/NIGERCON54645.2022.9803015
Osama, M. A. (2021). Assessment of global warming in Al Buraimi, sultanate of Oman based on statistical analysis of NASA POWER data over 39 years, and testing the reliability of NASA POWER against meteorological measurements. Heliyon, (3), 1–19. https://doi.org/10.1016/j.heliyon.2021.e06625
Petre, E. G. (2009). A decision tree for weather prediction. BMIF, LXI(1), 77–82.
Quansah, A. D., Dogbey, F., Asilevi, P. J., Boakye, P., Darkwah, L., Oduro-Kwarteng, S., & Mensah, P. (2022). Assessment of solar radiation resource from the NASA-POWER reanalysis products for tropical climates in Ghana towards clean energy application. Scientific Reports, 1–10. https://doi.org/10.1038/s41598-022-14126-9
Rodrigues, G. C., & Braga, R. P. (2021). Evaluation of NASA POWER reanalysis products to estimate daily weather variables in a hot summer Mediterranean climate. Agronomy, 11(6), 1207. https://doi.org/10.3390/agronomy11061207
Romero-Fiances, I., Livera, A., Theristis, M., Makrides, G., Stein, J. S., Nofuentes, G., & Georghio, G. E. (2022). Impact of duration and missing data on the long-term photovoltaic degradation rate estimation. Renewable Energy, 181, 738–748. https://doi.org/10.1016/j.renene.2021.09.078
Sheikh, F., Karthick, S., Malathi, D., Sudarsan, J., & Arun, C. (2016). Analysis of data mining techniques for weather prediction. Indian Journal of Science and Technology, 9(38), 1–9. https://doi.org/10.17485/ijst/2016/v9i38/101962
Sher, V. (2020). Time series analysis using pandas in Python. (Towards Data Science) Retrieved December 21, 2021, from https://towardsdatascience.com/time-series-analysis-using-pandas-in-python-f726d87a97d8
Stack Exchange Network. (2018). Stack exchange network. (Stack Exchange Inc.) Retrieved June 21, 2022, from https://datascience.stackexchange.com/questions/10523/guidelines-for-selecting-an-optimizer-for-training-neural-networks
Tan, L., Guo, J., Mohanarajah, S., & Zhou, K. (2021). Can we detect trends in natural disaster management with artificial intelligence? A review of modeling practices. Natural Hazards, 107(3), 2389–2417. https://doi.org/10.1007/s11069-020-04429-3
Twin, A. (2021). Data Mining. (Investopedia) Retrieved November 30, 2021, from https://www.investopedia.com/terms/d/datamining.asp
Vulova, S., Meier, F., Rocha, A. D., Quanz, J., Nouri, H., & Kleinschmit, B. (2021). Modeling urban evapotranspiration using remote sensing, flux footprints, and artificial intelligence. Science of The Total Environment, 786. https://doi.org/10.1016/j.scitotenv.2021.147293
Waring, R. H., & Running, S. W. (2007). Spatial Scaling Methods for Landscape and Regional Ecosystem Analysis. In Forest Ecosystems (Third Edition) (p. 225). Academic Press.
Xi, X., Zuo, J., Dooling, T. A., & Mohanarajah, S. (2021). Bayesian network reasoning and machine learning with multiple data features: Air pollution risk monitoring and early warning. Natural Hazards, 107(3), 2555–2572. https://doi.org/10.1007/s11069-021-04504-3
Zhang, Y., & Thorburn, P. J. (2022). Handling missing data in near real-time environmental monitoring: A system and a review of selected methods. Future Generation Computer Systems, 128, 63–72. https://doi.org/10.1016/j.future.2021.09.033
Acknowledgements
Profound gratitude to the management and staff of the Advanced Space Technology Applications Laboratory (ASTAL), the ASTAL Digital Image Processing Laboratory (DIPL), and the National Space Research and Development Agency (NASRDA) for providing an enabling environment for this work. Appreciation to Kanda Weather Group LLC and the Nigerian Meteorological Agency (NiMet) for provision of ground truth weather data.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conceptualization and methodology. Data collection was carried out by SO. Data processing and analysis was carried out by AO. First draft of the manuscript was written by AO, and all authors commented on all versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Oloyede, A., Ozuomba, S., Asuquo, P. et al. Data-driven techniques for temperature data prediction: big data analytics approach. Environ Monit Assess 195, 343 (2023). https://doi.org/10.1007/s10661-023-10961-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10661-023-10961-z