Assessment of the relevance of features associated with corn crop yield prediction in Colombia, a country in the Neotropical zone

Sierra-Forero, Brayan-Leonardo; Baron-Velandia, Julio; Vanegas-Ayala, Sebastian-Camilo

doi:10.1007/s41870-024-01762-9

Assessment of the relevance of features associated with corn crop yield prediction in Colombia, a country in the Neotropical zone

Original Research
Open access
Published: 12 March 2024

Volume 16, pages 2129–2138, (2024)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Information Technology Aims and scope Submit manuscript

Assessment of the relevance of features associated with corn crop yield prediction in Colombia, a country in the Neotropical zone

Download PDF

Brayan-Leonardo Sierra-Forero¹,
Julio Baron-Velandia¹^na1 &
Sebastian-Camilo Vanegas-Ayala^1,2^na1

656 Accesses
Explore all metrics

Abstract

Obtaining highly accurate predictive models to precisely estimate corn crop yields is essential for making informed decisions in a sustainable agriculture environment. There are various approaches to achieve this goal, including models based on Fuzzy Logic, Association Rules, and Machine Learning. However, some of these models have limitations in terms of the accuracy of their predictions, attributable to the high complexity and non-linearity in the interactions between factors. While Machine Learning techniques alone can achieve high precision, the inclusion of multiple attributes can reduce it. This study focuses on identifying the most influential factors at the regional level through a comprehensive analysis of the relevance of features associated with corn crop yields in Colombia, a country in the Neotropical zone. To accomplish this, climatological time series and historical yield records are used through a methodology based on CRISP- DM, widely used in the field of data mining, involving a review of related work, data cleaning and transformation, relevance evaluation using the RReliefF algorithm, and verification of the performance of the most influential factors through prediction algorithms. The results obtained demonstrate that solar radiation, precipitation, vapor pressure, and maximum and minimum temperatures exert the greatest influence on corn crop yield prediction, with a relevance factor of 0.033, 0.032, 0.026, 0.022, and 0.021, respectively. In the validation of the performance of the selected factors, two predictive models were implemented. The first, based on Artificial Neural Networks, yielded a RMSE of 0.1216 with the subset of variables and 0.1403 with all available variables. In the second, Linear Regression was applied, resulting in an RMSE of 0.1417 with the subset of variables and 0.1424 with all available variables. These results highlight the importance of the selected features as the most influential climatic factors in defining highly accurate predictive models in the Neotropical zone.

Enhancing crop recommendation systems with explainable artificial intelligence: a study on agricultural decision-making

Article Open access 11 January 2024

Analysis of factors affecting evapotranspiration zoning

Article 13 June 2024

Diverse approaches to crop diversification in agricultural research. A review

Article Open access 20 April 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

From a climatic and topographic perspective, the planet Earth is divided into eight main biogeographic realms: Nearctic, Neotropical, Palearctic, Afrotropical, Indo-Malayan, Australian, Oceanic, and Antarctic. Colombia, due to its geographic location, is within the Neotropical Kingdom, a region rich in biological diversity that encompasses South America, Central America and Mexico [1].

Corn plays a crucial role as a global staple food, consumed daily by more than 4.5 billion people. It is highly relevant due to its caloric content and environmental adaptability, and its demand is expected to experience significant growth in the coming years due to the increase in the global population [2,3,4,5,6]. In the specific case of Colombia, corn accounts for 9% of the daily calorie supply; however, the country relies on imports for 74% of its demand [7]. This situation highlights the urgent need to strengthen domestic production, ensure crop availability, and reduce global dependence. Ensuring effective corn production with high yields requires preventive policies based on information, commonly obtained through decision-making guided by predictive data models [8,9,10,11,12].

The estimation of crop yields is a task of paramount importance for food security [12,13,14,15]. With this information, farmers, commercial improvement organizations, and government agencies make informed decisions that allow for proper crop management, the implementation of development policies, the promotion of national food policies, and the promotion of international trade [4, 16,17,18,19]. In the case of corn crops, estimating their yield helps to understand their response to different environmental stresses [20, 21] and, thus, provides relevant information for their management in a sustainable agriculture environment [22,23,24]. However, making estimates with a high degree of accuracy is a complex task. This process involves multiple factors that directly and indirectly affect plant growth [4, 25,26,27].

The constant spatial and temporal changes in planting environments, as well as the continuous interaction between factors, result in highly complex and non-linear effects that, in practice, make it difficult to provide accurate estimates [28,29,30]. To achieve successful predictions, a representative dataset for each study case is required, containing a subset of features capable of appropriately describing the target concept. Predictive models that employ many features are often irrelevant and noisy, with low precision and accuracy, requiring analysis and selection of the features to be used to ensure the accuracy of the results obtained [31].

The objective of this research is to identify the climatic factors that are critical in the accurate prediction of corn crop yields in Colombia, a country belonging to the Neotropical zone, to guide the construction of more accurate predictive models with regional applicability. To achieve this, a feature relevance estimation technique is employed, and prediction algorithms are used to validate their performance. While some studies have revealed highly influential factors that can improve model accuracy [32, 33], very few have focused their efforts on identifying those that are predominant in the Neotropical zone for this type of crop.

The rest of the document is organized as follows. Section 2 presents the related works in the case study. Section 3 shows the methodology used in the research. Section 4 describes the development and results obtained. Section 5 provides the discussion of results. Section 6 addresses the conclusions and Sect. 7 discusses future work followed by the references.

2 Related works

According to the systematic review conducted in SCOPUS, Science Direct, Web of Science, PugMed, IEEE and Google Scholar, among the researches that focus their efforts on examining the influence of the various factors involved in the yield of corn crops, there are varied approaches in the estimation of the relevance of factors; the quality of this estimation depends, to a large extent, on the availability of data and the methods used, their complexity and performance significantly impact the selection of attributes and, therefore, the precision of the proposed predictive models.

A common approach is the use of statistical methods. In the study [34], the “C–D production function model” algorithm was applied to evaluate the relevance of factors such as the application of fertilizers, pesticides, sown area and precipitation, respect to the yield of corn crops in Daqing city, China. It was concluded that the application of pesticides and fertilizers significantly influences yield, as does precipitation, whose impact varies depending on the variety of corn planted. Similarly, in [35], Pearson correlation coefficient and coefficient of determination were used to analyze variables associated with soil and topography in the states of Illinois and Indiana, EE. UU., highlighting elevation and terrain curvature as the most influential factors on yield. In [36], multiple linear regression was used to evaluate 18 factors related to crop growth in the EE. UU., highlighting precipitation and late season temperature as the most influential.

Another approach focuses on checking the relevance of factors by testing different subgroups of attributes directly on predictive models. In research [37] they use the Random Forest (RF) algorithm to estimate the yield of corn crops in the EE. UU., using different combinations of variables until they find the one that presents the best performance for the predictive model. In this case, the result is that the year, region, irrigation and seasonal climate are the most relevant factors to predict with high accuracy the yield to be obtained at the end of the season. The authors of [38] evaluate the influence of 20 attributes related to soil, topography and type of corn crop in two fields in Illinois, EE.UU., using different variations of Artificial Neural Networks (ANN). They use an intelligent problem solver to randomly test 150 combinations of ANN attributes and configurations, to find the model with the best precision, resulting in: corn hybrid, relative terrain elevation and cation exchange capacity are the factors with the highest degree of influence on the estimation of crop yield. Finally, in the research [39], the Hybrid-Maize model, which allows simulating corn crop yield and the influence of each factor on the yield, is used to analyze 12 factors related to corn hybrids, crop management and climatic factors in Huanghuaihai, China. Through verification of multiple variations in these factors, precipitation and temperature were identified as the most relevant, explaining approximately 50% of the yield obtained at the end of the season.

3 Methodology

This research is based on the CRISP-DM methodology, widely used by various authors to describe the life cycle of standard data mining projects [40]. It aims to develop the necessary mechanisms for identifying and selecting the climatic factors with the highest degree of influence on predicting corn crop yield in Colombia.

Below, we present a detailed description of the phases that comprise the methodology used:

1.
Understanding the Problem: This phase addresses the lack of precision in predictive models for corn crop yield, emphasizing the importance of selecting influential factors for model training. This identifies the knowledge gap to be addressed, leading to the case study and the research objective.
2.
Understanding the Data: Involves the search and collection of data in the form of historical records related to climatic factors and corn crop yield in the region. Through analysis, the necessary processing and transformation procedures are identified to create a dataset ready for use.

There is a fundamental relationship between the phase of understanding the problem and understanding the data; understanding the problem implies the need to have access to the data and its adequate interpretation.
3.
Data Preparation: Various techniques are employed to transform the data according to the study’s requirements. Initially, attribute cleansing is carried out on the datasets by analyzing their frequency of use in research related to the case study, establishing a starting point for transformation processes. Subsequently, the climatic dataset is adjusted to a semi-annual periodicity corresponding to the yield data. Next, the datasets are integrated by matching them based on date and location, outlier values are removed, and normalization is performed. Finally, using RReliefF as a method to estimate feature relevance, a subset of attributes representing the best configuration for the predictive model is selected, resulting in a single transformed dataset ready for use in the modeling phase.
4.
Modeling: ANN and Linear Regression (LR) are used to build two predictive models, each with two different configurations: one employing the selected subset of variables and the other using the total available variables in the study. This is done to validate the performance of the selected attributes.

In this context, the modeling phase is closely linked to the data preparation phase. During model configuration and testing, it may be necessary to make additional adjustments to the data set to ensure efficient and accurate integration with the required information.
5.
Evaluation: The results obtained in the modeling phase are evaluated to verify the performance of the selected subset of factors and determine if there is a significant improvement compared to using the total available variables.

This phase seeks to ensure the fulfillment of the research objective, depending on the result of the model evaluation, its determined if it can advance to the next phase of the process. If its insufficient, it returns to the phase of understanding the problem to carry out the necessary changes and make the corresponding adjustments.
6.
Deployment: The obtained results are discussed, highlighting the most important findings regarding the factors with the highest degree of relevance in estimating corn crop yield, potential associations between features, and the overall performance of the models with and without the use of the selected subset of attributes.

Figure 1 illustrates the general structure of the methodology and the interaction between its phases.

4 Results

4.1 Data description

The dataset used in this study consists of historical records of climatic factors and corn crop performance in Colombia. Yield data has been recorded semi-annually from 2006 to 2021 in various regions of the country.

4.1.1 Climatic data

These data are provided by the Consultation and Download of Hydrometeorological Data system of the Institute of Hydrology, Meteorology, and Environmental Studies of Colombia [41]. They originate from over 4.400 meteorological stations located throughout the country. The dataset comprises 20 climatological variables, including Maximum Temperature, Minimum Temperature, Average Temperature, Precipitation, Vapor Pressure, Solar Radiation, Sunshine Hours, Cloud Cover, Evaporation, Wind Speed, and Relative Humidity, with over 1.5 million daily records.

4.1.2 Yield data

These are obtained from the Ministry of Agriculture and Rural Development [42] and come from historical records of traditional maize harvests across the country. The dataset includes 17 variables related to production data, including Yield, Planted Area, Harvested Area, Physical Production Status, and Production, totaling 22.440 individual records with semi-annual periodicity, commonly used for transient crop types.

4.2 Data preparation

Data preparation begins with the refinement of features in the previously described datasets based on their frequency of use in related research concerning the case study. This reduces the number of attributes to be transformed and analyzed in later phases.

To determine which attributes are commonly used in other research, the results obtained from 19 related studies are reviewed, with reference to the variables available in the climatic dataset. Table 1 presents the results, with attributes analyzed horizontally and the reviewed research studies listed vertically, marked with an (X) for attributes used in each study.

Table 1 Frequency of use of climatic factors for corn crop yield prediction

Full size table

The results reveal that precipitation, minimum temperature, maximum temperature, average temperature, vapor pressure, solar radiation, and evaporation are the most frequently used climatic characteristics, with a total of 17, 17, 17, 8, 8, 5, and 3 appearances, respectively. In contrast, cloud cover, wind speed, and dew point temperature are excluded from the study due to their low utilization in the case study. Next, the climatic dataset is transformed by calculating the arithmetic mean of each attribute semi-annually, reducing the annual records from 364 to 2, corresponding to each year’s A and B semesters, resulting in a total of 10.225 records with semi-annual periodicity. Subsequently, using the Department, Municipality, Year, and Period fields as reference points, the climatic and yield datasets are integrated, resulting in 2.984 records and 13 attributes, including yield. Temporal and location attributes are then removed, leaving the dataset with 8 variables.

Through the calculation of Z-Score by attribute, which indicates how far a particular value deviates from its arithmetic mean, outlier values accounting for 5.46% of the data are identified and removed, resulting in a total of 2.821 records. The Min–Max data normalization method is applied, using the minimum and maximum values of each attribute as reference. Table 2 displays a segment of the resulting dataset after applying these transformation techniques.

Table 2 Fragment of the resulting dataset after data preparation

Full size table

As a final step, the RReliefF algorithm is applied to identify variables with the highest relevance to the dependent variable. RReliefF is an improved version of the Relief and ReliefF methods, allowing for the identification of statistically influential attributes with respect to a target attribute through case-based learning. Relief involves assigning a weight to each attribute, modifying it based on the Euclidean distance calculation between randomly selected instances of the attribute and their nearest neighbors, both from the same attribute (near-hit) and the target attribute (near-miss). Finally, attributes that exceed a predefined threshold are selected [52]. On the other hand, RReliefF incorporates the ability to address regression problems, where the target class is continuous [53].

Table 3 presents the results of applying the RReliefF method, which indicates that the attribute with the highest degree of relevance to yield is solar radiation, followed by precipitation, vapor pressure, and maximum and minimum temperatures. Average temperature and evaporation have a lower degree of influence compared to the other attributes.

Table 3 Selection of relevant characteristics using the RRliefF method

Full size table

4.3 Predictive model

In order to evaluate the effectiveness of the chosen attributes in estimating corn crop yields, two predictive models were developed: one using ANN and another using LR. For each, two different configurations are presented: one involving all the variables used in the research and another based on the selected subset of attributes. Each configuration is detailed below:

Artificial Neural Network: The Multi-layer Perceptron type was used, which, due to its structure and high performance in pattern association for predictions [54], is optimally suited for the needs of this research. The ANN is configured using 10 and 5 neurons in the input layer corresponding to the number of input variables in the model, considering all available attributes and the selected subset, respectively. It has 3 hidden layers with 64, 32, and 16 neurons each and an output layer with 1 neuron. Adam is used as the optimization function, Relu as the activation function, Mean Squared Error (MSE) as the loss function, and 200 training epochs.
Linear Regression: The Ridge algorithm (L2) is employed, an alternative regularized version of least squares that reduces variance and mean absolute error [55], configured with a penalty coefficient (Alpha) of 0.0001.

70% of the data, randomly selected from the dataset, were used for model training. The remaining percentage was used to validate the accuracy in each case. Table 4 presents the results obtained, specifying each model’s configuration and performance, evaluated using metrics such as MSE, Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE), both in the training and validation phases.

Table 4 Performance of predictive models in the training and validation phase

Full size table

The results reveal that the ANN model with the selected subset of variables exhibited the best performance, achieving an RMSE of 0.1216, translating to a notable accuracy of 87.84%. In contrast, the LR model with the same subset of variables achieved an RMSE of 0.1417, equivalent to an accuracy of 85.76%.

On the other hand, the ANN model that used all available variables showed significantly lower performance compared to its counterpart that used the selected subset of variables, registering an RMSE of 0.1403 and an accuracy of 85.97%. Similarly, the LR model obtained an RMSE of 0.1424, equivalent to an accuracy of 85.76%.

5 Discussion of results

Most research on factors affecting corn crop yields focuses on biogeographic zones of the: Nearctic that exhibit climatic diversity with variation in conditions from arid in the southwest to a temperate climate on the east coast, a topography that encompasses vast plains, majestic mountains and extensive plateaus, such as the United States, and the Palearctic and Indo-Malayan, which experience climates from arid and cold in the north to tropical in the south, with a topography that includes vast plains in the east, majestic mountains in the west and plateaus in the center, such as China [1].

According to the literature review, there is little research in relation to countries near the equator, which make up the Neotropical, characterized by tropical and equatorial climates, with warm temperatures throughout the year and rainy seasons. Climate variability has a differentiated impact in each region, affecting corn crop yields in a unique way in each geographic area. The presence of seasons in countries such as the United States and China introduces specific challenges, such as seasonal droughts, which can have direct consequences on corn production [56], while in equatorial countries, the absence of different seasons minimizes these climatic risks, offering distinct conditions for corn cultivation. This diversity in corn response to climate highlights the need for research that analyzes the relevance of climatic factors on crop yields, considering the biogeographic and topographic heterogeneity of the region.

In research such as [35, 38], focused on the analysis of topographic factors and soil properties, although they do not establish a direct correlation with the present study as they do not address climatic aspects, they make it possible to understand and compare equally relevant elements. These include the type of methods used to evaluate the relevance of characteristics, the methodology employed and the relative influence of other types of factors on corn crop yields.

This study proposes an approach that identifies the most influential factors in the prediction of corn crop yields by evaluating the degree of relevance of each attribute with respect to crop yield, considering the interaction between attributes by assigning weights based on their neighborhood, using the RReliefF algorithm. The results show that in the Neotropical zone, solar radiation exerts the greatest degree of influence, followed closely by precipitation. In addition, vapor pressure and maximum and minimum temperature exhibit values greater than 0.020. Although their magnitudes are smaller than those corresponding to solar radiation and precipitation, they have a high influence on the estimation of corn crop yields. In the studies that consider the influence of climatic factors, a high correspondence is evidenced in relation to precipitation and temperature, when accurately estimating the yield to be obtained at the end of the season, as presented in Table 5, ratifying the results obtained in this research.

Table 5 Most influential factors in corn crop yields by research

Full size table

The choice of RReliefF in feature selection before implementing a predictive model offers key advantages compared to other approaches such as using statistical methods [34,35,36] or direct implementation of predictive models [37,38,39]. RReliefF stands out for its sensitivity to local interactions, predictive model independence, robustness to noise, interpretability, and computational efficiency. By focusing on evaluating feature relevance at the local level, RReliefF can capture specific patterns and provide more robust and efficient feature selection, independent of the subsequent prediction algorithm. These features make RReliefF an attractive option in situations where feature interpretation, noise resilience and computational efficiency are valued.

On the other hand, the values obtained for average temperature and evaporation are slightly lower compared to the other attributes. Although these factors have also been identified in previous studies as relevant for estimating corn crop yield, their relative influence in the case study is lower. It is important to note that despite their lower degree of influence, these attributes can still play a significant role in yield prediction when considered together with other climatic factors.

According to the results, solar radiation is the most important factor in estimating corn crop yield. It plays a fundamental role by providing the necessary energy for the photosynthesis process in plants, which directly influences carbohydrate production and crop growth. The energy captured through solar radiation is essential for driving the biological processes that determine corn crop production and yield [57].

Precipitation also plays a crucial role due to its significant influence on yield. Its variability during the crop growth cycle has a significant impact on the outcomes. This factor is the primary source of water for crops, making it crucial for meeting their water needs. Unlike other factors like temperature, solar radiation, and wind, which are consumers, precipitation directly supplies the water needed for plant development and growth [58].

Vapor pressure is another determinant factor in this estimation process, allowing us to understand the pressure exerted by the water vapor content in the air in a specific area, and consequently, how much water vapor is present in the air. Additionally, it determines the air’s capacity to hold water molecules and has a direct relationship with temperature. When temperature increases, the air’s capacity to store water molecules in the air and vapor pressure increase [59].

Maximum and minimum air temperatures are essential characteristics in the yield forecasting process as they regulate plant development rates and the duration of growth processes. Moreover, they control the capacity of the air to hold water molecules, and their variation over time determines a significant part of the corn crop growth stages [60].

During the training and validation process of the models, an increase in accuracy was observed when reducing the input variable set from 10 to 5. In the case of the ANN-based model, the configuration with 5 variables achieved an accuracy of 87.84%. In contrast, the accuracy of the model with 10 variables was 85.97%. Similarly, in the LR model, the configuration with 5 variables resulted in an accuracy of 85.83%, surpassing the accuracy of the model with 10 variables, which was 85.76%. The reduction of variables allowed the models to more effectively capture the relationships between climatic factors and corn crop yield, confirming the importance of the selection process in the predictive capability of the models.

The use of climatic factors and models based on them plays a fundamental role in addressing the issue of corn production. By identifying the relevance of attributes such as solar radiation, precipitation, vapor pressure, and maximum and minimum temperatures in predicting corn crop yield, a more comprehensive understanding of how climate changes affect agricultural production is achieved. This information is crucial for making informed decisions in crop planning, irrigation, and other agricultural practices. Anticipating the values of these attributes and their influence on corn crop yield allows farmers to take necessary management measures to improve results, adapting agricultural practices according to forecasted weather conditions.

6 Conclusions

The climatic and topographic diversity between biogeographic zones shows specific challenges and different patterns in in corn crop yields. The Nearctic biogeographic zone experiences seasonal challenges such as droughts, the Palearctic and Indo-Malayan zones have climates ranging from arid and cold to tropical, and the Neotropical zone, to which the equatorial countries belong, presents warm temperatures throughout the year and rainy seasons. From the study carried out in Colombia, a country in the Neotropical zone, its evident that solar radiation, precipitation, vapor pressure and maximum and minimum temperature are the climatic factors that have the greatest influence on the estimation of corn crop yields, with a relevance factor (RRelifF) of 0.033, 0.032, 0.026, 0.022 and 0.021, respectively. These factors, both individually and in their interaction during the crop growth cycle, play a determining role in the yield obtained at the end of the cycle. The significant relevance of these variables in the estimation of agricultural yield is essential for the construction of high- precision predictive models, which are crucial for improving production processes at the regional level.

When contrasting the performance of predictive models that used the complete set of variables with the subset representing the most relevant climatic factors, the importance of identifying the fundamental variables that should be considered when defining high-precision and reliable models becomes evident. In this case study, the ANN model achieved higher accuracy in yield estimation when using the selected variable set, achieving an RMSE of 0.1216, compared to the configuration that used all available variables, which obtained an RMSE of 0.1403. Similarly, the LR-based model showed better performance when using the subset of variables, obtaining an RMSE of 0.1417 compared to the configuration that used all variables and obtained an RMSE of 0.1424.

7 Future works

To optimize the predictive capacity of the models and adapt to specific contexts, additional variables could be considered in future work, such as soil quality, fertilizer use, and presence of pests. Conducting detailed temporal analyzes of corn crops to establish how climatic factors affect each phase of the growth cycle. Improving accuracy and efficiency may include techniques such as deep learning algorithms and optimization methods and integrating satellite data sets for more accurate measurements and improved spatial resolution. These improvements will result in a better understanding of the factors that influence corn crop yield estimation and facilitate effective application of the results in agricultural decision making.

Data availability

In accordance with the transparency and data access policy, we provide the following data availability statement to support the results and analysis of our research. Climatic Data: These data are available through the Hydrometeorological Data Query and Download System of the Institute of Hydrology, Meteorology, and Environmental Studies of Colombia (IDEAM). We recommend visiting the official IDEAM portal. Maize Crop Performance Data: These data are available on the AGRONET portal of the Ministry of Agriculture and Rural Development (MinAgricultura). We recommend visiting the official AGRONET portal.

References

Udvardy MDF (1975) A classification of the biogeographical provinces of the world. https://api.semanticscholar.org/. CorpusID: 177193388
Erenstein O, Jaleta M, Sonder K, Mottaleb K, Prasanna BM (2022) Global maize production, consumption and trade: trends and R&D implications. Food Secur 14(5):1295–1319. https://doi.org/10.1007/S12571-022-01288-7
Article Google Scholar
Khaki S, Wang L, Archontoulis SV (2020) A cnn-rnn framework for crop yield prediction. Front Plant Sci 10:1–14. https://doi.org/10.3389/fpls.2019.01750
Article Google Scholar
Khaki S, Pham H, Wang L (2021) Simultaneous corn and soybean yield prediction from remote sensing data using deep transfer learning. Sci Rep 11(1):1–14. https://doi.org/10.1038/s41598-021-89779-z
Article Google Scholar
Zhang L, Zhang Z, Luo Y, Cao J, Tao F (2020) Combining optical, fluorescence, thermal satellite, and environmental data to predict county-level maize yield in China using machine learning approaches. Remote Sens 12(1):21. https://doi.org/10.3390/RS12010021
Article Google Scholar
Arjunagi S, Patil NB (2023) Optimized convolutional neural network for identification of maize leaf diseases with adaptive ageist spider monkey optimization model. Int J Inf Technol 15(2):877–891. https://doi.org/10.1007/S41870-021-00657-3/TABLES/2
Article Google Scholar
CIAT and CIMMYT (2019) Maiz para Colombia visi´on 2030. E. Figueroa (ed)
Franch B, Vermote EF, Becker-Reshef I et al (2015) Improving the timeliness of winter wheat production forecast in the United States of America, ukraine and china using modis data and ncar growing degree day information. Remote Sens Environ 161:131–148. https://doi.org/10.1016/J.RSE.2015.02.014
Article Google Scholar
Johnson DM (2014) An assessment of pre- and within-season remotely sensed variables for forecasting corn and soybean yields in the United States. Remote Sens Environ 141:116–128. https://doi.org/10.1016/J.RSE.2013.10.027
Article Google Scholar
López-Lozano R, Duveiller G, Seguini L et al (2015) Towards regional grain yield forecasting with 1 km-resolution EO biophysical products: strengths and limitations at pan-European level. Agric For Meteorol 206:12–32. https://doi.org/10.1016/J.AGRFORMET.2015.02.021
Article Google Scholar
Pagani V, Guarneri T, Fumagalli D et al (2017) Improving cereal yield forecasts in Europe—the impact of weather extremes. Eur J Agron 89:97–106. https://doi.org/10.1016/J.EJA.2017.06.010
Article Google Scholar
Sun J, Lai Z, Di L, Sun Z, Tao J, Shen Y (2020) Multilevel deep learning network for county-level corn yield estimation in the U.S. corn belt. IEEE J Sel Top Appl Earth Obs Remote Sens 13:5048–5060. https://doi.org/10.1109/JSTARS.2020.3019046
Article Google Scholar
Kim N, Lee YW (2016) Machine learning approaches to corn yield estimation using satellite images and climate data: a case of iowa state. J Korean Soc Surv Geod Photogramm Cartogr 34(4):383–390. https://doi.org/10.7848/ksgpc.2016.34.4.383
Article Google Scholar
Qiao M, He X, Cheng X et al (2021) Crop yield prediction from multi-spectral, multi-temporal remotely sensed imagery using recurrent 3d convolutional neural networks. Int J Appl Earth Obs Geoinf 102:102436. https://doi.org/10.1016/j.jag.2021.102436
Article Google Scholar
van der Velde M, Baruth B, Bussay A et al (2018) In-season performance of European Union wheat forecasts during extreme impacts. Sci Rep 8(1):1–10. https://doi.org/10.1038/s41598-018-33688-1
Article Google Scholar
Kim N, Ha K-J, Park N-W, Cho J, Hong S, Lee Y-W (2019) A comparison between major artificial intelligence models for crop yield prediction: case study of the midwestern United States, 2006–2015. ISPRS Int J Geo Inf 8(5):240. https://doi.org/10.3390/ijgi8050240
Article Google Scholar
Liu Y, Kim J, Fleisher DH, Kim KS (2021) Analogy-based crop yield forecasts based on temporal similarity of leaf area index. Remote Sens 13(16):1–16. https://doi.org/10.3390/rs13163069
Article Google Scholar
Zhang L, Zhang Z, Luo Y, Cao J, Xie R, Li S (2021) Integrating satellite-derived climatic and vegetation indices to predict smallholder maize yield using deep learning. Agric For Meteorol 311:108666. https://doi.org/10.1016/j.agrformet.2021.108666
Article Google Scholar
Kumar P, Motia S, Reddy SRN (2023) Integrating wireless sensing and decision support technologies for real-time farmland monitoring and support for effective decision making: designing and deployment of WSN and DSS for sustainable growth of Indian agriculture. Int J Inf Technol 15(2):1081–1099. https://doi.org/10.1007/S41870-018-0218-9/FIGURES/22
Article Google Scholar
Guan K, Wu J, Kimball JS et al (2017) The shared and unique values of optical, fluorescence, thermal and microwave satellite data for estimating large-scale crop yields. Remote Sens Environ 199:333–349. https://doi.org/10.1016/J.RSE.2017.06.043
Article Google Scholar
Ma Y, Zhang Z, Yang HL, Yang Z (2021) An adaptive adversarial domain adaptation approach for corn yield prediction. Comput Electron Agric 187:106314. https://doi.org/10.1016/j.compag.2021.106314
Article Google Scholar
Kang Y, Ozdogan M, Zhu X, Ye Z, Hain C, Anderson M (2020) Comparative assessment of environmental variables and machine learning algorithms for maize yield prediction in the us midwest. Environ Res Lett 15(6):064005. https://doi.org/10.1088/1748-9326/AB7DF9
Article Google Scholar
Nabi F, Jamwal S, Padmanbh K (2022) Wireless sensor network in precision farming for forecasting and monitoring of apple disease: a survey. Int J Inf Technol 14(2):769–780. https://doi.org/10.1007/S41870-020-00418-8/TABLES/4
Article Google Scholar
Admass WS (2022) Developing knowledge-based system for the diagnosis and treatment of mango pests using data mining techniques. Int J Inf Technol 14(3):1495–1504. https://doi.org/10.1007/S41870-022-00870-8/FIGURES/5
Article Google Scholar
Chlingaryan A, Sukkarieh S, Whelan B (2018) Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: a review. Comput Electron Agric 151:61–69. https://doi.org/10.1016/j.compag.2018.05.012
Article Google Scholar
Khaki S, Wang L (2020) Crop yield prediction using deep neural networks. In: Smart service systems, operations management, and analytics. Springer, pp 139–147. https://doi.org/10.1007/978-3-030-30967-1_13
Upadhyay SK, Kumar A (2022) A novel approach for rice plant diseases classification with deep convolutional neural network. Int J Inf Technol 14(1):185–199. https://doi.org/10.1007/S41870-021-00817-5/TABLES/4
Article MathSciNet Google Scholar
Challinor AJ, Ewert F, Arnold S, Simelton E, Fraser E (2009) Crops and climate change: progress, trends, and challenges in simulating impacts and informing adaptation. J Exp Bot 60(10):2775–2789. https://doi.org/10.1093/JXB/ERP062
Article Google Scholar
Kothari S, Schweiger AK (2022) Plant spectra as integrative measures of plant phenotypes. J Ecol 110(11):2536–2554. https://doi.org/10.1111/1365-2745.13972
Article Google Scholar
Sánchez B, Rasmussen A, Porter JR (2014) Temperatures and the growth and development of maize and rice: a review. Glob Change Biol 20(2):408–417. https://doi.org/10.1111/GCB.12389
Article Google Scholar
Rodriguez-Galiano VF, Luque-Espinar JA, Chica-Olmo M, Mendes MP (2018) Feature selection approaches for predictive modelling of groundwater nitrate pollution: an evaluation of filters, embedded and wrapper methods. Sci Total Environ 624:661–672. https://doi.org/10.1016/J.SCITOTENV.2017.12.152
Article Google Scholar
Kuwata K, Shibasaki R (2016) Estimating corn yield in the United States with modis evi and machine learning methods. ISPRS Ann Photogramm Remote Sens Spat Inf Sci III–8:131–136. https://doi.org/10.5194/isprsannals-iii-8-131-2016
Article Google Scholar
Nevavuori P, Narra N, Lipping T (2019) Crop yield prediction with deep convolutional neural networks. Comput Electron Agric 163:104859. https://doi.org/10.1016/j.compag.2019.104859
Article Google Scholar
Zhang Q, Dong W, Wen C, Li T (2020) Study on factors affecting corn yield based on the Cobb-Douglas production function. Agric Water Manag 228:105869. https://doi.org/10.1016/J.AGWAT.2019.105869
Article Google Scholar
Kravchenko AN, Bullock DG (2000) Correlation of corn and soybean grain yield with topography and soil properties. Agron J 92(1):75–83. https://doi.org/10.2134/AGRONJ2000.92175X
Article Google Scholar
Freelove D (2018) Factors causing corn yield increases in the United States. https://opensiuc.lib.siu.edu/gsrp/826. Research Paper
Schumacher BL, Burchfield EK, Bean B, Yost MA (2023) Leveraging important covariate groups for corn yield prediction. Agric 13(3):618. https://doi.org/10.3390/AGRICULTURE13030618/S1
Article Google Scholar
Miao Y, Mulla DJ, Robert PC (2006) Identifying important factors influencing corn yield and grain quality variability using artificial neural networks. Precis Agric 7(2):117–135. https://doi.org/10.1007/S11119-006-9004-Y/FIGURES/13
Article Google Scholar
Chen G, Liu H, Zhang J, Liu P, Dong S (2012) Factors affecting summer maize yield under climate change in Shandong Province in the Huanghuaihai region of China. Int J Biometeorol 56(4):621–629. https://doi.org/10.1007/S00484-011-0460-3/TABLES/4
Article Google Scholar
Wirth R, Hipp J (2000) Crisp-dm: towards a standard process model for data mining. In: Proceedings of the fourth international conference on the practical application of knowledge discovery and data mining, pp 29–39
IDEAM (2020 Consulta y descarga de datos hidrometeorol´ogicos ideam. http://dhime.ideam.gov.co/atencionciudadano/
Reporte: area, producci´on, rendimiento y participaci´on municipal en el departamento por cultivo, MinAgricultura. (2006). https://www.agronet.gov.co/estadistica/Paginas/home.aspx?cod=4
Lin T, Zhong R, Wang Y et al (2020) Deepcropnet: a deep spatial-temporal learning framework for county-level corn yield estimation. Environ Res Lett 15(3):034016. https://doi.org/10.1088/1748-9326/AB66CB
Article Google Scholar
Ma Y, Zhang Z, Kang Y, Ozdogan M (2021) Corn yield prediction and uncertainty analysis based on remotely sensed variables using a Bayesian neural network approach. Remote Sens Environ 259:112408. https://doi.org/10.1016/j.rse.2021.112408
Article Google Scholar
Jiang H, Hu H, Zhong R et al (2020) A deep learning approach to conflating heterogeneous geospatial data for corn yield estimation: a case study of the US corn belt at the county level. Glob Change Biol 26(3):1754–1766. https://doi.org/10.1111/gcb.14885
Article Google Scholar
Ju S, Lim H, Heo J (2020) Machine learning approaches for crop yield prediction with modis and weather data. In: 40th Asian conference on remote sensing, ACRS 2019: progress of remote sensing technology for smart future, pp 1–4
Kuwata K, Shibasaki R (2015) Estimating crop yields with deep learning and remotely sensed data. In: International Geoscience and Remote Sensing Symposium (IGARSS), 2015-November 2015, pp 858–861. https://doi.org/10.1109/IGARSS.2015.7325900
Cunha RLF, Silva B, Netto MAS (2018) A scalable machine learning system for pre-season agriculture yield forecast. In: Proceedings—IEEE 14th International Conference on EScience, e-Science 2018, pp 423–430. https://doi.org/10.1109/eScience.2018.00131
Jiang Z, Liu C, Hendricks NP, Ganapathysubramanian B, Hayes DJ, Sarkar S (2018) Predicting county level corn yields using deep long short term memory models. http://arxiv.org/abs/1805.12044
Karuna G, Pravallika K, Anuradha K, Srilakshmi V (2021) “Convolutional and spiking neural network models for crop yield forecasting. E3S Web Conf 309:01162. https://doi.org/10.1051/e3sconf/202130901162
Article Google Scholar
Shahhosseini M, Hu G, Khaki S, Archontoulis SV (2021) Corn yield prediction with ensemble cnn-dnn. Front Plant Sci 12:1–13. https://doi.org/10.3389/fpls.2021.709008
Article Google Scholar
Robnik-Sikonja M, Kononenko I (1997) An adaptation of relief for attribute estimation in regression. In: Machine learning: proceedings of the fourteenth international conference (ICML’97), December 1997, 296–304
Robnik M, Konenko I (2003) Theoretical and empirical analysis of relieff and rrelieff. Mach Learn 53(1–2):23–69
Article Google Scholar
Gardner MW, Dorling SR (1998) Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos Environ 32(14–15):2627–2636. https://doi.org/10.1016/S1352-2310(97)00447-0
Article Google Scholar
Ismailov VE (2007) A note on the best L2 approximation by ridge functions. Appl Math E-Notes 7:71–76
MathSciNet Google Scholar
Zipper SC, Qiu J, Kucharik CJ (2016) Drought effects on us maize and soybean production: spatiotemporal patterns and historical changes. Environ Res Let 11(9):094021. https://doi.org/10.1088/1748-9326/11/9/094021
Article Google Scholar
Smith NG, Dukes JS (2017) Short-term acclimation to warmer temperatures accelerates leaf carbon exchange processes across plant types. Glob Change Biol 23(11):4840–4853. https://doi.org/10.1111/gcb.13735
Article Google Scholar
Lobell DB, Burke MB (2008) Why are agricultural impacts of climate change so uncertain? the importance of temperature relative to precipitation. Environ Res Lett 3(3):034007. https://doi.org/10.1088/1748-9326/3/3/034007
Article Google Scholar
Villegas JA, Torres BE (1977) Evapotranspiracion. Serie Didactica, Facultad de Agronomia y Zootecnica, Universidad Nacional de Tucuman, vol 45
Jaramillo AR (2005) Clima andino y café en Colombia. CENICAFE, Tech Rep. https://biblioteca.cenicafe.org/bitstream/10778/859/1/Portada.pdf

Download references

Funding

Open Access funding provided by Colombia Consortium. The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Julio Baron-Velandia and Sebastian-Camilo Vanegas-Ayala have contributed equally to this work.

Authors and Affiliations

Faculty of Engineering, Universidad Distrital Francisco José de Caldas, 111611, Bogotá D.C., Colombia
Brayan-Leonardo Sierra-Forero, Julio Baron-Velandia & Sebastian-Camilo Vanegas-Ayala
Systems Engineering Program, Faculty of Engineering and Basic Sciences, Fundación Universitaria Los Libertadores, 111221, Bogotá D.C., Colombia
Sebastian-Camilo Vanegas-Ayala

Authors

Brayan-Leonardo Sierra-Forero
View author publications
You can also search for this author in PubMed Google Scholar
Julio Baron-Velandia
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian-Camilo Vanegas-Ayala
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors confirm contribution to the paper as follows: study conception and design: conceptualization: BS and SV; data curation: BS; formal analysis: BS and JB; methodology: BS and SV; project administration: BS and JB; supervision: BS and JB; visualization: BS; writing—original draft: BS; writing—review & editing: SV and JB. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Brayan-Leonardo Sierra-Forero.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sierra-Forero, BL., Baron-Velandia, J. & Vanegas-Ayala, SC. Assessment of the relevance of features associated with corn crop yield prediction in Colombia, a country in the Neotropical zone. Int. j. inf. tecnol. 16, 2129–2138 (2024). https://doi.org/10.1007/s41870-024-01762-9

Download citation

Received: 26 October 2023
Accepted: 27 January 2024
Published: 12 March 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s41870-024-01762-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Assessment of the relevance of features associated with corn crop yield prediction in Colombia, a country in the Neotropical zone

Abstract

Similar content being viewed by others

Enhancing crop recommendation systems with explainable artificial intelligence: a study on agricultural decision-making

Analysis of factors affecting evapotranspiration zoning

Diverse approaches to crop diversification in agricultural research. A review

1 Introduction

2 Related works

3 Methodology

4 Results

4.1 Data description

4.1.1 Climatic data

4.1.2 Yield data

4.2 Data preparation

4.3 Predictive model

5 Discussion of results

6 Conclusions

7 Future works

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Assessment of the relevance of features associated with corn crop yield prediction in Colombia, a country in the Neotropical zone

Abstract

Similar content being viewed by others

Enhancing crop recommendation systems with explainable artificial intelligence: a study on agricultural decision-making

Analysis of factors affecting evapotranspiration zoning

Diverse approaches to crop diversification in agricultural research. A review

1 Introduction

2 Related works

3 Methodology

4 Results

4.1 Data description

4.1.1 Climatic data

4.1.2 Yield data

4.2 Data preparation

4.3 Predictive model

5 Discussion of results

6 Conclusions

7 Future works

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation