Application of machine learning for estimating reference evapotranspiration and crop yield based on climatological data
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Accurate estimation of reference evapotranspiration (ET₀) provides information on crop water requirement. Knowing crop water requirement in advance allows water managers to take decision for improving the agricultural production and food security at the regional and global scales. ET₀ and crop yield are complex and nonlinear phenomenon depending on various environmental factors. Due to their complex characteristics, this study implemented various machine learning algorithms to predict ET₀ and crop yield. Performance of the implemented models were evaluated using statistical metrics of coefficient of determination (R²), root mean square error (RMSE), mean absolute error (MAE), and relative absolute error (RAE). The overall question this study aimed to address was: will machine learning algorithms be able to model the complex phenomenon of reference evapotranspiration and crop yield across the central United States? To support this research question, two studies were performed in which machine learning algorithms were developed and applied to estimate reference evapotranspiration and crop yield. The first part of the study evaluates the performance of four machine learning algorithms of support vector machine (SVM), random forest (RF), artificial neural network (ANN), and extreme learning machine (ELM) to predict daily ET₀. Four input combinations acquired from minimum and maximum temperatures (T[subscript min] and T[subscript max]), net solar radiation (R[subscript n]), wind speed (U₂), and relative humidity (H[subscript r]) were considered using weather data from 1979-2014 for warm season (May through September) from 39 weather stations across the central United States. The predicted ET₀ values were compared with the ET₀ estimated using the FAO-56 Penman-Monteith model. Results revealed that all implemented models showed good results with R² > 0.952 when all five parameters of T[subscript max], T[subscript min], R[subscript n], U₂, and H[subscript r] were used for input variables. When fewer meteorological parameters, including only T[subscript max], R[subscript n], and H[subscript r] were used, all models still showed satisfactory results with an R² not less than 0.868. ELM showed better performance for all input combinations. The second part of the study implemented six machine learning models, including SVM, ANN, ELM, RF, multiple linear regression (MLR), and deep neural network (DNN) for predicting corn yield using satellite-based vegetation index and weather data from 2003-2020 for growing season (May through September) across Kansas, Iowa, and Nebraska in the Corn Belt, United States. Normalized difference vegetation index (NDVI) is used as satellite-based vegetation index. Biweekly composite MODIS Aqua 1-km NDVI product were used. The predicted crop yield was compared with the USDA yield statistics. The USDA yield statistics consists of corn production in tons per hectare for each county. All models were trained using “with NDVI” and “without NDVI” input combinations. The “without NDVI” input combination weather data of day length, snow water equivalent, vapor pressure, solar radiation, maximum and minimum temperature, and precipitation and “with NDVI” input combination includes all parameters of “without NDVI” input combination plus satellite based NDVI. Results revealed that all implemented models showed R² range between 0.351-0.634 when used with “with NDVI” input combination and R² range between 0.343-0.591 when used with “without NDVI” input combination. Performance of all models were slightly higher using "with NDVI" input combination as compared to using “without NDVI” input combination. DNN showed the best performance following by ELM. MLR showed the worst performance using both the input combination. The developed machine learning models showed very high accuracy as compared to the other developed machine learning or empirical models for estimating ET₀. This study could help to choose the best predictive machine learning models for estimating ET₀ across the central United States with higher accuracy. The developed machine learning models helps to evaluate the ET₀ with fewer parameters with competitive accuracy as compared to FAO-56 PM model. The accurate estimation of ET₀ helps to quantify the crop water requirement in advance and helps to utilize the limited irrigation availability effectively. This helps to increase the irrigation system efficiency and supports in conversation of water resources and irrigation water management. Secondly, the implemented machine learning models showed average performance for forecasting crop yield across the part of the Corn Belt in the United States. Our model limits the estimation of crop yield before the harvest due to the utilization of NDVI data from the entire growing season. The predictive model would predict the crop yield at any period before the harvest when utilize with NDVI during different stages of the crop growth. The performance of the models could be increased by adding soil data as an input variable. This study helps to provide information on increasing performance of the models by using NDVI as an input.