Forecasting population of corn earworm (Helicoverpa zea) using classical and machine learning models
Abstract
Corn earworm (Helicoverpa zea) (CEW) is a major pest of several field crops in the southern U.S., with its population dynamics closely linked to environmental conditions. Accurate forecasting of CEW seasonal populations can help growers implement timely and cost-effective pest management strategies. This study evaluated seven forecasting models, Autoregressive model with exogenous inputs (ARX), Multiple Linear Regression (MLR), Random Forest (RF), Extreme Gradient Boosting (XGB), Support Vector Regression (SVR), Multi-Layer Perceptron (MLP), and Long Short-Term Memory (LSTM) to predict CEW abundance using lagged weather variables at three trap sites in Alexandria, Louisiana, from 2020 to 2024. A 15-day aggregate weather data was used to create lagged features, and model performance was assessed through walk-forward cross-validation using RMSE, MAE, R², and bias. Results showed that SVR and MLP models consistently provided better forecasting accuracy, especially for years other than 2023 with stable CEW patterns, while LSTM performed well in some cases but tended to overfit. Forecasting accuracy dropped during 2023 with extreme population spikes and atypical weather patterns. Feature importance analysis revealed that temperature, cold hour accumulation, rainfall, and humidity were some of the most influential predictors across models and years. All the analyses were done in JupyterLab with python 3.8 using libraries such as scikit learn, pandas, numpy, mlxtend, tensorflow, and seaborn. These findings highlight the potential of machine learning models to support data-driven pest forecasting potential, though improvements are needed in model generalization and data availability.
Keywords
AI, forecasting, corn earworm, machine learning, time series.
DOI
10.5703/1288284318182
Forecasting population of corn earworm (Helicoverpa zea) using classical and machine learning models
Corn earworm (Helicoverpa zea) (CEW) is a major pest of several field crops in the southern U.S., with its population dynamics closely linked to environmental conditions. Accurate forecasting of CEW seasonal populations can help growers implement timely and cost-effective pest management strategies. This study evaluated seven forecasting models, Autoregressive model with exogenous inputs (ARX), Multiple Linear Regression (MLR), Random Forest (RF), Extreme Gradient Boosting (XGB), Support Vector Regression (SVR), Multi-Layer Perceptron (MLP), and Long Short-Term Memory (LSTM) to predict CEW abundance using lagged weather variables at three trap sites in Alexandria, Louisiana, from 2020 to 2024. A 15-day aggregate weather data was used to create lagged features, and model performance was assessed through walk-forward cross-validation using RMSE, MAE, R², and bias. Results showed that SVR and MLP models consistently provided better forecasting accuracy, especially for years other than 2023 with stable CEW patterns, while LSTM performed well in some cases but tended to overfit. Forecasting accuracy dropped during 2023 with extreme population spikes and atypical weather patterns. Feature importance analysis revealed that temperature, cold hour accumulation, rainfall, and humidity were some of the most influential predictors across models and years. All the analyses were done in JupyterLab with python 3.8 using libraries such as scikit learn, pandas, numpy, mlxtend, tensorflow, and seaborn. These findings highlight the potential of machine learning models to support data-driven pest forecasting potential, though improvements are needed in model generalization and data availability.