Abstract

This study combines Sentinel-2 satellite images with yield monitoring data from fields to evaluate yield predictions from spatial, temporal, and modeling dimensions. Combining the Matern-Kriging method, vegetation index and machine learning models, four key aspects were studied: the influence of spatial resolution on ground truth interpolation, the influence of spatial proximity of training fields, single year and multi-year prediction performance, as well as the influence of spatial smoothing for yield map. Random forest is used as the main predictive model. The results show that a smaller Kriging grid size retains more spatial details, while larger grid leads to detail loss. LOFOCV revealed that there are significant differences in the accuracy of field predictions. It largely depends on the soil and management conditions at specific locations. Compared with fields with a longer spatial distance, training on fields with a closer spatial distance can continuously improve the prediction accuracy. Furthermore, the accuracy of single year predictions is similar to multi-year training. This means targeted, time-constrained datasets can reduce the computational burden without significantly lowering performance. The addition of spatial smoothing further enhances the predictive ability of random forests. It can reduce noise and align the map more closely with the observed yield patterns. Overall, integrating spatial resolution, temporal and spatial proximity into the model can significantly enhance the robustness and applicability of the yield prediction framework, providing practical insights for precision agriculture and scalable food production monitoring.

Keywords

Yield prediction, Satellite imagery, Machine learning, Vegetation indices

DOI

10.5703/1288284318187

Share

COinS
 

Generalizing yield prediction and evaluating the common factors influencing the models

This study combines Sentinel-2 satellite images with yield monitoring data from fields to evaluate yield predictions from spatial, temporal, and modeling dimensions. Combining the Matern-Kriging method, vegetation index and machine learning models, four key aspects were studied: the influence of spatial resolution on ground truth interpolation, the influence of spatial proximity of training fields, single year and multi-year prediction performance, as well as the influence of spatial smoothing for yield map. Random forest is used as the main predictive model. The results show that a smaller Kriging grid size retains more spatial details, while larger grid leads to detail loss. LOFOCV revealed that there are significant differences in the accuracy of field predictions. It largely depends on the soil and management conditions at specific locations. Compared with fields with a longer spatial distance, training on fields with a closer spatial distance can continuously improve the prediction accuracy. Furthermore, the accuracy of single year predictions is similar to multi-year training. This means targeted, time-constrained datasets can reduce the computational burden without significantly lowering performance. The addition of spatial smoothing further enhances the predictive ability of random forests. It can reduce noise and align the map more closely with the observed yield patterns. Overall, integrating spatial resolution, temporal and spatial proximity into the model can significantly enhance the robustness and applicability of the yield prediction framework, providing practical insights for precision agriculture and scalable food production monitoring.