Field Data Collection Methods Strongly Affect Satellite-Based Crop Yield Estimation
Crop yield estimation from satellite data requires field observations to fit and evaluate predictive models. However, it is not clear how much field data collection methods matter for predictive performance. To evaluate this, we used maize yield estimates obtained with seven field methods (two farmer estimates, two point transects, and three crop cut methods) and the “true yield” measured from a full-field harvest for 196 fields in three districts in Ethiopia in 2019. We used a combination of nine vegetation indices and five temporal aggregation methods for the growing season from Sentinel-2 SR data as yield predictors in the linear regression and Random Forest models. Crop-cut-based models had the highest model fit and accuracy, similar to that of full-field-harvest-based models. When the farmer estimates were used as the training data, the prediction gain was negligible, indicating very little advantage to using remote sensing to predict yield when the training data quality is low. Our results suggest that remote sensing models to estimate crop yield should be fit with data from crop cuts or comparable high-quality measurements, which give better prediction results than low-quality training data sets, even when much larger numbers of such observations are available.