Using an incomplete block design to allocate lines to environments improves sparse genome-based prediction in plant breeding
Genomic selection (GS) is a predictive methodology that trains statistical machine-learning models with a reference population that is used to perform genome-enabled predictions of new lines. In plant breeding, it has the potential to increase the speed and reduce the cost of selection. However, to optimize resources, sparse testing methods have been proposed. A common approach is to guarantee a proportion of nonoverlapping and overlapping lines allocated randomly in locations, that is, lines appearing in some locations but not in all. In this study we propose using incomplete block designs (IBD), principally, for the allocation of lines to locations in such a way that not all lines are observed in all locations. We compare this allocation with a random allocation of lines to locations guaranteeing that the lines are allocated to the same number of locations as under the IBD design. We implemented this benchmarking on several crop data sets under the Bayesian genomic best linear unbiased predictor (GBLUP) model, finding that allocation under the principle of IBD outperformed random allocation by between 1.4% and 26.5% across locations, traits, and data sets in terms of mean square error. Although a wide range of performance improvements were observed, our results provide evidence that using IBD for the allocation of lines to locations can help improve predictive performance compared with random allocation. This has the potential to be applied to large-scale plant breeding programs.