Genomic prediction in a large African maize population
Genomic prediction (GP) combines genomewide marker data with phenotypic data in a training population to predict the genomic estimated breeding values of untested individuals in a relevant testing population. Our objective was to evaluate the effects of population structure, genotype ´ trial, tester, and management interactions, and imputation methods on the accuracy of GP for grain yield in the CIMMYT’s African maize (Zea mays L.) program. The dataset included 2022 diverse breeding lines in 156 Stage 1 yield trials and 66,000 singlenucleotide polymorphism markers. The first two principal components from principal component analysis explained 10.5% of the variance in marker data. Based on marker data, five clusters were detected, but cluster of origin explained only 2% of the phenotypic variation. Prediction accuracy, assessed by cross validation, ranged from 0.20 to 0.36 within clusters and from 0.04 to 0.26 across clusters. Mean GP accuracy within clusters (0.27) outperformed pedigree-based prediction (0.03). Imputation methods did not strongly affect prediction accuracy. Testers and management had large effects. To achieve acceptable GP accuracy within such a diverse population, one can employ (i) a very large training population size, (ii) carefully planned and relevant testers, and (iii) common trial environments and management between the training and validation populations and related genetic materials.