Application of genomic selection at the early stage of breeding pipeline in tropical maize
In maize, doubled haploid (DH) line production capacity of large-sized maize breeding programs often exceeds the capacity to phenotypically evaluate the complete set of testcross candidates in multi-location trials. The ability to partially select DH lines based on genotypic data while maintaining or improving genetic gains for key traits using phenotypic selection can result in significant resource savings. The present study aimed to evaluate genomic selection (GS) prediction scenarios for grain yield and agronomic traits of one of the tropical maize breeding pipelines of CIMMYT in eastern Africa, based on multi-year empirical data for designing a GS-based strategy at the early stages of the pipeline. We used field data from 3,068 tropical maize DH lines genotyped using rAmpSeq markers and evaluated as test crosses in well-watered (WW) and water-stress (WS) environments in Kenya from 2017 to 2019. Three prediction schemes were compared: (1) 1 year of performance data to predict a second year; (2) 2 years of pooled data to predict performance in the third year, and (3) using individual or pooled data plus converting a certain proportion of individuals from the testing set (TST) to the training set (TRN) to predict the next year’s data. Employing five-fold cross-validation, the mean prediction accuracies for grain yield (GY) varied from 0.19 to 0.29 under WW and 0.22 to 0.31 under WS, when the 1-year datasets were used training set to predict a second year’s data as a testing set. The mean prediction accuracies increased to 0.32 under WW and 0.31 under WS when the 2-year datasets were used as a training set to predict the third-year data set. In a forward prediction scenario, good predictive abilities (0.53 to 0.71) were found when the training set consisted of the previous year’s breeding data and converting 30% of the next year’s data from the testing set to the training set. The prediction accuracy for anthesis date and plant height across WW and WS environments obtained using 1-year data and integrating 10, 30, 50, 70, and 90% of the TST set to TRN set was much higher than those trained in individual years. We demonstrate that by increasing the TRN set to include genotypic and phenotypic data from the previous year and combining only 10–30% of the lines from the year of testing, the predicting accuracy can be increased, which in turn could be used to replace the first stage of field-based screening partially, thus saving significant costs associated with the testcross formation and multi-location testcross evaluation.