Loss of accuracy in the genomic prediction for prevalent two-stage analysis with single site

Emi Tanaka, Ky Mathews, Alison B. Smith and Brian R. Cullis

Abstract

Linear mixed model is the prevailing method in genomic prediction and selection as it fits the structure of the data well. There are many forms of the analysis using linear mixed models however it is widely concurred that a single model that analyses the individual plot data, i.e. one-stage analysis, is superior to a two-stage analysis. Briefly, two-stage analysis involves computation of the adjusted genotype means in the first stage with a weighted or unweighted analysis in the second stage. A prevalent form of two-stage analysis lacks spatial modelling nor considers variance heterogeneity for genotype environment effects. Often in crop breeding trials, analytical approaches do not take into account the non-genetic sources of variation, either by the adoption of more suitable designs (Butler et al, 2014) or an appropriate analysis (Stefanova et al, 2009) or both. In particular, a two-stage analysis is frequently used in the analysis of crop breeding trials. We present one-stage and two-stage models for single trial analysis. Our one-stage model utilises both the pedigree and marker information and is an extension to the approach in Oakey et al (2006). Our simulation results based on 48 early generation wheat selection trials show there is almost always a loss in accuracy for the prevalent weighted and unweighted two-stage analysis over a one-stage analysis. The loss of accuracy was noticeably larger from the lack of use in spatial modelling than the use of a weighted analysis. In addition, the loss was more pronounced for partially replicated designs (Cullis et al, 2006) which are becoming widely adopted in plant improvement programs in Australia.

Keywords: two stage analysis, linear mixed models, genomic prediction, breeding values, pedigree.

Preprint