## Abstract

The rapid increase in high-throughput single-nucleotide polymorphism data has led to a great interest in applying genome-wide evaluation methods to identify an individual's genetic merit. Genome-wide evaluation combines statistical methods with genomic data to predict genetic values for complex traits. Considerable uncertainty currently exists in determining which genome-wide evaluation method is the most appropriate. We hypothesize that genome-wide methods deal differently with the genetic architecture of quantitative traits and genomes. A genomic linear method (GBLUP), and a genomic nonlinear Bayesian variable selection method (BayesB) are compared using stochastic simulation across three effective population sizes and a wide range of numbers of quantitative trait loci (*N*_{QTL}). GBLUP had a constant accuracy, for a given heritability and sample size, regardless of *N*_{QTL}. BayesB had a higher accuracy than GBLUP when *N*_{QTL} was low, but this advantage diminished as *N*_{QTL} increased and when *N*_{QTL} became large, GBLUP slightly outperformed BayesB. In addition, deterministic equations are extended to predict the accuracy of both methods and to estimate the number of independent chromosome segments (*M*_{e}) and *N*_{QTL}. The predictions of accuracy and estimates of *M*_{e} and *N*_{QTL} were generally in good agreement with results from simulated data. We conclude that the relative accuracy of GBLUP and BayesB for a given number of records and heritability are highly dependent on *M*_{e,} which is a property of the target genome, as well as the architecture of the trait (*N*_{QTL}).

THE rapid progress and reducing costs of genome sequencing and high-throughput DNA techniques have led to a great interest in applying genome-wide evaluation methods to identify individuals of high genetic merit. Genome-wide evaluation uses associations of a large number of SNP (single nucleotide polymorphism) markers across the whole genome with phenotypes to produce accurate estimates of breeding values (EBVs) for candidates to selection (Meuwissen *et al*. 2001). The accuracy of genome-wide selection (*i.e.,* selection based on genomic EBVs) is expected to be substantially higher than that of traditional best linear unbiased prediction (BLUP) selection, which is based on pedigree and phenotypic data (Daetwyler *et al*. 2008; Goddard 2009; Hayes *et al*. 2009c). In addition, genome-wide selection has the potential to reduce inbreeding rates because of the increased emphasis on own rather than family information (Woolliams *et al*. 2002; Daetwyler *et al*. 2007; Dekkers 2007). Furthermore, the application of genome-wide evaluation approaches can significantly aid our understanding of quantitative trait genetic architecture.

The genome-wide evaluation methods suggested to date can be broadly categorized into groups according to whether there is an assortment of the SNP by magnitude of effect or contribution to the variance. One group treats SNP homogeneously and includes variants of genomic best linear unbiased prediction (GBLUP). This group includes a form of ridge regression (Meuwissen *et al*. 2001) and the use of a realized relationship matrix computed from the markers instead of the traditional pedigree matrix (NejatiJavaremi *et al*. 1997; Villanueva *et al*. 2005; Hayes *et al*. 2009c). Both approaches have been shown to be equivalent (Habier *et al*. 2007; Goddard 2009). A second group provides for heterogeneity among SNP contributions to the variance, with some contributions permitted to be large while the remainder are small, possibly zero. This assortment is helped by Bayesian approaches, which place priors on numbers of SNP with major contributions (*e.g.,* BayesA and BayesB; see Meuwissen *et al*. 2001, 2009; Lee *et al*. 2008), or with some penalty based on functions of the magnitude of effect for each SNP (*e.g.,* Lasso; see Tibshirani 1996; Yi and Xu 2008) or with other smoothing metrics (Long *et al*. 2007). A third group attempts to reduce dimensionality by using principal components or partial least squares (Raadsma *et al*. 2008; Solberg *et al*. 2009) to identify an informative subset of SNP genotypes. The main two methods currently used in real data sets are a linear prediction method, GBLUP, and variants of nonlinear Bayesian variable selection approaches such as BayesB.

In most simulated published data, the accuracy of BayesB outperformed that of GBLUP (*e.g.,* Meuwissen *et al*. 2001; Habier *et al*. 2007; Lund *et al*. 2009). However, real data results have not consistently supported this conclusion. Two reviews of empirical results in dairy cattle to date have shown that GBLUP and BayesB result in very similar accuracies for most traits (Hayes *et al*. 2009a; Vanraden *et al*. 2009). One reason for the disagreement between simulated and real data results could be that the genetic architecture simulated is significantly different from what is found in real populations. Most studies published to date that compare methods using simulated architectures have considered only 50 or fewer QTL affecting the trait (*e.g.,* Meuwissen *et al*. 2001; Habier *et al*. 2007; Lund *et al*. 2009). In this article we hypothesize that the relative utility of genome-wide evaluation methods depends significantly on both the genomic structure of the population and the genetic trait architecture.

The main objective of this study was to compare a linear method, GBLUP, and a nonlinear variable selection method, BayesB, using simulated data across a range of population and trait genetic architectures to further understand the mechanics of genome-wide evaluation methods. An important secondary objective was to extend deterministic prediction models to predict the accuracy of both methods. Theoretical models complement stochastic simulation by helping the understanding of the factors involved in genome-wide evaluation performance and, in return, stochastic simulation is used to confirm theoretical derivations.

## METHODS

#### Theoretical development:

Daetwyler *et al*. (2008) derived equations for predicting the accuracy of a simple least-squares genome-wide evaluation approach for continuous and dichotomous traits. The original formula for genome-wide accuracy for a continuous trait is , where *r _{gĝ}* is the correlation between true and estimated additive genetic values (

*i.e.,*accuracy),

*N*

_{P}is the number of individuals in the training population,

*h*

^{2}is the heritability, and

*n*is the number of independent loci (Daetwyler

_{G}*et al*. 2008). The accuracy was independent of how large the subset of loci was that make nonzero contributions. Thus, it did not matter whether there were many nonzero loci effects of small magnitude or only a few nonzero loci effects of large magnitude. In Daetwyler

*et al*. (2008), the formulae were derived by considering the regression of phenotypes on one locus at a time as loci were assumed independent. Therefore the formula will work for small numbers of dispersed loci in a genome but the accuracy will tend to zero as

*n*becomes large; erroneously, because loci cannot be added independently in a finite genome due to linkage. Daetwyler

_{G}*et al*. (2008) discussed that an empirical value for the number of independent chromosome segments () could be used in place of

*n*, because

_{G}*n*was assumed independent. Goddard (2009) also proposed accuracy predictions for GBLUP, which used the concept of predicted

_{G}*M*

_{e}. His derivation builds on work by Visscher

*et al*. (2006) in which the variance of identical-by-descent sharing for full sibs was developed and provides a prediction for

*M*

_{e}, which is

*M*

_{e}= 2

*N*

_{e}

*L*/log(4

*N*

_{e}

*L*), where

*L*is the genome length in Morgans (Goddard 2009). Substituting

*M*

_{e}in place of

*n*in the original formula of Daetwyler

_{G}*et al*. (2008) results in(1)which also predicts the accuracy of GBLUP. At no time does the argument moving from the original formula of Daetwyler

*et al*. (2008) to Equation 1 depend on the distribution of the effects of loci, so we come to the first hypothesis in this study that states that GBLUP accuracy is independent of the number of quantitative trait loci (

*N*

_{QTL}) associated with the phenotypic trait.

Our second hypothesis was that the accuracy of BayesB when *N*_{QTL} is high would tend to that of GBLUP. If our first hypothesis is confirmed, then the dependence of GBLUP on *M*_{e} is an advantage at high *N*_{QTL}, even though *N*_{QTL} may be higher than *M*_{e}. Heuristically, if GBLUP delivers accuracy as if there are a *M*_{e} number of QTL, the benefit from prior information that there are approximately *M*_{e} (or more) QTL is unclear given Equation 1. On the other hand, it is a clear disadvantage if *N*_{QTL} < *M*_{e} because GBLUP cannot adapt the model to suit the data. In contrast, BayesB is a variable selection method that attempts to determine the “optimum dimensionality” given the data and prior information. When *N*_{QTL} is high this optimum is likely to be *M*_{e} in both methods. Hence, the accuracy of BayesB at high *N*_{QTL} can be predicted in the same way as GBLUP, but if *N*_{QTL} < *M*_{e}, variable selection may deliver an advantage in accuracy because choosing a subset of variables will reduce the dimensionality of the model. Thus, substituting *N*_{QTL} for *M*_{e} is likely to better predict the accuracy of BayesB. This results in the following equation,(2)Further rearrangement of Equations 1 and 2 allows for empirical estimates of *M*_{e} () to be made in the following way,(3)where is the squared accuracy of estimates of genetic values using GBLUP or BayesB (when *N*_{QTL} ≥ *M*_{e}) for individuals without phenotypes. Predicting with GBLUP requires molecular relatedness to be known, whereas this is not required when using BayesB. This result gives a further subhypothesis that the empirical *M*_{e} is predicted by the formula for independent chromosome segments given by Goddard (2009). Also, if *N*_{QTL} < *M*_{e}, additional information on *N*_{QTL} can be gathered using BayesB accuracy because it can choose a subset of loci or variables, by applying the following formula:(4)Therefore, additional insight into quantitative traits can be gained by combining genome-wide evaluation and deterministic prediction. The accuracy of BayesB is of course influenced by the priors used in the analyses (especially priors on the proportion of loci with no effect); hence it is important to use appropriate priors to get accurate .

#### Simulations:

Our study consisted of three main steps. First, populations of individuals were simulated to be in mutation drift equilibrium. Second, effects were assigned to a number of QTL that were randomly selected from the whole set of segregating loci, and true genetic values and phenotypes were generated for each individual. The third step consisted of the genetic evaluations of the individuals generated with both GBLUP and BayesB.

#### Populations and genome:

Populations in mutation drift equilibrium were simulated by random mating individuals for many generations with recombination and mutation. The number of male and female parents was across generations. A total of 1000, 5000, and 10,000 generations were simulated until linkage disequilibrium and heterozygosity values were stable for *N*_{e} = 200, *N*_{e} = 1000, and *N*_{e} = 2000, respectively. In the final generation, a set of training individuals (of variable size) in which the loci effects were to be estimated was generated by random mating. Using the same parents, a set of validation individuals of size equal to the training set was produced whose genetic values were to be predicted. In scenarios where the size of the training sets (*N*_{P}) was larger than *N*_{e}, population size was increased by increasing the number of offspring per mating in the final generation.

The total genome size was 10 M (10 chromosomes of 1 M each). In generation zero all individuals were completely homozygous for the same allele and mutations were applied at a rate of 2.5 × 10^{−5} per locus per meiosis in the following generations. Mutations switched allele one to two and vice versa. The number of randomly distributed mutations per chromosome was sampled from a Poisson distribution with mean corresponding to the product of the number of loci per chromosome and the mutation rate. Similarly, recombinations per chromosome were sampled from a Poisson distribution with a mean of one per morgan and were then randomly placed along the chromosome. Linkage disequilibrium (LD) statistics, *i.e., R*^{2} (Hill and Robertson 1968), between adjacent segregating loci were averaged among all pairs exceeding a minor allele frequency of 0.05 and matched expected *R*^{2} values (Sved 1971; Tenesa *et al*. 2007). Allele frequency distributions were found to follow a U-shaped distribution.

The number of loci at the start of the simulation (generation zero) required several considerations concerned with obtaining an appropriate number of segregating loci (*N*_{e}) and *N*_{QTL} in the final generation. The realized relationship matrix used in GBLUP can be singular if *N*_{L} is less than the number of individuals in the matrix (Vanraden 2008), preventing the inversion needed to compute solutions. Thus, *N*_{L} at mutation drift equilibrium was made larger than the maximum sum of training and validation individuals to be used, and a similar *N*_{L} was used across all scenarios. However, as *N*_{e} increased, the proportion of segregating loci in the last generation also increased and for *N*_{e} = 200, *N*_{e} = 1000, and *N*_{e} = 2000, approximately 0.04, 0.28, and 0.52 of initial loci were segregating at mutation drift equilibrium, respectively. This required the adjustment of the number of initial loci. Across all scenarios, *N*_{L} varied between 4576 and 4721 loci (SE < 9.5 loci). To obtain *N*_{QTL,} *M*_{e} in a random mating population, as derived by Goddard (2009), was used as a guide to allow comparisons to be made across *N*_{e}. The following *N*_{QTL} scenarios were simulated: 0.03, 0.05, 0.15, 0.30, 0.50, 0.75, and 1 *M*_{e}. Table 1 outlines the corresponding *N*_{QTL} for these proportions of *M*_{e} for the three *N*_{e}, whereas Table 2 shows all scenarios that were carried out. Note that throughout this study our use of the terms “low” or “high” *N*_{QTL} may refer to different actual *N*_{QTL} across the three *N*_{e} because *N*_{QTL} was scaled to be proportional to *M*_{e} (Table 1).

The desired *N*_{QTL} were randomly chosen from *N*_{L}. True allele substitution effects were sampled from *N* (0, 1). True breeding values for 2*N*_{P} (*i.e.,* training and validation set) individuals were calculated for each QTL as 2(1 − *p _{j}*)β

_{j}(where

*p*is the major allele frequency at locus

_{j}*j*), −2

*p*β

_{j}_{j}, and ((1 −

*p*) −

_{j}*p*)β

_{j}_{j}for the major and minor homozygote and heterozygote genotype, respectively (Falconer and Mackay 1996). Total breeding values were obtained by summing over

*N*

_{QTL}. Breeding values were scaled to have the variance of

*h*

^{2}(

*i.e.,*phenotypic variance is 1). Phenotypic records were simulated for

*N*

_{P}(training set) animals by adding independent environmental terms drawn from

*N*(0, 1 −

*h*

^{2}) to true breeding values. In addition, sampling β

_{j}from a Laplace (double exponential) distribution was investigated in scenario 4. The accuracy of both genome-wide evaluation methods was computed as the correlation between true and estimated breeding values.

#### GBLUP analysis:

The evaluation with GBLUP applied the following model, which was fit in ASReml (Gilmour *et al*. 1995): **y** = μ**1** + **Za** + **e**, where **y** is the vector of phenotypic values, μ is the population mean, **Z** is an incidence matrix for random individual effects, **a** is a vector of random individual additive genetic values, and **e** is the residual. Random effects **a** and **e** were assumed normally distributed as and , respectively, where **G** was the realized relationship matrix computed using the *N*_{L} loci. In **G**, the relationship between a pair of individuals was based on identical-by-state probabilities and included all training individuals with phenotypes and validation individuals without phenotypes. The total allelic relationship at a locus between a pair of individuals was calculated as , where δ_{ij} is 1 if allele *i* in the first individual is identical to allele *j* in the second individual and 0 otherwise. Averaging over loci as yields the numerator relationship between all individual pairs required for **G** (NejatiJavaremi *et al*. 1997). Breeding values were obtained by solving the mixed model equations (Henderson 1975).

#### BayesB analysis:

We implemented a variant of the original BayesB (Meuwissen *et al*. 2001). The model applied was , where μ was the mean, **X** was an incidence equal to −1, 0, and 1 for 11, 12/21, and 22, respectively, β is the allele substitution effect for locus *j*, and **e** was the vector of residuals distributed as . Allele substitution effects were assumed to come for a mixture distribution where a proportion (π) had no effect and a proportion (1 − π) had effects distributed as .

The prior used for μ was uniform, over a long range (*i.e.,* to ), and for was uniform (*i.e.,* 0 to ), where is the phenotypic variance of the data set analyzed. The prior for was taken from a scaled inverted chi-square distribution of the form ∼ , where *s*^{2} represented the prior value for and υ was its degree of belief (Wang *et al*. 1994). A weak prior was chosen to be 1 for both υ and *s*^{2} across all scenarios. The influence of our *s*^{2} on estimates from BayesB was investigated and was found to be small. A critical discussion of the impact of priors on BayesB can be found in Gianola *et al*. (2009).

To test effect of π on breeding value accuracy, two types of scenarios were carried out. In the majority of scenarios, 1 − π was fixed to true values simulated for a particular scenario, 1 − π = *N*_{QTL}[*N*_{L}]^{−1}, which we call the “informed prior.” However, in scenario 8, 1 − π was fixed to 57 QTL for all *N*_{QTL} scenarios, 1 − π = 57 QTL[*N*_{L}]^{−1}, which we term the low prior. While not done in this study, one could estimate 1 − π instead of using a prior and this may be an advantage in real data with unknown *N*_{QTL} (*e.g.,* Pong-Wong and Hadjipavlou 2010).

Our implementation of sampling loci variances was slightly different from that of Meuwissen *et al*. (2001). They performed a haplotype analysis and therefore several haplotype effects would need estimating per locus and having more than one effect per locus required sampling of locus variances. In this study, biallelic loci were simulated and, because only one effect was estimated per locus, it was not necessary to sample a per-locus variance. This process of sampling from one variance is more similar to that of Xu (2003). However, his implementation assumed all loci had an effect, which is comparable to BayesA of Meuwissen *et al*. (2001). In contrast, our model assumed that some loci will not have an effect (π), which is more similar to BayesB. In addition, while Xu (2003) used a naïve prior for , we used an informative prior (*i.e.,* υ and *s*^{2} = 1) as proposed by Ter Braak *et al*. (2005).

The length of the Gibbs chain was 105,000 iterations and the first 5000 iterations were discarded as burn-in. Estimates at every 20th iteration were stored as a sample resulting in a total of 5000 samples. Several analyses were carried out to test if the sampling protocol was appropriate. Autocorrelations of effects and estimated breeding values were found to be close to zero in stored samples, which showed that they were almost independent, so Monte Carlo error was < 0.0003 in all scenarios (Geyer and Thompson 1992). This allowed for shortening of the chain length to 45,000 iterations (2000 samples) for scenarios with *N*_{P} = 2000 to reduce running time. Furthermore, convergence was investigated by using a variety of starting values for *s*^{2} in the BayesB. Using different starting values resulted in nearly identical estimates of and breeding value accuracy suggesting that our Gibbs chains were converging. In addition, a long chain of 160,000 iterations was run and estimates and accuracy values were very similar to the shorter chain. While convergence cannot be guaranteed in any MCMC study, evidence suggests that our MCMC protocol converged.

## RESULTS

#### GBLUP accuracy:

The accuracy of GBLUP for a given set of values for *N*_{P} and *h*^{2} stayed constant regardless of *N*_{QTL} in all scenarios simulated (Figure 1 and Table 3). This confirmed our first hypothesis. The constant accuracy results from the unique *M*_{e} of a random mating population, which, in turn, depends on *N*_{e} and *L* (Goddard 2009). The plateau of GBLUP accuracy increased when more phenotypic records were used in the estimation of breeding values and when *h*^{2} increased (Figure 1). Sampling β_{j} from a Laplace (double exponential) distribution resulted in the same GBLUP accuracy as sampling from *N*(0, 1).

#### BayesB accuracy:

In contrast to GBLUP, with BayesB the accuracy was highest at low *N*_{QTL} and then decreased as *N*_{QTL} increased (Figure 1 and Table 3). Once *N*_{QTL} was high, BayesB reached a plateau where the accuracy does not decrease anymore despite increasing *N*_{QTL}. This plateau was observed in all BayesB scenarios and the value of the accuracy at this plateau depended on *N*_{e}, *h*^{2}, and *N*_{P} (Tables 3 and 4). The plateau decreased when *N*_{e} increased. An increase in *h*^{2} and *N*_{P} influenced the accuracy in two ways: first, it raised the overall accuracy in all *N*_{QTL} scenarios, and second, it slightly shifted the onset of the accuracy plateau to higher *N*_{QTL}. Sampling effects from a Laplace distribution instead of *N*(0, 1) raised BayesB accuracy slightly, but when *N*_{QTL} was equal to 0.03 *M*_{e} or *N*_{QTL} > 0.5 *M*_{e} no difference in accuracy was observed between sampling from both distributions.

The use of low *N*_{QTL} priors for 1 − π yielded a lower accuracy than informed priors (Table 2, scenario 8, and Figure 2). The gap between the accuracy of informed and low priors increased as *N*_{QTL} increased because the proportion of the genetic variance explained by the low *N*_{QTL} prior became smaller.

#### Comparison of GBLUP and BayesB:

The comparison of GBLUP and BayesB leads to several key observations. BayesB always performed better than GBLUP at low *N*_{QTL}. However, as *N*_{QTL} increased, the difference between the two methods became smaller and eventually both approaches achieved very similar accuracy. The *N*_{QTL} at which this equivalence occurred was increased with increasing *N*_{e}, *N*_{P}, and *h*^{2} (Figure 1 and Tables 3 and 4). Once *N*_{QTL} increased past the equivalence point, BayesB had a slightly lower accuracy than GBLUP and settled at a constant accuracy (Table 3). The difference between GBLUP and BayesB at high *N*_{QTL} decreased when *N*_{P} was increased. Sampling effects from a Laplace instead of a Normal distribution did not affect these general trends.

In Figure 2, the maximum *x*-value of *N*_{QTL} plotted is equal to the predicted *M*_{e} from Goddard (2009) and one observes that BayesB accuracy approaches the plateau and becomes similar to GBLUP accuracy well below *M*_{e}. A first inspection therefore suggests that the second hypothesis does not hold. However the argument for the second hypothesis is based upon the empirical . calculated with Equation 3 for *h*^{2} = 0.1, 0.3, and 0.5 by averaging over the values of *N*_{QTL} gives values of 890, 900, and 700. In this context, hypothesis 2 is shown to be broadly valid, in that superiority of BayesB over GBLUP disappears when *N*_{QTL} approaches , although there is a trend for BayesB accuracy to become similar to GBLUP accuracy sooner than *M*_{e}. These observations held for other scenarios. The comparison between *M*_{e} and is addressed in more detail below.

#### Decay in accuracy:

The decay in accuracy between training and validation individuals was also greater for GBLUP than that of BayesB at low *N*_{QTL} (Table 4) as observed by Habier *et al*. (2007). However, this trend diminished as *N*_{QTL} increased and the decay of accuracy reached similar levels in both methods at high *N*_{QTL}.

#### Predictions of accuracy:

Figure 3 shows the accuracies of GBLUP and BayesB predicted with Equations 1 and 2, respectively, and the accuracies from simulations in the validation set. Predictions of GBLUP and BayesB (at high *N*_{QTL}) accuracy were generally accurate. The accuracy of the predictions was highly dependent on *M*_{e}. In BayesB, the drop in accuracy as *N*_{QTL} increased was predicted well. Equation 2 tended to overpredict BayesB accuracy, particularly in scenarios where *N*_{QTL} was a low proportion of *M*_{e}, using Goddard (2009), and low *h*^{2} and *N*_{P}.

#### Empirical and :

We estimated *M*_{e} using the accuracy of GBLUP or BayesB (when *N*_{QTL} = *M*_{e}) (Equation 3). When GBLUP accuracy was used, we averaged the accuracy across all *N*_{QTL} scenarios simulated for a given set of values for *h*^{2} and *N*_{P}. This was done for each population replicate to obtain a standard error. It is a subhypothesis that *M*_{e} as predicted by Goddard (2009) approximates . Empirical estimates of *M*_{e} using GBLUP were always lower than those using BayesB (Table 5) due to the higher GBLUP accuracy when *N*_{QTL} is high. The estimates using BayesB accuracy were more variable than GBLUP as shown by the larger SE of BayesB. A general trend was apparent showing that increased as *N*_{P} increased, which suggests that has not reached a bound; however, the change in is small in relation to the difference from *M*_{e}. Furthermore, does not increase linearly with *N*_{P} and this may indicate that it may be approaching asymptotic values.

The number of QTL controlling the trait (*N*_{QTL}) was estimated using Equation 4 with reliability values from BayesB when *N*_{QTL} < *M*_{e}. As shown in Figure 4 for scenario 7, the estimated *N*_{QTL} do follow the actual *N*_{QTL} well and are predictive of the trend. Empirical were better estimated with higher *N*_{P}. Note that incorrect priors will reduce accuracy.

## DISCUSSION

We have compared GBLUP and BayesB at various population and trait genetic architectures and at various *N*_{P}. We demonstrated that GBLUP had a constant accuracy, for a given *N*_{P} and *h*^{2}, regardless of *N*_{QTL}. The accuracy of BayesB was greatest at low *N*_{QTL}, decreased with increasing *N*_{QTL}, and eventually reached a lower accuracy plateau below which the accuracy did not fall even when *N*_{QTL} was further increased. BayesB has an advantage over GBLUP at low *N*_{QTL}, but this advantage decreased as *N*_{QTL} increased and it finally diminished completely or, in some cases, the advantage switched to GBLUP depending on *N*_{e} and *N*_{P}. The point at which GBLUP and BayesB accuracy became equal was related to the empirical number of independent segments estimated from the GBLUP accuracy, , which was less than the theoretical prediction of *M*_{e} provided by Goddard (2009). It is clear from this study that quantifying the superiority of GBLUP over BayesB or vice versa depends upon three sets of attributes: the population genome structure (*e.g., N*_{e}), the trait genetic architecture (*e.g., N*_{QTL}, *h*^{2}), and the size of the training set. Superiority is, therefore, not a property of the method and general statements to that effect should be avoided. Furthermore, we have proposed and tested equations for the prediction of GBLUP and BayesB accuracy and the estimation of and . Our predictions follow achieved GBLUP and BayesB accuracy well. Empirical values seem to be approaching an asymptote with increasing *N*_{P} and our estimates of follow the trend of true *N*_{QTL}.

The constant accuracy of GBLUP, for a given *h*^{2} and *N*_{P}, confirmed our first hypothesis and clearly shows that this accuracy depends crucially on genomic properties, and not on properties of the trait, and is summarized in the concept of *M*_{e}. In turn, *M*_{e} will depend on *N*_{e} and *L*, which can be viewed, in the short term, as constants in a random mating population (Goddard 2009). In a wider sense, *M*_{e} and the more commonly known haplotype blocks are both measures resulting from population history and as such are related, although not interchangeable. Haplotype blocks are physical segments of the genome within which haplotype diversity is low, bounded by areas where evidence for historical recombination exists (*e.g.,* Goldstein 2001; Gabriel *et al*. 2002; Frazer *et al*. 2007). While haplotype blocks are a physical measure, *M*_{e} is a more statistical concept related to the behavior of the genome in genomic evaluations. It is theoretically derived from variation in relationship between relatives and from the continuum of variation in linkage disequilibrium across the genome (Goddard 2009). Both the number of haplotype blocks and *M*_{e} increase with increasing *N*_{e} and in close relatives haplotype blocks will be long and *M*_{e} will be small (Hayes *et al*. 2009c). It should be noted that the dependence of GBLUP on *M*_{e} shown in this study does not support the conclusion that GBLUP assumes an infinitesimal model in which there are a very large number of genes each contributing a small portion to the genetic variance. In fact, GBLUP is indifferent to *N*_{QTL}, unless *N*_{QTL} is very small (*i.e.,* one or < 10 QTL; unpublished results), as demonstrated in this study.

While it is clear that in GBLUP the accuracy depends on *M*_{e} and not on *N*_{QTL}, in BayesB the accuracy depends on the interplay of both features of genetic architecture (*N*_{QTL} and *M*_{e}). Our results follow our second hypothesis that the behavior of BayesB accuracy at high *N*_{QTL} is similar to that of GBLUP. The accuracy of BayesB declines as *N*_{QTL} increases and eventually becomes similar to GBLUP as *N*_{QTL} increases. The point at which this occurs approaches *M*_{e} with increasing *N*_{P}. Therefore, we show that the accuracy of BayesB at high *N*_{QTL} is also dependent on *M*_{e} just like in GBLUP. This is also supported by the accuracy plateau being observed across similar proportions of *M*_{e} and that near equivalence is approximated closely by *N*_{QTL} = , where is the empirical estimate of *M*_{e} obtained from GBLUP. Therefore, the plateau is not a function of actual *N*_{QTL} but of *M*_{e}. Another argument for to be a major determinant in BayesB accuracy at high *N*_{QTL} is that it can be accurately predicted with Equation 3. In addition, a recent study of Barley using real data has also shown that GBLUP accuracy exceeded that of BayesB when *N*_{QTL} was large (Zhong *et al*. 2009).

The difference in accuracy between GBLUP and BayesB may be explained by the way both methods process and model information. With GBLUP, each independent segment is estimated irrespective of whether it has an effect whereas with BayesB, an additional step is involved in which for each locus it is estimated if the locus has an effect or not. The fact of choosing a near-correct subset of loci with effect with BayesB increases accuracy but there is also an error associated with determining π, which depends on *N*_{L}. When *N*_{QTL} < *M*_{e} the advantage of choosing a subset is clear but when *N*_{QTL} ≥ *M*_{e} this diminishes (heuristically, it is likely that each independent segment contains QTL). Thus, under the latter scenario, GBLUP performs slightly better than BayesB. This argument is further supported by the decreasing difference in accuracy between GBLUP and the BayesB accuracy plateau at high *N*_{QTL} when *N*_{P} is increased, because BayesB can identify the loci with effect better with more information.

The use of QTL effects sampled from a Laplace distribution instead of a Normal distribution resulted in no quantitative change to GBLUP accuracy and in only minor changes in accuracy to BayesB. Overall, the trends observed were fully consistent with the conclusions obtained with normally distributed effects, confirming other studies that have compared these effect distributions (Habier *et al*. 2007; Daetwyler *et al*. 2008; Meuwissen 2009).

It is challenging to compare statistical approaches that embody different statistical philosophies. In the genomic and animal breeding context, comparing approaches using point estimates of breeding values, such as their accuracy, is common. Breeding itself is an application of decision theory (*e.g.,* Woolliams and Meuwissen 1993) and in model terms the value of a breeding decision may be measured as the expected rate of genetic gain (Δ*G*) arising from the decision. The focus on accuracy in this article arises from the “breeders equation” *E*[Δ*G*] = *ir*σ_{A}/*L*, where *i* is standardized intensity, σ_{A} is the additive genetic standard deviation, *L* is generation interval, and *r* is the accuracy. The concept of accuracy is therefore central to the infrastructure of genetic evaluation.

This does not mean that accuracy as presented here will remain the key comparison in the future, and the richer information available from Bayesian methods may at some point overturn this paradigm. The expectation of the posterior distribution for an individual's breeding value is a natural estimate arising from a Bayesian analysis (*e.g.,* Goddard 2009; Meuwissen *et al*. 2009) and also gives a natural comparison with the estimates from BLUP approaches. In addition, alternative measures of comparison to expected gain (in a Bayesian sense, minimizing the squared loss) have been considered, such as percentiles of the posterior, which will be influenced by (co)variances of candidates (*e.g.,* Woolliams and Meuwissen 1993), or other approaches to selection such as minimax regret.

Genome-wide evaluation methods are popular because they seem to offer a solution to predicting a large numbers of parameters (*i.e.,* loci effects) from a limited number of phenotypes, but a number of concerns have been expressed. For example, Bayesian methods without an influential prior may experience convergence problems. Investigation with noninformative priors for scale factors have resulted in BayesB accuracy similar to those using priors in this study (unpublished results). However, more investigation of convergence of genome-wide methods is needed, including minimum length of Gibbs chains, influence of priors on convergence, and optimal protocols for parallel computing. At the same time, there is concern that the priors on degree of belief and scale factors used in the BayesB (Meuwissen *et al*. 2001) exert too much influence, thereby preventing Bayesian learning (Gianola *et al*. 2009). Gianola *et al*. (2009) clearly show that priors influence estimates.

The findings that both GBLUP and BayesB depend significantly on *M*_{e} are given more weight by the fact that the accuracy of both methods can be predicted with Equations 1 and 2, respectively. The predictions were generally accurate but limitations have also been highlighted, especially in predicting BayesB accuracy. Extensions to the formulae may be needed to predict BayesB more accurately at low *N*_{QTL} relative to *M*_{e} when *h*^{2} or *N*_{P} are also low, and there is also a need to review whether *M*_{e} as formulated by Goddard (2009) is a good predictor of . However, being able to predict the trend in BayesB accuracy is a significant step forward (Figure 3). One of the assumptions in the original derivation (Daetwyler *et al*. 2008) was that all of the genetic variance was tagged by the loci used in the analysis. This represents a complication when applying our equations to predict the accuracy using a commercially available SNP chip, because the current chips are likely to miss a portion of the genetic variance. First, it is likely that the number of SNP on current chips is not high enough to tag all the genetic variance and variation not associated with SNP (*e.g.,* copy number variation; Redon *et al*. 2006) will also be missed. Second, SNP with higher than average heterozygosity are selected for developing the chips and therefore loci with low minor allele frequency are proportionally underrepresented (*i.e.,* ascertainment bias). The result of this missing genetic variance in the analysis of real populations is that our deterministic equations are likely to overpredict the accuracy in both methods.

The fact that our equations account for the entire genetic variance will, however, be a clear advantage as the scientific community moves toward the analysis of sequence data for which our formulae are appropriate in their current form. In sequence data analysis, all basepairs are included and therefore no rare alleles would be missing. Thus, all the genetic variance is contained in the sequence and the prediction does not rely on capturing LD with the true mutation.

Additional insight into quantitative traits can be gained by combining genome-wide evaluation and deterministic prediction. We have shown that *M*_{e} can be estimated with Equation 3 if the accuracy of GBLUP or BayesB is known. Two theoretical values for *M*_{e} have been proposed to date, *M*_{e} = 2*N*_{e}*L*/log(4*N*_{e}*L*) (Goddard 2009) and 2*N*_{e}*L* (Hayes *et al*. 2009c). Our estimates of *M*_{e} (), even though still increasing with increasing *N*_{P}, remain lower than both theoretical values but were of the right order of magnitude when using Goddard's equation (Goddard 2009) rather than 2*N*_{e}*L* (Table 5). In real data using 2*N*_{e}*L* in Equation 1 appears to predict GBLUP accuracy well (Hayes *et al*. 2009b), but this may be due to the fact that SNP arrays miss a significant proportion of the genetic variance. Once more of the genetic variance is captured with new technology we would expect that estimates of from real data would likely tend toward the derivation of Goddard (2009), or possibly lower. In addition to , *N*_{QTL} can be estimated with BayesB accuracy if *N*_{QTL} < *M*_{e}. As Figure 4 shows, this can be a coarse measure of *N*_{QTL}, because small changes in accuracy can cause relatively large fluctuations in . A complication in estimating *N*_{QTL} is that several SNP may be in partial LD with a particular QTL and this could lead to overestimates of *N*_{QTL}. In addition, BayesB requires knowledge of the true *N*_{QTL} in its prior. Nevertheless, estimates of *N*_{QTL} could aid investigations into complex trait architectures, perhaps through examining the correspondence between the assumed prior on *N*_{QTL} and the resulting estimate.

The trends observed in this study are supported by experiences in real data. Results in dairy cattle genotyped with a 50K SNP chip show that GBLUP and BayesB lead to very similar accuracies in most traits (Hayes *et al*. 2009a; Vanraden *et al*. 2009). Vanraden *et al*. (2009) report correlations between linear and nonlinear methods of >0.99 in a vast majority of traits. This suggests that in real animal populations quantitative traits are controlled by a large number of QTL and for most traits *N*_{QTL} ≥ *M*_{e}. There are of course exceptions to the rule and, for example, in dairy cattle BayesB performed better than GBLUP in milk fat content (Vanraden *et al*. 2009). This is likely due to a significant portion of the variation being explained by few genes of large effect, such as DGAT (Grisart *et al*. 2004). Hence, in this trait it is likely that *N*_{QTL} < *M*_{e} or that a relatively small number of QTL explain the majority of the genetic variance in the trait. We have investigated a scenario where one QTL explained 25% of the genetic variance and 1000 very small QTL explained the rest of the variance (results not shown) and the results confirmed our hypothesis.

The principles established in this study should be transferable to other populations as the trends have been confirmed across three different *N*_{e}. In our view, investigators need to gather evidence to answer two questions. First, what is the population's *M*_{e} and, second, how many *N*_{QTL} are likely contributing to the genetic variance in a particular trait? When *N*_{QTL} ≥ *M*_{e} GBLUP will result in higher accuracy than BayesB, but when *N*_{QTL} < *M*_{e} BayesB will outperform GBLUP.

## Acknowledgments

We are grateful to Piter Bijma, Jesús Fernández, and two anonymous reviewers for their helpful and constructive comments. H.D.D. was supported by the SABRETRAIN Project, which is funded by the Marie Curie Host Fellowships for Early Stage Research Training, as part of the Sixth Framework Programme of the European Commission. B.V. received support from the Scottish Executive Environment and Rural Affairs Department (SEERAD) and Instituto Nacional de Investigación y Technologica Agraria y Alimentaria, and J.A.W. received funding from the Biotechnology and Biological Sciences Research Council (BBSRC). This work has made use of the resources provided by the Edinburgh Compute and Data Facility (ECDF, http://www.ecdf.ed.ac.uk/). The ECDF is partially supported by the eDIKT initiative (http://www.edikt.org.uk).

## Footnotes

Communicating editor: E. Arjas

- Received March 21, 2010.
- Accepted April 7, 2010.

- Copyright © 2010 by the Genetics Society of America