Functional Mapping Model of Quantitative Trait Loci in a Full-sib Family

Functional mapping of quantitative trait loci (QTL) has embedded the biological mechanisms and processes of dynamic traits in the statistical method of QTL mapping. Although the statistical model for functional mapping has been well developed in inbred lines, it has not been proposed in a full-sib family, in which the number of alleles may vary over loci and the linkage phases are usually unknown between loci. Since forest trees have several unique biological characteristics and most of them are outbred species, we should not directly apply the functional mapping model for inbred species to them. The objective of this study is to develop a functional mapping statistical model for a full-sib family in forest trees. We suppose that the QTL genotype of one parent is heterozygous but the other is homozygous, and consider all pairwise combinations of marker segregation types and all possible linkage phases between mark loci in a diploid full-sib family into our model. A maximum likelihood approach based on a logistic-mixture model, implemented with EM algorithm, was developed to provide the estimation of QTL positions and the parameters responsible for growth trajectories. The results of a large number of simulations showed that our model was robust for identifying the QTL position and estimating the parameters of QTL effects and residual (co)variance.(Van der Geer, 2000, p. 51-59)


Introduction
Most important traits of biology, biomedicine and agriculture are dynamic complex traits, such as body height and weight, milk production, tumor size, HIV load, circadian clock and drug response that all change with time or other independent variables.The genetic control of dynamic complex traits should be accordingly expressed as a function of some independent and continuous variable (Wu and Lin 2006).They are under the control of an interacting network of minor genes and of environmental factors (Lynch and Walsh 1998).Therefore, the genetic study of these dynamic complex traits has been one of the most difficult tasks in biology.Since Lander and Botstein (1989) developed the interval mapping method to map quantitative trait loci (QTL) within a particular chromosomal interval bracketed by two flanking markers, many improved or new statistical methods for QTL mapping have been proposed (Haley and Knott 1992;Zeng 1994;Jansen and Stam 1994;Satagopan et al. 1996;Kao et al. 1999;Sillanpaa and Arjas 1999;Xu and Yi 2000).Although these traditional statistical approaches to QTL mapping are useful, they neglect the developmental features of traits because they are all analytic methods only for a single time point.
A series of statistical models have been developed to locate and identify QTL that affect dynamic complex trait.Wu et al. (2002) first presented a logistic mixture model for detecting major genes responsible for growth trajectories to illustrate how the genes causing differentiation in growth trajectories.In the meantime, Ma et al. (2002) developed a general statistical framework called functional mapping, which integrates the mathematical features of dynamic complex traits within the QTL mapping theory to identify QTL affecting growth trajectories in inbred lines.Ma et al. (2004) proposed a likelihood approach for mapping growth trajectories in a phase-unknown full-sib family, and used dominant markers only.More recently, Yang et al. (2006) presented an approach based on Legendre polynomials for QTL mapping of longitudinal traits in inbred lines.Further more£Yang et al. (2007) developed a semiparametric approach for composite functional mapping of dynamic QTL.The time-dependent genetic effects of QTL tested within a marker interval were modeled by functional mapping method based on a biologically meaningful parametric function and the marker effects outside the test interval were modeled by a nonparametric approach based on Legendre polynomials.It has tremendous power to detect and separate linked QTL located on the similar regions of a chromosome.
Forest trees have several unique biological characteristics including a long generation cycle, high genetic heterozygosity, high genetic load and high environmental heterogeneity (Yi et al. 1998).It is difficult or even impossible to gain two homozygous parents through self-fertilization or backcross and to perform accurate QTL mapping through inbred genetic design.We should not directly apply the QTL mapping model for inbred lines to outbreed species (Shi and Tong 2006).For this reason, it is necessary to establish the statistical model for functional QTL mapping in outbreed species.
We proposed a functional mapping statistical model for a full-sib family in forest trees.In a diploid full-sib family, we supposed that the QTL genotype of one parent was heterozygous but the other was homozygous, i.e., QQ×Qq, and considered all pairwise combinations of marker segregation types and all possible linkage phases between any two mark loci into our model.A maximum likelihood approach based on a logistic-mixture model, implemented with EM algorithm (Dempster et al. 1977), was developed to provide the estimates of QTL positions, QTL effects, and other model parameters responsible for growth trajectories.A series of simulation studies showed that our model was robust for identifying the QTL position and estimating the parameters of QTL effects and residual (co)variance.A real example was given for illustrating the validation of our functional QTL mapping model in practice.

Growth trajectory
The common property of dynamic complex traits is that they can be described as a function (or stochastic process), which relates some independent and continuous variables consisting of an infinite number of points to the measures of the phenotypic size.Throughout this article, growth trajectories (Wu et al. 2002) are used as a concrete example to illustrate the idea that functional mapping relies on the concepts from functional analysis and stochastic processes.It is well known that growth trajectories are governed by biological laws, and a biological law controls and propels the phenotypic change of ontogeny and traces out the ontogenetic trajectories (Wu et al. 2002).Many functions have been established to describe the growth trajectories, such as logistic (or sigmoid) growth curve, which is regarded as one of the most important functions to represent the change of development.In the present study, we use logistic growth curve to represent growth trajectories of an individual.It can be mathematically described by Where a is the asymptotic or limit value ofgwhen t → ∞, a/(1+b) is the initial value of gwhen t= 0, and c is the relative rate of growth.

Genetic design
We consider a full-sib family derived from two parents that have some traits of interest differing substantially.There are up to four different alleles at an informative marker locus in the two parents.Maliepaard et al. (1997) summarized that the combinations of the two parental marker genotypes (e.g., segregation types) may be ab x aa, aa x ab, ab x ab, ab x cd, a0 xa0, ab xa0 or a0 x ab, where a, b, c and d denote different alleles at a marker locus, 0 denotes the null allele, the two characters left of the crossing symbol represents the maternal marker genotype and the two characters on the right the paternal marker genotype.The linkage phases between any two marker loci include several cases: (1) coupling (c) in the maternal parent and uninformative in the paternal parent, or vice versa, (2) repulsion (r) in the maternal parent and uninformative in the paternal parent, or vice versa, (3) coupling in both parents (cxc), (4) repulsion in both parents (rxr) and ( 5) coupling in the maternal parent and repulsion in the paternal parent (cxr), or vice versa (rxc).Here, the linkage phases between any two adjacent loci on a linkage group are assumed to be known.
Although the number of QTL alleles is not fixed for the two heterozygous parents, we consider a QTL with genotype QQ for the maternal parent and Qq for the paternal parent for simplicity.For a particular genotype j ( j =1 for QQand j =2 for Qq), the logistic curve is described by the parameters a j , b j and c j .These parameters of two different genotypes will be compared to determine whether and how this QTL affects growth trajectories.
Assume that there are a total of Nprogeny measured at T time points in a full-sib family.The phenotypic trait value of progeny i measured at time t (t = 1, • • • , T ), governed by the QTL genotype located on a marker interval flanked by two markers, can be expressed by a linear statistical model (Wu et al. 2002;Ma et al. 2002).
where x i j is an indicator variable for the possible QTL genotype of progeny i and defined as 1 if the QTL genotype is j or 0 otherwise; g j (t) is the genotypic value of the QTL genotype j for the trait at time t; e i (t) is the residual effect of progeny i at time t and distributed as N(0, σ 2 e (t)).The residual effect includes the aggregate effect of polygenes and error effect.The probability of x i j which takes 1 or 0 depends on the marker genotype of the two flanking markers and the position of the QTL on the marker interval.
The conditional probability of a QTL genotype (QQ or Qq) is determined by the QTL position and the different genotypes of the two flanking markers.The principle about how to derive the conditional probabilities of QTL genotypes between the two flanking markers was described by Tong and Shi (2002).According to the approach, we derive a uniform formula for computing conditional probabilities of QTL genotypes given the two flanking markers with different alleles and linkage phases in a full-sib family.As an example, we consider one of the combinations of marker segregation types, abab×aacd, e.g., the maternal genotype being abab and the paternal genotype aacd at the two marker loci.A putative QTL that governs growth trajectory is presumably located between the two marker loci.Let r be the recombination fraction between the two marker loci, r 1 be the recombination fraction between the first marker locus and the QTL, and r 2 be the recombination fraction between the QTL and the second marker locus, then we have the relationship, r = r 1 £r 2 £2r 1 r 2 , under the assumption of no interruption between two intervals on chromosomes.The frequencies of the combined genotypes in the progeny at the two marker loci and QTL are shown in table 1. Dividing each element of every row by the sum of all the elements of the same row in table 1, we can obtain the conditional probabilities of QTL genotypes given flanking markers.

Statistical methods
The phenotypic observation of progeny i measured at T time points for QTL genotype j is conditional on the unknown parameters (Ω) and the marker data (M) and follows a multivariate normal distribution with density, where is the vector of phenotypic observation measured at T time points; g j is the vector of the expected genotypic values for QTL genotype j ( j = 1, 2); Σ is the residual variance-covariance matrix of the phenotypes measured at different time points.According to the logistic curve of equation 1, g j is modeled as for QTL genotype j and the set of parameters a j , b j , c j defines the shape of the growth curve.Σ is assumed identical among different genotypes and modeled by using AR(1) (first-order autoregressive) model as where, σ 2 isthe same residual variance for the trait at each time; ρ (0 < ρ < 1) is the proportion parameter with which the correlation decays with time lag.
The joint likelihood function conditional on the observed phenotype (y) and marker data (M) for a sample of size N can be represented by a multivariate mixture model where the vector Ω = a j , b j , c j , ρ, σ 2 contains the unknown parameters to be estimated for the QTL effect, QTL position, and residual (co)variances; p j|i is the probability of QTL genotype j conditional upon the genotypes of the two flanking markers for progeny i in a full-sib family.
The maximum-likelihood estimates (MLEs) of the unknown parameters for a pleiotropic QTL can be computed by implementing the EM algorithm.In the E-step (i.e., expectation), a posterior probability of progeny ihaving a QTL genotype jis calculated as In the M-step (i.e., maximization), all parameters are estimated from the equations obtained by differentiating the loglikelihood of equation 6 with respect to each unknown parameter in Ω and letting the derivatives equal zero.The estimates are then used to update P (P = {P j|i , j = 1, 2; i = 1, • • • , N}), and the process is repeated until the iterative value of each parameter converge at a satisfying critical value.The values at convergence are the MLEs of Ω. Ma et al. (2002) gave the iterative expressions for estimating the parameters in Ω, where the conditional probabilities of QTL genotype in our model is completely different from the inbred lines.The standard errors of the MLEs can be estimated using the inverse of the Fisher information matrix.

Hypothesis tests
The hypothesis about the existence of a major gene affecting an overall growth curve can be formulated as H 1 : at least one of the equalities above does not hold.( 8) where H 0 corresponds to the reduced model, in which the data can be fit by a single logistic curve, and H 1 corresponds to the full model, in which there exist two different logistic curves to fit the data.The test statistic for testing the above hypotheses is calculated as the log-likelihood ratio of the full model over the reduced model: where Ω and Ω denote the MLEs of the unknown parameters under H 0 and H 1 , respectively.
LR is used as a kind of probable measurement of the potential QTL position on chromosome.The probable position of QTL is determined by the critical thresholds of LR.

Simulation design
A series of simulation studies were performed to investigate the theoretical power of our model.Suppose that there are six equidistant makers on a linkage group of 100 cM orderly, whose segregation types are ab×aa, ab×cd, aa×ab, aa×ab, ab×ab and ab×cd, respectively.The linkage phases between neighboring markers are assumed to be r, c, c, r and c × r, respectively.A putative QTL is also assumed to be located at the position of 50 cM from the first marker of the linkage group.The simulations were performed by considering different number of time points (T = 8 vs. T = 16), different heritability levels (H 2 = 0.1 vs. H 2 = 0.4) and various sample size (N = 100 vs. N = 400).The true parameters in our simulations are set to be the corresponding values in the first column of Table 2 and Table 3, where the σ 2 is determined by the heritability H 2 and the largest variance σ 2 G explained by the QTL over the T time points.The precision of the parameter estimation was evaluated based on the square roots of the mean square errors (SMSE).

Simulation results
In general, our model can provide reasonable estimates for the QTL position and the parameters of QTL effects.Tables 2-3 lists the mean of the QTL location, the means of MLEs of the QTL effects and their SMSE based on 1000 simulation replicates under different simulation schemes.We found that the estimation precision of the QTL effects and position was dependent on heritability level, sample size and the number of time points.Even though heritability level, sample size and the number of time points are all small, the means of all the parameter estimates approximate to the true values well and all the SMSEs are also small.If one of the three factors (i.e., heritability level, sample size and the number of time points) increases, the SMSE of partial or all parameters will decrease obviously: (1) The SMSE of QTL position and QTL effects parameters decrease by about twofold when N increases from 100 to 400 for the fixed number of time points and fixed heritability level; (2) When H 2 increasesfrom 0.1 to 0.4 for the fixed number of time points and fixed sample size, the precision of QTL position dose not increase obviously, but the estimation precision of the parameters (a j , b j , c j ) which describe the logistic curve of different QTL genotypes increases by about 2.5-3 times; (3) As T increases from 8 to 16 for fixed heritability level and fixed sample size, the estimation precision of the logistic curve parameters (a j , b j , c j ) and residual variance (σ 2 ) increases with different degrees.

Discussion
The statistical model for functional QTL mapping has been well developed in inbred lines (Ma et al. 2002;Yang et al. 2007), it has not been proposed in a full-sib family, in which the number of alleles may vary over loci and the linkage phases are usually unknown between loci.In this article, we developed a functional mapping model based on a logistic mixed model for mapping QTL in a full-sib family in forest trees.
There are many factors that affect the power and precision of QTL mapping in a full-sib family in forest trees, such as the heterozygous degree of two parents, the marker density, the population size, the heritability level, and the linkage phases between loci.In the present study, we illuminated how the three controllable factors (i.e., heritability level, sample size and the number of time points) impacted on the power of functional QTL mapping through a series of simulation studies.We found that our model could achieve adequate power of QTL mapping and accuracy of the parameters estimation even if sample size was smaller and heritability level was lower.
Our functional QTL mapping model is based on the assumption that the QTL genotype of one parent is heterozygous and the other is homozygous in a full-sib family in forest trees.In summary, there are several major features in this study.First, there are up to four marker alleles at a single locus and the number of alleles may vary over loci in a full-sib family (Maliepaard et al. 1997).We considered all pairwise combinations of segregation types and all possible linkage phases between marker loci in a full-sib family into functional mapping model of QTL for the first time.Therefore, our model can be used to locate QTL on the genetic linkage maps which are constructed with a mixed set of different segregation markers (Yin et al. 2002;Zhang 2005) in a full-sib family in forest trees.Second, functional mapping can enhance the understanding of genetic architecture of quantitative traits by viewing the developmental process of dynamic traits as a growth trajectory (Wu et al. 2006).We can draw growth trajectory of each QTL genotype by estimating the QTL parameters and detect the dynamic pattern of QTL expression over time (Ma et al. 2002).With the growth trajectories of trait, we can determine how each QTL genotype affects development of traits, how a trait expresses under different environmental conditions and whether there is a common genetic basis for individual development (Wu et al. 2006).Third, the simulation results indicate that functional mapping model has high computational efficiency and robust precision for identifying the QTL position and estimating the parameters of QTL effects and residual variance.An important finding is that increasing heritability level can make more substantial improvements in the accuracy of QTL parameters than increasing sample sizes.This conclusion suggests that an optimal experiment design is more advisable for precise estimation of QTL parameters.For example, we should pay more attention to using silvicultural treatment to increase site homogeneity rather than planting a large sample size on nonuniform sites (Lin et al. 2003).
In a general full-sib family, the segregation pattern of QTL genotype can be various for major genes governing growth trajectories in different regions of the genome (Wu et al. 2002).A further study would be done by performing a model selection process for many different QTL segregation types.That would discriminate the linkage phase between the marker and QTL, and improve the power of QTL mapping through identifying the true QTL genotype.

Table 1 .
Genotype frequencies in the progeny generated by hybridization, abQQab×aaQqcd, without interruption under different linkage phases (coupling and repulsion) in a full-sib family.

Table 2 .
Mean of the QTL position and the MLEs of the model parameters for T = 8 under different heritability levels (H 2 = 0.1 and 0.4) and sample size (N = 100 and 400) based on 1000 Monte Carlo simulations.The SMSE of each parameter is in the parentheses.

Table 3 .
Mean of the QTL position and the MLEs of the model parameters for T = 16 under different heritability levels (H 2 = 0.1 and 0.4) and sample sizes (N = 100 and 400) based on 1000 Monte Carlo simulations.The SMSE of each parameter is in the parentheses.