Population Structure , Linkage Disequilibrium and Selective Loci in Natural Populations of Prunus davidiana

Prunus davidiana (Carrie're) Franch is a very important resource for the restoration in dry and arid areas, genetic improvement of peach, and extraction of health-promoting components. To effectively use the resource, we must have a measure of genetic diversity of P. davidiana and its population structure. LD (Linkage disequilibrium) provides information for association mapping underlying the phenotypic variation observed. Selective loci reveal adaptive evolution processes resulting from natural selection. A set of 190 genotypes from seven natural populations(SXTB, SIYQ, SXFX, NXXJ, SIJC, GSHT, GSHS) of P. davidiana collected from the range of P. davidiana in China was fingerprinted with 23 SSR markers, and analyzed with spatial structure, pairwise Fst (differentiation coefficient), PCA (principal coordinate analysis), estimation of groups of populations with STRUCTURE software, selective loci obtained from lnRH tested by standardization distribution and Grubbs. Our results demonstrate that population structure of four groups existed among populations through complementary analyses of the above mentioned methods; significant LD numbers from 22 to 129 between loci within unstructured populations were detected; there were five selective loci in all populations and two common selective loci for local natural selection between populations. We should conserve four populations among seven populations; these selective loci may provide information for disclosing adaption evolution and candidate genes according to selective loci and alleles; LDs inform how to use them for association analysis.

genetic characterization of natural populations of P. davidiana, which provides some information for conservation and utilities.SSR (Simple sequence repeat or Microsatellite) technology is usually preferred among molecular biology methods due to SSR markers displaying co-dominant inheritance, hypervariability and having high cross-species transferability (Tauraz, 1989;Sosinski et al., 2000;Wünsch, 2009).More than 300 SSR markers have been isolated and characterized in the subgenus Amygdalus of Prunus (Sook et al., 2008).The markers provide a very reliable and convenient tool for the analyzing genetic diversity of P. davidiana.Genetic diversity studies have been performed in peach and other species of the subgenus Amygdalus (Aranzana et al., 2002(Aranzana et al., , 2003(Aranzana et al., ,2010;;Bouhadida et al., 2007;Cheng, 2007a,b;Cheng & Huang, 2009;Cipriani et al., 1999;Dirlewanger et al., 2002;Shiran, 2007;Sosinski et al., 2000;Testolin et al., 2000) however, there are no studies on the population structure, LD and selection of natural populations of P. davidiana.
Measuring the population structure of P. davidiana using neutral markers is an important first step in association genetic studies in order to avoid false associations between phenotypes and genotypes that may arise from nonselective demographic factors ( Krutovsky et al., 2009), and it is more efficient for management and utilization of germplasm (Cho et al., 2008).Softwares or methods such as Cluster analyses using UPGMA (the unweighted pair group method with arithmetic mean method), NJ (Neighbor Joining method) and MP (Maximum parsimony method) (Nei and Kumar, 2000) to identify groups and subgroups according to similarity or distance; STRUCTURE software (Pritchard et al., 2000) for deciding K groups by genetic background analysis; PCA (principal coordinate analysis) implemented by GenALEx6.2software (Peakall & Smouse, 2006) for characterizing population structure by means of principal coordinates.Some papers have been published about both wild populations (Belaj et al., 2007;Besnard et al., 2007) and cultivars (Inghelandt et al., 2010;Li et al., 2010) in plants and used for guidance of conservation and LD (linage disequilibrium) analyses.
Generally, genetic mapping comes from two basic methods, one is traditional QTL (Quantitative trait loci), and the other is advanced LD (Linage disequilibrium loci).QTL mapping requires segregating populations derived from biparental crosses and has resolution limited (Ecke et al., 2009).LD mapping has higher number of recombination events and a higher resolution in polymorphic populations (Ewens & Spielman, 2001;Jannink et al., 2001).Testing of LD can be calculated with Pairwise LD, Multi-locus LD, Haplotype-specific LD, Model-based LD and recombination (Mueller, 2004).LD has been utilized for genetic mapping of traist or disease loci in humans and model organisms (Mueller, 2004).LD has also successfully been used in plants, and significant LD between loci has been detected and the extent and decay of LD have been observed to vary between expressed species populations and subpopulations (Agrama & Eizenga, 2007;Berloo et al., 2008;Rossi et al., 2009;Comadran et al., 2010;Inghelandt et al., 2010;Brazauskas et al., 2011;Myles et al., 2011).These results provided preconditions for selecting populations or subpopulations for LD mapping, and for evaluating number of markers for use in LD mapping.
There are no published studies on population structure, LD and selection loci of P. davidiana.In this study, we investigated seven populations with 23 SSRs that cover the peach genome and appear not tightly linked markers (Aranzana et al., 2003).In this study, we aim to 1) analyze population structure and measure genetic variation among populations and use the information to guide the conservation and use of the germplasm; 2) determine if LD exists between loci among populations and use the information to choose the appropriate strategy for genetic association mapping; 3) search for loci showing evidence of selection in the whole population or between populations to detect genes associated with adaption evolution.

Plant Materials
We selected accessions of seven natural populations of P. davidiana from the center of origin Shannxi and Gansu provinces and surrounding areas Shanxi and Ningxia provinces in China.Young leaves from more than 30 accessions from each population were collected.The distance between any two accessions collected in a www.ccsen population Yangquan abbreviate containing 1).When performing PCR for multifluorophore fragment analysis, the conditions above mentioned were followed except for primer pairs with T a significantly lower than 58ºC (T a for M-13 forward primer).In such cases, 4 additional cycles were performed at the annealing temperature of the SSR marker followed by 35 cycles at the annealing temperature of the M-13, as described above.PCR amplicons, using 3% MetaPhor® -1X TBE agarose gels along with New England Biolabs' low molecular weight DNA marker, were visualized with ethidium bromide under UV light, and after pooling the four amplicons together (4 different fluorophores).The samples were cleaned with ExoSAP-IT (USA Scientific or USB) according to manufacturer protocols and run on an ABI 3130 with GeneScan™ 600 LIZ® (Applied Biosystems) internal size standard.PCR products were analyzed by GeneScan with the ABI 3130 and read by Gene Mapper V.4.0 (Applied Biosystem) for multifluorophore fragments.

Data Analysis
DNAs of accessions from populations were amplified, and their bands with 23 SSR markers, which corresponded to exact sizes detected by Gene Mapper V.4.0, were recorded in Excel.
Population structure was performed with four complementary analyses on genotypic data.First, spatial structure was detected with GenALE x 6.2 software (Peakall & Smouse, 2006) based on genetic distances among populations; second, PCA was implemented with GenALEx6.2software.Based on the distribution of all accessions along the first three axes, we could detect whether there was any grouping of individuals from populations; third, we used natural populations as a priori groups to test with Wright's Fst index (Weir & Cockerham, 1984) if there was differentiation between populations.The empirical distribution of no differentiation was obtained with Arlequin ver 3.5.1.2using 10000 permutations; fourth, STRUCTURE 2.3 software, based on the Bayesian model of clustering method (Pritchard et al., 2000), was implemented.We used admixture model assumption to identify K groups of individuals.The assumed K groups varying from 2 to 10 were calculated with thirty replicate runs per K value, a burn in period length of 100000 and a post burn in simulation length of 200000.We decided final K groups through LnP (D) values according to the method recommended by Evanno et al. (2005).Individuals can be allocated into groups with different membership coefficients corresponding to the sum of all being equal to 1.
LD analysis of unstructured populations detected by STRUCTURE was performed using Arlequin ver 3.5.1.2software under unknown phase between alleles from two heterozygous loci.When allele frequencies for LD were used, those below 5 percent were removed.The number of permutations for LD was 10000, without breaking genotypes to prevent any disequilibrium within loci (Hardy-Weinberg) to affect the significance of disequilibrium between loci.LD between a pair of loci was tested for genotypic data using a likelihood-ratio test, whose empirical distribution was obtained by a permutation procedure (Slatkin & Excoffier, 1996).
Selective loci were detected with Arlequin ver 3.5.1.2from the alleles at 23 loci of all accessions according to demand of format in finite model with settings of 30000 permutations.A plot of Fst values against Het/(1-Fst) with permutations and observations was generated.If a locus located out of the plot area, the outlier locus is the most possible selected locus.Schotterer and Dieringer ( 2005) developed quantitative model-free statistics to identify loci that exhibited the largest reduction in microsatellite diversity ln RH which was more powerful than ln RV (Schlotterer, 2002).Ln RH can be obtained with the expected heterozygosity of compared populations, based on a stepwise mutation mode (Ohta & Kimura, 1973).Ln RH should be approximated by a Gaussian distribution under neutrality.Selective loci as outlier loci were checked between populations by using method of ln RH (Schotterer & Daniel, 2005).For monomorphic loci in a population, we used the method of Kauer et al. (2003) to adjust one additional allele different from the others for avoiding division by zero in the calculation of the ratio between populations.If a locus falls beyond the predetermined confidence bounds (i.e.95% of a standard normal distribution), it indicates a significant reduction in genetic diversity (Harr et al., 2002).Grubbs' test (Motulsky, 2003), also known as the maximum normed residual test, is a statistical test used to detect outliers in a univariate data set assumed to come from a normally distributed population.We also used Grubbs' test to detect outlier values which should be considered as selective loci.Diagrams of allelic frequencies of selective loci between populations were produced in Excel.

Genetic Diversity of SSR Markers and Populations
The 23 loci amplified by the SSR markers revealed a total of 148 alleles.The number of genotypes identified ranged from 3 with CPPCT002 to 37 with UDP98-412, and the mean of 14.2 for all the markers for 190 accessions.Gene diversity and PIC had similar orders except for a few slight differences, displaying that their values of CPPCT002 and UDP96-005 were the highest and lowest, respectively.Inbreeding coefficient screened by the markers disclosed extreme homozygosity of loci except BPPCT006 (Table 3).For analyses of populations, the mean numbers of alleles per locus ranged from 4.261 in SIJC to 2.826 in SXTB with the mean of 3.7 among seven populations.Expected heterozygosity and Theta (H) under the infinite model of the populations were the same orders as the mean numbers of alleles per locus except SIJC (Table 4).

Population Structure
According to data of latitude and longitude and amplified bands of accessions of populations, geographic distances and genetic distances between populations were obtained with GenALEx6.2software.When we analyzed correlation coefficients based on genetic distance and geographic distance between populations, all distance classes displayed no significance at 5% level, which meant there was no spatial structure of populations, that is, the end points were not located beyond the upper or lower red dots lines (Figure 2; Table 5).Ur 0.000 0.000 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.100 Lr 0.000 0.000 0.000 0.000 -0.028 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.199

R Correlation coefficient
Fst was calculated for all pairs of populations, and all pairwise differences between populations appeared significant at the 5% level (Table 6).It meant there was genetic differentiation between the populations.When we used all accessions from the populations for PCA, the first three axes explained 24.81%, 23.08% and 16.39% of total variation, respectively.Accessions from SXTB or SIYQ almost clustered together according to their originations, whereas accessions from the other five populations distributed in overlapping (Figure 3).

Figure 3. Principal coordinates analysis of 190 individuals from seven populations
We used STRUCTURE2.3 software in admixture model to analyze genetic structure of populations.When performed with assumed K = 2 to 10, there were no distinct groups to be decided because of the values of LnP(D) slightly increasing with values of K, so we used the method suggested by Evanno et al. (2005) to calculate K groups through values of LnP(D), and the highest peak of the curve line was found at K = 4 (groups) (Figure 4).Accessions' membership probabilities of seven populations allocated in four groups were more than 0.83 just except 0.77 from GSHT.Accessions from populations were distributed among assumed four groups similar to the groups with PCA analysis (Figure 3).LDs in the whole population and unstructured populations expressed great differences.What reason is for highest value in the whole population is that there is population structure existed in it, which causes spurious LD.For example, Wang et al. (2008) found that 63.89% LDs of loci pairs at a 1% level were in the entire sample, but a range of 18.75-40.28%was in the subgroups.We selected unstructured populations for further analyzing of LD, Many LDs of loci pairs in unstructured populations were detected, which explained that natural populations might have experienced genetic bottleneck from their progenitor and natural selection for a long time, and self-compatible individuals generated genetic drift because some deadly genes became homogeneous.LDs screened the populations creates precondition for association mapping and marker assisted selection (MAS).In this study, mean value of 25.7% of loci pairs (SXTB; SIYQ; SXFX SIJC and GSHT; NXXJ and GSHS were 8.6%, 14.6%, 51.0% and 28.5% at 5% significant level, respectively) in P. davidiana (Figure 7, 8, 9, 10) was higher than that of 15.1% of the three subpopulations of cultivars in related P. persica (melting peaches, nectarines and non-melting peaches were 13.9%,13.4% and 18%, respectively) (Aranzana et al., 2010).The two species belonged to the same genus had more difference of LD, while the latter maybe came from more recombination due to cultivars bred from crossing.Other studies (Barnaud et al., 2009;Rossi et al. 2009) also justified that domestication bottlenecks and vegetative propagation are the primary factors responsible for this difference between cultivated and wild grapevine.Differences of LDs among unstructured populations may be explained that they had different number of accessions, membership of accessions and differential selection for adaptation to complicated environments or for special traits in these populations, but the information still can help us to select ideal populations for association mapping.To adapt to various environments, natural populations through selection have caused variation of alleles.Generally, as long as favorable genes were fixed for positive selection, usually as expression of outlier loci, variation of gene frequencies became low.We used Arlequin ver 3.5.1.2 in finite model to detect outlier loci for individuals from all populations.Five loci were significant for selection in all populations, inferring that genes experienced coinciding evolution with history of demography (Table 7).Two of these loci with 1% significant level, located beyond 30000 permutations plot (Figure 11), demonstrated two positive selective types, one with low Fst at the bottom of the plot was balancing selection; the other with high Fst at the upper of the plot and high heterozygosity was directional selection (Excoffier et al., 2009).On the other hand, we detected selective loci appeared between populations.Selective loci using lnRH tested by both standard distribution scale and Grubbs appeared different number (Electronic supplementary material S1, Electronic supplementary material S2).From statistics, Grubbs test, used for detecting outliers, is more strictly than test of standard deviation.Both methods detected common selective loci: BPPCT 025 loci between SXFX and SIJC, CPPCT022 loci between SIYQ and NXXJ.Allele frequencies displayed apparently differences between populations (Figure 12a,b) as local selective sweep found in human populations (Kayser et al., 2003;Schlotterer, 2002).Some genes of the two loci are very possible responsibility for disclosing adaption evolution and digging out candidate genes.

Figure 2 .
Figure 2. Spatial structure analyses of seven populations Note: U (Ur error) and L (Lr error) error bars bound the 95% confidence interval about r as determined by bootstrap resampling.
, U and L values are adjusted by the correction factor.Uncorrected values are shown as r uc, U uc, L uc.Bootstrap mean, Ur, Lr are also adjusted by the correction factor.Upper (Ur error) and lower (Lr error) error bars bound the 95% confidence interval about r as determined by bootstrap resampling.Upper (U) and lower (L) confidence limits bound the 95% confidence interval about the null hypothesis of No spatial structure for the combined data set as determined by permutation.

Figure 4 .
Figure 4. Detecting the number of cluster of 190 individuals from seven populations.∆K calculated as ∆K = m|L''(K)|/s[L(K)].The modal value of this distribution is the true K(*) or the uppermost level of structure, here four clusters (Evanno et al., 2005)

Figure
Figure

Table 2 .
Twenty-three SSR markers used for amplification of individuals of seven populations in P. davidiana

Table 5 .
Correlation of genetic distance and geographic distance among populations