Additional Quantitative Trait Loci and Candidate Genes for Seed Isoflavone Content in Soybean

Seed isoflavone content of soybean ( Glycine max L. Merr.) is a trait of moderate heritablity and an ideal target for marker selection. To date over 20 QTL have been identified underlying this trait among seven populations. The objectives of this study were to identify additional QTL and candidate genes controlling isoflavone content in a set of recombinant inbred line (RIL) populations of soybean grown in two different seasons. Variations of isoflavones namely daidzein, glycitein and genistein contents over two growing seasons and locations suggests that isoflavones are influenced by both genes and environments. Six QTL were identified on five different chromosomes (Chr) or linkage groups (LG) that controlled daidzein (Chr_2/LG-M; Chr_17a/LG-D2), glycitein (Chr_2/LG-D1b; Chr_8/LG-A2) and genistein (Chr_8/LG-A2; Chr_12/LG-H) respectively in the seeds grown in season 2010. Two QTL were identified for daidzein (Chr_6/LG-C2; Chr_13b/LG-F), two QTLs for glycitein (Chr_1/LG-D1a; Chr_17c/LG-D2) and five QTLs for genistein (Chr_3/ LG-N; Chr_8/LG-A2; Chr_9/LG-K; Chr_18/LG-G) in the seeds of the 2011 growing season. Genes located within QTL confidence intervals were retrieved and gene ontology (GO) terms were used to identify those related to the flavonoid biosynthesis process. Twenty six candidate genes were identified that may be involved in isoflavones accumulation in soybean seeds.


Introduction
Isoflavones are secondary metabolites produced mainly by Papilionoideae subfamily to which soybean [Glycine max (L.) Merr.] belongs.The most studied soybean seed isoflavones are genistein, daidzein, and glycitein that have numerous health benefits in humans and some animals (Alderkreutz, Banwart, Wahala, Makela, Brunow, Hase, & Arosemena, & Vickery, 1993;Setchell, 1998;Regal, Frazer, Weeks, & Greenberg, 2000;Hedlund, Johannes, Miller, 2003).These compounds have antimutagenic activity (Miyazawa, Sakoano, Nakamura, & Kosaka, 1999), reduce the risk of breast and prostate cancer (Wang, Wang, Lu, Kao, & Chen, 2009;Jiang, Payton-Stewart, Elliott, Driver, Rhodes, Zhang, Zheng, & Wang, 2010), reduce the risk of cardiovascular diseases, osteoporosis and also have effect on ameliorate the symptoms of menaupose in women (Regal et al., 2000;Hedlund et al., 2003).In plants, isoflavones play a role in different types of stresses and in defense against pathogenes (Mabrouk, Zourgui, Delavault, Simier, & Belhadj, 2007).In Soybean, Isoflavones can be produced in all parts of plants but the seeds are the major sites of isoflavones synthesis and storage in soybean plant.However, the biological effects of the isoflavones on human appear to depend on purity and dose of isoflavones, age of consumed person, and other dietary cofactors (Knight & Eden, 1996).& Chung, 2007;Tsai et al., 2007;Bennett, Yu, Heatherly, & Krishnan, 2004;Lee et al., 2008).Selection soybean cultivar for high-isoflavone content is one of goal to the breeders.To date, genetic linkage analysis has been used to identify loci for isoflavones in two or more populations.However, it is likely that the genetic map locations of all quantitative trait loci (QTL) associated with isoflavone content have not yet been determined.The control of isoflavone content in soybean seeds is largely genetic rather than the environmental interactions (Gutierrez-Gonzalez et al., 2009;Hoeck, Fehr, Murphy, & Welke, 2000;Murphy et al., 2009), and several minor effects QTL have been found which determined amounts of isoflavonein soybean.Numerous studies have shown soybean isoflavones to be a quantitatively inherited trait (Kassem et al., 2004;Primomo et al., 2005;Meksem et al., 2001;Kassem et al., 2006;Zeng et al., 2009).Identification and mapping isoflavones QTL could offer breeders improving the level of isoflavones in soybean through marker assisted selection (MAS).Researchers have been creating genetic maps for soybean using several different DNA markers in order to identify allelic polymorphisms.To date, different molecular techniques including, amplified fragment length polymorphisms (AFLPs), restriction fragment length polymorphisms (RFLPs), simple sequence repeats (SSRs or microsatellites), and single nucleotide polymorphisms (SNPs) were used for constructing map and subsequently QTL identification.Updated genetic maps have been created for the soybean genome and the recent consensus map was created using 1,536 SNP markers, which showed that the soybean genome is approximately 2,300 centimorgans (cM) in length (Hyten et al., 2010) The creations of genetic linkage maps for soybeans have proved useful for identifying QTL (Hyten et al., 2010) and for this purpose recently we developed a genetic linkage map of soybean from RILs of MD 96-5722 by 'Spencer' using the SoySNP6K Illumina Infinium BeadChip Genotyping Array (Akond et al., 2013).Another high density genetic linkage maps was constructed using the 1,536 Universal Soy Linkage Panel 1.0 SNP markers in the 'PI 438489B' by 'Hamilton' populations and 31 linkage groups were obtained and used in this study (Kassem et al., 2012).Up to date several isoflavones QTL were identified from the linkage map constructed by different markers (Kassem et al., 2004;Gutierrez-Gonzalez et al., 2009;Zeng et al., 2009;Gutierrez-Gonzalez et al., 2010).SNP markers allow the anchoring of genetic maps and QTL on the soybean genome sequence so identifying large genomic regions each containing several hundred genes.
However, imbedding of the genetic maps and QTL on soybean genome sequence allows identifying genome regions that contained hundreds of genes.Functional annotations can be applied in order to reduce the number of candidate genes.Identification of more QTL and candidate genes will enhance the marker assisted selection (MAS) for genetic improvement of soybeans.The objectives of this research were; (1) to map additional QTL controlling isoflavone contents in soybean; and (2) to identify candidate genes underlying the QTL intervals using annotation based enrichment tools and Gene Ontology databases.

Plant Material Isoflavones Quantification and Data Analysis
In this study, we used a set of 50 recombinant inbred line (RIL) of soybean derived from a cross between the cultivars PI 438489B and Hamilton.This RIL population was developed by the Agronomy Research Center, Iowa State University by Dr. Silvia Cianzio and was provided to us by Dr. Khalid Meksem of Southern Illinois University.The population was grown in two different seasons and locations (FSU campus, Fayetteville, NC -2010 and Saint Pauls, NC -2011) as described earlier (Ragin, Bazelle, Clark, Kantartzi, Meksem, Akond, and Kassem, 2012).RIL population and parental lines were harvested at maturity and seeds were dried for having samples for isoflavones analysis.Five grams (g) of seeds were grinded to flour in hexane using a laboratory blender and incubated at room temperature for 1 hour shaking period.To the defatted flour samples 8 mL of methanol was added and the mixtures were then centrifuged at 1,000 g for 20 min and incubated overnight under a fume hood.The supernatant was then placed under nitrogen flow, when the solvent reached to a final volume of 2 mL, the evaporation method was stopped (Xu & Godber, 2002).For quantification of isoflavones using HPLC, 200 µl of the samples were diluted with 200 µL of mobile phase diluent which separated the liquid into two layers.The resulting mixture was vortex then centrifuged at 12,000 g for 5 min and thereafter 200 µL aliquot of the supernatant was dispensed into an HPLC vial for analysis.Daidzein, glycitein, and genistein standards (Chromadex, Irvine, CA) were used in quantification by HPLC.
To determine the interactions between genotypic effect of RILs and growing seasons or locations the isoflavones data was analysed via combined analyses of variance using the general linear model (GLM) procedure of Statistical Analysis System (SAS).

SNP Genotyping, Genetic Map Construction and QTL Mapping
DNA from RIL population was genotyped using the 1,536 Universal Soy Linkage Panel 1.0 (Hyten et al., 2010).SNPs markers were created through the GoldenGate assay on BeadStation 500G (Illumina Inc., San Diego, CA) following the manufactures protocols; detail information of SNPs and procedures were described previously by Fan et al. (2003) and Hyten et al. (2008).BeadStudio version 3.2 software (Illumina Inc., San Diego, CA) was used for allele calling for each locus.
The genetic linkage map (Kassem et al., 2012) was computed using Join-Map 4.0 software (Stam, 1993).Win-QTL Cartographer version 2.5 (Wang, Basten, & Zeng, 2005) was used for map QTL and estimating their effects, Composite interval mapping (CIM) was performed preferring the option Model 6 and its default settings.

Detecting Interactions Between QTL
SNP markers were analyzed by the Two Way Mixed ANOVA using SAS PROC GLM (General Linear Model) procedure to detect non-additive interactions between the unlinked QTL (Lark, Chase, Adler, Mansur, & Orf, 1995).Non-additive interactions between markers associated with the traits were analysed at P≤0.05 level.

Analysis of QTL Regions for Candidate Genes
The identification of genes within the QTL regions was performed as described earlier (Barakat, Staton, Cheng, Park, Yassin, Ficklin, Yeh, Hebard, & Sederoff, 2012;Kassem et al., 2012).Briefly, each QTL was bounded by two single nucleotide polymorphism markers (SNPs).The SNP DNA sequences were obtained from the NCBI database (http://www.ncbi.nlm.nih.gov/projects/SNP/).SNP-harboring sequences were mapped to the Glycine max genome (http://soybase.org/gbrowse/cgi-bin/gbrowse/gmax1.01/).Soybean predicted coding DNA sequences (CDS) from QTL regions were retrieved from the phytozome website (www.phytozome.net/soybean) and were annotated by querying them against the plant proteome using Blastx (Blastx, e-value < e -6 ) as previously described by Barakat et al. (2012).The Gene Ontology (GO) system (The Gene Ontology Consortium, 2008) was used for functional classification of each gene.Annotation categories were parsed to identify genes from the flavonoid biosynthesis pathways as described previously (Barakat et al., 2012).

Isoflavone Traits in Parents and RILs
Mean values showed that plants grown in 2010 season at FSU campus had a higher seed daidzein and genistein but less glycitein than those grown in 2011 season at St. Pauls (Table 1).Daidzein contents were higher than parental means in both seasons/locations however glycitein and genistein were lesser.The variation for isoflavones among soybean lines over seasons and locations suggested that isoflavone contents were influenced by both location/environments and genes (Table 2).To verify the probable causes of such variation an analysis of variance was conducted using GLM procedure of SAS.Genotypic and environmental effects was highly significant at P<0.0001 for genistein, daidzein and glycitein accumulation.The results of GLM analysis showed that genotype x environment interaction (GXE) effects were important components for genistein, daidzein and glycitein accumulation in soybean grown in two seasons and locations.All significant at the 0.001 level.

Detecting Interactions Between QTL
SNP markers were analyzed by the Two Way Mixed ANOVA using SAS PROC GLM procedure for non-additive interactions between markers which were significantly (P≤0.05)associated with the traits response.Twenty SNP markers most likely to be associated with the isoflavones QTL were identified in 2010 grown seeds.Among these QTL, 8 were underlying daidzein content and 6 of each for seed genistein and glycitein (Table 4).

Analysis of QTL Regions for Candidate Genes
Annotation on the sequences from QTL genomic regions has provided the opportunity to search for genes that could be related to the traits of this study.Genes from each QTL region were retrieved and parsed based on their annotation categories to identify those from the flavonoid biosynthesis or regulator pathways.Twenty six genes involved in flavonoid biosynthesis pathways (Table 5) which would influence isoflavones were found on seven different chromosomes.These genes encode various proteins including BEL-like homeodomain protein, Transparent Testa Glabra 1, Syntaxin Related Protein 1, BEL1-Like Homeodomain 1, alpha/beta-Hydrolase, Syntaxin Related Protein, NAC domain transcription factor, Fe(II)/ascorbate oxidase , Squalene epoxidase 1, basic helix-loop-helix (bHLH) DNA-binding protein, UDP-L-Rhamnose synthase, Acetohydroxy Acid Synthase, NAD(P)H dehydrogenase B2, Quinone reductase, anthocyanidin 3-O-glucosyltransferase, ATBS1 Interacting Factor 3. One gene encoded in Chr_8 (LG-A2; glycitein), three genes in Chr_12 (LG-H, genistein) and two genes in Chr_17a (LG-D2, daidzein were within the (from 2010 data) QTL intervals.The second group of QTL (from 2011) contained eight genes likely to encode genistein synthesis (Chr_3/LG-C2), eleven genes for daidzein (Chr_6/LG-C2 and Chr_13b/LG-F) and one gene for glycitein (Chr_17C/LG-D2).

Discussion
Biotic and abiotic factors can act as inducers or repressors in accumulation of some secondary metabolites during the production of certain crops which are even genetically stabilized.Isoflavones accumulation over environments or their concentrations in plants varied widely (Hoeck et al., 2000;Mebrahtu, Mohamed, Wang, & Andebrhan, 2004;Caldwell, Britz, & Mirecki, 2005;Lozovaya, Lygin, Ulanov, Nelson, Dayde, & Widhohm, 2005) and generally plants are conducive to accumulate more isoflavone when growing in a controlled environment.This might due to desire well-watered and temperature condition.Temperature, water, and soil nutrient conditions are known to play significant role in isoflavone production in soybean (Gutierrez-Gonzalez et al., 2009) and these findings could explain the difference we reported here in this study for accumulation of isoflavones in two different seasons.But also the variability observed for the seed isoflavones among RILs was attributable not only by environment but also for genetic and also genetic and environment (GXE) interactions.The mixed-model approach yielded slightly different results for detecting multiple genetic main effects of QTL, as composite interval mapping with the same sets of marker cofactors.Results of the approach shown in Table 4, indicated that the mixed model approaches were powerful in detecting QTL with relatively large additive and/or epistatic effects (coefficient of determination R2 > 8.3%) (Wang, Zhu, Li, & Paterson, 1999).Linkage, particularly close linkage between QTL, had a significant impact on the detection power in QTL mapping using the mixed-model approaches.The nature of this impact depends on the directions of the QTL main effects (including epistatic effects) among linked QTL.When the effects of the linked QTL are in coupling (in the same direction), the QTL tended to be detected with increased power.However, when in repulsion (having effects of opposite direction), the QTL tended to be detected with decreased power.The LR threshold of P=0.005 (equivalent to LOD = 2.79 for df = 3) (Wang et al., 1999) used in our simulations is considered as an typical one for most QTL mapping studies, but it gave us a consistent high power in detecting QTL of moderate additive/epistatic effects (R2~5%).However, increasing the threshold to 0.001 in our simulations resulted in a significantly reduced power for detecting QTL with moderate additive/epistatic effects (Wang et al., 1999).Interaction among QTL that underlie daidzein content in soybean explained 33% of the total variation in daidzein in the seeds harvested in 2010 and 27% in the seeds harvested in 2011.Interaction among QTL that underlie genistein content in soybean explained 34% of the total variation in genistein in the seeds harvested in 2010 and 18% in the seeds harvested in 2011.Interaction among QTL that underlie glycitin content in soybean explained 24% of the total variation in daidzein in the seeds harvested in 2010.
Annotation of genes several hundreds of genes from QTL allowed us to identify twenty eight candidate genes that may control flavonoid synthesis in soybean seeds under different environments.These candidate genes identified in this study represent a valuable resource for studying the genetic basis underlying isoflavones biosynthesis in soybean.

Figure 1 .
Figure 1.Locations of SNP markers and the QTL that underlie soybean seed daidzein (DAID), genistein (GEN), and glycitein (GLY) contents in the 'PI438489B' by 'Hamilton' RIL population grown in FSU campus (Cumberland County, NC). LGs were drawn using MapChart according to the PIxH genetic linkage map

Table 1 .
Comparisons of Daidzein, glycitein, and genistein contents in the RILs of 'PI438489B' and 'Hamilton' between two growing period at Fayetteville State University, North Carolina, USA in 2010 (FSU2010) vs.St.Pauls, North Carolina, USA in 2011 (SP2011) employed from a two-tailed t test (type 2) at α = 0.05 *Values in parentheses are mean values of each trait from two parents (PI 438489B and 'Hamilton') in each year.

Table 2 .
The GLM analysis of variance for contents of isoflavones components in 51 soybean RILs in two locations in NC in two cropping seasons

Table 4 .
SNP markers encompassing QTL for seed isoflavone content in soybean seed of the PI 438489B and Hamilton population across two environments

Table 5 .
Candidate genes with GO terms overrepresented in QTL corresponding to the flavone biosynthetic process in the Arabidopsis genome.Isoflavones QTL: genistein, daidzein, and glycitein values across two years 2010 and 2011 Zeng et al. (2009)4)t al. (2009)dentified for seed isoflavones content on 12 different chromosome (Chr) or linkage groups(LG).Other studies identified QTL for total and individual isoflavone on the same Chr or LG.The QTL for daidgein (qDAID001-2010 on Chr_7/LG-M) reported in this study is 16 cM apart from the QTL (dai-M) identified on same LG in the 'Essex' by 'Forrest' RIL population (n=100) byGutierrez-Gonzalez et al. (2009).They identified two QTL for glycitein (gly-D2_1 and gly-D2_2) on LG-D2, on the same LG we reported a second QTL for daidgein (qDAID002-2010).The QTL identified for glycitein content (gly-D2_1) on LG-D2 byGutierrez-Gonzalez et al. (2009), on the same LG and same position we also reported a QTL for glycetin (qGLY002-2011).Two QTL for genistein(qGEN002-2010 and qGEN003-2010)were reported in this study on LG-H/Chr_12, which are closely located QTL for daidzein (dai-H) and total isoflavones (tot-H) on LG-H reported byGutierrez-Gonzalez et al. (2009).Kassem et al. (2004)identified two QTL for genistein and glycetin on LG-H/Chr_12 in the 'Essex' by 'Forrest' population, which are very close to identified QTL, qGEN002-2010 and qGEN003-2010 of this study.Primomo, Poysa, Ablett, Jackson, & Rajcan, (2005)identified another QTL that controlls daidzein on LG-A2/Chr_8 which is approximately 100 cM from the QTL for glycetin (qGLY002-2010) of this study.Same study identified QTL for daidzein, genistein, glycitein on LG-H/Chr_12.Moreover, total isoflavones QTL of same study is approximately 60 cM apart from the QTL we identified for genistein (qGEN002-2010 and qGEN003-2010) in this study.Primomo, Poysa, Ablett, Jackson, & Rajcan, (2005)also identified QTL for daidzein, genistein, glycitein, and total isoflavones on LG-M/Chr_7 which are approximately 14.6 cM apart from the QTL, qDAID001-2010 of this study.All QTL identified byPrimomo,  Poysa, Ablett, Jackson, & Rajcan, (2005)was from the 'AC756' by 'RCAT Angora' RIL population (n=207) and they also observed sigificant genotype by environment interactions for isoflavones contents which is consistent with our study.Zeng et al. (2009)identified a QTL that controls genistein content on LG-