Identification of Differentially Expressed Genes in Shoot Apex of Garlic ( Allium sativum L . ) Using Illumina Sequencing

Garlic is widely used as a spice throughout the world. In this study, transcriptional profilings of garlic shoot apex were performed by using the Illumina technology. A total of 45,363 significantly changed expressed transcripts were detected between the dormant and sprouting garlic shoot apex libraries. The expression of 22,836 unigenes was increased by more than 2-fold in sprouting garlic shoot apex as compared with dormant shoot apex (up-regulated unigene), and 22,526 unigenes were identified as having been down-regulated. Gene ontology (GO) annotations indicated that the differentially expressed genes were mainly played role in nucleotide binding, plastid, hydrolase activity, transferase activity, protein metabolic process, nucleic acid binding, protein binding, mitochondrion. A total of 8,725 differentially expressed genes were assigned to five Kyoto Encyclopedia of Genes and Genomes (KEGG) biochemical pathways, including metabolism, genetic information processing, organism system, cellular processes, and environmental information processing. Real-time quantitative RT-PCR (qRT-PCR) was performed to dissect shoot apex sprouting. The differential expression of genes, such as ENHYDROUS, DAG1, DAM, DTH8, indicate that they play a critical role in shoot apex sprouting. These differentially expressed genes comprise related candidates for shoot apex sprouting regulation in Allium species.


Introduction
Garlic (Allium sativum L.) is one of the cultivated Allium vegetables widespred the world and is widely used as a spice.Garlic is harvested in the Shandong region of China from May to June and stored to maintain a year round supply.At harvest, garlic bulbs are usually dormant (Ledesma et al., 1980).Length of the dormant period depended on both the storage conditions and the genetic background of the cultivar (Cantwell et al., 2003).During post-harvest storage, dormancy gradually diminished with the beginning of inner sprout growth.Sprouting of garlic bulbs during storage is the major factor limiting storage life, as it results in loss of dry matter, quality decreases, and onset of disease.
Abscisic acid (ABA) and Gibberlin acid (GA) plays a major regulatory part in seed dormancy and germination (Sreenivasulu et al., 2008).While ABA increases during seed maturation and dormancy, GA levels peak during seed germination and postgerminative growth.However, evidence is growth for a role of auxin during this process (Holdsworth et al., 2008).Ethylene promotes dormancy breaking through interactions with ABA signaling.Seeds of ethylene resistant 1 (etr1), ethylene insensitive 2 (ein2), and enhanced response to aba 3 (era3) mutants display enhanced dormancy and increased sensitivity to ABA (Chiwocha et al., 2005;Beaudoin et al., 2000).A number of studies utilized transcriptomic approaches to investigate seed dormancy and germination.Nearly 12,000 mRNA categories in mature dry seeds were found in barley (Hordeum vulgare L.) and Arabidopsis (Arabidopsis thaliana) (Nakabayashi et al., 2005;Sreenivasulu et al., 2008).These authors assumed that the stored mRNAs take part in de novo synthesis of protein during the early stages of germination.Global gene expression during vegetative buds dormancy maintenance and release were examined in potato (Faivre-Rampant et al., 2004;Campbell et al., 2008), and leafy spurge (Horvath et al., 2008).These studies have identified a suit of differentially expressed genes, and several of them are specifically involved in vegetative buds dormancy maintenance and sprouting.However, the molecular mechanism of dormancy maintenance and sprouting in Allium crops is poorly studied, and remains largely unknown.
Over the past several years, RNA-sequencing (RNA-Seq) has emerged as a mighty and cost-effective tool for transcriptome analysis and this has noticeably enhanced the efficiency and velocity of gene discovery (Schuster, 2008;Ansorge, 2009).The RNA-Seq technology was revealed to have comparatively little variety between technical replicates for discovering differentially expressed genes (Marioni et al., 2008).Transcriptome analysis using Illumina sequencing approach is one of the most universal tools for gene identify, and it has been applied recently to several species that lack genomic sequence information, including sweet potato (Ipomoea batatas) (Wang et al., 2010), Eichhornia paniculata (Ness et al., 2011), alfalfa (Medicago sativa) (Yang et al., 2011), and tea (Camellia sinensis) (Shi et al., 2011).
In this study, transcripts involved in garlic shoot apex dormancy and sprouting were identified, and their putative functions were discussed.The gene expression profiles will suppiy a better comprehending of the mechanisms regulating dormancy release in Allium crops, and provide useful information for further research.

Plant Material and RNA Extraction
Garlic bulbs (Allium sativum, cultivar Cangshan 15) were harvested from entirely mature plants grown in the field.Garlic bulbs were grown in the Province of Shan Dong, China.After harvest, the bulbs were stored at room temperature.To analyze the gene expression during sprouting, shoot apex were excised from cloves at four stages: bulbs stored for 2 weeks, 6 weeks, 10 weeks and 16 weeks.Samples were frozen in liquid nitrogen and stored at -80 °C ahead of the RNA extraction.Total RNA was isolated using the TRIzol reagent (Invitrogen, Carlsbad, CA, USA) according to the producer's book of instructions.

Identification of Differentially Expressed Genes
In a previous study, two garlic Illumina libraries were constructed from tissues of dormancy and sprouting shoot apex (Sun et al., 2012).A total of 127,933 unigenes were achieved form the two libraries.A total of 47,095 unigenes had annotation in the public databases.To identify genes differently expressd between two samples, a statistical analysis of the frequency of each gene in the two cDNA libraries was developed using the method described by Audic et al. (1999).The total unigene number of the sample 1 is N1, and total unigene number of sample 2 is N2; gene A holds x unigenes in sample1 and y unigenes in sample2.The probability of gene A expressed equally between two samples can be calculated with the following formula: The threshold of P value was estimated by false discovery rate (FDR) in multiple test and analysis.Gene expression difference was judged by using FDR ≤ 0.001 and the absolute value of log2Ratio ≥ 1. Differentially expressed genes were identified by using more strict standards with little FDR and larger fold-change values.RPKM method (reads per kilobase per million reads) was used to calculate the expression of Unigene (Mortazavi et al., 2008).

Functional Annotation and Classification
The assembled unigenes were aligned to the NCBI non-redundant protein (Nr) and Swiss-Prot protein databases using the BLASTX algorithm with an E-value cut-off of 10-5.The functional annotation by gene ontology terms was performed using the BLAST2GO software (Conesa et al., 2005).The COG annotation was performed using the BLASTX algorithm (E-value threshold: 10-5) against the Cluster of Orthologous Groups database.Unigenes were also compared with the sequences in the Kyoto Encyclopedia of Genes and Genomes database (Kanehisa et al. 2004) using BLASTx with an E-value threshold of 10-5.

Quantitative Real-Time RT-PCR
The differentially expressed genes were chosen by function of interest.First-strand cDNA was synthesized from DNase-treated RNA using Superscript II reverse transcriptase (Invitrogen).Primers were designed with Primer Premier 5.0 (Table 1).The reaction was performed in a total volume of 20 μL, containing 2 μl of cDNA and 10 μl of SYBR Green Master Mix (TaKaRa) with a pair of primers.Thermal cycling was done on the Bio-Rad Sequence Detection System with the following program: 95 °C for 10 s, then 40 cycles of 95 °C for 15 s, and 60 °C for 30 s. Triplicates of each reaction were performed, and the results were expressed relative to the expression levels of actin in each sample by using the 2 ΔΔCT method (Livaka & Schmittgen, 2001).

Identification of Differentially Expressed Genes
The putative differentially expressed unigenes between two samples were identified by an algorithm developed by Audic et al. (Audic & Claverie, 1997).A total of 45,363 significantly changed expressed unigenes were detected between the dormant and sprouting garlic shoot apex libraries (Figure 1).Of these, 34,906 unigenes showed > 5-fold changes in expression level, and 26,410 unigenes showed > 10-fold changes in expression.The expression of 22,836 unigenes was increased by more than 2-fold in sprouting garlic shoot apex as compared with dormant shoot apex (up-regulated unigene, Figure 1), and 22,526 unigenes were identified as having been down-regulated.Of these differentially expressed unigenes, 7,784 unigenes were only expressed in sprouting shoot apex while 7,975 unigenes were only expressed in dormant shoot apex.
Figure 1.Gene expression levels of dormant garlic shoot apex and sprouting garlic shoot apex The differentially expressed genes are revealed in red and green.Genes without expression changes are revealed in blue.RPKM: Reads Per kb per Million reads.FDR: false discovery rate.We used "FDR ≤ 0.001 and the absolute value of log2Ratio ≥ 1" as the threshold to judge the significance of gene expression difference.Y axes display the observed log10 ratios of RPKM of sprouting bud.X axes display the observed log10 ratios of RPKM of dormant bud.

Functional Annotation of Differentially Expressed Genes
All differentially expressed unigenes were compared with the sequences in public databases, including the NCBI non-redundant protein (Nr) database, the NCBI Clusters of Orthologous Groups (COGs) database, the Swiss-Prot protein database, and the Kyoto Encyclopedia of Genes and Genomes database (KEGG) using BLASTx algorithm with an E value threshold of 10 -5 .A total of 21,044 unigenes had significant hits (E-value < 10 -5 ) to the sequences in the five databases.A total of 20,549 unigenes had significant hits (E-value < 10 −5 ) to the sequences in the Nr database.Gene Ontology (GO) is an international standardized gene functional classification system that provides a standardized vocabulary that is used to assign function to uncharacterized sequences.We used GO analysis to classify the functions of differentially expressed genes.Putative functions were assigned to 10,840 unique sequences involved in the categories of biological process, cellular component and molecular function (Figure 2).The main functional groups of up-regulated genes are related to nucleotide binding, plastid, hydrolase activity, transferase activity, protein metabolic process, nucleic acid binding, protein binding, mitochondrion, while the main functional groups of down-regulated genes are related to plastid, mitochondrion, transferase activity, protein metabolic process, nucleotide binding, nucleic acid binding, protein binding, ion binding.The number of "genes" up-regulated are much more than the ones down-regulated.This event appears to operate in a coordinated fashion with the establishment of the full metabolic profile, and the biogenesis of organelles in the sprouting garlic bud.The assembled unigenes were assigned to the biochemical pathways described in KEGG based on their EC numbers.Pathway-based analyses help to understand the biological function of genes further.A total of 9,704 differentially expressed genes demonstrated sequence similarities to the genes in the KEGG database, representing 6,321 up-regulated and 3,383 down-regulated transcripts (Additional file 1).These differentially expressed genes were assigned to five main KEGG biochemical pathways, including metabolism (4,624 unigenes), genetic information processing (2,976), organism system (704), cellular processes (421), and environmental information processing (170) (Table 2).Those pathways related to metabolic pathways were the largest group, with a majority of the proteins involved in biosynthesis of secondary metabolites (1,058), carbohydrate metabolism (945), energy metabolism (568), amino acid metabolism (511), lipid metabolism (477), and Nucleotide Metabolism (372).The second largest group comprised genetic information processing, including those genes involved in transcription (978), folding, sorting and degradation (541), translation (357).The third largest group was the organismal systems, most of which were associated with environmental adaptation (666).Pathways related to cellular processes and environmental information processing were also well represented by the unigenes from Allium sativum L. Functional analysis of the up-regulated genes revealed that a large proportion of them are enzymes involved in carbohydrate metabolism, such as ascorbate oxidase, beta-galactosidase, biotin carboxylase, endoglucanase, enolase, glyceraldehyde 3-phosphate dehydrogenase, pyruvate kinase, sucrose synthase (additional file 1).Similar results were also observed in the germination of barley seeds (Zhang et al., 2004).An increase in the expression of genes belonging to energy metabolism was observed at the stage of shoot apex sprouting.This group was mainly comprised of genes involved in photosystem I and II proteins, ribulose biphosphate carboxylase, plastocyanin, and ATP synthase.Accumulation of genes which are involved in energy production was associated with bud burst in sessile oak (Quercus petraea) (Derory et al., 2006).These results indicated that energy metabolism is one of the most important factors associated with garlic shoot apex sprouting.Among those up-regulated genes, many were involved in amino acid metabolism, such as methionine synthase, adenosylhomocysteinase，glutamate synthase, leucyl synthetase, glycine dehydrogenase, arginine decarboxylase, suggesting an increase in protein synthesis essential for bud sprouting.Changes in the expression of genes participated in amino acid metabolism have also been reported during barley germination (Wilson et al., 2005).Several genes (methionine synthase and adenosylhomocysteinase) encoding enzymes participated in methionine biosynthesis and regeneration pathways, are discovered to be actively transcribed in germinating sugar beet seeds (Pestsova et al., 2008).In Arabidopsis, methionine and/or its derivatives was also play important roles in promoting germination and seedling growth (Gallardo et al., 2002).
In the sprouting library, genes involved in transcription and translation were relatively more abundant.Examples include ribosomal proteins, transcription initiation factor, DNA-directed RNA polymerase, pre-mRNA-processing factor, splicing factor.This finding emphasizes a central role of protein biosynthesis during bud sprouting.A prevailing number of transcripts abundant in sprouting bud encode eukaryotic elongation factors.This included EF2 which has been shown to be important for seed germination in Norway maple (Pawłowski, 2009).Changes in the expression of EF1 have been similarly reported during germination in Norway maple seeds (Twardowski & Szczotka, 1989).EF-Tu is involved in the buildup of the photosynthetic system during seeds germination (Gallardo et al., 2001).The increase of elongation factors during germination implies that they may take part in the mechanism of bud dormancy breaking, where they are in charge of cell division and protein synthesis in meristematic tissues.
Genes involved in membrane transport were up-regulated during the bud sprouting.This group was comprised of genes encoding ABC transporter.PED3, a peroxisomal ABC transporter, promotes seed germination by inducing pectin degradation in Arabidopsis thaliana (Kanai et al., 2010).Plant ATP-binding cassette (ABC) proteins, the majority of which are transporters, are one of the largest known superfamilies.ABC transporters are involved in lipid catabolism, xenobiotic detoxification, formation of Fe/S clusters, chlorophyll biosynthesis, ion fluxes, polar auxin transport, and stomatal function (Martinoia et al., 2002).ABC transporters are also takes part in cell elongation processes (Marin et al., 2006).Therefore, ABC transporters may play important role in plant developmental and growth processes, and the increase in ABC transporter gene expression corresponds to the fact that plant tissues are at growing during the sprouting period.
Genes involved in cell division and growth exhibited increased expression in non-dormant tissue.This group was comprised of genes encoding actin, tubulin, cyclin and cyclin dependent kinase.Actin, an essential component of the cytoskeleton, is critical for differentiation, cell expansion, and cell wall deposition (Wasteneys & Galway, 2003).In Arabidopsis, ACT7 is highly expressed in young seedlings (McDowell et al., 1996).From previous studies it is known that de novo synthesis and accumulation of beta-tubulins is associated with seed dormancy breaking and germination in several species such as Arabidopsis (Chibani et al., 2006), tomato (Castro et al., 2000) and Norway maple (Pawłowski et al., 2004).β-Tubulin has been suggested to be a genetic marker for dormancy status in tree buds (Hedley et al., 2010), and is induced in expression during dormancy release in buds of woody perennial species, including blackcurrant Malus and Rosa (Bergervoet et al., 1999) and poplar (Rohde et al., 2008).D-type cyclins promote seed germination due to its division activity in the root apex in Arabidopsis (Masubelele et al., 2005).Altogether, these results demonstrate that genes involved in cell division and growth are the important determinant of completion of dormancy breaking and growth initiation.
A prevailing number of transcripts encoding heat shock proteins (HSPs) were abundant both in dormant and sprouting buds.Small heat-shock proteins (sHSPs) have also been shown to accumulate during seed maturation in several species, and a role for these proteins in desiccation tolerance, dormancy has been hypothesized (Wehmeyer et al., 1996).The 70-kD heat shock proteins (Hsp70s) involved in a variety of cellular processes including protein folding, assembly and degradation (Bukau et al. 2006).Several HSP70s were identified in Norway maple seed, which were associated with dormancy breaking (Pawłowski, 2009).The results of this study indicated that HSPs are required during garlic shoot apex dormancy maintenance and sprouting to maintain the proper folding of other proteins.A striking finding was the high percentage of oxygen stress related genes such as catalase, superoxide dismutase, glutathione S-transferase and glutathione reductase present in both libraries.Several studies indicate that levels of antioxidant enzymes increase during seed dehydration (Chen et al., 2010).Antioxidant enzymes, induced in dormant garlic shoot apex, may play a role in protecting shoot apex from desiccation damage by exposure to reactive oxygen species.Furthermore, other studies report that seed and bud dormancy release coincides with an up-regulation of the antioxidant system (Hedley et al., 2010).In sunflower (Helianthus annuus L.) seeds, ROS caused lipid peroxidation, carbonylation of a specific subset of proteins and mRNA oxidation which were associated with seed dormancy release (Bazin et al., 2011).The high level of expression of antioxidant enzymes in sprout garlic shoot apex indicate these enzymes play a role in decreasing the level of ROS after it has acted as a chemical signal to induce dormancy release.A large proportion of transcripts abundant both in dormant and sprouting buds encode ribosomal proteins, transcription initiation factor, DNA-directed RNA polymerase, pre-mRNA-processing factor, splicing factor.This finding emphasizes a central role of protein biosynthesis during both shoot apex dormancy maintenance and sprouting.

Hormone Related Genes
ABA levels are correlated with dormancy in vegetative reproductive organs and tend to drop during dormancy release (Yamazaki et al., 1999).Our expression profiling studies show that key ABA biosynthetic transcripts 9-cis-epoxycarotenoid dioxygenase (NCED) and zeaxanthin epoxidase (ZEP) are expressed at high levels in dormant shoot apex as compared with sprouting shoot apex (Additional file 2).Our results also provide hints that ABA signaling network genes such as ABA receptor protein, ABA-responsive element binding protein, ABA-insensitive protein 5 (ABI5), and protein phosphatase 2C are expressed to a higher extent in the garlic shoot apex during dormancy than during sprouting.In this study, a set of putative ABA-regulated genes were down-regulated during the transition from dormancy to sprouting, including Serine/threonine-protein kinase, abscisic stress-ripening protein 1, possibly indicating changing ABA levels in the shoot apex.Thomas (1969) found that gibberellin activities decreased before sprouting, but that there was an increase in gibberellin and auxin activities as soon as sprouting had begun in onion (Allium cepa).Our expression profiling studies display that key GA biosynthetic transcripts, gibberellin 2 oxidase (GA 2 ) and gibberellin 20 oxidase (GA 20 ), are expressed at high levels in sprouting shoot apex as compared with dormant shoot apex.Our results also show that GA signaling network genes, GA receptor protein (GID1L2) are expressed to a higher extent in the garlic shoot apex during sprouting than during dormancy.Numerous auxin response factors (ARF), auxin transport proteins are activated preferentially in the sprouting garlic shoot apex.Perturbing auxin transport proteins leads to a reduction in meristematic activity in several plants (Morris et al., 2005;Blilou et al., 2005).Our results also show that ethylene signaling network genes such as ethylene insensitive 2 (EIN2), and ethylene receptor 2 are expressed to a higher extent in the garlic shoot apex during sprouting.The preferential expression of these genes in the sprouting shoot apex suggests an important role in cell expansion during shoot apex sprouting.

Transcription Patterns of Candidate Genes
A group of genes were chosen to study the expression patterns in detail at the four stages by qRT-PCR (Figure 3).They were selected based on their putative function in dormancy and sprouting determination, and on our research interest in epigenetic regulation in garlic.The PCR products were sequenced, and all the products were specific amplification.In both libraries, we found a large number of unigenes having a Zn-finger DNA-binding motif.The expressions of zinc finger gene Dof affecting germination 1 (DAG1) were down-regulated during garlic shoot apex sprouting, while the C2H2 type zinc finger gene ENHYDROUS was up-regulated, indicating that these zinc finger genes play positive or negative roles during shoot apex dormancy transition.. DAG1 has been reported to be involved in the control of seed dormancy in Arabidopsis thaliana (Papi et al., 2000).ENHYDROUS has been reported to promotes seed germination in Arabidopsis thaliana (Feurtado et al., 2011).
A cluster of SVP-like genes, named dormancy associated MADS-box (DAM) genes, had shown to be important components in dormancy transition in several species.In Arabidopsis, SVP acts as a direct repressor of flowering (Yant et al., 2009).In this study, a homologue of DAM transcription factor was down-regulated during sprouting.
A putative homologue of the nuclear transcription factor DTH8 was down-regulated during garlic shoot apex sprouting.DTH8 has been reported to play a role in suppresses flowering in Rice (Wei et al., 2010).Horvath (Horvath, 2009) has hypothesized that similar mechanisms regulate bud flowering and dormancy.The result of this study indicates DAM and DTH8 play a role in regulating garlic shoot apex dormancy.The expression level of the dehydration-responsive element-binding protein 2A (DREB2A) had a decreasing trend, reaching a minimum after shoot apex sprouting.A putative double APETALA2 repeat transcription factor CHOTTO1 was down-regulated during sprouting.CHOTTO1 is involved in abscisic acid-mediated repression of gibberellin biosynthesis during seed germination in Arabidopsis.Other genes such as multiprotein-bridging factor 1C (MBF1C), Homeodomain leucine zipper (HDZIP) were first induced to a high expression level and then decreased, indicating that they might play major roles in specific dormancy stages.A subset of genes had an increased expression during dormancy transition.The putative cell division control protein (CDC48), TCP transcription factor 4 (TCP4) and auxin response factor 6 (ARF6) were included in this group.In potato, the increased expression of auxin response factor (ARF6) was associated with dormancy release (Faivre-Rampant et al. 2004).The expression level of eukaryotic elongation factor 2 (EF2) was up-regulated during sprouting.

Conclusion
This study has confirmed the usefulness of Illumina sequencing approach in identifying genes related to shoot apex sprouting, and has provided new insights in the understanding of gene expression during shoot apex sprouting.The substantial amount of transcripts obtained in the present study provides a useful resource to investigate the molecular bases of shoot apex dormancy and sprouting.Several candidate genes that are most likely involved in shoot apex dormancy and sprouting have also been isolated and their differential expression validated by qRT-PCR.Further functional analysis of the differentially expressed genes will provide deeper insight into the regulation garlic shoot apex sprouting.

Figure 2 .
Figure 2. Gene ontology classification of assembled unigenes Y axes display the categories of biological process, cellular component and molecular function.X axes display the number of genes.

Figure 3 .
Figure 3.The expression profiles of selected genes

Table 1 .
Sequences of primers used for qRT-PCR

Table 2 .
Mapping of Allium sativum differentially expressed genes to KEGG biochemical pathways