Complete Chloroplast Genome Sequence of Freshwater Araphid Pennate Diatom Alga Synedra acus from Lake Baikal

Complete chloroplast genome of diatom alga Synedra acus possesses canonical quadripartite structure with two inverted repeats containing ribosomal RNA gene loci that separate small and large single-copy regions. Chloroplast genome maps as a circular molecule of 116 251 bp. It encodes 27 tRNAs, three rRNAs, two small RNA genes, and 128 protein-coding genes. Comparison of the genic features across diatom chloroplast genomes reveals the absence of an overlap between atpD and atpF gene coding sequences that is present in other plastid genomes of diatom origin. This feature is a clear synapomorphy of S. acus plastid genome that is likely a result of either relaxed constraints or extensive selection forces acting upon atpF gene. We also characterized nuclear-encoded acyl-carrier protein gene with chloroplastic targeting in S. acus. The transfer of acpp gene into the nuclear host genome is hypothesized to have occurred independently in several lineages of diatoms.


Introduction
Diatom algae are a group of unicellular eukaryotes, which is one of the major photosynthetic producers of organic carbon in the world, providing about 20% of net primary production (Falkowski et al. 2004).Chloroplast of diatoms had initially a cyanobacterial origin and was inherited from an ancient red alga during its secondary endosymbiosis with a heterotrophic eukaryote.In the course of endosymbiosis, several genes were transferred to the nuclear host genome, whereas some others were lost since the host genes had incorporated into the chloroplast molecular machinery.Diatoms inhabit almost all water environments, from marine to ultra-oligotrophic freshwater bodies and from pack-ice to thermal springs (Round et al. 1997).During their evolution from ca. 240 mya (Medlin 2009), many species of diatoms were formed and disappeared.The present number of diatom species is reported to be up to 10 5 -10 6 including cryptic ones (Round et al. 1997).Based on cell wall symmetry and other morphological features, diatoms could be separated into several large groups which are listed in the order of their divergence: radial and bipolar centrics, araphids, and raphid pennates, with only the last group believed to be monophyletic (Sims et al. 2006).

Culture growth and DNA isolation
Culture of S. acus was isolated from the phytoplankton of Listvennichny Bay of Lake Baikal.An axenic culture was subsequently obtained in the Limnological Institute by a series of single-cell passaging combined with antibiotics and detergent treatments (Shishlyannikov et al. 2011).Diatom cells were cultivated using sterile technique and 15-liter glass bottles with DM medium (Thompson et al. 1988) up to a density of 3-4*10 4 cells/mL.Diatom cells were collected on a filter with 5 μm pores and washed with sterile medium.Total DNA was isolated as in Ravin et al. (2010).

Sequencing and assembly
The chloroplast genome of S. acus was sequenced using GS FLX instrument (Roche/454 Life Sciences, Branford, CT, USA).A shotgun genome library was constructed using GS FLX Titanium General Library Preparation Kit.Shot-gun reads from 2 FLX runs were assembled with MIRA 3.0.4and two chloroplast-specific contigs were identified by their similarities to other chloroplast genomes.We also prepared a mate-pair genomic library using Mate Pair Library Prep Kit v2 (Illumina, CA, USA).Shortly, genomic DNA was sonicated followed by ligation of circularization adapters.Fragments in the size range of 1.5-4 kbp were eluted from a preparative agarose gel, and amplified with eight PCR cycles using a proof-reading polymerase.After circularization, the linear fragments were digested with DNase I followed by fragmentation of circular DNA with ultrasound.Paired-end adapters were ligated to fragments carrying the circularization adapters for sequencing with Genome Analyzer IIx instrument (Illumina, CA, USA).Finally, mate-pair DNA fragments were PCR-amplified with 17 cycles using a proof-reading enzyme.One lane of GAIIx run resulted in 18M of 2X52bp reads.These reads were mapped to chloroplast-specific contigs with MIRA 3.0.4 to boost the quality of the 454-derived sequences.The remaining two gaps were closed by combining the overlapped parts of the high-quality contigs in Consed v. 20.The complete cpDNA sequence of S. acus is available in GenBank (JQ 088178).

cpDNA genome annotation
Gene content was determined by BLAST similarity searches (Altschul et al. 1997) against the non-redundant database of National Center for Biotechnology Information.ORFs were localized using Artemis 12.0; tRNA-coding sequences were identified by tRNAscan-SE 1.23 (Lowe and Eddy 1997).Small and large ribosomal RNA subunit genes were identified by RNAmmer software (Lagesen et al. 2007) and by comparing S. acus cpDNA with rRNA genes from chloroplast genomes of other diatoms.Among ORFs, protein-coding genes were manually annotated on the basis of similarity derived from BLAST-searches.Transfer-messenger RNAs and signal recognition particle genes were predicted by ARAGORN (Laslett & Canback 2004) and SRPscan (Regalia et al. 2002), respectively.

Search for nuclear-encoded genes with chloroplastic targeting
To look for the specific genes in nuclear genome of S. acus, we searched for the similar sequences in a bulk set of the assembled contigs using tblastn protocol.To confirm the plastid targeting, analysis for the presence of the bipartite signal peptide with HECTAR tool was performed (Gschloessl et al. 2008).

Phylogenetic analysis of atpF amino-acid sequences
Sequences, which are closely related to S. acus atpF gene, were initially retrieved from GenBank on the basis of BLAST-hits against the non-redundant NCBI sequence database.Several distant sequences of red algae, cryptomonads, and haptophytes were used in the subsequent analysis as an outgroup.The sequences were aligned by MUSCLE (Edgar 2004) under default settings.The alignment was analyzed using the PhyML 3.0 package (Guindon & Gascuel 2003) with the LG amino acid substitution model (Le & Gascuel 2008), and the among-site rate variation approximated by a discrete gamma distribution with four rate categories (LG+ model).The initial tree was constructed using BioNJ.Subsequent tree topologies were searched by subtree pruning and regrafting (SPR).Bootstrap analysis was performed for 100 replicates.The consensus network was computed in SplitsTree program (Huson et al. 2006) with a threshold of 0.1 and edge weights representing split counts.A similar procedure was performed to reconstruct the phylogenetic network based on rbcL amino-acid sequences.Alignments of atpF and rbcL sequences are available on request.

Results and Discussion
The chloroplast genome of S. acus was sequenced and assembled using the whole-genome approach as described in section 2.2.This cpDNA is compared to other chloroplast genomes of diatom origin to integrate the newly available data into the overall concept of the plastid genomics of diatoms.Chloroplast genome of S. acus has a size of 116 251 bp and is mapped as a circular molecule.It is characterized as a canonical quadripartite structure with two inverted repeats, IRa and IRb, containing ribosomal RNA gene loci, that separate small (SSC) and large (LSC) single-copy regions (Figure 1).Plastid genome of S. acus is somewhat smaller than those of other diatom algae sequenced to date, whereas other general features of the genome such as GC-content and coding capacity are very similar between cpDNAs of diatoms/dinotoms group.
Gene set of S. acus cpDNA consists of 160 genes (Table 1).There are three rRNAs composing ribosomal loci in IRs, 27 transfer RNAs, which are sufficient for messenger RNA translation inside the organelle, and 128 protein-coding genes.We have also found a transfer-messenger RNA gene ssra which is believed to play a role in trans-translational termination (Roche & Sauer 1999) and signal-recognition particle RNA fss which participates in the transmembrane transport of nuclear-encoded genes with plastid localization.The protein-coding gene set includes 44 ribosomal protein genes, 44 photosynthesis-associated genes, and 40 other proteins.
Gene complements of diatom-related cpDNAs share a core set of 150 common genes.This includes 3 rRNA genes, 27 transfer RNAs, and 120 protein-coding genes.However, there are differences in gene content which are outlined in Table 2. Specifically, the S. acus plastid genome contains 10 genes absent from at least one of diatom-related cpDNA (Table 1).
S. acus plastid protein-coding genes use the standard plastid/bacterial genetic code (code table 1).In S. acus cpDNA, there are five genes with alternative start codons: rbcS, ccsA, rps8, and rpl23 use GTG, whereas ATT is used in secY instead of ATG.The most frequent stop-codon is TAA which is used in 117 genes.Ten genes end with TAG, two with TTA, and one with TGA.All diatom/dinotom plastid genomes are compact and have no intron sequences.Intergenic spacers do not exceed 120 bp.
Like in other diatom genomes, there is no intergenic spacer between rpl14 and rpl24 genes in S. acus cpDNA.Another similar feature is the presence of three identical cases of overlapping genes in S. acus cpDNA: sufC-sufB (1 bp), rpl4-rpl23 (8 bp) and psbD-psbC (53 bp) (Oudot-Le Secq et al. 2009).However, genes atpD and atpF do not overlap in contrast with other plastid genomes of diatoms and dinotoms.This feature is a clear synapomorphy of S. acus cpDNA which is probably a result of either relaxed constraints or extensive selection forces acting to atpF gene in this diatom.To further investigate the reason of low similarity of atpF gene, we performed phylogenetic analysis as described in section 2.5.S. acus atpF gene was shown to belong to the diatom/dinotom group of sequences as revealed by the phylogenetic network reconstruction (Figure 2A).The analysis failed to reveal the order of divergence within the diatom lineage.We performed phylogenetic network reconstruction based on rbcL gene amino-acid sequences from a similar set of taxons to see whether this polytomy could be explained by poor taxon sampling (Figure 2B).Like in previous case, diatom divergence is not resolved by the rbcL-network.However, according to rbcL data, S. acus falls into the same clade as other Synedra-specific sequences.Additional taxon sampling is still required for a solid understanding of atpF evolution in diatoms.Nevertheless, S. acus atpF is unlikely to have been acquired by horizontal gene transfer from some distant lineage, since the closest atpF sequences still belong to diatom/dinotom group.
As noted above, protein-coding gene sets in the plastid genomes of diatom origin share 120 common genes.It is well known that plastid genes tend to undergo a sequential process of transfer from chloroplast to nucleus, as it is believed to unify regulation of expression and eventually increase integration of the molecular cell machinery during the orchestrated response to changing environmental conditions (Lommer et al. 2010).In this respect, at least some of the genes noticed in Table 2 are hypothesized to have been transferred to the nucleus in the course of evolution.Interestingly, we found a sign of such gene transfer in S. acus.Acyl-carrier protein gene acpp, which is known to be an important component in fatty-acid biosynthesis inside the plastid, is found only in cpDNAs of O. sinensis and P. tricornutum.The BLAST-search for acpp-related sequences in the bulk assembly of S. acus 454 shot-gun reads revealed a contig with similar gene.Further analysis showed S. acus nuclear acpp gene to contain a bipartite signal of plastid localization which is responsible for directing the transcribed polypeptide towards chloroplast stroma through four plasma membranes.Given the paradigm of divergence of diatoms (Sims et al. 2006) coupled with the origin of cpDNAs of dinotoms (Imanian et al. 2010), a transfer of the plastid acpp gene to the nuclear host genome could hardly be considered as a single event that occurred in an ancient diatom ancestor.It should rather be regarded as a series of independent gene transfers in respective lineages of diatoms.
The small RNA genes detected in S. acus cpDNA are present in all plastid genomes of diatom origin except for O. sinensis (Table 2).Most likely, the absence of ssra and ffs genes in cpDNA of Odontella is a result of recent evolutionary event which occurred after divergence of this lineage from other groups of diatoms under analysis.

Conclusion
During the last decade, several complete organellar and nuclear genomes of diatoms have been sequenced.This encouraged establishing of the first model diatoms, namely T. pseudonana and P. tricornutum, and supported the origin of functional genomics of diatom algae.In terms of evolutionary biology, the presence of genomic data allows us to apply a rich repertoire of phylogeny methods on a whole-genome scale, to identify rearrangement events, cases of horizontal gene transfer, to perform large-scale phylogenetic analysis using multiple genes, etc.However, additional data are required for deep and solid understanding of all aspects of biology of diatoms.Chloroplast DNA sequence of S. acus is a small step forward in deciphering a complex quest which is challenged by diatoms, an extremely diversified and distinctive group of unicellular algae.
Table 1.Gene content of S. acus cpDNA
Table 2. Gene content differences between diatom plastid genomes