Random Local Clock and Molecular Evolution Studies on Corydalis ( Papaveraceae s . l . )

Molecular evolutionary rate and dating studies have been carried out on nuclear ribosomal DNA (nrDNA ITS I and II) and non-coding plastid DNA (rps16 intron) nucleotide sequences from Corydalis and other related genera in the poppy family Papaveraceae s. l. Strict, relaxed, and local clock models have been compared. Although it has been suggested that the nrDNA ITS I and II regions evolved in concert, they may have evolved at different evolutionary rates under differential constraints. Based on random local clock and uncorrelated relaxed lognormal clock Bayesian analyses using BEAST v.1.6.2, the ITS II region may have evolved in a clock-like fashion.


Random Local Clock Analyses as a Tool
Recently, random local molecular clock (LMC) theory has been developed, allowing variable branches in lineages as well as constant rates in shared clades (Drummond & Suchard, 2010;Drummond, Suchard, Xie, & Rambaut, 2012).This has depended on the introduction of sophisticated statistical parameter testing approaches (Drummond & Suchard, 2010;Drummond, Rambaut, & Suchard, 2011a;Drummond et al., 2012).Nuclear ribosomal DNA ITS I (Internal transcribed spacer I) and ITS II (internal transcribed spacer II), and plastid DNA rps16 intron regions have been studied for phylogeny and molecular dating of flowering plants by various researchers (Baldwin et al., 1995;Couvreur, Ferest, & Baker, 2011;Eiserhardt et al., 2011;Hong & Jury, 2011;Scheunert & Heubl, 2011).The nrDNA ITS I and II regions (less than 300 bp each) have been considered to evolve in concert in flowering plants (Baldwin et al., 1995), and the cpDNA rps16 region is expected to evolve faster than coding gene regions such as the rbcL gene (Couvreur et al., 2011).

The Important Medicinal Plants Corydalis as an Evolutionary Rate Model System
Oriental traditional herbal medicine practitioners have for centuries used as potent pain-killers the roots (tubers) of a number of Corydalis species (Kim et al., 2011).The genus Corydalis belongs to the poppy family Papaveraceae s. l. (Damerval & Nadot, 2007;Lengyel, Gove, Latimer, Majer, & Dunn, 2010;Dar, Koul, Naqshi, Khuroo, & Malik, 2011).More than 200 species are included in the genus, which is distributed mostly in the Northern Hemisphere, having its origin in the ancient Laurasia region (Suau et al., 2005;Kundu, 2008;Dar et al., 2011).The potent herbal medicinal properties of Corydalis (Singh et al., 2003;Suau et al., 2005;Wangchuk, Bremner, Samten, Rattanajak, & Kamchonwongpaisan, 2010) invite attention in terms of phylogeography and molecular evolution to establish how the plants have diversified and what their molecular evolutionary rates are.
The stem and crown group ages of Papaveraceae have been estimated as between 114-121 and 106-119 million years (Myr) by the analysis of chloroplast coding gene rbcL region analyses (Anderson, Bremer, & Friis, 2005).Even though the phylogeny of Papaveraceae has been studied previously using coding and non-coding DNA regions such as rps16 intron, ITS I & II, atpB, rbcL, and trnK restriction sites (Hoot et al., 1997;Lidén, Fukuhara, Rylander, & Oxelman, 1997;Damerval & Nadot, 2007), no research has been done to establish divergence time and evolutionary rate estimations of the genus Corydalis at inter-and infra-specific levels compared to other sister groups within the family Papaveraceae s. l.Calibration dates using external constraints (Bromham, 2003) are becoming an important factor in the estimation of molecular dates of lineages.Different estimates may arise by employing different calibration dates using fossils, or nucleotide or amino acid sequences (Bromham & Penny, 2003).In the current work, the important oriental medicinal plants from Corydalis and other closely related genera have been studied using non-coding chloroplast and nuclear DNA nucleotide sequence data to estimate evolutionary diversification times (speciation events) and rates.The following questions were also addressed in this study; 1) If nrDNA ITS I and and II regions evolve in concert, then do they share the same or similar evolutionary rates?)Are there any rate differences between nuclear ribosomal DNA and non-coding chloroplast DNA nucleotide sequences?

DNA Data Sets
For the BEAST analyses, non-coding chloroplast (rps16 intron) and nuclear ribosomal DNA (ITS I and ITS II) sequences of species from Corydalis and other related genera in Papaveraceae S. l. were downloaded from GenBank website (Table 1).A total of 41 taxa were included in the analyses, and 19 (ITS I), 19 (ITS II) and 22 (rps16 intron) taxa were tested for phylogeny-based Bayesian inferences using BEAST (Drummond et al., 2007).Sequence alignments were obtained by using the ClustalW program (Thompson, Higgins, & Gibson, 1994) available in MEGA 5.0 software (Tamura et al., 2011).Nucleotide sequence lengths of 268 (ITS I), 263 (ITS II), and 898 (rps16) base pairs were evaluated to generate the BEAUti XML files (Drummond et al., 2007).Separate Markov chain Monte Carlo (MCMC) analyses were carried out on the ITS I data using either all outgroups (including Hypecoum imberbe, Sarcocapnos enneaphylla, Ichtyoselmis macrantha, Lamprocapnos spectabilis, and Papaver rhoeas; in total 23 taxa), or a reduced set of only two outgroups (Hypecoum imberbe and Sarcocapnos ennephylla; in total 19 taxa).Similarly, with ITS II data either three outgroups were included (Hypecoum imberbe, Sarcocapnos ennephylla, and Papaver rhoeas) or just two (Hypecoum imberbe and Sarcocapnos ennephylla, and thus identical to the reduced data set for ITS I) (Tables 1-2).

Bayesian Inference Analyses
For the MCMC (Markov chain Monte Carlo) Bayesian inference analyses, the BEAST v 1.6.2software package (Drummond et al., 2011) was installed and utilized.For molecular clock estimations, strict and random local clock alternatives were compared.For a priori age inference, the age of Corydalis nobilis was estimated as 26 million years (Myr) and Hypecoum imberbe as 50 Myr (Figure 1; Lidén et al., 1997;Anderson et al., 2005;Damerval & Nadot, 2007) for both ITS I and ITS II Bayesian analyses.For the rps 16 sequence data, Hypecoum imberbe was used for calibrating age estimation (Figure 1; Lidén, Fukuhara, Axberg, 1995;Lidén et al., 1997;Anderson et al., 2005;Damerval & Nadot, 2007).Tracer v 1.5 was used for posterior parameter estimations and for other outputs of BEAST analyses, and FigTree v 1.3.1 was used to display the annotated Maximum Clade Credibility tree (MCC, TreeAnnotator v 1.6.2) (Figure 1, Drummond et al., 2007).To combine log files, LogCombiner v 1.6.2(Drummond et al., 2007) was run using at least three time-repeated BEAST results.To obtain more than 200 ESS (effective sample sizes), the chain lengths of MCMC operation were set to 10,000,000 and for the burn-in processing, 1,000,000 steps were discarded.
In order to define confidence levels, 95% HPDs (95% highest posterior density) intervals were measured (Drummond et al., 2007).Two DNA substitution model tests, HKY (Hasegawa-Kishino-Yano, 1985;Yang, 1994) and GTR (General Time Reversible, Waddel & Steel, 1997;Gatto et al., 2006), were performed using the BEAUti v 1.6.2.program (Drummond et al., 2011).The site heterogeneity option GI (Gamma + Invariant Sites) or G 8 (gamma + 8 shape parameters, Rannala & Yang, 2007) under the substitution models was employed to permit different rates at different sites (Yang, 1994;Drummond et al., 2007) for strict molecular analyses.For the molecular clock variation analyses, strict vs random local clock models were compared using BEAUti v 1.6.2.(Drummond et al., 2011b).The random local clock option was expected to provide information on the relative evolutionary rates (fast vs slow) in different clades in lineages of the data while keeping rates constant within the shared clades in phylogenetic trees (Douzery et al., 2002;Drummond & Suchard, 2010;Drummond et al., 2011b).The uncorrelated relaxed lognormal option was expected to provide information on the degree of clock-likeness of the data (Drummond et al., 2007).

Estimated Evolutionary Rates among the Groups
The evolutionary rate and variance estimations are shown in Table 2.The rate estimation is based on the 19 total taxa and 263 base pairs (sites) for joint ITS I and ITS II reduced data set MCMC analyses (Table 1).The results of the strict molecular clock analysis show that mean rates of change of nrDNA ITS I, ITS II, and cpDNA rps16 intron sequences of Corydalis were 1.70 x 10 -8 , 9.24 x 10 -8 , and 2.52 x 10 -8 substitutions per site per Myr, respectively (Table 2).Thus for the reduced data set ITS II evolves at a 1.5 -5 times faster rate than ITS I and rps16 intron regions, based upon strict molecular clock analysis using the HKY base substitution model option.When the complete data set with all 5 outgroups (see Methods section) was used for ITS I analyses, the mean rate (clock rate) was 19.06 x 10 -8 substitutions per site per Myr (Table 2) in the strict model (HKY + G + I / HKY base substitution option with heterogeneity site option with gamma distribution including invariant sites).The mean rates for the reduced and full ITS I nucleotide sequence data sets using the strict model are about 10 times faster when more outgroups are added: 1.70 x 10 -8 vs 19.06 x 10 -8 substitutions per site per Myr (Table 2).Thus the current study suggests that adding more distant outgroups may influence the rate estimations for ITS I and probably also ITS II.ITS I (total 19 taxa) and rps16 regions performed badly (data not shown) in the uncorrelated relaxed lognormal clock analyses employing GTR or HKY base substitution options, generating less than the threshold effective sample sizes (200).Generating longer MCMC chains may result in extended analysis times, depending on the sample (taxon) sizes, but involving more parameters could also make analyses complicated (http://groups.google.com/group/beast-users).The reason for the poor performance of relaxed clock analyses in both ITS I and rps16 nucleotide sequences of Corydalis and related genera may be related to the time of mixing the posterior distribution of parameters.However, the outcome for the ITS II region (reduced data set) was better in both uncorrelated relaxed lognormal clock and random local clock analyses (Table 2) with the same MCMC chain length generations (10,000,000), giving mean rates of 18.16 x 10 -8 substitutions per site per Myr, respectively (Table 2).For ITS II analyses, both relaxed and local clock models (GTR) generated approximately the same rate compared to the strict clock model (HKY, Table 2).
For ITS II analyses, both relaxed and local clock models (GTR) generated rates of approximately twice (9.24 x 10 -8 vs. 18.16 x 10 -8 or 16.44 x 10 -8 substitutions per site per Myr, respectively (Table 2) than the strict clock result.In both relaxed lognormal (uncorrelated) clock and random local clock models (with GTR substitution) of ITS II region analyses, the coefficient of variation (  , Drummond et al., 2006) value was much smaller than 1.0, indicating that the nrDNA ITS II region may evolve in a clock-like fashion in Corydalis and other related genera (Table 2).

Discussion
Based on the results reported here, nuclear ribosomal DNA (nrDNA) ITS I and II regions may undergo rather independent evolutionary pathways, which is in contrast to a previous report (Baldwin et al., 1995) that these two non-coding regions of nrDNA evolved in concert in flowering plants.At the nucleotide level in Corydalis and other related genera (Papaveraceae s. l.), the ITS II region evolved approximately 5 times faster than did the ITS I region (Table 2).The clades found in this study to be sharing local molecular clocks indicate that some clades in a lineage may evolve faster than others, indicating that a 'rate shift' is present in the lineages (Figure 1).
The evolutionary rate of the nrDNA ITS II region is more than 3 times higher than that of the non-coding cpDNA rps16 intron region in this study, indicating that these regions are under different constraints which are expected as they originate from different organelles -nuclear vs chloroplast.The ITS II region also may have evolved in a clock-like fashion as evidenced by the low coefficient of variation values (<1.0) shown in Table 2.
As Bromham (2003) points out, more sophisticated statistical methods should be designed and developed to avoid using controversial and disparate age calibrations from either fossils or molecular data.This would improve the accuracy of molecular dating or evolutionary rate estimation.
In conclusion, the rates of molecular evolution at the nucleotide level in nrDNA ITS I and ITS II regions and non-coding plastid DNA may show variations in different organisms.MCMC-based maximum-likelihood-analysis-driven Bayesian inference studies are being developed rapidly with more efficient and sophisticated statistical parameter settings, and thus it might be expected that a better understanding of the operation of the molecular clock in a wide range of organisms will be achieved in the near future.
toHasegawa-Kishino-Yano (1985) model; ** refers to clock rate in the strict clock model, and the mean rates in relaxed or local clock models; ITS I † indicates the data obtained by using total 19 taxa (see Materials and Methods section);

Figure 1 .
Figure 1.The MCC tree generated from ITS II nucleotide sequence information (full data set of 21 taxa; 263 sites) The rates are color-coded: low (blue); intermediate (black); and high (red).The local clocks were assumed to be shared among the clades with the same color.The branches with probabilities (posterior) limit greater than 0.1 (> 0.1) were set, and they are labeled with HPD (95% Highest Probability Density) ranges in brackets at the nodes.The stars (filled black) indicate calibration sites for age estimation.The arrows with numbers (1-8) with letters F (fast) or S (slow) refer to evolutionary 'rate shift' at the nodes.The unit of the black bar scale at the bottom indicates 9.0 x 10 7 substitutions per site per Myr.

Table 1 .
Details of the taxa used in this study and their ITS I, ITS II and rps16 GenBank/EMBL accession numbers

Table 2 .
Evolutionary rate and variance estimations for Corydalis