Assessment of DNA Barcoding for the Identification of Chenopodium murale L . ( Chenopodiaceae )

Chenopodium murale L. (Chenopodiaceae) is an erect annual herbaceous weed. This species is a threat for ecosystems worldwide as this annual-weed affects the growth and development of other plants by reducing the biological nitrogen-fixing ability. We evaluated the barcoding genes of the plastid region of C. murale [rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase) and matK (Maturase K)] for the success of PCR amplification, the differential inter-specific divergences and the ability of a single gene or combination of rbcL and matK genes to discriminate C. murale as individual species. Online nucleotide database-search using individually produced sequences of rbcL and matK gene of the plastid region primarily identified the specimen as C. murale with 100% sequence similarity. The single gene matK showed better resolution of the tree compared with the other phylogenetic-trees that were inferred from the single rbcL gene sequence or the combination of sequences of rbcL and matK. We found that single gene sequence of matK had high discrimination efficiency for the identification of the species C. murale as well as for the other 11 species (C. album, C. ambrosioides, C. bonus-henricus, C. ficifolium, C. foliosum, C. glaucum, C. polyspermum, C. rubrum, C. simplex, C. urbicum & C. vulvaria) under the genus Chenopodium.


Introduction
Chenopodium murale L. (nettle leaves goosefoot, Chenopodiaceae) is an erect annual herbaceous weed native to Eurasia (Holm et al., 1997).Invasive plant species like C. murale is a threat for ecosystems worldwide as this annual herbaceous weed affects the growth and development of other plants by reducing the biological nitrogen-fixing ability (Batish et al., 2007).Predicting the invasive potential of introduced species remains difficult as mostly they become invasive in the absence of closely related species in the native flora (Schaefer et al., 2011).C. murale is one of the fast-growing annual-plant of the family Chenopodiaceae and is widespread throughout different habitat type.This species is reported to have negative association with many species even with the plants that have similar ecological requirement and considered a pest in agro-eco systems (El-Khatib et al., 2004).Recently DNA barcoding has been used for the discrimination of the noxious invasive plant species from non-invasive relatives (Van-De-Wiel et al., 2009).A DNA barcode is one or more short gene sequences taken from a standardized portion of the genome to identify species with the ultimate goal of quick and reliable species-level identifications across all forms of life, including animals, plants and microorganisms (Kress & Erickson, 2008).The Consortium for the Barcode of Life (CBOL) plant working group recommended the 2-locus combination of ribulose-1,5-bisphosphate carboxylase oxygenase large subunit (rbcL) and maturase K (matK) as the standard plant barcode based on assessments of recoverability, sequence quality and levels of species discrimination (CBOL Plant Working Group, 2009).Recently, several investigators used rbcL and matK sequences for barcoding or species identification (Asahina et al., 2010;Starr et al., 2009) as well as for phylogenetic analysis (Kuo et al., 2011;Manen et al., 2004;Tamura et al., 2004).However, universal barcode markers may not work in all cases (Roy et al., 2010).The first practical problem for barcoding of plants is the acquirement of sufficiently clean DNA for multi-locus sequencing as isolated plant-DNA contains PCR inhibitors (Aras et al., 1993;Temiesak et al., 1993;Vanijajiva et al., 2005).The second technical issues of primer universality and sequence quality and complexity remain arguable for barcoding of all the land plants of different region (Fox et al., 1992;Schneider & Schuettpelz, 2006).Especially, success in PCR is a pre-requisite for barcoding as this technique is exclusively used to amplify the target sequence.Thus achieving high PCR success rates continues to be important, particularly for environmental sampling studies where primer bias and non-universality dramatically skews results (Soininen et al., 2009).Plants of the arid environment are adapted to enduring harsh climatic conditions, thus possessing different survival characteristics and molecular diversity (Bokhari et al., 1990;Kamal et al., 2010).Universal barcode markers (matK and rbcL) need to be evaluated for a broader spectrum due to morphological/geographical variation and reticulate evolution in plant species.The survival characteristics and diversity of plants in Saudi Arabia are different as they grow under harsh environment; the barcodes of most of the arid plants remains to be discovered.It is imperative that molecular variations in arid plants may have some impact on the PCR amplification of barcoding markers (Bafeel et al., 2011).The genus Chenopodium is characterized using the non-coding trnL-F (cpDNA) and nuclear ITS regions (Bazan et al., 2012).We are not aware of any literature on barcoding of C. murale or the genus Chenopodium using matK and rbcL genes, except Schaefer et al. (2011) which mainly aimed at the naturalization process of invasive plants, not specific to C. murale.Gene-sequence based molecular characterization (barcoding) of C. murale may enhance the current morphology based identification system as using molecular technique, identification is possible at different life stages (seeds and seedlings) or using fragments of plant material and thus will help in controlled inventory and ecological surveys of this invasive weed.In this study usefulness of the rbcL and matK gene-sequence (barcoding) was assessed using three criteria: the success of PCR amplification, the differential inter-specific divergences and the ability of single gene or combination of these genes to discriminate this weed plant species from closely related species under the genus Chenopodium.

Materials and Methods
This study comprised of specimens (C21a and C21b) taken from Riyadh, Saudi Arabia.The plant-specimens were morphologically identified at herbarium, King Saud University.DNeasy plant mini kit (Qiagen) and an automated DNA extraction instrument (QIAcube, Qiagen) were used for DNA isolation.Amplification of plastid rbcL and matK region was conducted following Bafeel et al. (2011).Sequences were determined directly using the dideoxynucleotide chain-termination method with a DNA-sequencer (ABI PRISM 3130xl; Applied Biosystems/Hitachi). Sequences were submitted to DDBJ/EMBL/GenBank database (Accession numbers; JQ988068, JQ988069 & JQ988070).BLAST (Basic Local Alignment Search Tool) and recently developed BOLD (The Barcode of Life Data Systems) searches were applied to the produced sequences using the available online databases.Closely related and all the currently available sequences of the species under the genus Chenopodium were retrieved from DDBJ/EMBL/genBank nucleotide databases and aligned using CLUSTAL X (version 1.81).Phylogenetic analyses were conducted using MEGA5.

Results
Database-search was conducted using individually produced sequences of rbcL and matK gene of the plastid region of the plant specimen.BLAST and BOLD-searches using single gene sequence of rbcL primarily identified the specimen as C. murale (JN891260) with 100% sequence similarity.BLAST search using single gene matK also identified this specimen as C. murale (JN894392) with 100% sequence similarity.On the contrary, newly developed BOLD database-search misidentified this specimen as C. simplex with sequence similarity of >98%.Individually produced rbcL gene sequence of the studied specimen (C21a, C. murale) and C. simplex shared 100% pair-wise sequence similarity hence, it was difficult to assign under a single species (  1).On the contrary, matK gene sequence of the studied specimens (C21a & C21b) showed 100% pair-wise sequence similarity with a single species C. murale therefore, it was not difficult to assign the specimen under a single species (Table 2).Plastid matK gene showed the lowest level of average pair-wise sequence similarity (94.5%) and distinguished all the plant species under the genus Chenopodium.The pair-wise matK gene sequence similarties among all the individual species ranged from 91.5% (C.ambrosioides with C. ficifolium) to 99.6 (C.foliosum with C. bonus-henricus).Plastid rbcL gene sequence could not distinguish C. album from C. ficifolium and C. murale from C. simplex due to identical sequence.In contrast, matK gene sequence distinguished all the species.Pair-wise matK gene sequence similarity was 99.0% between C. album and C. ficifolium and 98.1% between C. simplex and C. murale (Table 2).Combination of rbcL and matK gene sequence of the studied specimen also showed 100% pair-wise sequence similarity with a single species, C. murale (Table 3).Pair-wise sequence similarities for the combination of rbcL and matK gene for all the species ranged from 93.8% (C.ambrosioides with C. ficifolium; C. ficifolium with C. glaucum) to 99.5% (C.foliosum with C. bonus-henricus) (Table 3).Overall average pair-wise rbcL single gene sequence similarity was higher compared with that of the single gene matK or combination of rbcL and matK gene sequences.Overall average pair-wise gene sequence similarities among all the plant species under the genus Chenopodium were observed as 98.0% (rbcL) > 96.0% (rbcL + matK) > 94.5% (matK).The single gene matK showed better resolution of the tree compared with the trees that were constructed from the single rbcL gene sequence or the combination of sequences of rbcL and matK genes (Figures 1-3).The tree that was inferred from single gene sequence of matK placed the specimen with the single species C. murale and this clade was supported by 100% bootstrap value therefore, could distinguish the specimen as C. murale unequivocally.Tree constructed with the single matK gene placed all the plant species under the genus Chenopodium separately (Figure 2).Phylogenetic tree that was constructed from the single gene rbcL placed the studied specimen with C. murale and C. simplex however this clade was supported by comparatively low bootstrap value (62%).The tree that was inferred from the single gene sequence of rbcL also failed to sparate C. album from the species, C. ficifolium (Figure 1).Phylogenetic tree constructed with the combination of rbcL and matK gene sequences placed the specimen with C. murale.This tree also placed all the plant species under the genus Chenopodium separately, similar with the single gene tree of matK (Figure 3).

Discussion
We observed success of both rbcL and matK universal primers for PCR amplification of this plant species.In some cases, the presence of certain metabolites is reported to interfere with the plant DNA isolation procedures and downstream reactions such as PCR amplification (Khanuja et al., 1999).Genetic techniques for species identification based on multiple genes (rbcL & matK for plants; COBL Plant Working Group, 2009) or single-gene (CO1 for animal; Hebert et al., 2003) sequence similarity or phylogenies have been rapidly gaining wide use.BLAST, the nearest distance/similarity of sequences and phylogenetic tree based method have been used as bioinformatics tools along with molecular techniques for the identification of plants (Liu et al., 2012a;Liu et al., 2012b;Pang et al., 2012).In this study, BLAST search using both rbcL and matK primarily identified the query specimen as C. murale except BOLD search with matK gene sequence, which misidentified the specimen.There are very few records of plant species on the current GenBank/BOLD databases (Ratnasingham & Hebert, 2007), so queries may not return an authentic match, however provides an insight of the probable nearest species depending on the availability of the reference sequences on that database.Plastid rbcL is the most commonly sequenced gene for phylogenetic studies of plants (Schuettpelz et al., 2006) because of matK gene is difficult to amplify (Bafeel et al., 2011) and sequence (Hollingsworth, 2011) for some group of plant.However, the problem related with the rbcL gene is that it shows the lowest discriminatory power at the species level (10% for rbcL, 31.25% for matK, 63.6% for trnH-psbA, 76.9% for ITS; Ren et al., 2010).Therefore, assignment of taxa is problematic for some plant species (Zuccarello & Lokhorst, 2005) due to low level of discriminatory power (Chen et al., 1999;Gadek & Quinn, 1993;Les et al., 1997) as demonstrated by this study.Pair-wise rbcL gene sequence similarities were 100% between C. murale and C. simplex as well as C. ficifolium and C. album therefore, it was difficult to discriminate these species.In our study, single gene matK showed the lowest average pariwise sequence similarity compared with the other single gene rbcL as well as combination of rbcL and matK.
Nevertheless the gene matK is not easy to amplify in some groups of plants and that additional work on primer development is suggested (CBOL Plant Working Group, 2012).For example, generation of matK sequences for ferns showed problematic, because this part of the chloroplast genome underwent a strong restructuring during the evolution of the fern clade (Duffy et al., 2009).Therefore, rbcL and trnL-F (instead of matK) were used for the two-locus barcoding of European ferns (De-Groot et al., 2011).

Conclusion
Our results demonstrated the ability of a single barcoding gene (matK/rbcL) or combination of these genes to discriminate species under the genus Chenopodium.We found that single gene sequence of matK had high discrimination efficiency not only for the single species C. murale but also for the discriminations of all the studied species under the genus Chenopodium.Therefore, this region could serve as a valuable barcode for identifying Chenopodium species.The short sequence of single gene matK is an informative and potentially powerful molecular tag for identifying the weed plant species C. murale as well as other species under the genus Chenopodium.

Figure 1 .
Figure 1.Maximum Likelihood (ML) phylogenetic tree inferred from the single gene sequence of rbcL showing the relationship of the specimen (C21a) and other members under the genus Chenopodium.The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches.The tree is drawn to scale, with branch lengths measured in the number of substitutions per site.Corresponding genBank accession numbers are written in the parentheses

Table 1 .
Pair-wise similarity between rbcL gene sequences of the plant species/specimen under the genus Chenopodium.Identical sequence similarities are typed in bold.Overall mean, 98.0%

Table 2 .
Pair-wise similarity between matK gene sequences of the plant species/specimen under the genus Chenopodium.Identical sequence similarities are typed in bold.Overall mean, 94.5%

Table 3 .
Pair-wise similarity between combination of rbcL and matK gene sequences of the plant species/specimen under the genus Chenopodium.Identical sequence similarities are typed in bold.Overall mean, 96.0%