DNA Sequence Characteristics and Phylogenetics of Putative Imprinted Genes on Bovine Chromosome 29

Cattle are important livestock species with huge genetic resource for food security, agriculture and livelihoods. Over 60% of its genes are homologous to all mammalian species which creates a molecular basis for conducting comparative genomic analysis. Genomic imprinting has been implicated in a variety of biological functions and so identification of new or verification of known imprinted genes in livestock species is of high agricultural and biomedical importance. Fourteen (14) putative imprinted genes on bovine chromosome 29 (Bta 29) as well as the human (Hg 11) and mouse (Mm 7) orthologs were computationally characterized with respect to the CpG islands (CGI), transcription factor binding elements and sequence motif. Phylogenetic analysis was conducted across the three species for each of the genes identified to have promoter CGI. Promoter CGI were identified in ASCL2, TSSC4, CDKN1C, KCNQ1, PHLDA2 and NAP1L4. The promoter CGI were enriched with CpG containing transcription factor binding sites. Generally, it was observed that cattle was more closely related to human than mouse and that natural selection was the force driving the evolutionary change between the three species. Protein kinase motifs involved in phosphorylation were identified in the amino-acid sequences of ASCL2, TSSC4, PHLDA2 and NAP1L4. Our results suggest the post-translation regulation of imprinting and that the predicted promoter CGI can be assayed to determine molecular function, gene expression and DNA methylation status of the bovine putative imprinted genes.


Introduction
Genomic imprinting is widely studied due to its involvement in various biological processes such as brain function and behaviour (Garfield et al., 2011), tumorigenesis (Lim & Maher, 2010), control of intra-uterine growth and birth weight (Schulz et al., 2010), reprogramming during embryo and nuclear transfer as well as in somatic cell cloning (Kedia-Mokashi et al., 2011).It provides a mechanism to distinguish between the paternal and maternal genomes and also regulate biological processes.The number of experimentally confirmed imprinted genes is over 100 in mammals and studies indicate that up to 1,300 (Gregg et al., 2010) and as many as 2,000 additional genes (Nikaido et al., 2003) could still be imprinted in mammals.Most of these imprinted genes were identified in human and mouse.Very few imprinted genes have been identified in some livestock species (cattle, sheep, pig and rabbit) to be associated with the economically important traits such as milk yield, fat and meat deposition, fetal development, growth and carcass traits.Although the bovine genome is the best characterized livestock genome with high sequence coverage and with the highest percentage of annotated genes, however, less than two dozen imprinted genes have been experimentally validated (Imumorin et al., 2012).Imprinted genes are characterized by some genetic and epigenetic features.A remarkable feature of imprinted genes is that they are physically linked in clusters with other imprinted genes and do show unique patterns of sequence conservation (Hutter et al., 2010).The DNA sequence environment of imprinted genes is usually rich in CpG islands (CGI), repetitive elements and transcription factor binding sites (TFBS) (Paulsen et al., 2000;Neumann et al., 1995).These features are being used to analyze known and putative imprinted genes (Khatib et al., 2007;Luedi et al., 2007).Recently, a large cluster of imprinted genes has been mapped to the bovine chromosome 29 (Imumorin et al., 2012).Bovine chromosome (Bta) 29 is the equivalent of human chromosome (Hg) 11 and mouse chromosome (Mm) 7 which contains the highest number of imprinted genes.In the current study, we selected a total of 14 genes from the imprinting cluster on Bta 29 and carried out comparative genomic analysis with its human and mouse orthologs to facilitate a better understanding of the imprinting sequence features which will aid in the experimental validation of their respective imprinting status.

In silico Sequence Retrieval
The fourteen orthologous genes selected from the imprinting clusters on Bta 29, Hg 11 and Mm 7 were H19, IGF2, INS, TH, ASCL2, TSPAN32, CD81, TSSC4, KCNQ1, CDKN1C, SLC22A18, PHLDA2, NAP1L4 and OSBPL5.A structured query of these genes in the Otago catalogue of imprinted genes (http://www.otago.ac.nz/IGC) was performed to identify the respective imprinting status across the three species (Table 1).Genomic, transcript and protein reference sequences (RefSeq) for each of the genes were retrieved from the GenBank (NCBI).Source: Otago catalogue of imprinted genes.

Phylogenetic Analysis
The genomic RefSeq was used to determine the nucleotide variations; the coding sequence (CDS) was used to test for the type of selection and the protein sequence was used to build the phylogenetic trees and as well as the distance matrix.Using the pairwise sequence comparison (i.e.cattle/human, cattle/mouse, human/mouse), the rate of non-synonymous substitutions (d N ) and synonymous substitutions (d S ), d N /d S as well as neutrality index (NI) were determined from the CDS.These estimates were obtained using the DnaSP 5.0 programme (Librado & Rozas, 2009).Protein sequences were used to estimate the evolutionary distance and subsequently infer the phylogenetic relationships.The phylogenetic trees were constructed using the Neighbor-Joining method of MEGA5.2 software (Tamura et al., 2007) by selecting the pairwise distance model for amino acids substitutions.
A 1000 bootstrap replication test was performed and the evolutionary distances (p-distance matrix) formed the nodal parameter used in defining the clades.

DNA Sequence Features
The CGI predicted by each of the programmes for the putative imprinted genes on Bta 29 as well as its human and mouse orthologs were classified into promoter, intragenic and gene-terminal CGI (Figure 1).The repetitive elements were analyzed to compare the frequency and distribution of the short interspersed nuclear elements (SINE) particularly the Alu (Arthrobacter luteus) repeats.Table 2 shows there were repetitive sequences in all the 14 genes across the three species except for OSBPL5 and H19 in cattle and human respectively.There were no Alu repeats in the repetitive elements of all the putative imprinted genes on Bta 29.Note.BM = bases masked; ALU = Arthrobacter luteus expressed as a percentage of the SINE; SINE = short interspersed transposable elements; HG = human; MM = mouse.
The promoter CGI across the six genes were enriched with the CpG-containing E2F, ZF, EGR, KROX, SP1, AP2 and YY1 transcription factor binding sites (consensus sequence) but there were no TATA boxes.Evolutionary conserved domains were identified in all the six genes but site-specific motifs were only found in ASCL2, TSSC4, PHLDA2 and NAP1L4 (Table 3).These motifs were found within the conserved protein domain of each of the respective genes.The motifs were active sites for protein kinases involved in phosphorylation which is an important epigenetic mechanism for post-translation modification of gene expression Note.BHLH = myogenic basic helix-loop-helix; IT = ion transport; CDI = cyclin-dependent kinases inhibitor; PH = pleckstrin homology; NAP = nucleosome assembly protein; PKC = protein kinase C; cAMP = cyclic adenosine monophosphate; CK2 = casein kinase C; * = amino-acids residue position.

Molecular Evolution
The analyses of the genetic diversity was with respect to three pairwise datasets.Transition bias (Ts>Tv) which is a general property of DNA sequence evolution was observed in ASCL2, TSSC4 and CDKN1C (Table 4).In CDKN1C, all the pair-wise comparisons showed transition bias whereas in ASCL2 and TSSC4, the phenomenon only occurred in two of the pairwise comparisons.The transition/transversion rate ratio as estimated by the distance-based and maximum likelihood methods was less than 2 (Table 5).The pairwise comparison between cattle and human had the lowest nucleotide diversity (Pi) across the six genes except in NAP1L4 (Table 4).Note.Ts = transitions; Tv = transversions; Pi = nucleotide diversity.Note.P = transitional sites; Q = transversional sites; α = instantaneous transition rate; β = instantaneous transversion rate; HKY = Hasegawa-Kishino-Yano.
The d N /d S rate ratio was found to be less than 1 (d N /d S < 1) for all the pairwise comparisons in ASCL2, TSSC4, CDKN1C and NAP1L4.KCNQ1 had a d N /d S > 1 in all the pairwise comparisons while PHLDA2 had a d N /d S < 1 in all the pairwise comparisons except cattle/human (Table 6).The phylogenetic trees (Figure 2) showed that the out-group (Zebrafish/Red Jungle Fowl) was classified differently from the mammals which is consistent with the traditional classification.The results showed that cattle was more closely related to human than mouse in all the six genes except for NAP1L4.

Discussion
The analysis of the repetitive elements showed that the mammalian-wide interspersed repeats (MIRs) accounted for the SINE transposons as there were no Alus in all the 14 putative imprinted genes in cattle.This is expected since Alus are primate specific SINEs (Liu et al., 2009).Imprinted loci have been reported to contain fewer SINE transposons-derived sequences than non-imprinted loci and that there is a direct relationship between SINEs and imprinting (Greally, 2002).NAPIL4 had an unusual high percentage of SINEs in human.This according to Greally (2002), is characteristic of the sequence transition region TSSC3/NAPIL4 which is flanked by regions of increased SINE content.It was observed that the mouse orthologs were CpG poor which supports earlier reports in which about 20% of mouse orthologs of human genes do not always have CGI as a result of the evolutionary pressure towards conservation (Antequera, 2003;Illingworth et al., 2010).
The predicted CGI were assigned into promoter, intragenic and gene-terminal CGIs as described by Bock et al. (2006).Intragenic CGI also described as 'orphan CGIs' by Illingworth et al. (2010), is said to play a role in transcriptional initiation and dynamic expression during development.As such the abundance of these orphan CGI in cattle and human suggests a regulatory function associated with the various isoforms of the putative imprinted genes.The identification of promoter CGI in ASCL2, TSSC4, KCNQ1, CDKN1C, PHLDA2 and NAP1L4 is significant because promoter CGI are the major sequence characteristics of imprinted genes (Paulsen et al., 2008).Also, genes with promoter CGI often function as housekeeping genes (Weber et al., 2007).Three out of these genes (KCNQ1, CDKN1C & PHLDA2) had been experimentally validated to be imprinted in human while the other three are imprinted in mouse (Morison et al., 2005).Recently, PHLDA2 was reported to be imprinted in cattle (Sikora et al., 2012).This therefore suggests that ASCL2, TSSC4, KCNQ1, CDKN1C, and NAP1L4 may also be imprinted in cattle and their promoter CGI functionally involved in differential gene expression.
Our result supports earlier studies in which several TFs have been reported to contain CpGs in their recognition sequence and that promoter CGI lack TATA boxes (Deaton & Bird, 2011).According to Landolin et al. (2010), the enrichment of the promoter CGI with CpG-containing transcription factor binding sites is characteristic of imprinted genes.In this study, we identified several regions of sequence conservation in which core TF binding sites (Sox2, Nanog, Oct4) were found.Although most of the identified TFs were present within UTR and intronic regions, however, these may be potential sites for differential methylation (Hansen et al., 2012).A cross-matching of the identified conserved intergenic regions in five (INS, TH, ASCL2, TSSC4 and PHLDA2) of the putative imprinted genes corresponds to the imprinted gene clusters on Bta 29.This suggests that the core TFs within these intergenic regions may provide additional regulatory signals for the respective imprinting control centers (IC) of the gene clusters (Paulsen et al., 2008).The observed transition bias in ASCL2, TSSC4 and CDKN1C indicates that during the speciation of these genes, transition base substitutions were favoured over transversions inorder to ensure the conservation of the chemical nature of the proteins (Wakeley, 1996).
According to Zhao et al. (2006), base substitution is the main cause of gene variation, diversity and evolution of species.For all the six genes, the estimates of the mutational transition/transversion rate ratio were less than two (< 2) which according to Wang et al. (2012), suggests that mutations within these genes in the mammalian homologs (cattle, human & mouse) have reached a saturation status.
In this study, all the six genes except KCNQ1 had a d N /d S that is significantly less than one (d N /d S < 1; Z-test, p < 0.05) and with significant deviation from the neutral theory (NI < 1; Fisher's exact test, p < 0.05).This demonstrates that the evolution of ASCL2, TSSC4, CDKN1C, PHLDA2 and NAP1L4 have been driven by natural (negative) selection and not random drift.The constraining of d N is a way by which natural selection prevents potential changes to the underlying amino-acids, thereby stabilizing the expression of the respective gene products (Wolf et al., 2009).According to Schaffner and Sabeti (2008), diet, climate and disease are the most significant forces driving the conservation of amino-acid residues in mammalian populations.
The phylogenetic trees for ASCL2, TSSC4, CDKN1C, KCNQ1 and PHLDA2 were consistent with earlier studies wherein bovine proteins were reported to share more homology with humans than mouse (Tellam et al., 2009).According to Tellam et al. (2009), alterations in the organization of specific gene families in the bovine lineage could have informed peculiar genome similarities and differences across other mammalian species.It thus suggests that the evolution of all these six putative imprinted genes except NAP1L4, may have undergone cattle-specific changes that are indicative of the evolutionary adaptations to the immediate environment, disease challenges (Elsik et al., 2009), reproductive functions (Rodriguez-osorio et al., 2009), growth and development (Ulzun et al., 2009).The conserved site-specific motifs in ASCL2, TSSC4, PHLDA2 and NAP1L4 were protein kinases involved in phosphorylation.The role of protein kinases in phospho-regulation has been compared to the transcription regulatory activity of TFs.According to Mair (2009), just as TFs regulate genes via recognizing specific DNA sequences, protein kinases only phosphorylate proteins that contain particular amino acid motifs.
The specific role of each of these protein kinases (cAMP, PKC, CK2) in glycogen regulation, muscle development and cellular regulatory mechanism within the respective domains of the queried genes could be an important evolutionary source of phenotype variability (Beltrao et al., 2009).The absence of any site-specific motif in the conserved domains of the bovine putative imprinted genes; KCNQ1 and CDKN1C as well as in its human and mouse orthologs suggests that the imprinting mechanism of these genes lie solely within its DNA sequences.That is, phospho-regulation may not be involved in the epigenetic regulation of the two genes.

Conclusion
The in silico characterization of the imprinted genes can then be used to predict molecular function or gene expression.The six bovine putative imprinted genes identified to have promoter CGI can be further assayed to determine their DNA methylation status.This study confirms that at the proteomic level the bovine genome shares more homology with humans than mouse which is consistent with the National Human Genetic Research Institute's assessment of cattle as an excellent model species for biomedical research.Our study reports the in silico phospho-regulatory mechanism of imprinting in ASCL2, TSSC4, PHLDA2 and NAP1L4.This post-translation modification will require further experimental validation especially in genes that have defied the DNA methylation hypothesis for genomic imprinting.

Figure 1 .
Figure 1.CGI categories across the finders for cattle, human and mouse

Table 1 .
Sizes (kb)and imprinting status of the Bta 29 genes and its orthologs in human and mouse

Table 2 .
Percentage of repetitive sequences in Bta 29 imprinted gene cluster and its orthologs in human and mouse

Table 3 .
Evolutionary conserved domains and motifs

Table 4 .
Phylogenetic analysis of the six putative imprinted bovine genes

Table 5 .
Estimates of the mutational transition/transversion rate ratio

Table 6 .
Synonymous (d S ) and Nonsynonymous (d N ) nucleotide substitution rate Note.NI = neutrality index.