Identifying Significant Biological Markers in Klotho Gene Variants Across Wide Ranging Taxonomy

Biological aging is marked by progressively degenerative physiological change that causes damage to tissues and organs. Errors in biopolymers accumulate over time; mitochondrial dysfunction, telomere attrition, and wider genomic instability lead to an altered state of intercellular communication. In this investigation, my focus will be aimed at examining and identifying specifically critical biomarkers in genetic variants of KLOTHO (a transmembrane protein involved in the genetic regulation of age-related disease) among organisms with varied life spans that range across wide taxonomical rankings. Here, I investigate the correlation between lower and higher frequency α-amino acid compositions in Klotho protein factors within a grouped methodology; as to also include several demonstrative techniques in comparative sequence analysis for inferring relatedness in evolutionary context.


Introduction
Promising new research in life extension has led to remarkable advancements in our basic understanding of the molecular mechanisms associated with aging.Within the last two decades, much of this research has focused on identifying potential genomic candidates for longevity, while seeking to explain how these individual genes can affect the biological process of aging.One of the proteins in particular (KLOTHO) has been the subject of several recently published papers.A handful of studies now suggest that genetic variants of KLOTHO [encoded by the KL gene] are associated with human aging and tumor suppression, and trials on model organisms involving KLOTHO variants has shown improvement in cognition and deceleration in age-related development (Dubal et al., 2014).In one such case, Dubal et al. (2014) demonstrated that systemic overexpression of Klotho variants in transgenic mice enhanced cognition and increased longevity by an average ratio of ±25 percent; whereas Klotho-deficient mice manifested a syndrome resembling accelerated human aging and displayed extensive and accelerated arteriosclerosis (Dubal et al., 2014).
For reasons that are not yet fully understood, Klotho-associated mechanisms, "change cellular calcium homeostasis, by both increasing the expression and activity of TRPV5 and decreasing that of TRPC6 (Kurosu et al., 2005)."Moreover, altered mineral-ion homeostasis could be a cause of premature aging-like phenotypes (Kurosu et al., 2005).In order to gain a more comprehensive understanding of the underlying functions in molecular components of KLOTHO, we should begin by examining KL-derivative patterns across a wide evolutionary spectrum, without limiting ourselves to one individual taxa or another.Because the lifespan of organisms vary widely among species, a comparative approach could help us identify a set of unique signatures in the molecular variation patterns of KLOTHO.In turn, this may help provide meaningful reference toward a fully systematic investigation.Throughout the course of this paper, I seek to address those areas thoroughly.
It should then be noted: This paper does not presume on a solution to longevity; nor does it seek to draw parallels between the many collective biological features that allow organisms to persist in semi-optimal states.Instead, my focus will be aimed at identifying and examining specifically critical biomarkers, or sequence motifs, in genetic variants of KLOTHO among organisms with varied life spans that range across wide taxonomical rankings.Here, I investigate the correlation between lower and higher frequency α-amino acid compositions in Klotho protein factors within a grouped methodology; as to also include several demonstrative techniques in comparative sequence analysis for inferring relatedness in evolutionary context.

Sequence Selection and Group Categorization
Biological aging is marked by progressively degenerative physiological change that causes [irreversible] damage to tissues and organs.Errors in biopolymers accumulate over time; mitochondrial dysfunction, telomere attrition, and wider genomic instability lead to an altered state of intercellular communication.Within a broader spectrum, degrees of longevity vary widely among organisms.Only a tiny percentage of species exhibit characteristics that is counter-intuitive to universal mechanisms of biological aging.Turritopsis dohrnii and Polycelis feline are two such examples of eukaryotic organisms that appear to exhibit, "limitless telomere regenerative capacity fueled by a population of highly proliferative adult stem cells (Tan et al., 2012)."Several other species of marine Crustaceans share similar traits that prolong biological aging.
Individual-scale longevity is largely determined by inheritance; genes can explain up to 35 percent (American Federation for Aging Research, 2012).As noted earlier, recent studies indicate that Klotho gene variants may be potential candidates for the genetic regulation of age-related disease.For this purpose, I selected the KL gene as a prime candidate for analysis and evaluation.Furthermore, using homologous variants of Klotho protein sequences as a primary source for comparative studies, I grouped and categorized six distinct genomic datasets of KLOTHO [in FASTA format]  The six protein sequences used for curating the experimental dataset were obtained via NCBI Protein database.
The accession references for each individual sequence are detailed in the sections below.

Regarding group type a
Two of three type a organisms exhibit extraordinarily prolonged life cycles.Heterocephalus glaber are highly resistant to cancer, and can maintain a youthful vascular function and cellular oxidant-antioxidant phenotype relatively longer and are better protected against aging-induced oxidative stress than shorter-living rodents (Buffenstein, 2009;Csiszar et al., 2007;Pérez et al., 2009).Danio rerio have shown the ability to regenerate their fins, skin, and heart (Jopling, 2010;Sun et al., 2009).Jopling et al. (2010) demonstrated that [zebrafish] can fully regenerate its heart after amputation of up to 20 percent of the ventricle.Although very little is known about the average lifespan of green sea turtles, Chelonia mydas reach sexual maturity between 20 to 50 years of age (United States Fish and Wildlife Service, 2005).

Performing Pairwise Sequence Alignment
Following group categorization, I deployed UGENE's pairwise sequence alignment (PSA) tool in order to identify similarity probabilities among my six protein sequences [length: 1,037].Initially, I produced five pairwise sequence alignments by pairing each Klotho protein sequence against Homo sapiens [type b] and then grouping the results in numerical sequence [see below].In this operation, Hirschberg algorithm was selected for optimal matching.A reliable algorithmic selection, Hirschberg increases efficiency by maximizing the sum of pairwise scores with quasi-gap penalties (Chao & Zhang, 2008).The Hirschberg algorithm can be derived from the Needleman-Wunsch algorithm by observing the following (Hirschberg, 1975): is the optimal alignment of (X,Y), and X = X l + X r is an arbitrary partition of X, there exists a partition When calculating the highest number of consistent patterns in a local alignment, gap penalty scores are often disregarded (Lassmann & Sonnhammer, 2005).However, because I am interested in obtaining a global alignment rather than calculating common subsequences, I applied a gap open penalty score of ten; including a gap extension penalty of two.The resulting PSA provided a first glimpse into similarity probabilities.As noted below, my numerically grouped PSA outputs infer high degree homology between three of the six sequences and two of six sequences; now categorized as group type a and group type b.PSA validates sequence placement among type a organisms and type b organisms.(Chao & Zhang, 2008).Here, I selected Kalign for multiple sequence alignment; an accurate and fast MSA algorithm (Lassmann & Sonnhammer, 2005).Kalign is an extension of Wu-Manber approximate pattern-matching algorithm, which is based on Levenshtein distances.This strategy enables Kalign to estimate sequence distances faster and more accurately than other popular iterative methods.Comparisons done by Lassmann and Sonnhammer (2005) show that Kalign is about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than other iterative methods; Kalign also delivers better overall resolution (Lassmann & Sonnhammer, 2005).
Kalign is noted for producing optimal execution times, and this procedure would require minimal computational resources.First, I initiated UGENE's multiple sequence alignment tool by importing and processing the six protein sequences in FASTA format.Due to parameter setting sensitivity in protein data types, Kalign for MSA gap penalty scores were modified slightly during successive intervals until an optimal global alignment was obtained.
Each interval resulted in a 3,092 base-pair alignment, followed by a phylogenetic diagram.These operations were then repeated in successive fashion, upon conducting protein to nucleotide conversions.

Reverse KLOTHO Protein-DNA Translation for Phylogenetic Reconstruction
Generally speaking, protein sequences are intolerant of change in evolutionary context.Over the span of evolutionary time, protein sequences undergo selective constraints for protein function and protein structure, and these are conserved over much longer periods than individual codons (Martin & Palumbi, 1993).The most direct evolutionary changes to protein occur at the amino and carboxyl termini in the form of domain insertions, repetitions, and deletions (Marsh & Teichmann, 2010).Likewise, we must also consider the possibility that convergent evolution can occur to produce apparent similarity between proteins that are evolutionarily unrelated, but perform similar functions and have similar structures (Bastien, 2008).Thus, multiple substitutions at a single DNA base more accurately reflect mutational history (Martin & Palumbi, 1993).
The challenges in utilizing protein sequences to infer divergence events led me to consider a secondary option to protein-protein comparison in the context of phylogenetic reconstruction.Perhaps it would be necessary to cross-check the original diagram(s) produced by protein-protein comparison with a nucleotide derivative.Using my custom-based API translator (Protein to DNA Bio Translator), I reverse translated each sequence from its original data type, namely protein, into a workable [and theoretical] dataset made exclusively of KL-derivative nucleotide sequences (DNA).
Following conversion, the six DNA sequences were imported to UGENE's bioinformatics software, and upon file import and MSA execution, an alternate radial phylogenetic diagram was generated that would allow me to cross-confirm the original(s) obtained by protein-protein comparison.I implemented PHYLIP neighbor-joining method coupled with distance matrix model F84 on the 3,092 base-pair alignment; this procedure would also require additional bootstrapping compilers to help evaluate the strength of the nodes.Lastly, it is worth noting that the resulting phylogenetic tree(s) do not assume an evolutionary clock; it is in effect an unrooted tree.
The PHYLIP neighbor-joining method is capable of generating highly probable diagrams in scenarios involving low degrees of variance, regardless of dataset size.An accurate and statically consistent polynomial-time algorithm, PHYLIP neighbor-joining does not assume that all lineages evolve at the same rate (as proteins evolve at different rates), and it constructs a tree by successive clustering of lineages, setting branch lengths as the lineages join [where a set of n taxa requires n -3 iterations; each step is repeated by (n -1) x (n -1)] (Felsenstein, 1981).For illustration purposes, the following formulas demonstrate a standard neighbor-joining Q-matrix algorithm: Pair to node (distances): Taxa to node (distances):

Analyzing α-Amino Acid Compositions
Given its mass and length, the molecular weight measurements of protein structures are fundamentally important to its biochemical characterization and function.Moreover, the α-amino acid composition of each protein structure may contribute to the overall quality of a protein.If certain α-amino acids are optimal for protein structure, natural selection should have acted over evolutionary time to increase the frequency of these α-amino acids (Anthis et al., 2013).As noted by Mannakee and Gutenkunst (2012), catalytic domains in protein evolve faster, while non-catalytic domains in protein evolve more slowly.This may also suggest that networks typically evolve under stabilizing selection (Mannakee & Gutenkunst, 2012).
In addition to a phylogenetic reconstruction, this investigation puts a strong emphasis on discerning meaningful patterns, or sequence motifs, in lower and higher frequency α-amino acid compositions.By comparing the results of composition frequency percentages among group type a and group type b, I hope to identify any potentially significant molecular markers in evolutionary conserved chemical properties of Klotho that may have increased or decreased over evolutionary time, within a particular group and across a wider evolutionary spectrum.This procedure incorporates a few methods and techniques, such as α-amino acid residue calculations for determining molecular weight and frequency percentage.
A number of highly efficient web applications are well suited for α-amino acid composition analysis.In this phase of my investigation, I use Composition/Molecular Weight Calculation tool (University of Delaware, 2014) for obtaining the sum ratio of α-amino acid residues and approximate residue charge, and Protein Calculator (Anthis et al., 2013) to determine atomic compositions.I then apply the resulting figures and datasets in comparative analysis in order to evaluate the overall α-amino acid distributions.
The sum ratios in α-amino acid counts vary slightly due to irregularities in protein sequence length.This would also explain discrepancies in PSA similarity approximations.Amino acid counts are quantified by the total number of residues in an individual protein sequence.These figures are later arranged and disbursed as to determine frequency percentage within a protein sequence arrangement.The total estimate in molecular weight per arrangement is also deciphered by accounting for the total sum of individual residue weights.

Reconstructing a Phylogeny Based on Klotho Gene Variants
The resulting base-pair alignments yielded nearly identical outputs.At first glance, type a organisms and type b organisms are each placed together within their respective groups -with four mammalian candidates on one end of the diagram(s), and the remaining non-mammalian candidates to the other [see Figure 1].When cross-checked with morphology, the corresponding nodes reside in correct order of taxonomy on both diagrams.On one end, three of six organisms belonging to order rodentia coincide in proper placement according to scientific classification; while my fourth mammalian species (Homo sapiens) falls within a fairly close proximity; followed by the remaining candidates.
The phylogenies illustrated in this study depict gene families within paralogy regions consistent with the early evolution of vertebrates.Because similar variations of KL-derivative proteins occur in all species of Chordataranging from fish, reptiles to mammals -we may safely infer that, in fact, Klotho family proteins belong to ancient paralogy lineages.While having the same basic molecular functions, Klotho protein variants may have undergone only tiny degrees of modification throughout evolutionary history (originating at the nucleotide level).As the next section will demonstrate, these tiny variations are further observed among my candidate sequences.Alas, the current MSA models help further verify group categorization, and the extent of evolutionary relatedness among group type a and group type b candidates, including node length and probable time scales in decimal format, are detailed below.

Minimal Degrees of Variation in Evolutionary Conserved Chemical Properties of Klotho
The chemical properties of the α-amino acids of proteins determine the biological activity of the protein, and it conveys a vast array of chemical versatility (University of Arizona, 2014).Evolution itself exerts pressure to preserve α-amino acid residues that bear invaluable functional roles.According to Dokholyan et al. (2002), proteins may retain significantly critical chemical properties (or molecular signatures) that contribute, in lower or higher frequency values, to protein stability via their interaction with other α-amino acid residues.To this end, we might expect to observe a large percentage of evolutionary conserved α-amino acids across entire protein families, with only minimal degrees of variation occurring among wide ranging taxonomical groups.
Indeed, in the context of evolutionary conservation, only three of twenty α-amino acids in this study exceed disparity ratios that average more than 0.093 percent.Comparative analyses of α-amino acid compositions highlight only three strongly distinct molecular signatures indicative of each group.As Figure 2 and Figure 3 illustrate, a correlation between elevated levels of L-Lysine coupled with low quantities of L-Alanine and L-Arginine are featured in group type a [as compared to group type b]; where the average disparity ratio is ±1.8 percent (L-Lysine), ±1.6 percent (L-Alanine), and ±1.34 percent (L-Arginine).A max disparity ratio of >3.1 percent occurs in L-Lysine molecules [type a].Increased levels of L-Lysine are significant in terms of optimal structure and fold, and suggest that natural selection may have acted to increase the frequency of this molecule.
It is worthwhile to note that L-Alanine and L-Arginine are manufactured internally via biosynthetic pathways; whereas L-Lysine must be ingested as lysine or lysine-containing proteins (plants synthesize L-Lysine from aspartic acid) (Beals, 1999).While the implications of dietary L-Lysine are inconsequential to a study involving techniques in comparative sequence analysis, various other studies suggest that increased dietary L-Lysine plays a role in calcium absorption; a function that is also inadvertently linked to KLOTHO (Tan et al., 2012).As noted in the sections above, "Klotho stimulates calcium reabsorption in the distal convoluted tubule by deglycosylating and stabilizing the epithelial calcium channel TRPV5 on the surface of cellular membrane (Robinson, 2012)."This, and proficiencies in tissue regeneration and other immunity-related health benefits are also linked to increased levels of dietary L-Lysine (Falco, 1995).mechanisms and the protein functions that play a decisive role.And while a great deal about protein evolution remains unresolved, much can be learned about protein evolution by trying to reconstruct ancient paralogy lineages via comparative analysis of modern proteins.With this in mind, I believe that my results may help provide meaningful reference toward a fully systematic investigation.

Table 1 .
Similarity probabilities among type a and type b (PSA)