The Power-Law-Tail in the Distribution of the Nucleotides of Genomes Was Related to the Complexity of Organism: New Classification of Organisms

  •  Masaharu Takeda    


We proposed a new index of the classification of organisms (cells) based on the appearance frequency of four nucleotides (bases) of various genomes. In double logarithmic plot of L (distance of a base to the next base, x-axis) vs F (frequencies of a base at L, y-axis), each value of four bases was expressed in y = ae-bx at L = 1 ~ 15, and y = Ux + W (power-law-tail) at L = more than 16 bases, respectively, in a single-strand of DNA. The a-, b- and U-values (slope) of four bases were resulted from the GC-content (%) and the size (nt) of the genome.  Moreover, each value was identical as A to T, and as G to C, respectively, in one organism. The power-law-tail should be unique to the genomes of the same species, the eukaryotes, the prokaryotes. The eukaryotic genomes were essentially composed of great number of bases with plural long power-law-tail regions when compared with those of the prokaryotes. In the prokaryotes, the base-distribution was partitioned at L = 20, and the U-values (base-distribution in power-law-tail region) of the archaea were similar to the eukaryotes compared with those of the eubacteria. Thus, the power-law-tail of the genomic DNA should be come from the structural features of the cells, i.e., the size, the GC-content and other characteristics of the genomic DNA. These results indicated that the power-law-tail would be specific for the complexity of organisms in individual genome, and might be a new index for cells.

This work is licensed under a Creative Commons Attribution 4.0 License.