Inferring Human Phylogenies Using Three CODIS STR Markers (CSF1PO, TPOX and TH01)


  •  Nuzhat Akram    
  •  Shakeel Farooqi    

Abstract

Over the past several decades polymorphic genetic loci have been discussed for their utility in human phylogenetic inferences. Short Tandem Repeat (STR) loci have shown promising results for this purpose. Unfortunately, allele frequency data of polymorphic loci are largely confined to few populations. Therefore, the number of shared loci declines as the number of population increases. We hypothesize that even a smaller number of STR loci can be used efficiently for phylogenetic purposes if an appropriate theoretical and statistical strategy is employed. This strategy provides a feasible and cost effective method to choose appropriate STR loci for phylogenetic studies. For this purpose, an empirical study was conducted using allele frequency data of three STR loci CSF1PO, TPOX, and TH01 across 98 human populations from the literature (references are available at http://dnaa.bravehost.com/ index.html and http://www.cstl.nist.gov/strbase/population/Omnipop). The choice of markers was based on locus polymorphism, high heterozygosity, low mutation rate, less artifacts and independence between the loci. Three methods were used to measure genetic distances between the populations; Cavalli Sforza’s chord distance (DC), Nei’s genetic (DA) and Nei’s standard genetic distances (DST). Coefficient of variation (CV) was calculated across hundred (100) datasets obtained by re-sampling of the original dataset for each of the genetic distance methods. CV was in order of DST >DA >DC. Therefore, a consensus tree based on DC was constructed using Neighbour Joining (NJ), Unweighted Pair Group Method with Arithmatic mean (UPGMA) and Maximum Likelihood (ML) methods. NJ and UPGMA methods got more statistical support that is higher bootstrap values than ML (NJ> UPGMA> ML). Validation study was performed using (A) Principal Component Analysis (B) Comparison with trees reported for other molecular markers (C) STR genotyping of five Pakistani subpopulations. Results strongly supported our hypothesis that the three STR markers CSF1PO, TPOX, and TH01 are successful in delineating ethnic, geographic and linguistic differentiation between the populations.



This work is licensed under a Creative Commons Attribution 4.0 License.