Severity of Infestation Levels of Tunga Penetrans in Central, Kenya: A Bayesian Cumulative Logit Model

Tungiasis is a neglected parasitic disease that significantly affects communities, especially in developing countries. This study developed a Bayesian severity of the jigger infestation model and its spatial counterpart. Putative determinants leading to different levels of infestation and the most affected areas were to be identified through the model. We collected data through a cross-sectional study with a multi-stage sampling design. A structured questionnaire was administered in each household to capture variables used for modelling jigger infestations. The severity of jigger infestation categorized for each individual was modelled against all the other predictor variables. It was also integrated with spatial data to determine the spatial distribution pattern of jigger infestation. A Bayesian multinomial logistic regression model was used to assess the association between various predictors and different infestation levels. Specifically, an ordered Bayesian Severity Hierarchical (OBSH) categorical model was obtained. This model was categorical based on the Counties (1 − Nyeri, 2 −Murang′a and 3 − Kiambu). Results from this model showed that for a one-unit decrease in the poverty index at level 1 (individuals categorized as poor) there was about a 69% increase in the severity of jigger infestation. A one-unit increase in the percentage of clay in the soil increased the odds ratio of the severity of jigger infestation by a factor of 11.21 while a high percentage of nitrogen in the soil lowered the severity of infestation. Severity of jigger infestation reduced from the baseline, Nyeri County to Kiambu County. It also increased with increasing altitude due to a decrease in nitrogen levels.


Introduction
Tunga Penetrans, is the etiological agent of Tungiasis, a seriously debilitating disease that is prevalent in Central America, South America, India, and tropical Africa as established by Darvin et al., 2018. This disease affects multiple hosts such as humans, cattle, dogs, pigs, cats, horses, mules, sheep, rats and mice (Fein et al., 2001;Escamilla-Martinez et al., 2008;Bouree et al., 2009;Sanusi, 1989). The key drivers of Tungiasis transmission include social neglect, inadequate health care and poor housing conditions, especially in poor-resource communities (Mwangi et al., 2015). The main ecological habitats of Tunga Penetrans are warm, dry soil and sand of beaches, stables, and stock farms (Gibbs, 2009). According to Gibbs, 2009, humans are infected by female Tunga Penetrans that anchor themselves to the host skin using their biting mouthparts and burrows into the epidermis. The primary typical areas affected include the plantar surface of the foot, the intertriginous regions of the toes, and the periungual regions as Muehlen et al. (2003) showed. However, ectopic sites such as the hands, elbows, thighs, and gluteal region may also be infected (Heukelbach et al., 2002). Whereas earlier studies conducted in the area were more on the social detail (Mwangi et al., 2015;Nyangacha et al., 2019;Nsanzimana et al., 2019;Keiyoro et al., 2016), a mathematical perspective was therefore required to infuse some modelling facet. This would enable finding a better solution to the menace by first establishing the key factors that affect the transmission of Tunga Penetrans. The spatial distribution patterns of jigger infestation with their severity distribution paradigm could help deal better with this peril. The intersecting areas of spatial statistical methods, geographical information systems (GIS) and remote sensing (RS) has witnessed the development of software tools. These tools could enhance the determination of the severity of jigger infestation leading to better large scale implementation of jigger infestation control programmes (Clements et al., 2006 andLawson et al., 2003).
WinBUGS, a major software tool well versed with modelling spatially referenced small area health data was used. Uncertainty could be incorporated into the modelling process for the observed data and any structured or unstructured random variables through WinBUGS (Lawson et al., 2003). Thus modelling of data was done using the hierarchical Bayesian ordered logistic regression models. Such modelling was through exploiting WinBUGS modern computational advances such as Gibbs sampling.
intrinsic Conditional autoregressive (iCAR) models were incorporated to model spatial variation. This was through the incorporation of the standard areal model such as the Besag and Besag − York − Mollie (BY M), models by Besag et al., 1991, in form of the structured random effects as expressed by Bakka, 2018. Disease severity refers to the presence and extensiveness of disease in the body as defined by Finlayson et al., 2004. Jigger infestation severity in this work was taken to be the percentage of relevant host tissues or organs covered by a lesion or damaged through such infestation. Thus severity was measured as a categorical variable under four levels. This research, therefore, incorporated a hierarchical Bayesian cumulative logit model to establish the extensiveness of jigger infestation with or without the spatial structure. Through incorporating the spatial structure of the data in a hierarchical Bayesian cumulative logit model, maps together with posterior distribution estimates from the outputs obtained led to informed and objective decision making about targeted areas for disease control through uncertainty assessments. Though documented studies on the comparison on the severity of jigger infestation in these 3 Counties is not found, this study found such severity of infestation to be higher in Nyeri County than Kiambu County while increasing with increase in altitude.

Data Collection
Data collection was through a cross-sectional survey with a multi-stage cluster sampling design. Twelve Counties with the endemic transmission of jiggers were purposefully selected. Among these 12 Counties, 3 (Nyeri, Murang a and Kiambu) were randomly selected. Two sub-counties were then randomly selected from each county. In Nyeri, we selected Mukurweini and Nyeri South Sub-counties, in Murang a, Kandara and Gatanga Sub-counties while in Kiambu, Gatundu South and Githunguri Sub-counties were selected (see Figure 1). The population was spread over a large geographical area. Using ArcGIS version 10.4, each of the six sub-counties was sub-divided into approximately 4 square kilometre quadrats or sites (points or patches). One homestead was chosen randomly (see Figure 2) from each quadrat. The number of homesteads selected was 99. Data on jigger infestation was obtained from each member (those found at the time of sampling) of the selected homesteads. The total number of human individuals sampled were 199.

Variables for Modeling Jigger Infestation
Multidimensional Poverty Index (MPI) is designed to measure acute poverty globally, regionally, nationally and across geographical regions (Alkire et al., 2011). A very versatile methodology where different sets of deprivations were defined measuring poverty level vis-a-vis three dimensions using ten indicators as shown by Alkire & Santos, 2010 was described. Thus MPI could be adjusted easily to incorporate alternative indicators, cutoffs and weights as appropriate in regional, national, or sub-national contexts where such indicators are quite effective as key points as established by Setboonsarng, 2005. Representing the number of persons as n, number of dimensions under consideration as d, then letting y = [y i j ] be n × d matrix of achievements gives the entry y i j ≥ 0 as the achievement of individual i = 1, · · · , n in dimension j = 1, · · · , d. Each y i * , a row vector, indicates individual i s achievements, while y * j , a column vector, establishes the distribution of dimension j achievements across the set of individuals as Alkire & Foster, 2009 indicated. Assuming d to be given and fixed, and allowing n to range across all positive integers, then the domain of matrices under consideration becomes Y = y ∈ R nd + : n ≥ 1 . Let z j > 0 denote the cutoff below which a person is considered to be deprived in dimension j , and z be the row vector of dimension -specific cutoffs.
An identification method by Foster, 2009 andBourguignon &Chakravarty, 2003 using an identification function mapping from person i's achievement, vector y i ∈ R and cutoff vector Z ∈ R 2 to an indicator variable in such a way that Focusing on the dimensional shortfalls, the union method of identification describes person i as multidimensionally poor if deprived in at least one dimension and alternatively, the intersection approach, identifies person i, as being poor only if deprived in all dimensions as established by Alkire & Foster, 2009. Due to shortfall in each, a trade-off of both the two methodologies would be to use an intermediate cutoff level for c i that lies somewhere between the two extremes, 1 and d such that for k = 1, · · · , d the identification method becomes Thus, International Journal of Statistics and Probability Vol. 10, No. 5;2021 Maps of 6 Sub-counties in the 3 Counties where d i is number of dimensions in which i is deprived.
ρ k is dependent on both within dimension cutoffs z j and the across dimension cutoff k, hence referring it as the dual cutoff method of identification.
A questionnaire was used to capture variables for computing multidimensional poverty index, (MPI). Using 3, poverty index was determined through excel. Reliability of the index was determined by computing Cronbach s alpha (α = 0.710) coefficient using SPSS to confirm consistency. POV HH, for poverty index calculated was a categorical variable whereby, if the household calculated score c, was less than 1 3 , then the household was not deprived, hence not poor. Other variables captured during sampling include Geographical coordinates (altitude, latitude and longitude) of each sampled household. Soil samples were taken from five different locations within each homestead and mixed to make a pooled sample. Approximately 100 grams of soil were collected from each sampling location. These soil samples were packed and taken to Kenya Agricultural and Livestock Research Organization (KALRO), Nairobi for analysis. Soil types, pH, texture grades, and macronutrients (e.g., total nitrogen, phosphorus and potassium) were each determined. Additional data on precipitation, temperature and humidity for the three main centres of the three-targeted counties were obtained from the central collection point, Nairobi Meteorological (Kenya's headquarters) Department. These Countylevel variables were taken for the last eight years from the date of the last data collection and averaged values computed.
For each family member, images were taken upon requesting the head of the household, irrespective of whether the person had been infested with jiggers or not, with the main parts of interest being feet, toes, fingers, ankles, hands and heels. A scale of measuring levels of infestation was computed from these images by counting black spots of the infested part with the codes 1, 2, 3 and 4 given on an increasing rate of severity (see Figure 3). The first level of infestation had "0" black spots, followed by "1 -10" black spots, then "11 -20" black spots with the fourth level, "over 20" black spots. Severity of Infestation (S EVT Y) for each person was established.

Ethics Approval and Consent to Participate
Ethical approval for this study was provided by the National Commission for Science, Technology and Innovation (NA-COSTI) in Kenya (Reference number: NACOSTI/P/15/6946/6847). Persons aged ≥ 18 years provided oral consents for participation in the study. For minors aged < 18 years, oral consents for participation in the study were obtained from their guardians/parents.

Severity of Jigger Infestation Model
Let i, (i = 1, · · · , n) be an index to represent the i th individual in this study. Also, let s, (s = 1, · · · , S ) be an index to represent the levels of Severity of Infestation. The index s took the values Not-infested , (s = 1) , Moderately Infested , (s = 2) , Highly Infested , (s = 3) , and Severely Infested , (s = 4). This led to the standard ordered response logit model as shown by Eluru et al., 2008 andKockelman &Young-Jun, 2002 scholarly works.
6 could equivalently be expressed as a bayesian cumulative logit model (BCLM), as given by Gelman & Hill (2012), with both models following ordered probit model specification like presented by Kockelman & Young-Jun, 2002.
From 6 assumption that the latent variable y * i jk follows a standardized logistic distribution as expressed by Qiu, Z., Song, P. X. K., & Tan, M. (2002), then we have a very important model as it pertains to the sampling scheme implemented and coded in WinBUGS for both severity of jigger infestation with and without the spatial counterpart.
Mapping from any of the above Ordinal Logit Models (OLM), the latent regression to the observed ordinal responses y i jk introduces a censoring function, Pr(y i jk = 1|β X i jk ) = Pr(y * i jk ≤ τ 1 ) (9) = Pr(β X i jk + i jk ≤ τ 1 ) and Pr(y i jk = 4|β X i jk ) = Pr(y * i jk > τ 3 ) f or the last category (11) = Pr(β X i jk + i jk > τ 3 ) a Bayesian hierarchical severity model having the i th person covariates x i , the j th Ward covariates x j and the k th County covariates x k were fitted through the MCMC simulation approach in WinBUGS package.
Thus the cumulative probability δ i jk,s = Pr(y i jk ≤ s) represents the probability that the severity of jigger infestation was not as severe than category s, s = 1, 2, 3, 4 where the X i jk was clearly established from our data (see Appendix A). This could be modeled as 12 with the ζ i , being the spatially structured random effect term added to produce a Bayesian hierarchical severity model as presented by Qiu et al., 2002;  • β x i jk as given in (Appendix A) with x i jk represented as x i incorporated all the variables.
• u i was the unstructured random effect term specified as an independent normal distribution with an expected mean of zero and variance of σ 2 u • ζ i was the spatially structured random effect term whose conditional distribution, ζ i |ζ (i) , was given as: with where w i j = a i j n i with a i j = 1, areas i and j are neighbors 0, otherwise T hus a ii = 0 ⇒ w ii = 0; c ii = 0 with c i j indicating the spatial proximity and parameter φ controlling the properness of the distribution as the scholarly work of Blangiardo & Cameletti, 2015 indicate. Considering W to be a matrix of generic elements w i j , and S = diag(s 1 , · · · , s n ), we ensured that the distribution for the spatially structured random effect ζ was proper, by having the covariance matrix (I n − φW) −1 S 2 positive definite. λ 1 ≥ λ 2 ≥ · · · ≥ λ n being the ordered eigenvalues of W, then the values of φ must lie between λ −1 n and λ −1 1 . This led to the proper conditional autoregressive specification (CAR) of ζ being a multivariate Normal random variable given as: where the mean vector becomesζ = ζ 1 · · ·ζ n , I n , is the n × n identity matrix, and S 2 = diag(s 2 1 , · · · , s 2 n ). Correlation between areas i and j depends only on φ and W as demonstrated by Assuncao & Krainski, 2009. φ proves to be a bit hard, hence estimation of such proper conditional autoregressive specification is not greatly used in disease mapping. Fixing φ = 1, simplifies the above formulation, leading to the covariance matrix to be non positive definite with the conditional distribution for ζ i having the specification known as intrinsic conditional autoregressive (iCAR). Such, coupled with the exchangeable random effect, u i , (i.e. the area-specific effect modelled as exchangeable) forms the so-called Besag − York − Mollie (BY M) model, by Besag et al., 1991. Due to the non-positive definition of the covariance matrix, there is no proper joint distribution for ζ as Blangiardo & Cameletti, 2015 reiterates, as it would be possible to add any constant to each ζ i without changing the distribution, which could be rectified by fixing a constraint such as n i=1 ζ i = 0. Thus assuminḡ ζ i = 0 f or each i, the conditional distribution now becomes which is determined through an intrinsic conditional autoregressive (iCAR) distribution that is widely used in spatial statistics.
The vectors in 9, 10 and 11 contain the appropriate conditional probabilities of each of the four ordinal outcomes, for each respondent, i. Indicating to Bayesian use of Gibbs sampler (BUGS) that there existed a categorical response with the vector of conditional probabilities for each possible response, y i , then a link was made between the model and the observed responses through the categorical density.

Pr(θ) = Pr(β) Pr(τ)
A vague (diffuse) normal priors for β was assumed in the data as shown by Jackman, 2009 andQiu et al., 2002, where β ∼ N(b 0 , B 0 ) resulting from a probit link. The prior for τ was found to obey the ordering constraint implied by the model International Journal of Statistics and Probability Vol. 10, No. 5;2021 as per the censoring scheme in 8 which mimics Jackman, 2009 scholarly work. This was accomplished through a simple arrangement of the priors for τ so that noting that Bayesian hierarchical model would require prior distributions for each of the model parameter. Prior density was arbitrarily chosen. For β, a normal or a multivariate normal prior would be sufficient as indicated by Jackman, 2009. The density for τ, Pr(τ), was given as normal. Other densities like uniform on some specific interval would also suffice. On the quantities g j , Jackman, 2009 presented the Exponential distribution as the best to keep an ordering constraint on τ. A truncated normal density with µ = 0 and a large variance of the form σ 2 = 1000 2 priors on the interval (0, ∞) for g j s could still be used. It's arbitrarily chosen since it is supported only on the non-negative half of the real line. Any distribution restricted to the non-negative half of the real line will suffice (like a non-overlapping zero interval on a uniform density, the gamma density or even a log-normal density).
By choosing v = 0, a reasonably large scalar for D and a small positive quantity for d, then we obtain a vague prior over the thresholds. Incorporating severity of jigger infestation model, 6, into 12 introduced both the unstructured and spatially structured random effect terms. Such resulted in the Spatial Bayesian Severity Hierarchical (S BS H) Models for the severity of jigger infestation we run in WinBUGS.

Results and Discussion
Basing summaries on frequencies calculated from N = 100, 000 simulations followed by an equal number of iterations, then N = 200, 000 samples were obtained with acceptable MC errors of less than 5% of the posterior standard deviations. Convergence was monitored through visual examination of trace plots of the samples for each chain as Liu & Zhu, 2017 demonstrated. The ratio of between and within chain variances of the Gelman-Rubin convergence statistic converged to 1.0 as indicated by Spiegelhalter et al., 2003. Different Bayesian logistic models were subjected to the deviance information criterion (DIC) statistic, where the ordered Bayesian severity hierarchical (OBSH) model 1 captured all personal, ward and County covariates with all the predictor variables well standardized. OBSH 2 had all the County predictors, age and gender removed since they were found nonsignificant in OBSH 1. OBSH 3 had the altitude removed from OBSH 2. Further removal of the soil pH (SOpH) predictor variable led to OBSH Random (with CNTY as a random variable) and OBSH Categorical (with CNTY as a categorical variable). The DIC values for different OBSH models were as given in Table 1. OBSH Categorical model was run in WinBUGS (see Appendix B and C for both model and the output, respectively). A negative coefficient in the poverty index, POV HH, meant an increase in severity of jigger infestation. POV HH, a categorical variable was calculated such that its value was 0, if the household was not deprived, hence not poor and 1, otherwise. An odds ratio of 0.3064 for POVHH reported, meant for a one-unit decrease in the poverty index at level 1, approximately 69% increase of severity to jigger infestation was registered as shown in Table 2.
This implied a positive association between the poverty index and the severity of jigger infestation. Its 95% Bayesian CI being a shorter interval (Clements et al., 2006), implying a high level of certainty in predicting that poverty level affects the severity of jigger infestation.
Percentage of clay in the soil contributed to 11.21 times increase in severity of jigger infestation with quite a wide 95% Bayesian CI. A one-unit increase in the Potassium percentage in the soil increased the odds ratio of jigger infestation severity by a factor of 1.725. Thus, Potassium percentage as one of the primary nutrients in the soil increased jigger infestation severity by about 72.5%. From Nyeri, the baseline County to Kiambu County, the severity of jigger infestation reduced by approximately 77%. The percentage of the severity of infestation was found to be 72.0% and 46.2% comparing County 1 and County 3 respectively as seen in Table 3. The threshold parameters, g, representing simple cut-points, though notorious for being highly correlated with one another, displayed some good mixing as displayed by the correlation matrix and the autocorrelation function.

Spatial Analysis
Four Spatial Bayesian Severity Hierarchical (S BS H) Models were run in WinBUGS. Their DIC and ψ values were tabulated for assessment. S BS H Model 1 incorporated all the personal, Ward and County predictor variables in a multilevel Bayesian logistic model. With the highest DIC value amongst the models run, none of the County predictor variables was significant together with age and gender. All the non-significant variables were left out in forming S BS H Model 2, a better fit than the first. The soil pH, SOpH variable was left out in the last two models. CNTY was used both as a random effect and categorical effect in the standardized S BS H Random Model and S BS H Categorical Model models respectively. Their fit was almost the same as shown from their DIC values in Table 4. S BS H Random Model was used in reporting for the spatial severity of jigger infestation since it was the best fit. This model had the lowest number of effective parameters (parsimonious) as shown by its pD value overall. Comparing the ψ values of S BS H Random Model and S BS H Categorical Model, it was clear that the CNTY effect affected spatial variation significantly. Using CNTY as a random effect ended up having a higher spatial variation than when using it as a categorical effect, giving another reason for using the standardized S BS H Random Model. The value, ψ = 0.3584 implied that slightly more than a third of the total variation was as a result of moving from one Ward to the next. ψ value near 1 showed that the spatial variation dominated, and was negligible when close to 0 as shown in Lawson et al., 2003 work. The WinBUGS code for the standardized S BS H Random Model is as shown in Appendix D. A unit increase in the poverty index increased the odds ratio level of 1 versus 0 by a factor of 0.2712. This led to the severity of jigger infestation increase of 73% with a short 95% Bayesian CI, meaning that predictions could sufficiently be made with some high degree of certainty moving from one ward to the next as reiterated by Clement, et al., 2006. Clay percentage in the soil very highly contributed to the severity of jigger infestation. For every one per cent increase in the amount of clay in the soil, the severity of jigger infestation increased 33 times with a very low degree of certainty in prediction within the area under study. The results of the posterior means and credible intervals of variables of the S BS H Random Model were as tabulated in Table 5. An increase in the total percentage levels of potassium and nitrogen as macronutrients in the soil meant both an increase and decrease in the severity of jigger infestation. From both, a high level of certainty in predicting the exact areas affected by each moving from one Ward to the next was possible. For every one-unit increase in altitude, the severity of jigger infestation increased by 193%. Though very little or none has been done on the area linking soil macronutrients to the severity of Tunga penetrans, Xianjin et al., 2016 andSundqvist, 2013 showed that foliar and litter nitrogen decrease in montane tropical forests or even wet tropical regions. This explains the fact that as altitude increase, the amount of nitrogen in the soils decrease, ultimately increasing the severity of Tunga Penetrans.

Conclusion
From both the OBSH Categorical model and the standardized S BS H Random Model, an increase in poverty level and a high percentage of clay and potassium in the soil led to more severe infestation. Increasing the nitrogen percentage in the soil lowered the severity of infestation. Moving from one Ward to the next, predictions could be made with some high degree of certainty considering poverty levels.