Geographically Weighted Regression Modeling for Analyzing Spatial Heterogeneity on Relationship between Dengue Hemorrhagic Fever Incidence and Rainfall in Surabaya, Indonesia

Geographically weighted regression (GWR) modeling has been extended to evaluate spatial heterogeneity on the relationship between dengue hemorrhagic fever (DHF) incidence and rainfall in Surabaya, Indonesia. We employed monthly data in 2010 as repeated observation for each subdistrict in Surabaya, subdistrict was then considered as spatial unit. Problem of temporally correlated errors in this modeling was solved by means of data transformation. The GWR model was compared with global regression model using some statistical criteria. The GWR model reveals that the relationship between the DHF incidence and the rainfall is significantly varied in every subdistrict. The influence of the rainfall on the DHF incidence was greater over southeastern subdistricts than other subdistricts in Surabaya. This result holds an important consequence on policy making for regulation in preventing DHF infection according to local characteristic climate in a certain region.


Introduction
Dengue hemorrhagic fever (DHF) which is defined as a disease caused by infection of dengue virus, has become a serious health problem for societies in many tropical and sub tropical countries.Approximately 50 millions people are infected by the dengue virus in almost 100 countries every year (World Health Organization [WHO], 2009).In Indonesia, the desease that is transmitted by Aedes aegypti mosquito was firstly discovered in Surabaya and Jakarta in 1968 (Aiken & Leigh, 1978).During last decade, Surabaya is claimed as the DHF endemic region in Indonesia (Mulyatno, Yamanaka, Yotopranoto, & Konishi, 2012).Various programs have been accomplished to prevent and control the desease, however the DHF incidence in Surabaya remains relatively high.The high of the DHF case requires a serious effort to block the spread of DHF epidemic desease by means of analyzing risk factor which is associated with dengue virus infection.Some studies concerning risk factors of dengue virus infections on DHF infected societies have been broadly conducted.Phuong et al. (2008) reported that job and surrounding environment were the two main factors affecting DHF incidence.Meanwhile, Allwinn (2011) found that visiting to DHF endemic region would make one infected by the virus.Other risk factors discovered by the reports were presence of mosquito nest, condition of water container, and inhabitant mobility (Soghaier et al., 2013).However, the studies merely to the point or limit of exploring risk factors according to global viewpoint, in which the inter-variable relationship is assumed to be spatially constant for all regions.Consequently, the conclusion does not describe any local characteristic.
Further study based on spatial consideration is originated by the presence of confusing results about DHF transmission.A study in South Thailand revealed negative correlation between DHF incidence and rainfall (Thammapalo, Chongsuwiwatwong, McNeil, & Geater, 2005), meanwhile a study in Central Thailand showed positive correlation between them (Wiwanitkit, 2005).But, Picardal and Elnar (2012) performed a study in Central Visayas (Philippines) did not show any correlation between the two variables.According to these findings, we hypothesize that the relationship between DHF incidence and rainfall in a certain region is spatially different.
One of new relatively methods to examine spatial risk factor is geographically weighted regression (GWR).This statistical method adjusts frame of global to local regression model, which enables regression parameter estimation for every single point of the spatial unit (Fotheringham, Brunsdon, & Charlton, 2002).Relationship between independent variable (X) and dependent variable (Y) in GWR model is different for all regions (Leung, Mei, & Zhang, 2000).The GWR method has been employed by Lin andWen (2011), Hsueh, Lee, andBeltz (2012), and Khormi and Kumar (2011) in analyzing DHF case.Those studies engaged spatial data at a certain time (cross-sectional data) in GWR modeling.
The objective of this study is to extend the frame of GWR modeling in analyzing spatial heterogeneity that is correlated with DHF incidence in Surabaya by three ways.First, we combined both spatial and temporal data in the modeling.In this case, we employed monthly data as repeated observation for each spatial unit.Second, we solved problem of temporally correlated error which may occur during the modeling.Third, we evaluated spatial heterogeneity from the relationship between the DHF incidence and the rainfall in Surabaya.The acquired GWR model was compared with global regression model by means of some statistical tests.The result of this study will be practically useful for authorities in regulating program to prevent the DHF infection based on local characteristic climate in a certain region.

Study Area
The study was conducted in Surabaya, the capital city of East Java province in Indonesia.During last decade, Surabaya is one of the highly risk of the DHF endemic cities in Indonesia (Health Office of Surabaya City, 2011, unpublished).Surabaya locates in northeast Java island around 7°11'-7°21' south latitude and 112°36'-112°54' east longitude with area of 326.36 km 2 (Statistics of Surabaya City, 2011).Surabaya is the second biggest city in Indonesia having population of 2.6 millions on 2010 cencus.As a region with tropical climate, Surabaya has high atmospheric temperature, and different rainfall as well as atmospheric pressure along the year.
The subdistrict, the 'kecamatan' in Indonesian, was used as the spatial unit in this study.16 of 31 subdistricts in Surabaya were chosen for modeling.In general, those subdistricts lie on low continental with high about 3-6 m over the sea surface, except in south region, the high of the subdistricts may reach about 25-50 m over the sea surface (Statistics of Surabaya City, 2011).

Data Collection
Data for number of DHF sufferers in every subdistric every month during 2010 was booked from Health Office of Surabaya City.These monthly data were considered as repeated observation for each subdistrict, one subdistrict is defined as one spatial unit.We also obtained data for number of population in every subdistrict from Statistics of Surabaya City.The DHF incidence in every subdistrict each month was attained by dividing the number of the DHF sufferers by the number of the population for every certain subdistrict, and multiplying it by 100,000.The DHF incidence was chosen as dependent variable (Y) in the regression modeling.
Climate factor, in this study is the rainfall, was chosen as the independent variable (X).This climate factor was taken from three meteorological stations, i.e.Maritim Perak I, Maritim Perak II, and Juanda International Airport meteorological stations.The number of data for every variable was n = 192 (16 subdistricts times 12 months).

Statistical Methods
Relationship between DHF incidence and rainfall can be analyzed by comparing the results of global regression and GWR modelings.If the GWR model is more reliable than the global regression model then relationship between the two variables is spatially different for each subdistrict.On the other side, the global regression model describes constant relationship for all study regions.
The regression modeling was firstly executed by ordinary least squares (OLS) method in examining the global relationship between DHF incidence (Y) and rainfall (X).The regression model is defined by including subdistric indices and time of occurrences: where ;  is error assumed to be normal distribution with zero mean and known standard deviation.Estimation of autocorrelation (  ) was extracted from OLS regression modeling through the origin: (2) (Myers, 1990).To solve problem of correlated error in  , a data transformation was performed: for 1, 2, ,12 t   in every single subdistrict i (we defined 0 t  for December 2009).Further regression was conducted on the transformed data.As regression modeling for time series data, an initial diagnosis was done to examine the autocorrelation by means of Durbin-Watson d-statistic.If the value of d is less than lower bound d L , then autocorrelation is present.If the value of d is higher than upper bound d U , then there will be no autocorrelation.If d L ≤ d ≤ d U then decision is inconclusive (Myers, 1990).Significance of the OLS regression model is expressed by t-statistic, sum of squared errors (SSE), Akaike information criterion (AIC), and coefficient of determination (R 2 ) (Myers, 1990;Fotheringham et al., 2002).
The spatial heterogeneity from the relationship between DHF incidence (Y) and rainfall (X) will be analyzed by GWR model.Unlike the OLS regression model, the GWR model enables regression parameter estimation in different subdistricts (Fotheringham et al., 2002).The GWR model we employed was the model which contained temporally correlated error: where 0 ( , )  are GWR coefficients in subdistrict i ; location point of subdistrict i is defined by latitude and longitude coordinates ( , )  is error which is assumed having normal distribution with zero mean and known standard deviation.Resolving correlated error problem on  can be done by transforming the data as expressed in Equation ( 3) and autocorrelation  is refined with the OLS regression model to compare the two models.
The first step in the GWR modeling is defining the latitude and longitude coordinates ( , )   i i u v for each subdistrict.These geographycal coordinates was used to specify Euclidean distance between observed data in subdistrict i in t th month and observed data in subdistrict j in l th month: Month indices t and l in Equation ( 5) do not influence the distance between subdistricts.The distance in Equation ( 5) is termed as fundamental background in weighting the data to estimate the parameter of the GWR model.The closer the distance between the estimated point to subdistrict, the larger the weight of the data during the parameter estimation.In this study, the weighting of data was conducted by means of Gaussian function: where 0 b  is bandwidth constant in which its designating was done by cross-validation method (Fotheringham et al., 2002).Gaussian function is an exponential function that gives weight 1 for data where the parameter of subdistrict is being estimated and weight with undeviating decrease value for data in other subdistricts, the weight continues to reduce as the distances among subdistricts increases.Gaussian where  is weight data for subdistrict j in l th month to estimate the parameter in subdistrict i in t th month.Every observed data has one weighted matrix ( , ) in estimating the parameter.By algebraic matrix approach, the estimation of the parameter 0 1 ˆ( , ) ( ( , ), ( , )) (Fotheringham et al., 2002).All observed data in a certain subdistrict has the same estimated parameter.It is due to the reality that the data weighting only engage the distances among the subdistricts.Meanwhile the monthly data is defined as the repeated observation in a subdistrict.
The GWR modeling was firstly conducted on the untransformed data.As we have discussed, the GWR model was calibrated to monthly data, and the initial diagnosis was done to examine the autocorrelation by means of Durbin-Watson statistic.If autocorrelation is present in the error then the further GWR model was fitted on the transformed data.Significance test of the GWR model was conducted by hypothesis test which proposed by Brunsdon, Fotheringham, and Charlton (1999).The null hypothesis in this test represents ( , ) k u v  function which is unchanged in all coordinates ( , )  u v in the study area for 0 k  and 1 k  .If there is no proof to reject the null hypothesis, then the OLS global regression model will be acceptable enough to describe the data.In the meantime, the test for variation of each set of k  across the study area will be completed by 3 F -statistic that was initiated by Leung et al. (2000).The significance of the GWR model is also represented by sum of squared errors (SSE) and coefficient of determination (R 2 ).Comparison of the GWR model and global regression model will be conducted by Akaike information criterion (AIC).According to this criterion, the regression model which posses the smallest AIC value is then defined as the best one (Fotheringham et al., 2002).

Result of OLS Regression
The regression modeling using OLS method on the untransformed data gives Durbin-Watson statistic with d = 0.690 in which the value is less than the lower bound d L = 1.758 (α = 0.05).The result shows that autocorrelation is present in the error.Estimation of autocorrelation by applying Equation (2) provides ˆ0.652 . The OLS estimation on the transformed data by employing Equation ( 3) presents global regression model as follows Equation (1): where Y is the DHF incidence and X is the rainfall.The regression model is globally suitable for all study regions.Diagnosis on the error of this model provides statistic d = 1.940, for which the value is higher than the upper bound d U = 1.778 (α = 0.05).This is indicating that there is no autocorrelation in the error.All P-values of the regression coefficient in the model were smaller than 0.05 (see Table 1).Hence, the rainfall influences the DHF incidence in Surabaya. .Further analysis on the model reveals that the error approaches normal distribution.

Result of GWR Model
The GWR estimation on the untransformed data provides Durbin-Watson statistic with d = 0.900 in which the value is less than the lower bound d L = 1.758 (α = 0.05).This result reveals that the model has correlated errors, so we need data transformation.To compare the global regression and the GWR models, the further modeling employed ˆ0.652     .Based on the cross-validation criteria (see Fotheringham et al., 2002), the GWR modeling on the transformed data provides optimum bandwidth b = 0.053.Thus, Gaussian function is expressed as: Table 2 represents the summary of estimated parameter (regression coefficient) for the obtained GWR model.All the coefficient values of the global regression lies in interval for coefficient of the GWR model.) also increases slightly higher than the OLS regression model.Table 3 represents summary of calculation F-statistic which was employed to examine the significance of the GWR model (see Brunsdon et al., 1999).The smaller of the P-value from 0.05 reveals that the GWR model is more reliable than the global regression model.It also means that the relationship between the DHF incidence and the rainfall in Surabaya is influenced by a geographical factor. .This value is again smaller than the global regression model.It indicates that the GWR model is more relevant in describing the data than the OLS regression model.Further examination shows that the parameter 1  varied significantly in every subdistrict (see Table 4).This can be seen from the P-value of the coefficient 1  with value smaller than 0.05.Meanwhile, the coefficient 0  remains constant for all study regions.

Discussion
This study reveals that the DHF incidence in Surabaya has a correlation with the rainfall.The higher the rainfall, the more the risk of the spreading DHF.The result strengthen the previous studies that the rainfall is concluded as a risk factor in the spread of the dengue (Aiken & Leigh, 1978;Wiwanitkit, 2005;Hii, Zhu, N. Ng, L. Ng, & Rocklöv, 2012).In principal, rainfall creates the area or nest of the dengue vector mosquito wider and wider.Rainfall alters numerous artificial containers into habitats for the mosquito.Breeding habitats of Aedes aegypti mosquito usually take place in unclosed water container, flower vase, tin can, or tire around the citizen environment (WHO, 2009).The mosquito prefers to lay its egg on the wall of the water container slightly over the water surface.In a favourable environment, the egg can survive desiccation in several months (Hopp & Foley, 2001).The egg will become a mosquito larva after immersing in water for couple of days.Therefore, the rainfall can increase rapidly the population density of Aedes aegypti mosquitoes.
The GWR modeling indicates that the relationship between the DHF incidence and the rainfall in Surabaya is spatially different.It implies that in some subdistricts higher DHF incidence is associated with higher rainfall, whereas in some subdistricts DHF incidence has little correlation with rainfall.After grouping subdistricts by region, the influence of the rainfall on the DHF incidence in subdistricts near southeastern of Surabaya is larger than other subdistricts (Figure 1).Since the rainfalls were different from one subdistrict to another (Kane & Yusof, 2013), an understanding about spatial varying relationship is very useful in policy making for preventing the DHF infection.
Figure 1.Spatial map of GWR coefficient of the rainfall influence on DHF incidence in Surabaya, Indonesia.Subdistricts given by white color are not included in modeling because of incomplete data One of the statistical criteria which is employed in regression modeling is coefficient of determination (R 2 ).The coefficient represents the percentage of variation in the DHF incidence (Y) that is explained by the rainfall (X) in the model.Both GWR and OLS regression models provides low R 2 .It fairly indicates that there exists other factors affecting the DHF incidence in Surabaya, like environment (Phuong et al., 2008;Khormi & Kumar, 2011), citizen mobility (Allwinn, 2011), human density and mosquito abundance (Lin & Wen, 2011), social and economical dynamics (Khormi & Kumar, 2011), transportation (Hsueh et al. 2012), and the condition of the water container (Soghaier et al., 2013).

Conclusion
The rainfall is correlated with the DHF incidence in Surabaya, Indonesia.The relationship between the DHF incidence and the rainfall varies in each subdistrict.The citizen who lives in southeastern of Surabaya takes higher risk of dengue infection when the rainfall is relatively high.

Table 1 .
Test of global regression coefficient

Table 2 .
Summary for coefficient of GWR modelDiagnosis on the error of this GWR model provides Durbin-Watson statistic d = 1.913 in which the value is higher than the upper bound d U = 1.778 (α = 0.05).It is then indicated that there is no autocorrelation in the error.Further evaluation on the model reveals that the error approaches normal distribution., which is smaller than the global regression.It clearly shows that the GWR model has estimation Ŷ closer Y compared to the global regression.The coefficient of determination from the GWR model ( 2 44.88%

Table 3 .
Comparison of GWR and OLS global regression models

Table 4 .
Significance test for variation of each set of 0