Comparing Weighted Markov Chain and Auto-Regressive Integrated Moving Average in the Prediction of Under-5 Mortality Annual Closing Rates in Nigeria

In developing countries, childhood mortality rates are not only affected by socioeconomic, demographic, and health variables, but also vary across regions. Correctly predicting childhood mortality rate trends can provide a clearer understanding for health policy formulation to reduce mortality. This paper describes and compares two prediction methods: Weighted Markov Chain Model (WMC) and Autoregressive Integrated Moving Average (ARIMA) in order to establish which method can better predict the annual child mortality rate in Nigeria. The data for the study were Childhood Mortality Annual Closing Rates (CMACR) data for Nigeria from 1964-2017. The CMACR provides random values changing over time (annually), so we can analyze the mortality closing rate and predict the change range in the next state. Weighted Markov Chain (WMC), a method based on Markov theory, addresses the state and its transition procedures to describe a changing random time series. While the Autoregressive Integrated Moving Average (ARIMA) is a generalization of an Autoregressive Moving Average (ARMA) model. The findings indicate that the ARIMA model predicts CMACR for Nigeria better than WMC. The WMC entered in a loop after two iterations, and we could not use it effectively to predict the future values of CMACR.


Introduction
Childhood mortality reflects Social Economic Status (SES) and Quality of Life (QoL). In developing countries, mortality rates are not only affected by social, economic, demographic, and health variables, but also vary across regions and districts (Odimegwu & Mkwananzi, 2016). Globally, nearly 7 million children die each year before they reach their fifth birthday. While India (24%) and Nigeria (11%) together account for more than one-third of all under-five deaths (UNICEF, 2014;You et al., 2011) International, 2004). Just as important as childhood mortality is to the economic status of a country, so are many other childhood morbidities. Accurate prediction of these future patterns is of great value to these countries to plan to alleviate the burden of these health conditions. Many new prediction models are available now in the literature that may provide more accurate prediction of time series data. However, model performance still needs to be compared and contrasted. One such model is the Weighted Markov Chain (WMC). As enumerated above, it is clear that Nigeria , has experienced a gradual decline in Child Mortality Annual Closing Rates (CMACR), but the extent to which this will continue to decline into the future and the model which can make this prediction better is not very clear. The aims of this paper, therefore, are to describe and compare two statistical models (Weighted Markov Chain (a relatively new model) and Auto-regressive Integrated Moving Average (an established model)) for predicting time series data using World Bank dataset of annual childhood (aged<5) mortality rates in Nigeria between 1964(The World Bank, 2018. Of importance in the dataset is that there are elements of seasonality and trends in it. Our focus for both models is to forecast the possibility of a future state of under-5 mortality in Nigeria based on historical data of childhood mortality. However, both methods have different approaches to transforming the value of mortality to the range of dynamic variables.

The Weighted Markov Chain (WMC)
WMC method is based on Markov theory and mainly studies the state and its transition procedures, describing a dynamic change process of a random time series (Zhou, 2014). The difference between WMC and the traditional Markov Chain (MC) prediction method is in the weight computed for the initial state and that it is less dependent on historical data (Zhou, 2014). Researchers using WMC believe that predicted values derived from it, the accuracy and practicability are very significant, and as such recommend its study and promotion (Peng et al., 2010;Zhou, 2014). While using this model for prediction in previous studies, some researchers used it solely, and some in combination with other mathematical models (such as fuzzy mathematical models, linear time series models) (Peng et al., 2010;Zhou, 2014). An earlier study has reported that WMC was able to forecast Chinese stock closing prices (Zhou, 2014). The result obtained was closer to the actual value when compared with the result derived using the traditional Markov Chain model (Zhou, 2014). Also, in another study, while predicting the exchange rate ratio between the US Dollar and the Euro, the US Dollar, and the Swiss Franc, the prediction errors using WMC were as low as between 1.4% and 3.0% (Kordnoori et al., 2015). However, Sadeghifar et al. (Shahdoust et al., 2015) used WMC and compared it with two other time series models including Holt Exponential Smoothing (HES) and Seasonal Auto-regressive Integrated Moving Average (ARIMA) in the prediction of hepatitis B (HB) monthly incidence rates in Hamadan Province, western Iran, and the result showed that HES most accurately predicted of HB incidence rates (Shahdoust et al., 2015).

Auto-regressive Integrated Moving Average (ARIMA)
The ARIMA model, also known as the Box-Jenkins model, is widely regarded as the most efficient forecasting technique in social science and is used extensively for time series analysis (Adebiyi et al., 2014). ARIMA is a generalization of an ARMA model whose basic assumption is that the time series is stationary. If that is not the case, Box and Jenkins proposed differencing it to achieve stationarity by incorporating 'I', hence 'Integrated' (Opare, 2015). So, ARIMA is a model built to serve as a basis of standard structures in time-series data, and as a result, provides a simple yet powerful method for making significant time-series predictions (Adebiyi et al., 2014). We can use the ARIMA model in a situation where data exhibit evidence of non-stationarity and where initial differencing steps correspond to eliminate the non-stationarity (Chetty & Narang, 2017a;Katchova, 2013). Researchers have also used ARIMA in the future forecast of disease burden. In China, when used on the incidence of Haemorrhagic Fever with Renal Syndrome (HFRS), it fit the fluctuations in HFRS frequency (Adebiyi et al., 2014). On animal infectious diseases, the predicted incidence of the ARIMA model was consistent with the actual incidence of Newcastle disease (Liu et al., 2011).
The focus of this present study is not to establish that these models (WMC and ARIMA) can predict under-5 mortality rate in Nigeria better than the models currently being used by UNICEF and its allies (The World Bank, 2018), but rather to identify the strengths and weaknesses of this relatively new prediction model (WMC) in comparison with another well-established time series prediction model (ARIMA) on a data set that is presumed to be seasonal and non-stationary. Given the above, the study hypothesized that the Weighted Markov Chain (WMC) model predicts the time series data more accurately than Auto-regressive Integrated Moving Average (ARIMA) while using World Bank Child Mortality Annual Closing Rates (CMACR) in Nigeria.

Data
The data used for this study are from UNICEF/World Bank annual report for Nigeria Under-5 mortality (U5M) from 1964 to 2017 (UNICEF, 2018). The Child Mortality Annual Closing Rate (CMACR) of under-5 is a random value changing over time (annually). It is the probability per 1000 live birth that a new-born child will die before the fifth birthday (The World Bank, 2018). (1) The UNICEF/World Bank sourced mortality data from countries specific vital registration systems and estimates based on sample surveys and censuses (The World Bank, 2018). The computations used all the available parameters that reconcile differences across countries to be able to make comparisons (The World Bank, 2018). The table also revealed that Nigeria has continued to witness a steady decline in the median estimates of under-5 deaths from 1990 to 2017. Between 1990 and 1999, the decline was slower However between 2000 and 2015, the period of millennium development goals was in operations, the under-5 deaths declined sharply. So, from the data, we can analyze the mortality closing rate and predict the change range in the next state. However, 50 annual reports from 1964 to 2013 were purportedly selected and used in the analysis to derive the WMC and the ARIMA models, while we used 2014 to 2017 (Zhou, 2014) for confirmation of prediction accuracy.

Weighted Markov Chain
The approach adopted here is as proposed in Peng et al. (Peng et al., 2010) modified and used in (Kordnoori et al., 2015;Shahdoust et al., 2015;Zhou, 2014). The steps were as follows: i.
We computed the number of clusters (i.e. the number of data points that can be combined for similarities) to divide the data into, using k-mean clustering methods. We can also use the mean and mean square error of the data values (Zhou, 2014). ii.
Divided the data values {X n } = X 1, X 2 … X n into the chosen number of clusters (say n), such that f ij , are the elements in a shift matrix starting with cluster i moving to cluster j. iii.
In order to make use of WMC, we carry our 'Markov property' verification of the known data time series (Zhou, 2014). The marginal probability estimates, which is the marginal frequency (sum of elements along each column in the shift matrix) divided by the total, are determined by: (2) The test statistic for large n, follows a chi-square distribution with (n-1) 2 degree of freedom is given as: Where p ij is the transition matrix from the cluster, i to cluster j. The data values {X n } is said to conform to the Markov property, provided .
iv. Compute each order autocorrelation coefficient r k of the data values {X n } and the corresponding weight of Markov Chain for the various steps using: ( 4) and (5) where m is the maximum order of prediction inquiry; is the k order autocorrelation coefficient, such that ; x i is the CMACR for the i th year; is the mean of data values {X n }; n is the length of the reference sample data values; is the weight of Markov Chain with a step of k orders.
v. The Prediction of the CMACR Range.
Here, we combine the initial state S i as the corresponding state of CMACR in the one-year past with the row vector of its corresponding transition probability matrix results in state transition probability vector in the year as follows (Kordnoori et al., 2015): (6) We obtain the m-order weighted state transition probability matrix as follows: (7) Then, predict the state of the data values {X n } in the future using the data values {X n } for 2012 and 2013 as the initial state (S) and the corresponding transition probability matrix to predict the data values {X n } for 2014 using the formula: The forecasted state of data values {X n } by WMC is max {P i, i . We used R-Statistical Programming Package version 3.5.3 (R Version 3.5.3, 2019) for the analysis of WMC,.

Auto-regressive Integrated Moving Average (ARIMA)
ARIMA model has a unique condition that the time series data are either stationary or can be transformed into it (Chetty & Narang, 2017a;Opare, 2015). The prediction equation is a linear regression such that the independent variables are the lags of the dependent variable including/or lags of error terms . ARIMA is simply the 'Integrated series (I)' of two lags series: the stationary part known as 'auto-regressive (AR)' and the forecast errors part known as 'moving average (MA)'. When an ARMA (p,q) process carries a d-order differencing, it is then known as ARIMA(p,d,q) (Zaiontz, 2012). The p refers to the number of autoregressive terms, d as the degree of differencing, while q refers to the number of moving average terms. The mathematical formulation of the ARIMA model is well established in the literature . In general, with lag-operator set as B, ARIMA (p,d,q) is given as (Deljac et al., 2011): In order to use ARIMA, the model requires that we transform the original series into stationary data through differencing at different levels.
The steps involved in formulating ARIMA model, as stated in (Opare, 2015) include:

Model Identification:
We carried out a visual inspection on the graphical plot to confirm the stationarity in the series. Stationarity means the existence of constant mean, and variance 2. Model Estimation: We used Augmented Dickey-Fuller (ADF) to test the null hypothesis that the data has a unit root and, therefore, non-stationary (Opare, 2015). ADF usually will check the correlation in error terms by adding lags. Two values are most important to make confirmation: Z(t) and Mackinnon p-value for Z(t) (Chetty & Narang, 2017a). For stationarity, Z(t) should have a sizeable negative number, and the p-value should be significant, at least on a 5% level (Chetty & Narang, 2017b;Opare, 2015). To achieve this resolution, we add lags and subsequently taking the differencing (Chetty & Narang, 2017b).

Model Diagnostic checks:
We carried out a Portmanteau test for white noise to know if the residual variable follows a white noise process (Stata, 2017). Furthermore, we plot the 'fitted' values with the actual values on a two-way graph with standard error as scattered points (Chetty et al., 2018), 4. Forecasting: In general, the forecast equation for the differencing series is: In the situation where differencing has been introduced, there was need to reverse the differencing to obtain the original series. However, the software does it automatically .
In general, the transformation is done using this substitution (Opare, 2015): For a 3 rd differencing, for instance, the reverse equation will be: The chosen ARIMA (p,d,q) model was used to forecast values for the period 2014 to 2017. We used Stata 14 SE for academic user (Stata Corporation, 2014) in the computations

Measure of Comparison
The Mean Absolute Percentage Error (MAPE), is widely used statistical measure of forecast accuracy (Stephanie, 2017). The fact that MAPE is computed in percentage makes it easy to interpret (ForecastPRO, 2011). In this study, we apply MAPE as a measure of accuracy by converting the errors between the actuals and the estimates into percentages (Opare, 2015). The lower the MAPE, the more accurate the forecast. (13)

WMC Model in Forecasting (CMACR) in Nigeria
Changes in Nigeria CMACR was divided into two states blocks according to k-means cluster analysis as 'rate below' and 'rate above' with the corresponding state space as E = {1, 2}. Table 2 showed the results. Rate Above To verify the Markov property to justify the use of WMC, the chi-square of 68.98 was higher than the table value of 3.84 at 0.05 level of significance, so the annual closing CMACR conformed to the 'Markov property'; therefore WMC prediction theory can be applied (Zhou, 2014).  Table 3 displayed the results of the computation of each order autocorrelation coefficient r k of the sequence parameter values and the corresponding weights. Table 4 displayed the result of the prediction of CMACR for 2014 using 2012 and 2013 as the initial years  4 showed that max {p i , } = 0.537, indicating that CMACR annual closing rate in 2014 was in cluster 1 (i.e., 'Rate below'), which met the block interval of X < 200.6 with a probability of 53.7%. From the historical data, the actual closing rate was 111.6 and consistent with the prediction.
Also, we perform similar iteration such that the state in 2014 served as one of the initial states for predicting the annual closing rate in 2015. .000, indicating that CMACR closing in 2014 was also in cluster 1 (i.e., 'Rate below'), which met the block interval of X < 200.6 with a probability of 100%. From the historical data, the actual closing rate was 107.5 and consistent with the prediction.  So, CMACR cannot have constant mean and variance; hence, the primary assumption of stationarity cannot be confirmed (Adebiyi et al., 2014;Katchova, 2013;Liu et al., 2011;Opare, 2015).

Augmented Dickey-Fuller (ADF) Test
The results are shown in Table 5 below:  (-7.90) is a sizeable negative & corresponding p-value is also significant (0.0001), so the null hypothesis, which states that the series has a unit root was rejected. Therefore, we considered the third differencing for further studying.  Fig 2, only the 2 nd lag is outside the shaded portion (acceptable region), so, we set AR at 2. This AR (2) means that the AR model needs only two autoregressive terms (i.e., AR of order 2).

Figure 3. Diagram for PACF
Similarly, in figure 3, for the 3 rd differencing correlogram of PACF, the 2 nd lag is coming out of the shaded region, so we set MA at 2. So far we are able to identify only one plausible model (2, 3, 2); therefore, there may be no need to make a choice using Akaike's Information Criterion (AIC) and Bayesian information criterion (BIC)

Model Adequacy
The portmanteau Q statistic is 26.81 with a p-value of 0.26, so we cannot reject the null hypothesis indicating that the residual variance is indeed a white noise process Furthermore, after plotting the 'fitted' values with the actual values on a two-way graph with standard error as scattered points (Chetty et al., 2018), it can be seen clearly that the predicted values of CAMCR do not differ from the actual values.

Discussion
This study used WMC and ARIMA to predict the CMACR for Nigeria Under-five mortality rates. As for WMC, we classified CMACR into 2 clusters after having examined the performances of other clusters (3,4,5,6,7). The Markov Chain property was checked and found to be adequate to apply WMC. However, because of the nature of the data (non-stationary data), WMC could not produce a reasonable conclusion because it entered into a loop after the second iterations (after predicting for two periods, this causes the predicted values to remain in cluster one. Also, because of the open-ended interval created by using two clusters in WMC, the exact value for the forecasts could not be determined; therefore, accurate prediction becomes difficult. This finding agrees with Sadeghifar et al. (Shahdoust et al., 2015). So, WMC is weak in its ability to model effectively a data set including trends and seasonality. ARIMA, on the other hand, proved to be a better and more straight forward approach. The original data were non-stationary. We used third-order differencing to bring it into stationarity. The chosen model, ARIMA (2, 3, 2) gave a near-perfect fit with MAPE of 1.28%. So, because of the findings above, the hypothesis that WMC predicts CAMCR better than ARIMA was rejected. Therefore, we conclude that ARIMA can predict more accurately the CAMCR in Nigeria than WMC. Interestingly, the predicted values with ARIMA shows that the CMACR will continue to decrease up to 2022, after which it will increase such that the rate as of 2030 will be 139.4. However, the limitation from this finding is that the ARIMA model is weak in its ability to predict for a medium-term and long-term range (Deljac et al., 2011). The reason being that ARIMA convergences toward the mean of the auto-regressive part of the series (Huijskens, 2016), so we cannot rely on the efficacy of the 2030 forecast.

Recommendations for Future Study and Policy Implementation
More studies on WMCare needed to resolve the pitfall identified in this study. However, the implication here for policy-makers or agencies in charge of monitoring the under-five mortality rate in Nigeria is that, if the current approaches are continued, , the country will be unlikely to achieve the Sustainable Development Goals (SDGs) for 2030.