Statistical Modeling and Forecast of the Corona-Virus Disease (Covid-19) in Burkina Faso

Coronavirus disease 2019 (COVID-19) is an infectious disease caused by a new virus that has never been identified in humans before. This virus causes respiratory illness with symptoms like cough, fever and, in the most severe cases, pneumonia. The new COVID-19 is mainly spread through contact with an infected person, when they cough or sneeze, or through droplets of saliva or nasal secretions. The virus appeared for the first time on December 2019 in Wuhan, China. In less than four months, it has spread to more than 210 countries around the world. Africa got its first case of COVID-19 the 14th of February in Egypt and the first confirmed case in sub-Saharan Africa was in Nigeria.


Introduction
Coronavirus disease 2019  is an infectious disease caused by a new virus that has never been identified in humans before. This virus causes respiratory illness with symptoms like cough, fever and, in the most severe cases, pneumonia. The new COVID-19 is mainly spread through contact with an infected person, when they cough or sneeze, or through droplets of saliva or nasal secretions. The virus appeared for the first time on December 2019 in Wuhan, China. In less than four months, it has spread to more than 210 countries around the world. Africa got its first case of COVID-19 the 14th of February in Egypt and the first confirmed case in sub-Saharan Africa was in Nigeria.
In Burkina Faso, the first cases appear the 9 th of March. Up to the date of 9 April, Burkina Faso was one of the West African countries most affected by the pandemic with 443 cases including 146 cured and 19 deaths.
After the declaration of the first cases of COVID-19 in Burkina Faso, the leaders and the people of the country were troubled, schools were closed up one week later. Foreign radio and TV channels, predicted millions of confirmed cases in Africa. Face with all these statistics, we become concerned about the case of Burkina Faso.
Since the start of the pandemic, scientists all around the world have carried out several studies in various fields in several countries. See for instance ([Ivorra and Ramos, 2020], [Chen et al., 2020], [Jia et al., 2020], [Tang et al., 2020], [Liu et al., 2020], [Chen et al., 2020], [Maleki et al., 2020], [Khan and Gupta, 2020]). ( [Ivorra and Ramos, 2020]) studied the validation of the forecasts by using a Be-CoDis mathematical model. ( [Liu et al., 2020]) tried to understand the dynamic of the COVID-19 through the understanding of the unreported cases. They have developed a compartmental model to predict the behavior of the disease. ( [Chen et al., 2020]) developed a Bats-Hosts-Reservoir-People transmission network model for simulating the potential transmission from the infection source to the human infection. As a result, they computed the reproduction number R 0 . ( [Khan and Gupta, 2020]) used times series to forecast the confirmed and recovered cases of COVID-19. More precisely, they used the family of Autoregressive time series models based on twopiece scale mixture normal distributions, called TPCSMNCAR models to analyze the real data of con?rmed and recovered COVID-19 cases. ([Maleki et al., 2020]) have adopted uni-variate time series models to predict the number of COVID-19 infected cases that can be expected in upcoming days in India. The ARIMA and the Nonlinear AutoRegressive Neural Network (NAR) models were used in their work.
In the present paper we review several approaches to mathematical modeling of the COVID-19 disease and develop these ideas further with an emphasis on the analysis of the dynamics of the cumulative number of confirmed cases and estimation of the parameters of the models. We focus on models which use fewer parameters, rather than a detailed description of the disease.
We use these models to predict the cumulative number of confirmed cases of COVID-19. More precisely, we use ARIMA models to fit the available data and then predict the cumulative number of confirmed cases. The reminder of this paper is organized as follows. In section 2, we introduce the ARIMA and ExponenTial Smoothing models (ETS). We start by defining the information criterion on which we base our models choice. Then, we get our first model of prediction base on the auto.arima package of R. In subsection 2.3, we look at closely the ETS model and compare it to the ARIMA model got previously. In subsection 2.4, we build an ARIMA model by using the Box-Jenkins method. In section 3, we make prevision using the chosen model. We end our works with a conclusion.

ARIMA Models, Exponential Smoothing Model
In this section, we use the AutoRegressive Integrated Moving Average model got through auto.arima of the package forecast, Exponential smoothing method to predict the cumulative number of cases of COVID-19. Next, we construct our own ARIMA model and again make prevision. ARIMA models and Exponential smoothing models are the most widely used approaches to time series forecasting, and provide complementary approaches to the problem. The motivation to use these approaches is that the infection chain of the COVID-19 is autocorelated and has a certain trend. Exponential smoothing models focused on description of the trend and seasonality in the data, while ARIMA models focused on describing the autocorrelations in the data.

Information Criterion (IC)
Modeling growth often involves comparing several models of different equations on the same data set. This comparison allows the choice of the model that best fits the data ("goodness of fit"). To compare these models, we will look at information criterion like Akaike Information Criterion (AIC) (cf. [Burnham et al., ]), Bayesian Information Criterion (BIC), corrected AIC,... These criteria measure the quality of a statistical proposed model. When it is estimated that a statistical model, it is possible to increase the likelihood of the model in one or more parameters. The AIC, BIC and AICc make it possible to penalize the models as a function of the number of parameters in order to satisfy the criterion of parsimony. We then choose the model with the weakest information criterion, and thus keeping only the parameters of main interest. The formula of each one of the criteria is written as follows: with k = 1 if there is a drift and k = 0 otherwise. T is the total number of observations, p is the order of the autoregressive part, q the order of the moving average part and L the maximum value of the likelihood function of the model (see [Akaike, 1974]). Of two the models, the better is the one with the lesser information criterion. It is also possible to compare the residuals of the different models and choose the one for which the values were in the residual matrix are the least variable.

Automatic ARIMA Modelling
Automatic ARIMA modelling consists of the use of Hyndman-Khandakar algorithm. For more details about the algorithm, see ( [Akaike, 1974]). The function auto.arima of the package forecast of the software R combines unit root tests, minimization of the AICc and Maximum Likelihood Estimation to obtain an ARIMA model that fit the data available.
For the choice of the best ARIMA model that fit the data very well, we explore several models and choose based on the where Y t describes an ARMA(2.2) process.
To estimate the parameters of the model, we use the maximum likelihood estimation (MLE) method. The aim of using this method is to find the values of parameters that maximize the probability of obtaining the available data. Table 2 provides the estimates of the parameters. confirms that the residuals are white noise.

Exponential Smoothing Model
Exponential smoothing appeared around 1959 (cf. [Brown, 1959]) and has motivated some successful forecasting methods.
Here, based on the information criterion, we model the cumulative number of confirmed cases by the Holt's linear method with additive errors. For this model the equation of the model is given by where L t is the level (or the smoothed value) of the series at time t, B t is the trend component, α, β are smoothing coefficients of the model having the following constraints 0 < α < 2 and 0 < β < 4 − 2α (see [Akaike, 1974],chapter 10).
To estimate the smoothing parameter, we use the MLE method and obtain the following system.

Choosing Our Own Model
According to ([Hyndman and Athanasopoulos, 2018]) the automatic arima modeling technique uses a variation of Hyndman-Khandakar algorithm, which combines unit root tests, minimization of the AICc and MLE to obtain an ARIMA model. Our purpose in this subsection is to use a general procedure for forecasting using an ARIMA model. The modeling procedure used in the following is based on the one in ( [Hyndman and Athanasopoulos, 2018] p. 321), which can be summarized by Figure 5.
Plot of the data The curve in Figure 6 shows the evolution of the cumulative number of COVID-19 cases in Burkina Faso. We notice on this graph that the number of cases is increasing regularly. Figure 7 shows the scatterplots of the COVID4. We can notice the randomness of the data, but no clear seasonality. Figure 8 shows the autocorrelation function of the cumulative number of confirmed cases of COVID-19. The autocorrelations for small lags tend to be large and slowly decrease as the lags increase. Therefore the time series has a trend. Moreover, The data are strongly autocorrelated positive.
Box-Cox transformation In this paragraph, we proceed to the transformation of the data using the Box-Cox transformation. Indeed, the Standard Normal Homogeneity Test (SNHT), the test of Buishand and the test of von Neumann confirmed that the data are not homogeneous; the variance is not constant over time.
The analysis of the Box-Cox transformation reveals that the serial is not from a normal distribution. Moreover, the three tests, KPSS test, Phillips-Perron test and ADF test of Dickey-Fuller show that the process is non-stationary. Table 4 gives the results of tests of stationarity.
Differentiation We look at the differentiation of the Box-Cox process in this paragraph. The first differentiation of the process is non-stationary. But the differentiation of order 2 is stationary. Figure 10 shows the two-times difference process. We can observe the stationarity of the process. Figure 10 shows the differentiation of order two of the Box-Cox

Box-Cox(Total cases)
Total cases time Figure 9. Box-Cox of the cumulative number of confirmed cases International Journal of Statistics and Probability Vol. 9, No. 6;   In Figure 11, The ACF of the residuals from the ARMA(1,2,1) model indicates that the residuals are white noise.
Moreover, the analysis of the white noisiness of the residuals shows that the process has a normal distribution and is stationary (cf. Table 5).
We can therefore says that the differentiation Box-Cox process is a Gaussian white noise.
Analysis of the ACF and PACF Here now, we analyze the Auto Correlation Function (ACF) and the Partial Auto Correlation Function (PACF) of the 2-order Differentiation Box-Cox process. Figure 12 gives both functions ACF and PACF. The PACF and ACF functions in Figure 12 are suggestive of an ARMA(1,1) model.

Forecast From the Chosen Model
Now that we have our model, In this section we predict the cumulative number of confirmed cases. Figure 13 gives an overview of the prediction of twenty days from the 3 rd May.
Forecasts from ARIMA(1,2,1) 0 10 20 30 40 0 400 800 Figure 13. Forecast for 15 days from ARIMA(1,2,1) Moreover, Table 6 gives the prevision within thirty days from the 3 th of May in term of confidence interval.  Remark 3. The value of the order of difference d has an effect on the prediction intervals ł the higher the value of d, the more rapidly the prediction intervals increase in size. So one should take that into account for the model to choose for the prediction.
Remark 4. We notice that, in one hand, the automatic arima model fits the training data slightly better than the ETS model. However, ETS model out performs the ARIMA(1,2,1) model. On the other hand, the ARIMA(1,2,1) model provides more accurate forecasts on the test set than the automatic arima model ARIMA(3,2,2), which in turn outperforms the ETS model. Table 7 below gives an insight of what we say.
Likewise, when we use time series cross-validation to compare the three models, based on the Mean Squared Error, the ARIMA(1,2,1) model has a lower tsCV statistic, then come the automatic arima model and finally the ETS model.

Conclusion
The main contribution of this paper is the daily prediction of the cumulative number of confirmed cases using a number of times series models. The ARIMA(1,2,1) gives good predictions than the others.
It is important to point out that we haven't developed new statistical methods, but used existing simple ones to show their usefulness and practicability.