Forecasting Hydropower Generation in Ghana Using ARIMA Models

In this study, an Autoregressive Integrated Moving Average (ARIMA) model was used to forecast Ghana’s Akosombo dam level and hydropower generation by the end of year 2022. Data used for this study span from January 2010 to December 2019. Base on the final ARIMA model, power generation is forecasted to decrease from 398 Megawatts/hr in December 2019 to approximately 374 Megawatts/hr by December 2022. On the other hand, water level of the Akosombo dam is predicted to decrease marginally from 264.8 ft in December 2019 to approximately 255.19 ft by December 2022. The Volta River Authority (VRA) and managers of the electricity production in Ghana are encouraged to be proactive in expanding energy production by turning more to renewable energy sources. In the coming years, as they seek to provide sustainable electricity for their cherished customers, investment decisions should be directed towards protecting the volta river from drying up due to human and climatic activities as well as expanding energy mix. software, S.A.S.; validation, S.A.S., and A.A.; formal analysis, S.A.S. and A.A; investigation, S.A.S; resources, S.A.S., and A. A.; data curation, S.A.S.; writing—original draft preparation, S.A.S., A.A, and A,A; writing—review and editing, S.A.S., and A.A.; supervision, S.A.S.; project administration, S.A.S.. All authors have read and agreed to the published version of the manuscript.

However, despite the 39.9% of electricity derived from hydro, there has being a significant decline in hydropower since 2014. Ashong (2016) indicated that inadequate water inflow into the hydro dams as a results of low rainfall have been the main reason for the decline in hydropower generation, hence the major cause of unstable renewable energy state in Ghana.
There are several models used for forecasting in time series. This study mainly applies the Autoregressive Integrated Moving Average (ARIMA) model for predicting power generation and dam level. ARIMA models depend on past values to predict the future. The ARIMA model consist of three components (p, d, q), where p is the order of AR process, d is the difference order, and q is the order of MA process. The ARIMA model is one of the most used techniques by many researchers due to its reliability (Debnath & Mourshed, 2018;Ediger & Akar, 2007). Sarpong (2013) found out that the use of ARIMA model is adequate for forecasting. In addition, El Desouky, & Elkateb, 2000) revealed that the use of ARIMA model for forecasting provides smaller errors.
In previous studies, the Owusu et al. (2018) found out that, electricity generation will decline if alternative power sources are not urgently considered by Government. Boadi and Owusu, (2019) in their study on climate change and its effect on hydropower in Ghana using monthly data from 1970 to 2010 concluded that, 21% of Ghana's unstable electricity supply was due to shortfall in water levels of Akosombo hydroelectric power station. Michieka et al., (2021) found out that, long-run positive shock in temperature increases electric power production. According to Asian Development Bank (2007), drought causes more shortages resulting in outages and insufficient cooling water which ultimately decrease hydropower production.
Moreover, Ediger and Akar (2007) forecasted primary energy demand by fuel using ARIMA model. The forecasted result shows that primary energy demand will decrease between 2005 and 2020. Mite-León and Barzola-Monteses (2018) used ARIMA models in forecasting hydropower generation in Ecuador. The outcome of the study showed monthly increased in hydropower generation in Ecuador. Kabo-bah et al. (2016) found out that, regular low flow of water into the Akosombo dam affects power generation.
However, most of the previous studies developed on energy focused more on renewable resources, predicting energy consumption and in particular, overall energy production (Dind etal., 2018;Katani, 2019;Kaur & Ahuja, 2017;Sarkodie, 2017;Wu et al., 2017). Some too focused on factors affecting hydropower generation (Kabo-bah et al, 2016;Michieka et al., 2021). Notwithstanding the above, hydropower generation forecast in developing countries, like Ghana, has attained very little attention. Energy production forecast is of great importance to the operators of electrical system and decision makers to defined better policies and manage risks.
Also, from previous studies, different models have been used by different researchers in forecasting hydropower generation. Owusu et al. (2018) used Polynomial regression. Zolfaghari and Golabi (2021) used adaptive wavelet transform (AWT), long short-term memory (LSTM) and random forest (RF) algorithm (AWT-LSTM-RF) to predict the electricity production in hydroelectric power plant. Dmitrieva (2015) combined Neural Networks, SVM and ARIMA models in forecasting hydropower plant production. Mite-León and Barzola-Monteses (2018) used ARIMA with seasonal component in predicting hydropower generation in Ecuador.
From the year 2003, the Energy Commission of Ghana decided on an annual increase in power supply of 0.9 to 1.8% due to increasing population and economic activities (EC, 2013). It is clear from the above that, trends for future hydropower generation and water level of the longest serving source of electricity in Ghana (the Akosombo dam) is crucial to overcoming power supply challenges.
This study therefore attempts to forecast Ghana's hydropower generation as well as water level of the Akosombo dam using the Autoregressive Integrated Moving Average (ARIMA) technique. Many of the existing literature on ARIMA forecasting models usually ignore analysis of forecasting errors (Koutroumanidis et al., 2009;Ömer Faruk, 2010;Khashei & Bijari, 2011). In this study however, error analysis is performed using Root Mean Square Error (RMSE) and Mean Absolute Percentage error (MAPE) to evaluate the forecasting accuracy of the selected model. The Volta River Authority (VRA) and managers of the electricity company of Ghana may find this study very useful in the planning for the coming years as they seek to provide sustainable electricity for their cherished customers. This study may promote the need for intervention programs to protect the Volta river from drying up due to human and climatic activities.

Materials and Methods
The study used two secondary univariate time series data; power generation and dam level, which span from January 2010 to December 2019. Since the data was measured over time, and uniformly spaced, we considered utilizing the Box-Jenkins strategy (Shumway et al., 2000). The time series forecasting by using ARIMA models can be performed in four basic steps namely, Identification, Estimation, Diagnosis and Forecasting (Box et al., 2015), to end up with a specific formula that satisfies all the underlying conditions as much as possible to produce good and accurate forecast.

Autoregressive Integrated Moving Average Process (ARIMA)
ARIMA model is a type of Box-Jenkins series analysis which depends on past values to predict the future (Devi et al., 2013). The modelling is done using the integrated autoregressive and moving average processes. The ARIMA (p, d, q) model is divided into three main parts: The Autoregressive (AR) part of order p, which explains the present value of a series by the function of p past values, the Moving Average (MA) part of order q, which indicates that the output variable depends linearly on current and various past values, and the differenced (d) part which indicates that the data values have been replaced with the difference between the values and the previous values.
The Box-Jenkins methodology apply the maximum likelihood principles in parameter estimation. Using a modified form of Mite-León and Barzola-Monteses (2018) model approch, the ARIMA (p, d, q) model is expressed as: where, is the series with a difference order (d) 1 ,…, and 1 ,…, are the model parameters represent white noise with i.i.d.
The ARIMA (p, d, q) model is to make nonstationary time series stationary by d order difference.

Unit Root Evaluation
Test for a unit root is one of the basic assumptions underlying any time series data. This study made use of time-plot of the data and the ADF statistical test in evaluating the stationarity of the two series. The ADF was based on the assumption that the series can approximate an autoregressive of order 1 (Mite-León and Barzola-Monteses, 2018). The ADF test is performed under the null hypothesis, the series has a unit root. The regression equation of the ADF test is given by: Where, is the observed time series is constant is the coefficient of the time trend p is the order of AR process.
If = 0, the series is random walk and if -1 < 1+ < 1, the series is stationary.

Model Identification
Before applying the ARIMA model, the Autoregressive (AR) component p, and the Moving Average (MA) component q, was identified using ACF and PACF plots respectively. According to Box and Pierce (1970), the ACF and the PACF are correlogram functions that help to decide the degree of association between two successive values of the series and give an idea of the possible parameters of the ARIMA model. By following Polprasert et al., (2021) method of identification, the ACF and PACF were drawn to stationary time series and the p and q values were evaluated based on truncation and trailing nature of the function. Truncation refers to the nature in which the ACF or PACF time sequence is zero (0) after some time, and trailing refers to the nature in which the ACF or PACF slowly shrinks to zero (0). If the PACF is truncating and ACF is trailing, then p equals the truncation order, q equals 0, and it can be concluded that the sequence fits AR model. If PACF of the stationary series is trailing and ACF is truncating, then q equals the truncating order, p equals 0 and it can be concluded that the sequence fits MA model; if both the PACF and ACF are trailing, then p equals the PACF truncation order, q equals the ACF truncation order, and the model fits the ARMA model.

Model Estimation
After the parameters (p, d, q) of the ARIMA model have been identified, the model is then estimated to obtain the coefficients. The maximum likelihood estimation is used in this study to get the estimates of the coefficients of the suggested models at the identification stage. We fit all the suggested models at the identification stage to the series to obtain estimates of the coefficients.

Model Selection
After a successful estimation of the model, the Akaike Information Criteria (AIC) and the significance of the models will be accessed to determine the best model for our series. It is expected that the ARMA components would be significance at 5% level of significance after estimation and return the minimum AIC value. AIC is mathematically expressed as: where, L is the maximum likelihood value k is the number of parameters to be estimated.
When these conditions are satisfied, that model is then selected as the best model for the series.

Model Diagnostics
The Box-Jenkins methodology also provides an avenue to access the goodness of fit of the selected model. It is expected that after a complete estimation, the residuals of the selected model would exhibit the following characteristics: the residuals should be white noise, the ARMA process should be covariance stationary, thus all the AR roots must lie inside the unit circle, the ARMA process should be invertible, thus all the MA roots must lie inside the unit circle. The study employs the Ljung-Box test to test whether the residuals are white noise or not. The test is express as: Where is asymptotic distribution which a Chi-square distribution with degrees of freedom ℎ = − − , and are the orders of AR and MA, n is the sample size, is the estimated autocorrelation of the time series at lag , and is the number of lags to be tested.

Model Forecasting and Evaluation
Once the selected model has been verified, the model will then be used to predict power generation and dam level in the next 36 months. After the forecasting, this study employs the Root Mean Square Error (RMSE) and Mean Absolute Percentage error (MAPE) to evaluate the forecasting accuracy of the selected model. The RMSE and MAPE are given by: where is the predicted value for the ith observation, is the observed value for the j th observation, n is the number of non-missing residuals.

Graphs of the Series
It can be observed from figure 1 and 2 that, both series depicts a change in mean over time which suggest that both series are non-stationary. It can be observed from both correlogram plots (Figure 3 and 4) that, the ACF decline very slowly, which also suggest that both series are not stationary. This can be confirmed using the ADF test. Table 1 and 2 below display the results of ADF statistical test for stationarity 4 both power generation and dam level respectively. It can be confirmed from the ADF test that both series were not stationary at their levels and became stationary after first differencing. This confirmed the use of ARIMA (p,d,q) model to estimate our models and make predictions.

Model Diagnostics
It is very advisable to check the goodness of fit of the selected model to see whether it adequately fit the data before forecasting is perform.    Vol. 11, No. 5;2022 3.3.2 Dam Level Model Diagnostics Table 6. Ljung-Box results

Figure 8. AR/MA root results
The AR and MA roots lie inside the unit circles, which shows that the ARMA process is covariance stationary and invertible. Also, all the p-values of Ljung-Box Q-statistics are greater than 5% level of significance, there we fail to reject the null hypothesis and conclude that the residuals are white noise. This confirm that ARIMA (10,1,1) model adequately fits the data.

Model Forecasting in the Next 36 Months
Since the selected model has successfully passed the diagnostic stage, power generation and dam level are predicted as below. From figure 9, the vertical axis represents the quantity of power generation in megawatts while the horizontal axis denotes the time in months in which the power was generated. It can be observed that there    I  II III IV I  II III IV I  II III IV I  II III IV   2019  2020  2021  2022 PowerGen POWERGENF

Forecasting Evaluation
The table below display the evaluation results of our forecasted models for hydropower generation and dam level respectively.  I  II  III IV  I  II  III IV  I  II  III IV  I  II  III IV   2019  2020  2021  2022 DamLevel DAMLEVELF http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 11, No. 5;2022  It can be observed that, both tables presented values which are smaller. This means that the selected model for hydropower generation and that of dam level provide best forecasting accuracy results.

Discussion
The objective of the study was to obtain an appropriate ARIMA model that will help forecast hydropower generation and dam level which has a significant impact on hydropower. The Box-Jenkins method was employed to obtain the suitable model for our series. The study made use of two different univariate time series which was obtained monthly. The two datasets were monthly recorded data for power generation and dam level. The study first looked at hydropower and its significance to the development of a country, irregularities in dam levels because of climate and environmental conditions and lastly some of the models that have been used for forecasting hydropower generation. The graph of both series was obtained, and it was found that, there is a change in mean of the two series which shows nonstationary nature of the series.
There were rise and fall in dam level across the sample period, but in a downward pattern from 2011 to 2015 and upward pattern from 2016 to 2019. There is also sharp decline in power generation from 2014 to 2015. ACF plots for both series decline slowly which suggest nonstationary for both series. The ADF statistical test also proved that power generation and dam level are nonstationary. Transformation of the series was done by taking the first difference of both series to obtained stationarity. The correlogram of both series suggested different orders of p and q for AR and MA process. After different estimations, ARIMA (8,1,3) and ARIMA (10,1,1) fulfilled all the model selection criteria for power generation and dam level respectively.
For the validation of the model, the residuals hypotheses were tested. First, the Ljung-Box test was used to determine if the residuals of the selected models are white noise. The test returned p-values greater than 5% significance level up to six lags for both power generation and dam level. Also, from the root statistics, all the AR and MA roots fell inside the unit circle which shows that, the selected models were stationary and invertible respectively. With this validation, a forecast was made.
The results of Ghana's hydropower generation and Akosombo dam level are depicted in Figure 9 and Figure 10 respectively. The fitted model was used to forecast the observed series (2010-2019) and based on that the future series were forecasted as presented in Table 7 and Table 8 for hydropower generation and dam level respectively. Hydropower production phantom the sustainability of a country and Ghana has suffered from power outages since 2014. Jude et al., (2011) found out that, decrease in hydropower generation also decrease energy consumption, which in turn decrease economic growth. According to Ashong (2016), the decline in hydropower generation is mainly due to the inadequate inflow of water into the hydro dams as a results of low rainfall. This signifies that climate variabilities and environmental conditions are main variables that affect water levels and generation of hydropower (Michieka et al., 2021;Miescher, 2021).
The forecasting evaluation results for hydropower generation and dam level are displayed in Table 9 and 10 respectively.
From Table 9, RMSE of 1.16 means that the average distance between the observed series and the predicted values is 1.16. MAPE of 0.4% means that the ARIMA (8,1,3) model-predicted level varies by 0.4% from the observed series. Also, from Table 10, RMSE of 3.85 means that the average distance between the observed series and the predicted values is 3.85. MAPE of 1.45% means that the ARIMA (10,1,1) model-predicted level varies by 0.4% from the observed series. This means that the selected models are statistically sound to make future forecast. In a similar study, Sarkodie (2017) used RMSE and MAPE to analyze forecasting errors in predicting electricity consumption in Ghana.

Conclusions
The ARIMA models has revealed that, following the trend of past values of hydropower and water level from the Akosombo dam, both variables will trend downwards in future. Based on the forecasting results obtained, Ghana will experience a slight decrease in hydropower generation despite the increase in water level that will occur, as the results showed. This implies that there should an introduction of new hydro plants that will utilized the excess water to produce more of electricity for the country. The Volta River Authority (VRA) and managers of the electricity production in Ghana are also encouraged to be proactive in expanding energy production by turning more to renewable energy sources. In the coming years, as they seek to provide sustainable electricity for their cherished customers, investment decisions should be directed towards protecting the volta river from drying up due to human and climatic activities as well as expanding energy mix. Government of Ghana should devote funding to support scientific research in renewable energy and energy-harvesting technologies to ensure that enough energy is available for citizens and industries. Future research is encouraged to be done using new variables, to study patterns of power outages, construct economic models and make predictions. This will increase effective decision making process in the energy sector and also help sustain the growth of the economy. Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy reasons.