Evaluation of Hydrological Data Collection Challenges and Flood Estimation Uncertainties in Nigeria

In recent years, flooding has become a recurring problem in many regions including Nigeria, owing to changing climatic conditions, as well as anthropogenic factors such as poor land use management and urbanization that aggravate flood impact. To effectively manage and mitigate flood impact, hydrological data is required, and in many developing regions gauging stations are established, and gauge readers recruited and trained to collect and transmit such data to designated hydrological or water resource management agencies. This study focuses on understanding the challenges associated with hydrological data collection in Nigeria, using the Ogun-Osun River as a typical case, while analytically assessing how these challenges result in uncertainties that propagate unto flood frequency estimates that are used to inform flood management decisions. The findings reveal that (i) capacity and institutional gaps; lack of maintenance of hydrological infrastructure and surrounding landscape; poor data management architecture; and floods events that destroy hydrological equipment and inundate roads thereby restricting access to collected data during peak floods, are some of the challenges associated with hydrological data collection in developing regions; (ii) these conditions result in gaps in and shortened length of annual maximum hydrological time series required for flood frequency estimation, consequently leading to under or overestimation of low and high flood quantiles such as 1-in-2year and 1-in-100year floods, to levels of 0.67 m and 0.9 m respectively for the Ogun Osun River Basin. The need for improved data collation, management and adaptation of new technologies such as radar or sonar by the Ogun-Osun River Basin Development Authority is recommended in this study, to ensure sustainable and improved hydrological data collection, management, transferability and usability for flood management.


Introduction
Floods are one of the most devastating natural hazards, increasing in frequency, magnitude and impact in recent decades (Aerts et al., 2014;Di Baldassarre et al., 2010), owing to changing climatic conditions and anthropogenic factors such as poor land use management and urbanization (Lavender & Matthews, 2009).Reliable flood information is required by the government and other stakeholders alike to inform the deployment of flood countermeasures to mitigate flood impact (Padi, Baldassarre, & Castellarin, 2011).Typically, networks of Hydro-Meteorological (HYDROMET) gauging stations are established for systematic data collection (Herschy, 2008;Hipel, 1995), distributed across locations of interest for continuous and long-term data collection.Nevertheless, operating these stations, especially in developing regions is challenging, as factors such as poor financing by the government (Starrett et al., 2010), poor institutions, lack of commitment, lack of capacity, logistical and technical challenges (Ampadu, Chappell, & Kasei, 2013;Olayinka, Nwilo, & Emmanuel, 2013) hamper seamless data collection.These challenges result in hydrological gauging stations inadequacy, declining functionality, and gaps in available data, which would result in uncertain flood estimates and consequently poor flood management decisions if left unchecked.This paucity of data is particularly severe in developing countries, thereby further limiting these nations capacity to mitigate and cope with the impact of flooding on people, environment, infrastructure and socio-economic activities (Komi, Neal, Trigg, & Diekkrüger, 2017).
In the Ogun-Osun River Basin, the study area for this research, citizen observatory is typically employed, whereby, local residents are recruited and trained to collect and record daily water level readings (i.e.Gauge Readers (GR)), then records are transmitted intermittently to a designated hydrological data collation officer of the river basin authority (Bashiru, 2015).Like many developing regions, river measurements are manually collected using staff gauges and later converted to discharge using an established rating curve (Herschy, 2008).Therefore, during the peak of floods, measurement equipment could be damaged and/or access roads inundated, thereby impeding continuous data collection (Dano Umar et al., 2011;Olayinka et al., 2013).These challenges result in measurement errors, gaps in and shortened length of historical hydrological time series data, and are known to contribute to flood frequency estimate uncertainty, especially for the standard 1-in-100year flood estimates widely used for flood management planning and design of hydraulic structure such as dykes and levees to mitigate flood impact (Feaster, 2010).These measurement (aleatory) uncertainties are further exacerbated by procedural (epistemic) uncertainties that could result from the subjective nature of determining optimal probability distribution and parameters (such as shape, scale and location) required in the flood frequency estimation process (Di Baldassarre, Laio, & Montanari, 2012;Laio, Di Baldassarre, & Montanari, 2009).
This study seeks to understand the challenges associated with hydrological data collection in a typical developing region, with the specific objectives of: 1) Developing knowledge of the factors that contribute to the challenges associated with hydrological data collection in developing regions; and 2) Analytically assessing how these factors contribute to data uncertainty that consequently propagates onto flood model outcomes and decisions.

Study Area
The study area (Figure 1), Ogun-Osun River Basin (OORB) is located in western Nigeria (6°30′ -8°20′N latitude and 3°23′-5°10′E longitude), and encircles four states including Ogun, Osun, Oyo and Lagos, within a 66,264 km 2 area.The basin is drained by two major tributaries, Ogun and Osun, and other minor tributaries including Yewa, Ibu, Ona, Sasa and Ofiki Rivers.The climate of OORB is influenced by tropical continental and maritime air masses (Adeaga, Oyebande, & Depraetere, 2006), and experiences an annual rainfall of 1400 mm to 1500 mm; mean annual air temperature between 25.7°C and 30°C; and relative humidity varying from 37% -85% for dry and wet seasons respectively (Adeleke et al., 2015).The OORB experiences recurring flooding, caused by factors such as intense precipitation; poor urban planning and waste management; and failure of upstream hydraulic systems, resulting in socio-economic, infrastructural, ecological and environmental impacts (Jinadu, 2015;Komolafe, 2015).The five (5) locations under investigation for this study are along the Yewa River and include, Ajilete, Ebute -Igbooro, Eggua, Idogo and Ijaka -Oke, and are managed by the Ogun Osun River Basin Development Authority (OORBDA).
Figure 1.Study area showing OORB, constituting states and gauging station locations along Yewa River

Qualitative and Quantitative Data Collection and Analysis
A combined qualitative and quantitative approach is adopted in this study, where locally recruited hydrological data collection officers (gauge readers) were interviewed as part of a hydrographic survey and data collection campaign in January 2015.The five (5) gauge readers interviewed were responsible for Ajilete, Ebute -Igbooro, Eggua, Idogo and Ijaka -Oke hydrological gauging stations located along the Yewa River, one of the major rivers in Ogun state Nigeria.Direct quotations of gauge readers are presented in italics in Section 4. Analysis and Discussions.Also, field observation notes and photos were taken during the visit, and historical hydrological data acquired from the OORBDA were analysed using FLIKE Software (Kuczera, 1999) to corroborate interviewee disclosure and examine the uncertainties associated with gaps hydrological data collected from gauging stations presented in Table 1.

Missing Data Imputation and Flood Frequency Analysis
To evaluate the effect of missing data on flood frequency estimates, two datasets are developed and used for flood frequency analysis, (i) with missing data removed and (ii) with missing data filled using multiple imputation implemented in R using the Amelia package (Honaker, King, & Blackwell, 2011).The Multiple Imputation missing data infilling methodology uses the Markov Chain Monte Carlo approach that estimates missing values by randomly sampling from a distribution of plausible values derived from multiple simulations undertaken using mean and standard error parameters similar to that of the original dataset, under the assumption of normal distribution (van Buuren, 2007).This approach quantifies the uncertainty in the simulation process and reduces false precision attainable with single imputation (Li, Stuart, & Allison, 2015).Flood frequency analysis (FFA) was undertaken using Flike software (Kuczera, 1999) by fitting a predetermined probability distribution (i.e. the Generalized Extreme Value (GEV)) to the annual maximum flow time series for Idogo (the case site used for this analysis, having the least historical record due to missing records).The GEV probability distribution is selected for simplicity, based on its widespread usability in various regions (Izinyon & Ehiorobo, 2014;Smith, Villarini, & Baeck, 2011;Villarini & Smith, 2010) and for consistency with previous studies in the area investigated (Awokola & Martins, 2001;Ewemoje & Ewemooje, 2011).Typically, a suitability analysis can be undertaken to determine the probability distribution that best fits the time series data (Laio et al., 2009).
GEV formula is expressed in equation 1 as follows: where, τ, α, and k represents location, scale and shape parameters of the distribution function, presented in Table 2.

Challenges Associated with Hydrological Data Collection in Typical Developing Regions
River discharge data is a fundamental input (initial and boundary condition) required for flood modelling.River water levels within the study area are typically measured using staff gauge, then converted to discharge using established rating curves that plot water levels against discharge (Di Baldassarre et al., 2012;Herschy, 2008); see Supplementary Figure 1 for Idogo rating curve.This data collection approach results in measurement and extrapolation uncertainties (Baldassarre & Montanari, 2009;Haque, Rahman, & Haddad, 2014).Also, during peak flood seasons, access to remote areas for data collection is usually restricted due to inundation, and in some cases, hydrological measurement equipment are damaged by high-intensity floods (Olayinka et al., 2013).The Eggua GR recalled how the 2007 flood event resulted in the damage the gauging station, stating that "the flood 27 th of July 2007, destroyed the gauge station", with water levels reaching a peak of approximately "4.2 metres".These disclosures were consistent with a recent study in the region (Adelekan, 2011), as well as records from the Dartmouth Flood Observatory, Global Active Archive of Large Flood Events records (Note 3), where 1997, 1999 and 2007 flood events were reportedly caused by heavy rainfall, affected parts of Ogun state, damaging infrastructure and displacing approximately 5,000 persons in 2007.
In other instances, high magnitude flood events inundated road networks, thereby restricting the movement of manual data collectors.The GR's for Ajilete, Ijaka-Oke and Ebute Igboro also recalled that water levels overtopped roads and bridges, resulting in the absence of peak flow data.The Idogo GR narrated how the "1997 and 1999 flood events resulted in river overtopping the bridge, thus restricting passage of goods and persons'', while Ijaka-Oke GR recalled that "...The water level is enlarged during the rainy season", thereby restricting of movement ".... when the water comes, nobody can come from Ayetoro to this place, ...we are leaving at Ijaka, no way to move from Ijaka to anywhere else..., water close the road, there is no sign to move anywhere..."."...if I'm not swimming, there is no way to go to another place...".
Lack of financial support, technical deficiency, obsolete equipment/infrastructure and poor institutions have also been identified as some of the factors responsible for hydrological data sparsity in Nigeria (Ertuna, 1995;Izinyon & Ehiorobo, 2014;Olayinka et al., 2013;Olomoda, 2012).Furthermore, Maxwell (2013) andOnoniwu, (1994) attributed data inconsistency to poor hydrological data management systems and lack of standards, which results in data unreliability, fabrication and data format inconsistencies.The GR at Ijaka-Oke disclosed the need for an increase in his salary, stating "...I need the government to increase my salary", and the GR at Eggua similarly lamented non-payment for some time.Also, given that some of the GR's could not communicate in English, while others could, it was evident that the level of education or exposure varied amongst the gauge readers, although this question was not specifically asked.However, some of the GR's, like that at Ebute Igboro have been collecting data for the OORBDA since the inception of the gauging station in "1983/1984", hence there is some consistency in the data collection process, capacity and knowledge development over time.
Additionally, Maxwell (2013) and Olayinka (2012) argued that even when data is available, custodians store data in paper formats, thus limiting transferability, applicability and long-term/sustainable data availability.A sample dataset collected from the Ogun-Osun River Basin Authority is presented in Figure 2, showing the typical paper-based data collation and storage format by the agency.Such datasets and formats are prone to data quality error during imputation, typographical and data conversion errors during data transfer from paper to digital format (Beall, 2006).Figure 3 shows the gaps in time series of annual peak flows (discharge), likely caused by the absence of data due to the factors highlighted above.Furthermore, field observations revealed restricted access to staff gauges, to degrees that could affect the effective reading of staff gauge measurements and consequently result in uncertainties that are aleatory in nature, i.e. due to measurement (Merz & Thieken, 2005).Figure 4

Analysis of Uncertainties Associated with Data Collection Challenges
Gaps in hydrological time series result in reduced data completeness and shortened length, thereby contributing to uncertainties in flood frequency estimates, especially for the standard 1-in-100year flood estimate that has been proven to be significantly affected by the length of historical hydrological records (Feaster, 2010).These uncertainties propagate through to hydrodynamic models and consequently lead to flawed flood management decisions (Gill, Asefa, Kaheil, & McKee, 2007).Figure 5 displays the number of hydrological records available for all five gauging stations from 1980/1982 to 2012 (30/32 years), revealing that data gaps varying from 4 to 7 years.

Gauging Stations
1998) and for another, missing data is estimated using Monte Carlo multiple imputation approach (Graham, Olchowski, & Gilreath, 2007).The results are presented in Table 3, Figure 6 and Figure 7 showing how gaps in the hydrological time series can lead to different flood estimates of up to 2 and 2.22 m 3 /s for high return periods of 1-in 50 to 1-in-100-year floods respectively, which implies for instance a 0.90 m river level addition to 5.43 m for a 1-in100-year flood, based on the rating curve presented in Supplementary Figure 1.This increased water level can inundate roads, farmlands, and socio-economic and physical infrastructure.Also, flood estimates with missing data filled were consistently higher than those derived from data sets where gaps are removed, suggesting likely underestimation of flood quantiles when missing data exist.

Conclusion and Recommendations
This study evaluated the challenges associated with hydrological data collection in a typical developing region based on gauge readers testaments, field observations and hydrological analysis, identifying capacity and institutional gaps; poor maintenance of hydrological equipment and surrounding landscape; poor data management architecture (collection, transmission, storage and format); and floods events that destroy hydrological equipment and inundate roads thus restricting access to data collection during peak floods, as factors that hamper seamless and sustainable data collection.These challenges result in data gaps in hydrological timeseries essential to flood frequency estimation during peak flood events, and shortened length of existing historical hydrological data, thereby causing aleatory uncertainty that propagates through flood modelling processes; and can affect flood estimates as demonstrated in this study.Furthermore, the use of gauge readers for the long-term may not be sustainable in the long-term, due to lack of an established succession plan, capacity building programmes and the current challenges of data collected during peak flow seasons.Also, a standardize and more accurate approach to hydrological data collection is recommended, possibly through the adoption of new technologies such as radar and Sonar automated systems for improved data collection and management, to enhance data transferability and usability in the Ogun-Osun River Basin.

Figure 5 .
Figure 5. Graph showing the number of hydrological records available from Yewa River gauging station from 1980 to 2017 (Source: OORBDA)

Table 2 .
Parameters of the GEV probability distribution function

Table 3 .
Flood frequency estimates at Idogo station, for data with gaps removed and another filled