Goodness-of-Fit of Reference Evapotranspiration to Gamma Probability Distribution

This paper aims to estimate, using the Penman-Monteith method, the probabilities of reference evapotranspiration (ET0) in millimeters, as well as their accumulated values for ten days (decendial), in Mossoró, northeast Brazil. The Meteorological Station of the Federal Rural University of Semi-Arid (UFERSA) provided the daily records of evapotranspiration. The construction of tables based on the approximation of the variable to the Gamma distribution allows the use of data without transformations. The probabilities were estimated with the Gamma distribution at confidence levels of 1% to 95% over the 1970-2007 data period. The results of the chi-square and Kolmogorov-Smirnov tests at 10% probability (p ≥ 0.10) demonstrated the adequacy of the table construction process, providing essential support in the planning of agricultural activities in the region to obtain the maximum benefit from evapotranspiration data. The Gamma probability distribution best described the ET0 for scaling irrigation systems in the county. The maximum daily ET0 for irrigation projects in the Mossoró region is 10 mm, and the cumulative 10-day ET0 averages 80 mm.


Introduction
Meteorological parameters such as solar radiation, relative humidity, and precipitation influence agricultural production affecting plant transpiration and soil evaporation. Evapotranspiration (ET) comprises the total water lost by the soil surface and plants in the joint evaporation and transpiration processes (Ometto, 1981).
Thus, the evapotranspiration describes the whole process of water transference from the soil-plant system to the atmosphere. Evapotranspiration rates help in estimating the water demand of crops and, associated with the gain of water through precipitation, allows determining the water availability of a region, being a parameter of great importance in plant ecology and agricultural planning.
The occurrence probabilities of estimated evapotranspiration are an essential tool for decision making in agricultural activities (Prela-Pantano et al., 2009). In irrigation projects, the criterion of choosing the probability level should be based on an economic analysis considering the losses associated with the reduction of quantity and quality of production due to water deficit, and the increase of system costs to satisfy higher levels of probability (Silva et al., 1998). Studies recommend ET 0 values with a 75% probability for irrigation project scaling (Back, 2007).
Probability distribution models are commonly used to fit evapotranspiration data (Pereira & Frizzone, 1994). The nature of data guides the choice of the appropriate probability density functions. Provided that the representativeness of data is respected, parameter estimates for a given region can be extrapolated for general use, without prejudice to the accuracy of probability estimate (Catalunha et al., 2002).
The Kolmogorov-Smirnov test, for example, can verify the fit to Beta and Normal distributions of historical reference or potential evapotranspiration series accumulated at 5, 10, and 15 days and calculated by the Penman-Monteith method (Saad, 1990;Vellame, Queiroz, & Oliveira, 2012).
Setting the evapotranspiration value for scaling irrigation systems is quite complicated. Some models for dimensioning irrigation estimate the probability of occurrence of evapotranspiration and rainfall (Jensen, 1974). This approach enables accurate irrigation scaling based on levels of water deficit risk for the crop.
Probability levels should be selected based on economic criteria, taking into account the losses associated with the reduction of quantity and quality of production resulting from water deficit and the increase in system costs to satisfy higher levels of probability (Silva et al., 1998). Usually, high levels (80 to 90%) are only adopted for irrigated crops of high economic value and with shallow root systems (Jensen, Burman, & Allen 1990). Cultures in the irrigation conditions of the Center-South region of Brazil hardly need levels above 90%, and levels between 50 and 75% are usually used (Saad & Scaloppi, 1988). In most irrigated regions, these levels range between 75 and 80% (Doorenbos & Kassam, 1994).
The Log-Normal MM, Log-Normal MV, Gamma MM, and Gamma MV distribution functions also provide good fits to decendial ET 0 values. However, the Beta distribution has better goodness-of-fit than other distributions in several grouped time intervals (Back, 2007;Saad et al., 2002;Densk & Back, 2015).
This work aims to elaborate probability tables for the decendial ET 0 in mm day -1 using the Gamma probability density theoretical distribution model for cumulative 10-day periods, as well as to estimate the return periods at various probability levels.

Material and Methods
A series of 38 years (1970 to 2007) of evapotranspiration data were obtained from the records of the Meteorological Station of the Federal Rural University of Semi-Arid (UFERSA) in Mossoró, RN (5°11′S and 37°20′W; 18 m altitude). According to Köppen's classification, the climate of Mossoró is hot and dry (BSwh'), with an annual average temperature of 27.5 °C and a relative humidity of 68.9% (Carmo Filho et al., 1991).
Reference evapotranspiration values (ET0) were accumulated over consecutive 10-day periods and then analyzed using the program R Version 3.6.1 (2019) to verify the cumulative ET0 fit to the Gamma probability distribution model. We used confidence levels between 1% and 95%, and fittings were made using the non-parametric tests Kolmogorov-Smirnov (KS), Wilcoxon (W 2 ), Anderson-Darling (AD), and Chi-square (χ 2 ) at 10% probability, and by the Maximum Likelihood Logarithm (V) (Catalonia et al., 2002;Assis et al., 2018).
Thus, we tested if there was a higher frequency of acceptance for the null hypothesis that the data can be represented through the Gamma probability model. An x continuous random variable with (0 < x < ∞) distributes according to Gamma probability with parameters α > 0 and β > 0 if the density function is (Botelho & Morais, 1999): The Gamma probability function has two parameters, shape α and scale β. Gamma distribution approaches the normal distribution when values α are greater than or equal to 100 (Thom, 1958). The β scale parameter indicates the degree of dispersion of data.
The estimation of parameters α and β of a gamma distribution involves complex and extensive calculations. Several methods are used in these estimates, such as the least-squares, the method of moments, and the maximum likelihood. However, all have limitations, either due to mathematical problems or to produce inefficient estimates. The least-squares method is not recommended for gamma distribution because it presents many difficulties. Maximum likelihood and moments methods are the most commonly used, and maximum likelihood methods are preferred because of their better properties (Thom, 1958).
Some ways of estimating gamma distribution parameters have been developed, contributing, together with the flexibility of shapes, to their use in various areas (Haan, 1977). However, the maximum likelihood method remains the primary method for estimating gamma parameters, which must satisfy the condition (Catalunha et al., 2002).
If F(x) is the probability of occurrence of productivity less than or equal to x, then it can be estimated through its cumulative probability distribution function: where, F(x) = probability of occurrence of a value X ≤ x, or probability of occurrence of an amount of productivity equal to or less than x; X = continuous random variable representing productivity values; Г(α) = incomplete gamma function; α = shape parameter of random variable X; β = scale parameter of random variable X (mm); e = basis of the natural logarithm (2.718...); and x = amount of productivity in kg/ha.
The chi-square goodness-of-fit test (χ 2 ) determines the compatibility of an observed data set with the expected values. For this, a null hypothesis is tested assuming a specific distribution of probabilities (e.g., normal, log-normal, and gamma), and the parameters are estimated based on the sample data. The hypothesis is tested by comparing the observed and expected frequencies in each data frequency class using the χ 2 test statistic, given by: where, K is the number of classes, F O i is the observed frequency, and F e i is the expected frequency under the null hypothesis according to the tested distribution (Campos, 1983). Specific tables describe the critical values of χ 2 for some α significance level.
The statistic has degrees of freedom (v) defined by subtracting the number of frequency distribution classes from the estimated number of parameters and the minimum expected frequency. Thus, v = k -p -1, where, k is the number of classes and p the number of estimated parameters used to obtain the expected frequencies.
The non-parametric Kolmogorov-Smirnov (KS) test bases on the difference among the accumulated, empirical, and theoretical probability functions, being applied only for continuous distributions. This fit test evaluates the null hypothesis that the data come from a population expressed by an assumed probability distribution. The KS was designed in response to the failures of the chi-square test, which results are accurate only for discrete distributions. Also, the KS test has the advantage of making no assumption, removing the arbitrary character, and loss of information of the class grouping process (Campos, 1983). The mathematical model that represents the KS test can be expressed as shown in the following equation (Tiberius & Borre, 1999): From the D value of Kolmogorov-Smirnov test and the tabulated D k value obtained from the sample size and the α significance level, it is verified whether the null hypothesis is accepted or rejected, that is, if F N (x) and F 0 (x) are the equal or different (Tiberius & Borre, 1999).
Estimated decendial potential evapotranspiration values were acquired by the standard Penman-Monteith method, used by Pereira, Angelocci, and Sentelhas (2002), which fits well with the data from the Mossoró region.
The Penman-Monteith combined method for calculating the ET 0 of a hypothetical crop, when assumed the surface resistance of 70 s m -1 and the grass-fixed aerodynamic strength of 0.12 m in uniform height, can be expressed by the equation below (Sediyama et al., 1996): where, R n is the total daily net solar radiation (MJ m -2 d -1 ), G is the soil surface heat flux (MJ m -2 d -1 ), γ = 0.063 KPa °C -1 is the psychometric constant, t is the mean air temperature (°C), V 2 is the wind speed at 2 m high (m s -1 ), e s is the saturation pressure in kPa, e is the current water vapor pressure in kPa, and S is the slope of the vapor pressure curve at air temperature in Kpa °C calculated by: In hydrology and meteorology, the inverse probability of occurrence is called the return period or recurrence interval. The recurrence interval (Tr) can be interpreted as the average number of years during which the potential evapotranspiration analyzed is expected to be equaled or exceeded (Bertoni & Tucci, 2007). Thus, if a given hydrological quantity is likely to be equal or exceeded by 5% (p = 0.05) its recurrence period will be: T = l/p = l/0.05 = 20 years. For example, if a flood is equaled or exceeded on average every 20 years, it will have a recurrence period T = 20 years. In other words, this flood is said to have a 5% probability of being equaled or exceeded in any given year. If a dam is designed to last only one year, the risk that a flood will exceed it is equal to the probability of flooding. Dams that should last several years are exposed every year to risk equal to the probability of occurrence of project flow (Guimarães, 2011). The recurrence period can be obtained by the following equation: where, P is the probability of the variable under study being equaled or exceeded, whose values were 5%, 15%, 25%, 50%, 75%, and 95%.
To analyze the minimum values of any hydrological variable, the interpretation must be changed to the occurrence of values smaller than the analyzed. That is, we must calculate the cumulative probability of the variable. The recurrence period, in this case, is the inverse of the probability of non-exceedance.

Results and Discussion
The nonparametric Kolmogorov-Smirnov (KS), Wilcoxon (W 2 ), Anderson-Darling (AD), and Chi-square (χ 2 ) tests at 10% probability resulted in high approval of the two estimated parameters of the Gamma distribution equation. Thus, this theoretical distribution can represent the variation of decendial reference evapotranspiration (Table 1). Also, the maximum likelihood logarithm criterion (V) shows a good fit of the data to the normal distribution (Tables 2, 3 , 4, 5, 6, 7, 8, 9, 10, 11, 12, and 13). Other studies have obtained similar results (Costa Neto, 2002;Abumanssur, 2006;Blain & Brunini, 2007;Arraes et al., 2009). The Normal distribution model has good parsimony due to the simplicity of its equation, the low number of parameters to be estimated, easiness of estimation, extensive use in statistical inference studies, and a good percentage of goodness-of-fit to the studied series.
Source: Historical reference evapotranspiration series in Mossoró, RN.
Source: Historical reference evapotranspiration series in Mossoró, RN.
Source: Historical reference evapotranspiration series in Mossoró, RN.
Source: Historical reference evapotranspiration series in Mossoró, RN.
Source: Historical reference evapotranspiration series in Mossoró, RN.
Source: Historical reference evapotranspiration series in Mossoró, RN.
Source: Historical reference evapotranspiration series in Mossoró, RN.
Source: Historical reference evapotranspiration series in Mossoró, RN.
Source: Historical reference evapotranspiration series in Mossoró, RN.
The results of Kolmogorov-Smirnov tests were independent of the distribution capability to estimate observed frequencies and the number of classes. The tabulated critical value depended only on the number of observations, which does not vary among types of distribution, it solely depends on the data series (Catalunha et al., 2002;Martins et al., 2010;Martins et al., 2011).
The estimation of values through Gamma distribution has the advantage of guiding the researcher in the design of irrigation systems, making statistical inference using parameter estimates, making probabilistic predictions, comparing accumulated ET 0 and return periods by constructing confidence intervals. Thus, it is possible to produce highly reliable estimates for a fixed estimation or sampling error, apply hypothesis tests, predictive regression models, measure variability or heterogeneity, estimate the degree of asymmetry and kurtosis of the responses, and evaluate risks inherent to ET 0 . Table 14 shows that in the first decade of January, the higher the probability or risk value, the lower the estimated reference evapotranspiration value. On the other hand, the data show that there is at least a 75% probability that the reference evapotranspiration value will assume 6.90 mm day -1 . In the last decade of December, it was observed that there is a maximum probability of 75% that the reference evapotranspiration value does not exceed 462.79 mm day -1 . Thus, the use of the model can safely help the management of agricultural activities such as irrigation systems scaling, crop water requirements, cost estimation, material acquisition, crop forecasting, production estimation, present and future needs (Meyer, 2012;Doorenbos & Kassam, 1994;Hann, 1994;Lanna, 2001;Costa Neto, 2002;Bussab & Morettin, 2013;. The ET 0 values multiplied by the respective crop coefficients serve parameter for scaling irrigation systems in the region of Mossoró, RN, Brazil. For example, the flowering period of maize has the maximum water demand, lasting about 20 days with a crop coefficient of 1.05. Under these conditions, adopting a probability level of 75% (4-year payback period), the recommended evapotranspiration for scaling irrigation systems in the Dourados region was 6.18 mm day -1 (5.89 × 1.05) (Doorenbos & Kassam, 1994). In Mossoró, for ten days under maximum water requirement in October, using the last decendial, and a 75% probability level (4-year payback period), the recommended ET 0 for irrigation of a maize crop is 959.80 mm dia -1 (7.81 × 1.05). Therefore, in this case, there is a 75% probability that the ET 0 value in the last ten days of October will not exceed 959.80 mm day -1 , in other words, in just one out of every four years the ET 0 value will be at a minimum of or at least 959.80 mm day -1 .
Reference evapotranspiration values estimated at 75% probability by the Gamma distribution vary among months of the year (Table 14). Irrigated agriculture uses the value of dependent precipitation, which determines the scaling evapotranspiration within a series of data at the 75% probability level (Saad & Moura, 1995;Doorenbos & Kassam, 1994). Probability levels represent the occurrence limits of values equal to or lower than those established. For example, for a cumulative period of 10 days in January, there is a 75% probability that the evapotranspiration value will not exceed 6.90 mm day -1 . In other words, only one in every four years will reach evapotranspiration value equal to or greater than 6.90 mm day -1 .
The recurrence period increases as a function of evapotranspiration values (Table 14). That is, the potential evapotranspiration values estimated by the Gamma model increase with the probability of occurrence and remain quite similar among the decendial periods analyzed in all months.  Note. Recurrence Period (T): Average time measured in years when a given atmospheric, meteorological, or hydrological event must be equaled or exceeded at least once.

Conclusions
The Gamma probability distribution had an excellent fit for the 38-year historical series of reference evapotranspiration for consecutive 10-day periods. This result allows the model to be used in point estimates of the amount of reference evapotranspiration with different levels of probability, to estimate risk probabilities, to simulate this climate variable, to predict return periods, to make comparisons of similar atmospheric or agricultural phenomena, to simulate models of comparable events, and get the inverse of the distribution function for different levels of probability.
Therefore, the Gamma probability distribution was the one that best described the ET 0 , which could model the behavior of this atmospheric variable and aid in the design of irrigation systems in the region. The maximum daily ET 0 for irrigation projects in Mossoró is 10 mm. Already the ET 0 accumulated in periods of 10 days should be, on average, 80 mm.