Comparative Study of the Non-Homogeneous Poisson Process Type-I Generalized Half-Logistic Distribution

In this paper, Non-Homogeneous Poisson Process (NHPP) model are created based on Type-I Generalized Half-Logistic Distribution (GHLDI). Three methods for estimating the parameters of the NHPP GHLDI model are considered in the case of failure-occurrence time data, for this purpose the necessary likelihood equations are obtained. Confidence intervals are studied, the upper and the lower bounds of the parameters are constructed. An application based on the NHPP GHLD-I and using four published data sets are conducted. The performance of NHPP GHLD-I model is checked based on three evaluation criteria and useful results are obtained.


Introduction
Nowadays, technological achievement needs effective and high accuracy hardware and software in order to make dramatic improvements and reach the expected goals. Statistical reliability modelling is an approach that has been intensively used in quantifying the software system reliability, which describe software failures behavior based on different basic assumptions. The main issue of the traditional reliability models approach which are applied in the testing phase of software development cycle is: to find theoretical distribution that able to well fit the failure time data, to assess the future behavior of time between software failure, to predict software system reliability, and to determine when the software product becomes mature and ready to be released to the user [see: Lai and Garg (2012) and Barraza (2010)].
Several reliability models based on Non-Homogeneous Poisson Processes (NHPP) have been suggested during the past years, they are extensively and successfully used to describe the software failure process [see: Goel and Okumoto (1979), Yamada et al. (1983), Zhang et al. (2003), and Teng and Pham (2004)]. In this paper, a NHPP software reliability model is created, which optimistically will give a good representation of the uncertainty of the software system in the field of software reliability modeling belonging to the NHPP. Our suggested NHPP model is configured using Type-I Generalized Half-Logistic Distribution (GHLD-I) that proposed by [Kantam et al. (2013)] based on failure-occurrence time data. Our proposed model is expected to be flexible, able to well describe the growth phenomena, and useful for modeling life data, it offers several sub-models by changing the shape parameter, so the best fit model could be found easier and faster.
The parameters of the NHPP GHLD-I Model are estimated using Maximum Likelihood (ML), Non-Linear Least Squares (NLS), Weighted Non-Linear Least Squares (WNLS) estimation methods and the confidence intervals are constructed.
The Cumulative Distribution Function (CDF) of the GHLD-I with scale parameter σ and shape parameter θ is given by: Using Equation (1), the Probability Density Function (pdf) can be obtained as follows: Also, the reliability function can be obtained from Equation (1) as follows: While the hazard function can be found using Equations (2) and (3) as follows: .
The rest of this paper is arranged as follows: Section 2 displays the formulation of the NHPP GHLD-I model and describes the ML, NLS, and WNLS estimation approaches for this model. Section 3 illustrates the evaluation criteria that will be used in our evaluation study. Real data application will be shown in Sections 4. In the end, Section 5 is the conclusion of this paper.

NHPP GHLD-I Model
The section illustrates the suggested model formulation and the computation of the necessary mathematical equations for point and interval estimation.

Model Formulation and Characteristics
In this section, the NHPP GHLD-I model will be constructed by following [Lyu (2002)]: where ( ), ( ) are respectively the CDF and PDF of the time to failure of an individual failure, . From this, if we consider also distributions that belong to the finite failure type, i.e., lim →∞ ( ) we have that lim →∞ ( ) since lim →∞ ( ) . Thus represents the eventual number of failures observed in the system if it could have been detected over an infinite amount of time. Then by using Equations (1), (2), (5) and (6) the mean value and failure intensity functions of the NHPP GHLD-I model can be obtained respectively as follows: where the parameter is interpreted as the number of initial faults in the software, σ is the scale parameter and θ is shape parameter of the NHPP GHLD-I model. By using Equation (7) the number of remaining errors of this model can be written as follows: By using Equations (8) and (9), the error detection rate can be obtained as: We can obtain the mean time between failures (MTBF) of our suggested model using Equation (8) as: According to Equation (7) the conditional reliability function is:

Maximum Likelihood Estimation (MLE) Method
Parameter estimation is of primary importance in software reliability prediction. The ML estimation method is the most important traditional and widely used estimation technique. This technique has several properties including consistency, efficiency and asymptotic normality.
Suppose we have "n" time instants at which the first, second, third..., ℎ failures of a software are experienced. In other words, if is the total time to the ℎ failure, is an observation of random variable and "n" such failures are successively recorded. The joint probability of such failure time realizations , , 3 , … , is: The function given in Equation (13) is also called the likelihood function of the given failure data. Values of the parameters of NHPP models that would maximize L are called maximum likelihood estimators and the method is called maximum likelihood (ML) estimation method ].
For the purpose of estimating the unknown three parameters , and of the NHPP GHLM-I model using ML estimation method and based on the data on failure occurrence time k (k , 2, … , n; ≤ ≤ ≤ ⋯ ≤ ), we substitute Equations (7) and (8) in Equation (13) so we obtain the likelihood function as follows: The log of Equation (14) gives the log likelihood function of our proposed model as follows: lo ( ) n lo n lo 2 n lo θ ∑ (θ ) ∑ lo . / n lo σ- In order to estimate the parameters , and , the derivatives of Equation (15) with respect to , and will be obtained as follows: By equating the previous equations to zero, the equation becomes: .
The value of the parameter a can be obtained using the first expression of Equation (17) after getting the estimate of the parameters and . Since the second and third expression of Equation (17) are nonlinear, we can not find an analytic solution and must be obtained numerically, to facilitate this R programing language are used.

Non-Linear Least Square Estimation (NLSE) Method
The least squared sum in the Least Square Estimation (LSE) method is defined by: is the parameters of the NHPP model, and m( ) is its mean value function. The resulting estimates of which is obtained by minimizing ( ) is called the OLS estimates and can be calculated by using any non-linear regression technique. Usually, Gauss-Newton method or Lenvenberg-Marquardt algorithm is used to solve the minimization problem arg min Θ ( ). More specifically, we consider formally the above optimization problem from the viewpoint of regression analysis.
For the time epochs *τ : i ,2, … , N( )+, which are the i.i.d. random variables with realizations * : i ,2, … , n+, N(t) denote the cumulative number of the faults detected by time t. it is easily shown from the time-scale transform of the NHPP that τ * m(τ ) is a Homogeneous Poisson Process (HPP) with rate 1 and that When one sees the relationship between the random variable m(τ ) and its realization m( ) , it may be straightforward to consider the following regression model: Where * : i ,2, … + are the error terms. In a fashion similar to the usual regression analysis, if the errror terms are the i.i.d. random variables with mean , the OLS is formulated to minimize: Unfortunately, it is worth mentioning in the NHPP that the error terms in Equation (20), * : i ,2, … +, are neither independent nor identically distributed. This fact tells us that the OLS estimation may be irrelevant to the common linear regression analysis. It is evident that the resulting OLS estimates in the LSE can not be representative [Ishii et al (2012)] and the Non-Linear Least Square Estimation (NLSE) is needed.
We can calculate the ordinary least squares estimates of the unknown three parameters , θ and σ of the NHPP GHLM-I model by minimizing the least squared sum, whereas by using Equation (21), the least squared sum of our model is:

Weighted Least Square Estimation (WNLSE) Method
As notable, some of estimates based on the NLSE estimation are less precise than others in the sense that their variances are relatively larger. Since the variances of NLSE estimates are not equal, it is often necessary to adjust the NLSE estimation in such way that the mean value functions are weighted, i.e.
where (i ,2, … , n) are positive weights to satisfy ∑ n. The resulting estimates as solutions of argmin N ( ) are called the weighted non-linear leas squares (WNLS) estimates, where the NLS estimate is a special case of WNLS estimates when the weights are all the same, i.e., for all i. By taking account of the fact that the variances of the random variable m(τ ) in Equation (20) is unequal with respect to i, the error terms * : i ,2, … , n+ should be normalized as the random variables with variance 1.
Weighting function choice is an important concern when using the WNLSE method. Several ways of weighting techniques can be considered when using this method of estimation [Sun et al. (2102)]. From the analogy to the linear regression analysis, Massery et al. [23] assume that the error terms are weighted by that are inversely proportional to their variances, i.e., Which is due to the common property that the variance of NHPP equals its mean value. It should be noted that the error terms are not still i.i.d. random variables because our NHPP-based reliability model do not have the linear intensity Massery et al. [23]. Hence, the weighted in Equation (24) is meaningful only when the error terms can be approximately i.i.d. normal random variables [Ishii et al. (2012)].
The estimates of the unknown parameters , θ and σ of the NHPP GHLM-I model using the WNLSE method can be obtained by minimizing: For our application three weighting functions are considered: the one suggested by Massery et al. [23]: And the following two empirical weight functions are formulated as follows: √ n ( + ∑ ( + ⁄ (28)

Interval Estimation Method
The estimation of the model parameters is normally different according to different estimation methods. Furthermore, an accurate parameters estimation requires many failure data which might not be available. So, we can use interval estimation of software reliability models' parameters to solve these problems.
A confidence interval is an interval of numbers containing the most acceptable values for our distribution parameters. Let symbolize the parameters of NHPP model, in order to obtain the confidence limits for parameters of NHPP models we calculate the Fisher information matrix. The Fisher information is a way of measuring the amount of information that an observable random variable carries about unknown parameters of a NHPP models that models . The inverse of the Fisher information matrix gives the asymptotic variance and covariance of the estimates of the parameters of NHPP models. The two sided approximate % confidence limits for the parameters of a NHPP models are: where is the parameters of the NHPP model and ̂ is the ML, NLS or WNLS estimators of these parameters. is the ( ) quartile of the standard normal distribution [Hong et al. (1997)].
In this section, we discuss the interval estimation of the parameters of the NHPP GHLD-I model of ML, NLS and WNLS estimators. In order to obtain the confidence limits for parameter , and , we find the Fisher information matrix to obtain the asymptotic variance and covariance of the ML, NLS and WNLS estimates of the parameters. For obtaining confidence intervals of ML estimator, we define Fisher information matrix as the matrix of negative second partial derivatives of the log likelihood function that shown in Equation (15).
such that: Aimed at obtaining confidence intervals for the NLS estimators, we define the Fisher information matrix as the matrix of negative second partial derivatives of least squared sum function that shown in Equation (22): such that:  (30) and (31), along with ( ), ( ) and W ( ) matrices, the asymptotic α% two-sided confidence limits for the parameters , θ and σ of the NHPP GHLD-I model are respectively given by: ( ̂) , ( ̂) and ( ̂) are respectively the diagonal elements of the asymptotic variance and covariance of the estimates of the parameters , and of the NHPP GHLD-I model.

Evaluation Criteria
The mean of square errors (MSE), the sum of square errors (SSE), and the variance criteria are used for the evaluation purpose in our application. These criteria illustrate the variation between the actual and predicted values. The lower the criteria value, the better model performance. The formulas of those criteria are shown in Table ( (2009) k: The number of model parameters.

Real Data Application
This section displays the real data sets that used in our application and the results obtained through estimating the unknown parameters , and of the NHPP GHLM-I model by the three above-mentioned estimation methods as well as the obtained confidence intervals. Also, the results of the evaluation criteria are presented and discussed in this section.

Real Data Sets
In this section, four real data sets with sample sizes of (n = 34, 30, 136 and 41) are used to study the performance of the NHPP GHLM-I model.

Application Results
The results of the estimation process of initial faults , scale parameter σ and shape parameter θ of the NHPP GHLM-I model for the ML, NLS and WNLS estimation methods along with their SSE, MSE and variance using the four data sets are reported in Tables [(6)- (8)]. While the corresponding confidence intervals of the model parameters are displayed in Table 9. Tables [(10)-(13)] present the prediction results for the last 16 failures based on the three estimation methods for the four real data sets.

Discussion of Results
The purpose of our application is to evaluate the performance of the NHPP GHLM-I model through three evaluation criteria and based on different real data sets. Three methods of estimation are used to estimate the initial faults , scale parameter σ and shape parameter θ of the NHPP GHLM-I model. The 95% confidence intervals around all effects for the four selected real data sets are constructed. By comparing the evaluation criteria values of the four real data sets using the ML, NLS and WNLS estimation methods, the following observations can be made:  The NHPP GHLM-I model shows good predictive ability according to our selected three criteria and using the four selected real data sets, but the model fits the fourth real data set with size 41 more than the rest of the studied real data sets since all the evaluation criteria have the lowest values for the fourth real data set compared to the other three.
 The ML estimation method shows the worst performance among the three studied estimation methods; however, its evaluation criteria values are lower for the fourth data set than the rest of the studied data sets.
 According to our studied data sets, the NHPP GHLD-I model shows the best performance when using the NLS estimation method with all the four studied real data sets. It has the lowest values for all the evaluation criteria among the others selected estimation methods.
 The evaluation criteria values of the WNLS estimation method at weighting function ( (3) ) are approximately the same as the evaluation criteria values of the NLS estimation method. In addition, for three comparison cases it has lower criteria values than the NLS estimation method this indicates that the use of this weighting function gives better prediction results than the other considered weighting functions.
 According to confidence intervals of the model parameters the following points are concluded: For DS1, the estimator ̂ has the shortest expected length when using the WNLS estimation method at weighting function ( (3) ) while the shortest expected length for ̂ and ̂ estimators is obtained by using the WNLS estimation method at weighting function ( ( ) ). For DS2, the estimator ̂ has the shortest expected length when using the NLS estimation method while the shortest expected length for ̂ and ̂ estimators is obtained by using the WNLS estimation method at weighting function ( ( ) ). For DS3 and DS4, all parameters estimators have the shortest expected length when using the WNLS estimation method at weighting function ( ( ) ). Based on the interval estimation the WNLS estimation method gives the best performance while the ML estimation method gives the worst performance. The first-and second-best weighting function are respectively ( ) and (3) .

Conclusion
In this paper, a software reliability growth model that belong to the NHPP type of modeling and based on GHLD-I distribution have been constructed. The estimation process of the unknown parameters of the proposed NHPP GHLD-I model have been conducted by using the ML, NLS and WNLS estimation methods. In addition, confidence intervals of the model parameters which is important to the software reliability evaluation have been obtained. An application based on the NHPP GHLD-I model and using four real data sets have been conducted to measure the performance of our proposed model based on three evaluation criteria. Our numerical study illustrates the flexibility of the NHPP GHLD-I model and presents its positive contribution to the field of software reliability modeling.