The Parameters Optimization of Filtered Derivative for Change Points Analysis

Let X = (X1, X2, . . . , XN) be a time series. That is a sequence random variable indexed by the time t = (1, 2, . . . ,N), we suppose that the parameters of X are piecewise constant. In other words, it exists a subdivision τ = (τ1 < τ2 < . . . < τK) such that Xi is a family of independent and identically distributed (i.i.d) random variables for i ∈ (τk, τk+1], and k = 0, 1, . . . ,K where by convention τo = 0 and τK+1 = N. The preceding works such that (Bertrand, 2000) control the probability of false alarms for minimizing the probability of type I error of change point analysis. The novelty in this work is to control the number of false alarms. We give an bound of number of false alarms and the necessary condition for number of no detection. In other hand, we know the filtered derivative (Basseville & Nikirov, 1993) depends the parameters such that the threshold and the window, we give in order to choose the optimal parameters. We compare the results of Filtered Derivative optimized parameters and the Penalized Square Error methods in particulary the adaptive method of (Lavielle & Teyssière, 2006).


Introduction
The problem of change detection is much studied in the literature, it exists two types of change points detection: The on-line detection or sequential points analysis and the off-line detection or change points analysis.For an updated overview, we can see the textbooks (Basseville & Nikirov, 1993;Csorgo & Horváth, 1997), or (Huskovà & Meintanis, 2006b;Gombay & Serban, 2009 ).Many applications use the change points analysis such as health, medicine and civil engineering and the sequential analysis such as fault detection, finance, surveillance and security system.Many methods exists but we often use: The penalized least square error (PLS) (Lavielle & Moulines, 2000) and the filtered derivative (FD) (Basseville & Nikirov, 1993).The calculus of PLS need a matrix of size O(n 2 ), and that FD is of order O(n).To improve the FD-method, two methods are developed: The filtered derivative with pvalue (FDpV) (Bertrand, Fhima, & Guillin, 2011) and the filtered derivative and false discovery rate (FDqV) (Elmi, 2014).For the PLS-method, many authors are proposed (Lavielle & Moulines, 2000;Lavielle & Teyssière, 2006) the choices of penalized parameter for performing the PLS-method.For FD-method, there were no papers which mentioned this.Recall, the algorithm FD depends the window and the threshold and by consequent his performance depends the optimization of these parameters.In this work, we give the reasonable choices these parameters.We organised our paper in this way: section 1 is the introduction, section 2 describes the art of change points detection and the criteria of measure.Section 3 recall the methods of change point analysis.In the section 4, we discuss how to control the number of false alarms and numbers of no-detections.The section 5 contains numerical comparison of FD and PLS adaptive methods.Finally the appendix contains all proofs, propositions and lemmas used in this work.

The Art of Change Points Detection
The following subsection describes the problem of abrupt changes and different criteria used in the literature.

Problem of Change Points Detection: The Model
• X = (X 1 , X 2 , . . ., X N ) is a family of independent random variables indexed by the time.
• Let us define the minimal distance between to consecutive change points by L 0 = inf{|τ k+1 − τ k | k = 0, . . ., K}, and • The minimal absolute value of the shifts by Let us also recall the definition of the cumulative distribution function for standard Gaussian law All this paper, we use a following simulation:

Criterion
For measuring the quality of the estimation of parameters, we use the integrated square error (ISE).So we define the ISE as: with the signal and the estimated signal As, a result for one replication is not significant, we make M = 1, 000 replications and thus we use the mean integrated square error (MISE).

PLS-Method
For PLS-method, we have to search the instants of change points which minimize the contrast function defined as Two cases are studied: • The case where K is know (Bai & Perron, 1998) use the dynamical program method for estimating the instants of ruptures and the mean values corresponding.
• In the case where K is unknown, many authors (such that Lavielle & Moulines, 2000;Lavielle & Teyssière, 2006;Birgé & Massart, 2007) are proposed different values of penalized parameter for the performance of this method.
In Lavielle and Moulines (2000), the proposed choice of penalized parameter is: The inconvenient is to over-estimate the number of change points.
In Birgé and Massart (2007), we have We can only apply the times series with constant variance.Lavielle and Teyssière (2006) give an adaptive-method for estimating the number in following manner with the contrast function and e K a sequence independent of random variable following the gaussian law standard.
− We evaluate the probability that Q(K) follows this model.
− The estimated number of change points will be the highest value of K such the p-value corresponding is the smaller value of a given threshold.
For an illustration, we draw the following figure.x 10 Blue with red crosses: The contrast function Q( τ K ); green: The penalized contrasted pen(K)

FDpV-Method
The FDpV-method has two steps: 1) The step 1 is based for detection of potential changes points.For this we use the Filtered Derivative (Basseville & Nikirov, 1993) define as it follows: where is the classical empirical mean.
In this case Without noise, the function j → FD( j, A) presents a hat centered at τ = j that is the top of the hat corresponding at the right change point.The hight of the hat is exactly the size of the change on the mean, and the spread is 2A.When the signal is random, the true μ at the right (resp left) on (τ − A, τ + A) is replace by μ on (τ − A, τ + A).The estimate mean μ is fluctuating around μ.In order to reduce the noise due to the sampling fluctuation, we filters the signal by replacing the true value mean at the right and the left at the point j by its estimated at the right and the left at point j on (τ − A, τ + A), and we take the difference of these two quantities.
For detection change points, (Basseville & Nikirov, 1993;Benveniste & Basseville, 1984)  Remark 1 A natural question is coming on : Does it exist in order to choose the optimal parameters of filtered derivative A and C 1 ?The goal of this work is to give a response of this question.
2) Recently, Bertrand, Fhima, and Guillin (2011) remark it exist false alarms at the end of the step 1, so for keeping only as possible the right change points, they have had an idea to add a second step.For this, they have compared pairwisely the means estimated In other words they have done a test hypothesis where: In the sequel, they have calculated the p-value corresponding to each potential change points and they have chosen an critical p-value p * for keeping only the p-value lesser than the critical p-value.
Remark 2 In (Bertrand, Fhima, & Guillin, 2011), the critical p-value chosen is arbitrary (p * = 10 −6 ), so we can say that the problem of optimal p-value is open and we will try to do in future work.

FDqV-Method
In (Elmi, 2014), we have proposed a method for change points detection, this method use also the filtered derivative as step 1, but we have added a step 2, which allow us to detect as possible all change points right.The difference between the FDpV and FDqV is: The first use a single hypothesis for keeping all change points right and the second use a multiple test.The power of FDqV is established in (Elmi, 2014).

The Choice of Parameters for Filtered Derivative Method
All change point method depends on extra-parameters, which have to be well chosen.The PLS method depends only on the penalization parameter β, different choices are possible see Section 3 above.The filtered derivative method depends on the parameters, namely the window size A and the threshold C 1 .Both FDpV and FDqV method use filtered derivative as Step 1, so they depends on the same extra-parameters A and C 1 .Moreover, FDpV and FDqV method add a Step 2, which depends on another extra-parameter, that is the critical p-value p * or the q-value q * .In Subsection, we discuss the different criterium.In Subsection, we give the bound of the type II error.

Choice the Extra-Parameters of FD
As pointed out in Subsection, the quality of a change point method can be evaluated by two criteria: i) the absolute value of the number of estimated change point minus the right number of change points | K − K|; ii) ISE or MISE.Both criterions lead to prefer detecting more potential change point than missing at least one.Indeed, the no detection of one change point could greatly impact the mean values μ k 's and after the ISE, but also the p-value p k 's.Stress that this phenomenon does not more exist when we restrict ourselves to FD method with the number of change as criterion.
So, the type II error or probability of no detection (PND) should be controlled at a level close to zero.However, the previous remark address to detect the right change point at the right times.As pointed out in (Bertrand, 2000), when there is more no detection, we have: For each right change point τ k , we define the local PND as Then with these notations, we can write the global PND in this manner On the other hand, we define the probability of false alarm or probability of type I error as following: Where τ(A, C 1 ) is the first hitting time of C 1 and However, the type I error is the probability of at least one false alarm and thus appears as a rough criterion see (Bertrand, 2000).

The Type I and II Errors at Step 1 (Filtered Derivative)
In the following proposition, we give an upper bound for PND global .
Proof.Following (Bertrand, 2000, p. 222, Prop. 3.2), we have for each change point τ k , Next, by remarking that the right side of ( 7) is a decreasing function of δ k and setting δ = inf k=1,...,K δ k , we can deduce that By consequent, we obviously obtain which combined with 8) gives us the bound (6).This finishes the proof of Proposition 1.
K is unknown, but is not variable.Thus, we will monitor the quantities β * (C 1 , A), for instance we choose to set β * (C 1 , A) = 10 −4 and we can write ln Thus after the variables change we have f (x, y) = ln(10 −4 ) with x = C 1 δ and y = δ σ A 2 and f (x, y) = ln Ψ((1 − x) × y) + 2 ln Φ((x − 1/3) × y).This equation can be numerically solved, and we find couples solution of this equation.Since the map C 1 −→ β * (C 1 , A) is decreasing, and we find an implicit function A → C 1 (A).After having controlled the PND, we can control the PFA.We know (Bertrand, 2000, p. 221, Prop. 3.1) that for all ε > 0 there exists a constant M ε such that For instance, we can set ε = 0.1, next we plug the implicit relationship between A and C 1 inside ( 9) and we obtain a function A −→ α * C 1 (A), A .The first idea is to make varying the parameter A in order to find the optimal value corresponding to a minimum of the map A −→ α * C 1 (A), A .Unfortunately, the map A −→ α * C 1 (A), A is decreasing and reaches no minimum value.

Necessary Condition of No-Detection
In this subsection, we draw a three figures for choosing a "good" window A. According to the figure below, we can choose A with the following condition: With this drawings, we can say that A must verify A < L 0 /2, because in the first, we detect all change points and others, we only detect two change points.

Control of Number of False Alarms
In this subsection, we want to control the number of false alarms (NFA) and not only the PFA (probability of false alarms).First, we can remark that the number of false alarm is always greater than the corresponding one when there is no change.Indeed, let us denote by K the number of change points select in step 1 (FD), then the number of false alarms is ( K − K).By using (Bertrand, 2000), we have Let us point that when there is no change, then FD(A, t) = Γ(A, t) for all t, this implies that ( K − K) ≤ K 0 , where K 0 denotes the number of change points detected by FD when there is no change.For example, using the simulation the subsection (1.2), we can see that K 0 = 3 (see drawings below and count the number τ * ).Theorem 1 Assume there is no change, then i) For all integer L ≤ N where Proof.See Appendix.

The Choice of Parameter A
As stress above, the question of parameters which depends the FD method is important for its algorithm.In this work, we give the criteria for the choice of reasonable parameters A and C 1 .In the preceding section, we have established that for detecting of all right change points we must have 2 × A < L 0 with L 0 := inf{|τ k+1 − τ k |, k = 1, . . ., K}, see also Figure 5.

The Choice of C 1
In (Bertrand, 2000), we have C 1 < δ o with δ 0 = inf{|δ k |, k = 1, . . ., K} where δ k are the size of the average.An other hand in the Theorem 1, we have obtained a bound of number of false detection its average using the function ϕ.For N, L fixed and A supposed verified the condition above, we can choose C 1 optimal.We remark that if C 1 is increasing, the function Ψ is decreasing and consequently the average of number of false alarms is decreasing, so it should to choose C 1 the greatest possible and C 1 must verify the condition was given by Bertrand (2000).

Simulation
We use the simulation of the subsection (1.2) and we make various drawings with different values C 1 and A.

Comment
We notice through these drawings above that the number of false alarms and the number of no detections vary according to parameters A and C 1 .Thus, a choice of A and C minimizing these two points (NND and NFA) is imperative.
It is what we are going to do after.

Numerical Estimation of NND and NFA
In this part, we want to have an estimation of NND and NFA.For this we make the calculation for Filtered Derivative method with different A = 30 to 500 and C 1 = 0.1 to 1 and we choose Kmax = 20 (we suppose that the maximum number change points is equal 20).An each couple (A, C 1 ), we make 1000 simulations for to have an number exact of no detection of change points and number of false alarms.Then, we can deduce the NND and NFA for each couple and we sum up the result in the followings arrays:

Monte Carlo Simulation
For comparing the both-methods (The filtered derivative with parameters optimized and adaptive method of (Lavielle & Teyssière, 2006), we choose the simulation of the subsection (1.2).For FD-method, the optimal parameters chosen are A opt = 250 and C 1,opt = 0.25.
The criteria of comparison are the number of false alarms, the number of no-detection, and the mean square error.Firstly, for one replication, we obtain: • For adaptive method, NFA = 1, NND = 4, ISE = 28670 (see figure below).For M = 1, 000 replications, we obtain: • For FD-method, we obtain MISE = 3094.07,the number of false alarms NFA = 2.429, the number of no detection NND = 0.01.

Numerical Conclusion
It is clearly that the FD-method with parameters optimized is better the PLS-method adaptive (Lavielle & Teyssière, 2006) for the criteria mean integrate square error.In other hand, the FD-method with parameters optimized has less no detection of points that the PLS-method adaptive but the firstly has many false alarms that the secondly.Stress that, for in forthcoming work, we will add in step 2 for FD-method with parameter optimized for having the number of false alarms at a level close to zero.In other words, we will optimize the FDqV (Elmi, 2014) for the q-value corresponds the false discovery rate.

Conclusion
In this work, we gave the reasonable parameters of filtered derivative method.We obtained these parameters by doing the simulations but if we consider the Theorem 1 and fix L, N and choose A in order (10) we can calculate C 1 theoretically.To do directly a theoretical calculus of A and C 1 is very difficult and not solution at this moment.
In other hand, we can say that is better then to monitor the number of false alarms and number of no-detections that to control the probability of false alarms and the probability of no-detection as done in the preceding works.
A natural sequel will have to make the same for FDqV-method for keeping as possible the right number of change points.It will be interesting to search in manner to adapt these results for the times series weakly and strongly dependent.

Figure 1 .
Figure 1.The observed signal and the right signal

Figure 4 .
Figure 4. Signal reconstruction after step 2 by FDqV method Now, we start the main part of this article.

Figure 5 .
Figure 5.The graphic corresponding at the type I error, y = A 2 and z = α * C 1 (A), A

Figure 7 .
Figure 7. First drawing: The signal observed: blue, The right signal: red.Second drawing: The signal reconstruction: green

Figure 8 .
Figure 8.The Filtered Derivative with different parameters A and C 1

Figure 9 .
Figure 9.The adaptive method

Figure 10 .
Figure 10.The filtered derivative with parameters optimized

Table 1 .
Table of no detections of change points