Analysis of Change Points With Bayes Factor, Thresholds, and CUSUM

In this work, an analysis of change points is made with the Bayes factor, thresholds, and cumulative sum (CUSUM) statistics methods. For the analysis of change points with the Bayes factor, Poisson data were simulated; the threshold method was worked with a regression and data of the National Institute of Statistics, Geography and Informatics (INEGI) of Mexico and coronavirus were used for the CUSUM.


Introduction
When a change point is mentioned, the first question that comes to mind is: what is a change point? Chen and Gupta (2012) defined it as the site, or point in time t, in a succession of data {x t i } i = 1, . . . , n observed and ordered with respect to time, in such a way that these observations follow a distribution F 1 , before a point, and in another point after it, the distribution is F 2 . That is, from the statistical point of view, the succession of observations shows an inhomogeneous behaviour.
Under the classical and Bayesian approaches, the change point problem is considered one of the central problems of statistical inference, since it interrelates the statistical control theory, the hypothesis tests (when detecting if there is any change in the sequence of observed random variables), and the estimation theory (when estimating the number of changes and their corresponding locations). This under the classical and Bayesian approaches.
The change point problems originally appear in the quality control and generally can be found in the mathematical modeling of various disciplines such as Environmental Science, Epidemiology, Seismic Signal Processes, Economics, Finance, Geology, Medicine, Biology, Physics, etc. According to Chen and Gupta (2012), the change point problem is, in general, visualized as follows: Let X 1 , X 2 , . . . , X n be a succession of independent random vectors (or variables) with probability distribution functions F 1 , F 2 , . . . , F n , respectively. The change point problem is to test the null hypothesis, so the problem of the point of change consists of testing the null hypothesis H 0 about the non-existence of change against the alternative, H a that there is at least one change point: H 0 : F 1 = F 2 = . . . = F n vs H a : F 1 = · · · = F (k 1 ) F (k 1+1 ) = · · · = F (k 2 ) F (k 2+1 ) = · · · = F (k q ) F (k q+1 ) = · · · = F n .
where q and k 1 , k 2 , . . . , k q must be estimated. Together, these hypotheses reveal the inference aspects of the change points in order to determine if there is any change point in the process, as well as to estimate their number and their respective positions.
The objective of this work is to develop algorithms and programs to apply some procedures to detect change points, particularly the Bayes factor, the threshold and the cumulative sum statistics (CUSUM) methods.
In this work, some procedures were programmed in order to obtain change points, for this the Bayes factor was used to detect temporal and spatiotemporal change points. The homogeneous Poisson process was also used since data was simulated with this process. Multiple temporal-change points and spatiotemporal change points were detected with the Bayes factor. Changes were also detected with the threshold method and with the cumulative sum (CUSUM) method. Finally, the Buishand range, the Pettitt and the standard normal homogeneity tests were applied, which confirmed the change points detected by the cumulative sum method. For the application of the Bayes factor and the threshold methods, the concepts presented in Altieri (2015) were taken as the basis, Taylor (2000)'s article was used for the cumulative sum method.
In this work, the programs were elaborated with some instructions from the INLA package in Gómez (2020), Blangiardo and Cameletti (2015), except for the standard normal homogeneity, Pettit and Buishand ranges tests, for which the Trend package (from R) of Pohlert (2020) was used.
Simulated data from a Poisson process as well as cement database, INEGI, and coronavirus data were used. The INEGI and cement data were taken from an example of regression in Gómez (2020), and coronavirus data was taken from the page coronavirus.gob.mx.

Bayes Factor
To detect the change points, the Bayes factor was used. The definition of the Bayes factor will be given below. The change points are determined according to the values in Table 1.
If there is a problem in which you must choose between two possible models, based on an observed data set D, the plausibility of the two different models M 1 and M 2 , parameterized by parameter vectors θ 1 and θ 2 , can be measured using the Bayes factor, which is defined as: where P(D|M 1 ) is called the marginal likelihood or integrated likelihood. This is similar to what is done in the likelihood ratio tests but now, instead of maximizing the likelihood, the Bayes factor performs a weighted average on the distribution of the parameters.
A value of B > 1 means that the data supports M 1 more than M 2 .
In the case of the Bayes factor, Jeffreys established an interpretation scale of B, this is shown in Table 1. Another way to consider the Bayes factor is as follows: Suppose two hypotheses H 0 and H 1 such that the a priori probabilities are f 0 = P(H 0 ) and f 1 = P(H 1 ).
After observing a sample, the a posteriori probabilities of both hypotheses are α 0 = P(H 0 |x) y α 1 = P(H 1 |x). The Bayes factor in favor of H 0 is defined as: Thus, the Bayes factor represents the a posteriori plausibility divided by the a priori plausibility and reports the changes in our beliefs introduced by data. This has the property of being almost objective and partially eliminates the influence of the a priori distribution.
As an example suppose the simple contrast: We have that the a posteriori distributions are: So, the Bayes factor is: which coincides with the likelihood ratio, so that the a priori distribution would not influence the Bayes factor, in this case.
Thus, the Bayes factor for the change point, when it is divided into two segments, is given by the likelihood ratio: where Q 1 is the likelihood of segment 1 and Q 2 is the likelihood of segment 2, under the alternative hypothesis; and L 1 is the likelihood under the null hypothesis.

Homogeneous Poisson Process
To find the change points, first is simulated a homogeneous Poisson process and after of obtained data is worked on. The homogeneous Poisson process is defined as follows: Definition 1: A collection of random variables {N(t) : t ≥ 0}, defined in a probability space (ω, F, P) is said to be a Poisson process (homogeneous) with intensity λ > 0 if it satisfies the following properties: iii) For all 0 ≤ t 1 < . . . < t n , n ≥ 1 (that is, to say for all finite set of times), the random variables N(t n )−N(t n−1 ), . . . , N(t 2 )− N(t 1 ), N(t 1 ) − N(0), N(0), are independent. This is known as the property of independent increments.

Bisection Method
The bisection method was used to determine the change points with the Bayes factor. This method consists of dividing the data in half and looking for a change point with the Bayes factor as well as with the likelihoods corresponding to the null and alternative hypotheses. Then we go to the left side of the data and it is also divided in half; subsequently, the right side is divided, and so on. We continue going successively to the left and to the right, dividing each side in half and looking for change points with the Bayes factor.

Threshold Method
The following is an example of change points determined by the threshold method, explained in Altieri (2015), which consists of associating a posteriori probability to the data and defining a threshold, in such a way that, if the probability is less than that threshold, it is said that there is a change point. An example of how to associate probabilities is given below, using de following multiple linear regression: For this example, we took the cement database in G´mez (2020). In the model, Y i represents the evolved heat of the observation i and X j,i is the proportion of the component j in the observation i. The parameter β 0 is an intercept and β j , j = 1, . . . , 4 are coefficients associated with the covariates. Finally, i , i = 1, 2, . . . , n is an error term with a Gaussian distribution having zero mean and precision τ.

Cement Data
The cement database used is shown in Table 2.

Cumulative Sum Method
The Cumulative Sum method consists of the following: 1.-The average is calculated x 1 +x 2 +...+x 32 32 .
3. Other cumulative sums of the form S i = S i−1 + (X i −X) are calculated for i = 2, . . . , 32 Bootstrapping is also performed, but an estimator of the magnitude of the change is required before that. An option that works well regardless of distribution and despite multiple changes is S di f f , which is the maximum difference of S i and the minimum of S i , as can be seen below: The magnitude of the change is S di f f . A positive value of Sdiff indicates that there was a change from low to high, which means that traffic accidents changed. The latter is the topic of the problem addressed. Next, a bootstrap analysis of a single routine is performed. The procedure is the next: S o di f f is less than S di f f . Let N be the number of bootstrap samples performed and let X be the bootstrap for which S o di f f < S di f f . Then the confidence of the change occurred is calculated as a percentage as follows: Confidence level=100 X N %.

Traffic Accident Data
The cumulative sum method was applied to traffic accident data in 32 cities, which are the states of the Mexican Republic. The data, which correspond to the number of traffic accidents, were obtained from INEGI page and are presented in Table  3: Coronavirus data corresponding to May 2020, obtained from the coronavirus.gob.mx page, were used to detect the change points, applying the cumulative sum method. Three groups of data were formed, corresponding to eight, seven, and five days, respectively. In each database, it was detected where the change from minor to major occurred, that is, where a higher number of contagion cases occurred in the country. In the database they are divided into infections of men and women; however, for the analysis of changes, the total contagion cases (men plus women) were considered. Table 4 shows the data used.

Analysis of Multiple Change Points With the Bayes Factor
In order to detect, analyze, and compare points of change, a program was developed to detect 5 points of change. A set of 60 data with 6 different values for the parameter λ were used, which were obtained from a Poisson simulation process. This exercise is intended to detect the changes generated for the different values of λ. In addition, a comparison was made with the results obtained when applying an a priori uniform, a log-gamma, and a Gaussian.
A program was generated in R to detect multiple points of change in time. In total, for five change points, 60 data were used; the smallest segment was four points. A homogeneous Poisson process with different λ was simulated and the a posterior distribution was approached with the R INLA package, which performs the approximation through Taylor series. Bayes factor and the bisection method were used to detect the change points. A uniform a priori, log gamma and a Gaussian were used. Five change points are shown in Figure 1.
For the following problem, 60 data are used and the smallest division was reached, which was four data. So, there are fifteen divisions as shown in Table 5.
The change points for a uniform a priori with λ = 2, 1, 4, 7, 6, 1 and in the data divisions or segments 8,7,15,15,8,7, were as shown in Table 5, the change points are the first four and the seventh.
With the same change points and the a priori loggamma with parameters 0.01 and 0.01, the result is that shown in Table  5. It can be seen that compared to the uniform, the log-likelihoods of the loggamma are smaller, however they also detect the change points.
With the a priori Gaussian with zero mean and precision parameter 0.001, for the same number of change points as the uniform and the log gamma, the five change points were detected; these are the first four and the seventh, in Table 5. It can be seen that the value of the log-likelihood of the seventh change point is less than those of the following positions, however.
The five change points in Table 5 were detected with the Bayes factor B, mentioned above. The values for B indicate the strength of the evidence in favor of the change point hypothesis. Va-lues less than 1 indicate that there is no change point. The data in bold, which correspond to the change points, were detected in the following way: value 1 in the middle of the data, value 2 in data 15, value 3 in data 45, value 4 in data 8, and value 7 in data 53. As can be seen, the values of the change points are not equal to each other or to the rest. On the other hand, where a change point did not occur, there were values equal to each other, for example, values 5 and 6, as well as 8, 10, and 14, and in the same way as 9, 11, 13, and 15; therefore, all these values indicate the non-existence of change points. The strength of the evidence in favor of the alternative hypothesis, indicated by applying the bisection method for gradual detection of the change points, is strong for the first point of change and decreases for the following ones; notice this in Table 5. The change points are shown in With the same uniform model, six change points were detected for λ = 2, 1, 4, 7, 6, 1, 2, in the data divisions or segments of 8,7,15,15,8,4,3. The change points detected are the first four, the seventh, and the last. These are shown in Tables 6  and 7.
With the a priori logarithm of gamma, six change points were also detected, which are the first four, the seventh and the last in Table 6. The difference with the uniform is that the log likelihood is smaller, as can be seen.
For the a priori Gaussian, six change points were also detected. These are shown below in log likelihood values. The results are somewhat similar to those from the uniform. The change points correspond to the first four, the seventh, and the last in Table 6. The difference between Gaussian and uniform is that the last change point is less than the previous log likelihood values, where there is no change point. With the beta logit (with parameters 2 and 2), six change points were well detected, being the same as for the uniform and the gamma logit: the first four, the seventh, and the last. No big differences are visualized with respect to the uniform.
For the a priori truncated Normal with mean 0 and precision 0.001, the likelihood is smaller compared to the uniform. This is very similar to the loggamma, since six change points were also detected: the first four, the seventh, and the last in Tables 6 and 7.
In a similar way to the previous problem, the observed change points are shown in tables 6 and 7. These values are not repeated among themselves or among the rest. Instead, the rest of the values are repeated, for example, 5 and 6, as well as 8, 10, 12 and 14, and also 9, 11, and 13; therefore, all these values do not indicate change points, even when they are greater than 1. The change points are shown in Figure 2. These are given in 8, 15, 30, 45, 53, and 57 divisions.
Programming was done in R. The program is presented in appendix A as Program A1.
An algorithm was created for the problem of detecting six change points with the Bayes factor. This algorithm is presented below: Algorithm 1: it detects six change points.
Input the INLA function returns the variables mp1, mp2, and mp which contain data simulated with an AR1 autoregressive model and data from a Poisson family for datos A, datos B and datos.
Output the vector respuesta keeps the likelihoods of 15 divisions.
Initialize vectors are initialized mp1, mp2, and mp which contain the simulated data with an AR1 autoregressive model  The datos vector is divided into 1 to 30 and 31 to 60, and it is assigned to datos A and datos B.
i) The likelihood for datos A and datos B is calculated and stored in the respuesta vector, ii) Vectors datos A and datos B are halved and the data are assigned to datos C and datos D, iii) The likelihood fo datos C and datos D is calculated.
iv Steps i) and ii) are repeated for the new divisions of the data until the division of only two data.
v) The results obtained are recorded.

Space-Time Analysis of Multiple Change Points With the Bayes Factor
For multiple change points in space-time, what was done was assigning positions. The first one was assigned in the center and the rest around it in ascending and counterclockwise order. The a posteriori was approximated with the R-INLA package. The a priori uniform, the Bayes factor, and the bisection method were also used. The simulation of values was done for a homogeneous Poisson process; four random values of λ (between 1 and 15) were used for each position, so three change points were estimated. The values of λ are given in each row of Table 8.
And the resulting change points are the first three of Tables 9, 10, 11, and 12. These are in 15, 30, and 45 divisions.
The behavior of the values is similar to that of the first problem presented in this work. The change points indicated in bold in tables 9, 10, 11, and 12 are not repeated, unlike the rest. According to the strength of the evidence for Bayes factor B, values less than 1 indicate that there is no point of change. These values are observed in table 10, position 6, value 3, and in table 11, position 9, value 2. The Bayes factor method detects the change points well. Its weakness is that it only detects changes up to a division of four, but not when the amount of data is lower.
Programming was performed in R. It is shown in appendix A as program A2.
An algorithm was built to detect 3 space-time points of change in 16 positions. This algorithm is shown below: Algorithm 2: it detects 3 space-time points of change Input the INLA function returns the variables mp1, mp2, and mp which contain data simulated with an AR1 autoregressive model and data from a Poisson family for datos A, datos B and datos.
Output the matrizrespuesta keeps the likelihoods of 5 divisions in 16 positions.
Initialize vector mp1, mp2, and mp are initialized. These contain data simulated with an AR1 autoregressive model and data from a Poisson family for datos A, datos B and datos n=16 means 16 positions.
for i in 1:n The vector datos is divided into 1 to 30 and 31 to 60 and assigned to datos A and datos B.
i) The likelihood for each vector in datos A and datos B is calculated and stored in the matrizrespuesta, ii) The vector datos A and datos B are halved and data are assigned to datos C and datos D, iii) The likelihoods for datos C and datos D are calculated.

iv)
Steps i) and ii) are repeated for the new divisions of the data until the division of only two data.
v) The results obtained are recorded.
end for return returns the matrizrespuesta. This matrix contains the likelihoods.

Threshold Method
For the example of the change point, the variable X 1 in Table 2 of Section 2.2.1 was used. Probabilities of the a posteriori distribution were assigned to data of X 1 . A threshold of 3.041631e − 07 was set, so data having a probability less than this value were considered as change points.
The program A3 in Appendix A was developed in R. The results were given in zeros and ones: 0 0 1 1 0 1 0 0 0 1 0 1 1 The change occurs where the number 1 begins to be more constant, that is, where the a posteriori probability assigned to the data is less than the defined threshold, indicating that a point of change exists. The latter happens in position ten and three, with a threshold of 3.041631e-07.
An algorithm was created which detects multiple points of change by the threshold method. This algorithm is presented below: Algorithm 3: it detects multiple change points by the threshold method Input cement matrix, datos matrix, treshold, regression in ml.
Output Vector menores include zeros and ones; 1 means that the a posteriori probability is less than threshold y 0 means that the a posteriori probability is greater than threshold.
Inizialite n=13 is the amount of data.

For h in 1:n
The a posteriori probability is calculated for each data.

End For
For h in 1:n if the a posteriori probability is less than threshold, then a 1 is stored in the vector menores else if it is greater a zero is stored.

End For
Print menores

With Traffic Accident Data
The algorithm of Section 2.3 was applied to the traffic accident data of Section 2.3.1, Table 3. According to the magnitude of the change, the minimum sum was −439.6875, and it was found in the data located in the seventh position. The maximum sum, the minimum sum and the difference were: Smax=183.875 Smin=-439.6875 Sdiff=623.5625 Confidence Level for a hundred sample bootstrap=100 X N % = 83%. It can be seen in Figure 3 that the change point is at value 7, where it changes from the smallest number of traffic accidents to a higher number. What the algorithm establishes was applied and the program A4 in appendix A was developed.
An algorithm was created with the cumulative sum method, which detects a single point of change. This algorithm is shown below: Algorithm 4: it detects a single change point Input the matrix datos contain data for the analysis Output parciales are cumulative data sums, rparciales are cumulative bootstrap sums, rSdif are the resampling differences.
Inizialite n=33 they are 33 data m=100 is resampling of size 100 with 33 data  Vol. 10, No. 3;2021 If If the resampling differences are less than Sdif then the normal differences are stored in the vector respuesta.

With Coronavirus Data
The coronavirus data from Section 2.3.2 in Table 4 were used. The infection data are shown in a bar graph, in Figure 4. The bar graph shows how the number of infections decreases and increases again. Increases or changes occurred at May Figure 4. Graph of contagion 4, May 11 an May 18, for the first, second and last group, respectively. This is observed in the CUSUM cumulative sums graph shown in Figure 5. For each cumulative sum, the statistics and the percentage are those shown in Table 13. The R program applied for the analysis of coronavirus data, was the same as that used for the traffic accidents data.
With the CUSUM method only one change point is detected, however changes are detected even when the database is small, which is the case with the coronavirus data.

Conclusion
Programs were developed for the Bayes factor, thresholds, and CUSUM methods. In the case of the Bayes factor method, the purpose was to numerically detect the points of change and compare the results with those of the a priori distributions used. For the threshold and CUSUM methods, the objective was to detect the change points. Coronavirus, car traffic, and regression (cement) databases were used, as well as a database simulated with a Poisson process.
Pettitt, standard normal homogeneity, and Buishand ranges methods were used. The R Trend package, which contains these methods, was applied to detect the change points.
It can be concluded that the threshold and CUSUM methods detect change points well. The difference is that the threshold method allows to detect multiple points of change in a regression.
The CUSUM method only detects a point of change, as do Pettitt, standard normal homogeneity, and Buishand ranges methods. Using the CUSUM method, changes are detected even when the database is small, as happened with the coronavirus data.
The Bayes factor method detected the change points well. Its weakness is that it can only detect changes up to a division of four, but when the amount of data in the division is lower, the change points are no longer detected.