Detection of Changes in a Multinomial Process

We look at a multinomial distribution where the probabilities of landing in each category change at some unknown integer. We assume that the probability structure both before and after the change is known, and the problem is to find the probability that the probability structure has changed. For a loss function consisting of the cost of late detection and a penalty for early stopping, we develop, using dynamic programming, the one and two steps look ahead Bayesian stopping rules. We provide some numerical results to illustrate the effectiveness of the detection procedures. We show that the two step ahead procedure is a slight improvement over the one step ahead procedure. However the two procedures are very consistant in their stopping times.


Introduction
Detection of changes in the distribution of random variables is an active area of research with many important applications.There are several published papers dealing with the topic of detection of changes in the distribution of data.In 1926, Shewhart created control charts for means and standard deviations.In 1954, Page created continuous inspection schemes, which are process control procedures designed to detect changes in the distributions of sequences of observed data.Chernoff and Zacks (1964) investigated the Bayesian detection procedures for fixed sample sizes.See Zacks (1983) for a survey of papers on change-point problems.In 1978, Shiryaev studied optimal Bayesian procedures for detecting changes when the process is stopped as soon as a change is detected.Brown and Zacks (2006) investigated the detection of changes in a Poisson process.Zacks and Barzily (1981) extended the work of Shiryaev for the case of Bernoulli sequences.Optimal stopping rules based on Dynamic Programming procedures were discussed.There were several papers recently published dealing with multivariate data.Brown (2008) investigated how to monitor a series of Poisson processes in several categories at once in such a way that at some unknown time point, there is a change in each of several categories.In that paper, the time point changes from category to category.Thus, we obtain k independent Poisson processes.In this paper we change the assumptions to look at the case where the probability structure changes at the same time for each category.Thus we have a multinomial process.While Zacks and Barzily studied Bernoulli sequences, we extend the number of categories to consider a multinomial sequence in k categories.The probability structure of these sequences change in several of k categories.Robbins, Lund and Gallagher (2011) discussed categorical change-points and their application to climate change.Other applications include modeling a change in the likelihood that a customer will buy one of several brands of a product, and the likelihood that a drug will have a certain effect.When there are two categories (success or failure), we have the results of Zacks and Barzily.In this paper we generalize their results for an arbitrary number of categories.Cho (2009) investigated how to determine which of several categories has the largest probability.
In the current paper we use a Bayesian approach, putting Shiryaev (1978) geometric prior on the change-point τ.
In Section 2, we discuss the posterior process of calculating the probability that a change has already occurred by time n.In Section 3, we discuss the optimal stopping rule based on Dynamic Programming procedures.We assume there is a cost associated with stopping early and a cost associated with late detection.We develop optimal stopping rules that minimize the expected cost.In Section 4, we give an explicit formulation of the two step ahead stopping rule.In Section 5, we develop more user friendly calculations for the case where only one category decreases.We conclude with a numerical example showing how our method works.

The Bayesian Framework
We have a multinomial distribution with k categories.For each observation, the probability that the observation lands in category i is θ 1,i for i = 1, ..., k.At some unknown time point, τ, these probabilities change from θ 1,i to θ 2,i for each category.We arrange the categories in such a way that the probabilities increase for the first k 1 categories and decrease for the last k 2 = k − k 1 categories.Thus, θ 1,i < θ 2,i for i ≤ k 1 and θ 1,i > θ 2,i for i > k 1 .For each time point, we note the category where the observation occurs.The observations up to time n are given by X n = {X 1 , . . ., X n }.The joint pdf of (X 1 , X 2 , . . ., X n ) when the change-point= τ is f (X n ).
Here f 1 and f 2 are defined based on the multinomial probability before and after the change point.For i = 1, 2, we have is the sum of the number of times the first r observations land in category j.To simplify notation, let f (r,s) i = f i (X r , . . ., X s ) and f (r . Since τ is unknown, we put Shiryaev's geometric prior distribution h on the change-point τ.
Thus we define the posterior probability function that a change has occurred by time n as Here D n (X n ) is calculated as Let π n = ψ n (X n , π) be the realization of the posterior probability that a change has occurred by time n.
Theorem 1 The posterior probability function that a change happens by time n + 1, ψ n+1 , depends only on the posterior probability that a change happened at time n, π n , and the category where the next observation lands.Thus, Proof.The posterior probability function at time n + 1 is calculated according to (1) as and D n+1 (X n+1 ) is calculated according to (2) as . Using this substitution, we redefine (3) as , and Therefore the probability that a change has taken place at time n + 1 is given by International Journal of Statistics and Probability Vol. 1, No. 2; 2012 The posterior probability of change by time n + 1 depends only on π n and the what happens at the next observation.
Suppose at the next arrival time, we observe category j.The posterior probability of a change by time n + 1 is given by for all categories j = 1, ..., k.After observing the observations up to time n, we compute a posterior probability of the change-point occurring by time n or at a future time n + 1, n + 2, . . . to be h n (τ = n + i|π n ). (5)

Optimal Stopping Rule
At each time point, we calculate π n , the probability that a change has occurred at time n.Suppose we have a cost of c 1 for stopping early and a cost of c 2 per time unit we stop late.Our goal is to develop a procedure that minimizes the expected loss.Without loss of generality, we assume c 1 = 1 and c = c 2 c 1 .Thus c is the cost per time unit to stop late compared to the cost of stopping early.At each time we consider the cost of stopping and the cost of continuing.The cost of stopping at time n is 1 − π n .The cost of continuing at time n is given by cπ n + E (R n+1 |π n ), where R n+1 is the additional risk at time n + 1.Therefore, the risk at time n is According to (6), we opt to stop and declare there is a change whenever the risk Here [a] − = min (a, 0).According to equation ( 7) we stop sampling and declare a change in the probability structure when To find an expression for (8), we consider a truncated rule.We utilize the Dynamic Programming procedure developed by Zacks and Barzily (1981).We assume we must stop after observing n * observations if we have not stopped earlier.Let j be the number of future observations allowed until we reach the truncation point.Thus j = n * −n.Let R ( j) n (π n ) be the risk at time n when only j more observations are allowed.Note that Proof.We calculate E (π n+1 |π n ) using the posterior distribution of a change by time n + 1.
The conditional expectation in ( 9) is taken with respect to the posterior distribution of π n+1 defined in (5) for all j.

Corollary 1 lim
Proof.According to equation ( 9), The risk looking one step ahead is . Therefore, the one step ahead procedure is to stop sampling and declare a change when π n > p c+p .We define π * = p c+p to be the one step ahead boundary.Therefore the one step ahead stopping random variable is N (1) = min {n : π n > π * }.Now we look at j steps ahead stopping rules.Define for j ≥ 0, M ( j)  n (π n ) recursively as the expected risk of continuing at the next observation.
and M (0) n ≡ 0. Therefore the risk looking j steps ahead is R − .The j steps ahead procedure is to stop sampling and declare that a change has occurred the first time that π n > π * − M ( j−1) n (π n ) c+p .Define the j steps ahead boundary function b ( j)  n as b c+p , 1 .Therefore, the j steps ahead stopping random variable is N ( j) = min n ≥ 1 : π n > b ( j)  n .
Lemma 2 b ( j) n are increasing functions of j.Proof.It suffices to show by induction that M ( j)  n are decreasing functions of j.By definition of M ( j) n , we have that M (1)  n ≤ 0. For j ≥ 2, Calculating the expected value of ( 12) we obtain the difference . By the induction hypothesis, M ( j−1) n+1 (.) ≤ M ( j−2) n+1 (.).Thus, I 2 ⊂ I 1 .We express (12) as follows The first term is negative by the induction hypothesis, and the second term is also negative by the definition of I 1 .
Since the boundary functions are increasing functions of j and bounded above by 1, by the monotone convergence theorem, there exists an optimal stopping rule defined as b ∞ n (π n ) = lim j→∞ b ( j)  n (π n ).Thus the optimal stopping rule is to stop the first time that π n > b ∞ n (π n ).

Explicit Formulation of the Two Step Boundary
In this section we calculate the two step ahead boundary function.First we calculate M (1)  n (π n ).By equation ( 11), we have 1) with probability 1.Thus we only consider the case where π n > π * .
. Since in category i, Therefore, the sum in equation ( 13) can be taken over those categories for which the probabilities decrease.
P (X n+1 = i) is calculated using the predicted distribution of X n+1 defined in (4).From equations ( 4), ( 5) and ( 14), we calculate M (1) n (π n ) as follows The two step boundary is b Therefore the two step ahead boundary states that we stop sampling for the first time when π n > b (2) n (π n ).

The Two Step Boundary When Only One Category That Decreases
In the case where there is only one category that decreases in probability, we simplify the two step ahead boundary to make the procedure more user-friendly.This has applications to situations where the last category is "Other" or "None of the above".For example, suppose a car dealership is anticipating an increase in the sales of all their vehicles on their lot.When a customer enters the dealership, the categories could be "Buys a Sedan", "Buys a Coupe", "Buys a truck" or "Does not buy a vehicle".If an increase occurs in all vehicles, the only category that would have a decrease is the "Does not buy a vehicle" category.In this case, the two step ahead boundary becomes more simple.In calculating the two step ahead boundary, equation ( 15) becomes Here b (2) n is itself a function of the posterior probabilities π n .We want to determine which posterior probabilities are such that π n > π * − M (1) n (π n ) c+p .Solving this equation we obtain Thus the two step ahead boundary is The two step ahead stopping variable N (2) is defined as N (2) = min n : π n > B (2) .

Numerical Results
We now show some numerical results to see how the procedure works.For both examples, we look at a trinomial.In example one, the probability of landing in each category before the change is even, 1 3 for each category.After the change, we have a high probability of 0.8 landing in category 1, while categories 2 and 3 have a much lower probability of 0.1.The prior probability is p = π = 0.01 and the cost is 0.06.We assume the change takes place at τ = 10.We calculate the one step ahead procedure to be π * = 0.1429.The results are given in Table 1.From this table, we can see that we would stop using both the one step ahead procedure and the two step ahead procedure at time n = 12.The second simulation contains a decrease in just the last category.The probabilities of landing in each category before the change are (0.3, 0.1, 0.6) and after the change are (0.4,0.2, 0.4).Using equation ( 16) it is we see that B (2) = 0.1575.The results from this simulation are given in Table 2.In this simulation, the one step ahead procedure stops at time n = 9, which is a false alarm.The two step ahead procedure corrects for this and stops at n = 12.We also see that whenever π n > b (2) n , we also have the posterior probability greater than 0.1575.That illustrates the fact that in the case where all but one category increases, we can use equation ( 16) which will simplify the calculations.We proceeded by running 1000 Monte Carlo simulations, using various choices for θ i, j .In each of the simulations we assume three categories, a prior probability of π = p = .01,and a change-point happening at τ = 10.The results are given in Table 3.In this table, we compare the cost for the one step and the two step ahead procedure, we see that the two step ahead procedure has a slight improvement over the one step ahead procedure.The last column marked consistent shows how often the two procedures stop at the same time.For over half of the simulations run, the two step ahead procedure stops at the exact same spot as the one step ahead procedure.In Table 3, we compare the cost for the one step and the two step ahead procedure, we see that the two step ahead procedure has a slight improvement over the one step ahead procedure.The last column marked consistent shows how often the two procedures stop at the same time.For over half of the simulations run, the two step ahead procedure stops at the exact same spot as the one step ahead procedure.

Conclusions
We have generalized the detection procedure introduced by Zacks and Barzily to include multinomial sequences.
We have shown optimal stopping rules when the probability structure changes at some unknown time point.We gave explicit calculations of both the one and two step ahead procedures.There is a slight improvement in cost of the two step ahead procedure over the one step ahead procedure.However our simulations show that over half the time, the one and two step procedures are very consistent stopping at the same time.We also provide a user-friendly variation of the two step ahead stopping bound that works when only one category has decreasing probabilities.

Table 3 .
Compares the one and two step ahead procedures for various probability structures