Modeling Event Clustering Using the m -Memory Cox-Type Self-Exciting Intensity Model

In the analysis of point processes or recurrent events, the self-exciting component can be an important factor in understanding and predicting event occurrence. A Cox-type self-exciting intensity point process is generally not a proper model because of its explosion in ﬁnite time. However, the model with m -memory is appropriate to analyze sequences of recurrent events. It assumes the most recent m events multiplicatively a ﬀ ect the conditional intensity of event occurrence. Aside from the interpretability, one advantage is the simplicity of the estimation and inference–the Cox partial likelihood can be applied and the resulting estimator is consistent and asymptotically normal. Another advantage is that the model can be applied to the analysis of case-cohort data via the pseudo-likelihood approach. The simulation studies support the asymptotic theory. Application is illustrated with analysis of a bladder cancer dataset and of an Australian stock index dataset, which shows evidence of self-excitation.


Introduction
Recurrent event data are encountered frequently in many areas of scientific endeavor, such as the modeling and predictions of earthquakes and other disastrous events, study of the patterns of neural firings in neuroscience, assessing the efficacy of cancer medications in suppressing the recurrence of tumors, and analysis of the risk of default on debt repayments by borrowers.Point processes are natural stochastic process models for the modeling and analysis of recurrent event data.Depending on the form of the data available and the research questions of interest, one of two types of point process models might be appropriate.If the data is in the form of a single long string of event recurrence times, it might be of interest to predict the next event recurrence time by exploiting potential dependence of the waiting times between events on past events or on exogenous covariates.Models of this type include the self-exciting point process (Hawkes, 1971;Ogata, 1978), the modulated renewal process (Cox, 1972;Oakes & Cui, 1994;Lin & Fine, 2009) and the autoregressive conditional duration models (Engle & Russell, 1998;Fernandes & Grammig, 2006).Another form of data, which appears most often in medical statistics, consists of multiple strings of event times and covariates for each string.The number of events in each string is typically small due to censoring, and some individuals might not have experienced a single event by the censoring time.For data in this form, the main interest in practice is to assess the effects of the covariates on the frequency of event recurrence.Examples of the models that suits this purpose include the Cox proportional intensities (CoxPI) model (Andersen & Gill, 1982) and the proportional means model (Lin et al., 2000;Wellner & Zhang, 2007).
In this paper we consider a model that suits the analysis of data in the multiple string form.We are motivated by the temporal clustering of event times observed in individual strings with multiple events.The temporal clustering of event times indicates potential self-exciting effect among the events, which, if not properly accounted for, can lead to erroneous inferences about the effects of the covariate.Although the CoxPI model does not explicitly account for the potential self-exciting effect and therefore is not directly suitable for data with signs of event clustering, its many well-known theoretical and computational advantages motivate us to build our model based on it.The aim is to explicitly incorporate a self-exciting feature in the model, while at the same time retaining as many advantages of the CoxPI model as possible.
The method to model event clustering in this paper is motivated by the aforementioned Hawkes self-exciting point process model, which is a simple point process N(t) with intensity process λ(t) in a self-exciting form, where ν > 0 is the background event intensity and g(•) ≥ 0 is the excitation function.The CoxPI model is a simple point process with intensity process given by where λ 0 (t) is a baseline intensity function, z(t) is a vector valued process of covariates, and β a vector of parameters.A naïve extension of the CoxPI model by including the integral term in (1) to the logarithm of the intensity, i.e., log does not lead to an appropriate model because such a model can easily be explosive; see Remark 1 below for an explanation.However, if we modify the integral term in (2) by restricting the contribution of past events on the current event intensity to the most recent m (< ∞) events, then the resulting model does not suffer from the explosion issue and still posses an explicit self-excitation feature.Such a model, which we call the m-memory Cox-type self-exciting intensity (CoxSEI(m)) model, shall be an appropriate model for recurrent event data with temporal clustering of event times.
The rest of this paper is organized as follows.In Section 2, we present the CoxSEI(m) model and the estimation procedure.In Section 3, we present some asymptotic properties of the estimators.In Section 4, we report the results of some simulation studies and analysis of a bladder cancer data set and an Australian stock index data set.Section 5 concludes with discussion.Technical proofs are relegated to the Appendix.All computation was done in R (R Core Team, 2014) with the aid of the package coxsei written by the authors.

The CoxSEI(m) Model and the Estimation Procedure
Consider a point process N(t) = ∞ j=1 1 {T j ≤t} , with t ∈ [0, ∞) and T j denoting the j-th event time.As a CoxSEI(m) point process, N(•) has a conditional intensity process given by where μ(t) is an unspecified baseline intensity, Z(t) is a possibly time-varying p-vector of covariates, β is a p-vector of regression coefficients which measures the effects of the covariates to the intensity on the log scale, and φ(t) is a self-exciting term depending on past events of the process, where denotes the set of indexes of the most recent m events in the past.The excitation function g is specified up to a parameter γ.Normally g is a positive decaying function, and the parameter γ regulates the decay rate.The decay of g implies that the more recent events have stronger direct effects on the current event intensity than the events in the more remote past.Typical examples of g include the exponential decay function g(t, γ) = exp(−γt) and the polynomial function g(t, γ) = (1 + t) −γ , with γ > 0 (e.g., Errais et al., 2010;Ogata, 1988).The parameter α measures the initial magnitude of the self-exciting effect.While a positive α implies the self-exciting effect is genuinely excitatory, a negative α would imply that the "self-exciting" effect is in fact inhibitory (Kopperschmidt & Stute, 2009).
Remark 1 We assume m to be a positive integer.If m = 0, the self-exciting component vanishes and the CoxSEI(m) model ( 3) reduces to a CoxPI model.If m = ∞, the CoxSEI(m) model becomes an infinite-memory Cox-type self-exciting process.In this case, the process will be explosive under fairly general conditions if α > 0. To see this, suppose the baseline intensity μ(•) is bounded away from 0 and ∞, g(t, γ) > 0 is decreasing in t, the covariate processes Z(•) are bounded, and the regression coefficients β are all finite.Write c = inf{μ(t) exp(Z(t) T β): t ≥ 0} > 0. Let ΔT 1 = T 1 , ΔT j = T j − T j−1 , j ≥ 2 denote the durations between events.For any fixed t > 0, there exists ε > 0 such that ε ∞ j=1 1/ j 2 < t.As a result, ΔT j ≤ t) ≥ Pr(ΔT j ≤ ε/ j 2 , j = 1, 2, . . .). (5) Clearly the probability on the right hand side of (5) can be written as For CoxSEI(m) processes with finite m, under mild regularity conditions, such as C1-C4 to be presented later, the intensity process λ(•) is bounded away from 0 and ∞ with probability one.As a result, it will not be explosive for sure (with probability 1).We shall only consider the CoxSEI(m) model with finite m.
Remark 2 Under the CoxSEI(m) model, certain Markov property can be derived for the process.Set T k = 0 for k ≤ 0 for notational convenience.Let ξ j (t) = T N(t−)− j+1 , 1 ≤ j ≤ m, be the times of the most recent m events before time t.Let ξ(t) = (ξ 1 (t), ..., ξ m (t)) T , t ≥ 0, be an m-vector continuous time process.It can be verified that given the covariates and ξ(t), ξ(s) and ξ(τ) with s < t < τ are conditionally independent.Therefore ξ(t) is a continuous time Markov process of dimension m, conditioning on the covariates.
Suppose we have n independent observations of the CoxSEI(m) process N(t) and the covariate process Z(t) until a censoring time C which is assumed to be independent of N(t) conditional on Z(t).Denote the observations by The estimation of the CoxSEI(m) model is along the same lines as that of the CoxPI model.The estimation of the parametric part relies on the Cox partial likelihood, and the estimation of the cumulative baseline intensity is motivated by the Breslow estimator (Breslow, 1972) as in the CoxPI model.Specifically, we note that given the history of the n subjects prior to time t and the observation that an event occurs at time t, the conditional probability that the event pertains to the i-th subject is The maximum partial likelihood estimator θ is defined as the maximizer of L(θ) over the parameter space Θ ⊂ R p+2 .The estimator of the cumulative baseline intensity function dt is similar to the Breslow estimator (Breslow, 1972) and is given by

Large Sample Properties of the Estimators
The following conditions are needed to prove the large sample properties of θ.Let θ 0 be the true value of θ in Θ.We use the symbols ∂ θ and ∂ 2 θθ T to denote the operators of finding first and second order partial derivatives with respective to θ.

C1. The covariate process Z(•) is bounded.
C2.The parameter space Θ is closed, bounded and connected, and contains θ 0 as an interior point.Moreover, C3.The excitation function g(t, γ) is positive, bounded, decreasing in t, and twice continuously differentiable in γ.The baseline intensity μ(•) is bounded and continuous.
C4.The matrix Σ(θ) is finite and positive definite and continuous at θ 0 where and Remark 3 C1 is commonly assumed in the literature and C2 is an identifiability condition.In C3, the monotonicity of g is practical but can be relaxed.C4 is essential to the asymptotic normality of θ.Unlike the classical Cox model, the global concavity of the log-partial likelihood does not automatically hold in the CoxSEI(m) process.
The large sample properties of θ and Û(•) are given in the following two propositions.
where Σ = Σ(θ 0 ), which can be consistently estimated by Remark 5 Similar to the efficiency of the Cox partial likelihood estimator in the proportional hazards model, it can be verified that Σ is the Fisher information matrix for θ and that, as a result, θ is a semiparametric efficient estimator of θ.
Proposition 6 Let μ 0 (•) and U 0 (•) be the true baseline intensity and baseline cumulative intensity functions respectively.Let r (m) (t, θ), R (m) (t, θ), m = 0, 1, 2, and I (θ) be as those defined by (A.1)-(A.7) in the Appendix.Assume C1-C4 hold.Then the process √ n{ Û(•) − U 0 (•)} converges weakly to a Gaussian process with mean zero and covariance function which can be estimated uniformly consistently by Remark 7 The large sample distribution of θ is approximately normal with mean θ 0 and variance I ( θ) −1 , and the distribution of Û(t) is approximately normal with mean U 0 (t) and variance If the baseline intensity function μ(•) rather than its integral is of interest, then we can estimate it using one of the many nonparametric methods available, such as kernel smoothing (Ramlau-Hansen, 1983) and the local polynomial method (Chen et al., 2011).To this end, we first note the intensity process of the aggregate process N • (t) has a multiplicative form { i∈R t Ψ i (t, θ)}μ(t).Since the nonparametric estimator of μ(t) with the exposure process i∈R t Ψ i (t, θ) fully known typically has a rate of convergence slower than √ n, while the plug-in estimator of the exposure process i∈R t Ψ i (t, θ) has a √ n rate, we can simply estimate μ(t) by assuming the estimated exposure process is the unknown true exposure process.
The proof of Proposition 4 is given in the Appendix.The proof of Proposition 6 is essentially the same as that of Theorem 3.4 and Corollary 3.5 in Andersen and Gill (1982), and is omitted.

Simulation
This section reports the results of a simulation study.The simulation model is CoxSEI(2) with baseline intensity μ(t) = 1 + 0.5 cos(2πt) and excitation function g(t) = α exp(−γt) = 0.7e −10t .The covariate process has three static components Z i , i = 1, 2, 3. Their design distributions are Uniform[0.5,1.5],Uniform[1.5,2.5] and Bernoulli(0.5),respectively.The regression coefficients associated with the Z i are β 1 = 0.2, β 2 = 0.4, β 3 = 0.6.The censoring variable is independently generated, following lognormal(0, 0.1).The sample size is n = 100.The simulation was repeated 100 times.The results are summarized in Table 1.It is seen that the estimates of the parameters β i , α and γ seem unbiased, and the estimates of the standard errors are close enough to the empirical ones.The empirical distributions of all estimates are very close to normal distributions, with the two-sided Kolmogorov-Smirnov tests of normality all having p-values much greater than 0.05.The estimates of the cumulative baseline intensity function are shown in the left panel of Figure 1, which are close to the true cumulative intensity function.The standard error estimates calculated from the variance estimator (7) are shown in the right panel of Figure 1 together with the empirical standard errors, from which we note the variance estimator (7) for the cumulative intensity estimator is slightly biased upward, but not by much.We therefore conclude that the proposed estimation procedure works well and conforms with the theory.To evaluate the effects of neglecting self-excitation on the estimation of the covariate effect, we fit the ordinary CoxPI model to the data generated from the CoxSEI(2) model.The results are shown in Table 2.The estimated covariate effects are clearly inflated and the standard errors are generally underestimated.This implies that application of the CoxPI model to recurrent event data without accounting for the potential self-exciting effect may lead to erroneous inference about the covariate effects.

Analysis of a Bladder Cancer Dataset
We illustrate the CoxSEI(m) model with two real-life examples.The first is a bladder cancer study reported by Byar (1980) and frequently used to illustrate event history data analysis methods (e.g.Wei et al., 1989;Therneau & Hamilton, 1997;Wellner & Zhang, 2007).A total of 118 patients with superficial bladder tumors were admitted to the study between November, 1971 andAugust, 1976.The tumors were removed transurethrally and patients were randomly assigned to one of three treatment groups: placebo, pyridoxine, and thiotepa.For patients who experienced tumor recurrence, the new tumors were removed at each visit.The initial number of tumors and the size of the largest initial tumor were recorded for each patient.The censoring time was the earlier of death due to bladder cancer or other causes and end of study.The follow-up time of all patients varies from 0 to 64 months, the number of recurrences experienced by the patients varies between 0 and 9 with mean 1.6 and variance 5.3; see Figure 2. The data is available from the R package survival (Therneau and original Splus→R port by Thomas Lumley, 2011).We have made slight modifications to the data by adding 0.5 to the two 0 censoring times and to the censoring times that equal the corresponding patient's last recurrence time.These modifications caused no appreciable difference to the numerical result of fitting the CoxPI model using the coxph function from the R package survival.3. It is noted that by the CoxPI model the treatment thiotepa has a statistically significant suppressing effect on tumor recurrence intensity in the presence of other covariates.However, in the CoxSEI(2) model, while thiotepa still seems to have a beneficial effect in the presence of other covariates and the self-exciting effect, the beneficial effect is much less conclusive with a p-value substantially greater than 0.05, even if a single-sided alternative is assumed.Since the estimated α parameter of the self-exciting term is positive and statistically highly significant, and the AIC suggests CoxSEI(m) with m > 0 fits much better to the data than the CoxPI model, it seems plausible to conclude the self-exciting effect among bladder tumor recurrences is genuine.
From a biological point of niew, it also seems natural to suspect the occurrence of a tumor and the ensuing surgery to remove it could damage the bladder tissue, rendering further tumor recurrences more likely to happen.The neglect of the self-exciting effect could have been the cause of inflated beneficial effect of thiotepa in the ordinary CoxPI model, which is similar to the false positives caused by fitting generalized linear models to overdispersed data without properly accounting for overdispersion.

Analysis of an Australian Stock Index Data Set
As an example where the baseline event intensity might also be of interest, we consider data on intra-day times of exceedance of a threshold value by the tick-by-tick return of an Australian stock index, the All Ordinaries Index.Our consideration of the index return exceedance process is motivated by Embrechts et al. (2011).During the period from 1 January 1996 to 3 June 2011 GMT, there were roughly 4,000,000 price moves of the All Ordinaries Index.The corresponding tick-by-tick log-returns varied in the range [−1.114, 1.103] × 10 −1 , with the maximum and minimum returns attained at 10:13:33.614and 10:14:01.295respectively on 28 Jun 2010.The 99th percentile of the returns was q (0.99) = 4.39 × 10 −4 .For the purpose of illustrating the CoxSEI(m) model, we only considered the intra-day times in year 2010 at which the returns exceeded q (0.99) .There were 3,131 such exceedances on 254 trading dates in 2010.We filtered out the data on the 24th and 31st December 2010 as the stock exchange closed early at 14:10 on these two days and the baseline event intensity near 14:10 on these days would be substantially higher than on regular trading days when the market closes at 16:10.Since the market dynamics of after hours trading is expected to be different from that of normal hours trading, we also excluded the data outside the normal trading hours, 10:00-16:10.This left us with 3,030 exceedances on 252 trading days.The daily number of exceedances varied between 0 and 66, with mean 12.02 and variance 111.14.
To apply the CoxSEI(m) model, we need the assumption that the return exceedance processes on different days are conditionally independent.A times series plot of the daily number of return exceedances showed quite strong serial correlation even after weekday and month of year were accounted for using a Poisson regression.However, if we fit an order 1 autoregressive time series model with weekday and month as external categorical covariate variables, then the Ljung-Box tests revealed no significant serial correlation among the residuals, with p-values > 0.05 up to lag 14 and > 0.01 up to lag 20.Therefore we assumed that the daily exceedance processes were conditionally independent given weekday, month and the number of exceedances on the previous trading day.We fitted CoxSEI(m) models with exponential excitation function g(t) = α exp(−γt) and different m values to the data.We then selected the value of m using the AIC.The unit used in measuring time is the hour.
The AIC value was 31766.2 when m = 0, and in the range [31435.5, 31634.7]when m ≥ 1, with the minimal value 31435.5 attained by m = 1.The parametric part of the results of fitting the CoxSEI(1) model are shown in Table 4, from which we note that the number of exceedances on the previous day (yesterday) has a highly significant positive effect on the current day exceedance intensity.This could be interpreted as an inter-day exciting effect among the return exceedances on the All Ordinaries Index.The month effect is significant with February, June, July and August seeing more and April seeing less exceedances than January.In the presence of other variables, the differences between March, May, September, October, November, December and January were not significant.The weekday effects do not seem to be individually significant.The parameter α is highly significant with a positive value, suggesting the existence of intra-day exciting effect among the return exceedances.The parameter γ is also highly significantly different from 0, indicating the self-exciting effect is decaying over time.The month effect we have observed on the return exceedance intensity is reminiscent of the January effect in financial returns observed in the US financial market.In view of the common theory which relates the January effect to the end of the fiscal year in US, we might also speculate that Australia's end of the fiscal year in June have contributed to the increased market volatility which is reflected by the increased intensity of the return exceedance process.Considering the potentially erroneous inference about the covariate effects that could have been caused by neglecting the self-exciting effects, it seems warranted to develop formal statistical tests to detect the existence of the self-exciting/inhibitory effect.While the likelihood ratio test seems a natural candidate test, the asymptotic null distribution of the likelihood ratio statistic is non-standard.The reason is that under the null hypothesis of no self-exciting effect or equivalently, α = 0 in the examples considered in this paper, the parameter γ is unidentifiable and the asymptotic normality of γ fails, and therefore, the asymptotic distribution of the likelihood ratio statistic under the null fails to be χ 2 .In an unreported simulation study, we have found that the empirical distribution of the likelihood ratio statistic deviates substantially from the χ 2 1 and χ 2 2 distributions.Continuing work concerning the asymptotic null distribution of the likelihood ratio test or concerning other tests is desirable.
In constructing the self-excitation term (4) in the CoxSEI(m) model, we have parametrized the effects of the recent m events on the current event intensity in the form of αg(t − T N(t−)+1− j , γ) rather than using m unstructured coefficients corresponding to the m events respectively.The consideration behind this choice is interpretability.With a decreasing function g, the individual excitation effects on the current event intensity associated with recent events are monotone with more recent events having more significant effects, which tends to agree with our intuition.In contrast, the unstructured coefficients approach could give rise to estimated coefficients with erratic patterns which are hard to interpret.
The CoxSEI(m) model considered in this work may appear to be a special case of the CoxPI model with a timedependent covariate j∈M(t) g(t−T j , γ).However this is generally not the case because of the nonlinear dependence of g on the unknown parameters γ.
In the real data examples, our choice of the parametric form of the excitation function is essentially arbitrary and we have not considered how to select the excitation function using any data-driven procedures.The main reason is that for the correct estimation of the covariate effects and the baseline intensity function, the specific choice of the excitation function is much less important than the inclusion of the self-excitation term in the model.However, further work concerning formal specification tests for the excitation function is clearly desirable.
From the viewpoint of explicitly accounting for potential self-exciting effects in intensity based regression analysis of recurrent event data, one can also consider the combination of the Hawkes self-exciting point processmodel with the Aalen additive intensity regression model (Aalen, 1980).Although care is needed in fitting such a model to guarantee the positivity of the intensity process and the accommodation of self-inhibitory effects might not be as easy, this additive model is arguably more intuitive and easier to interpret in specific contexts.Therefore such a model also deserves investigation, and shall be considered elsewhere.
Another advantage of the CoxSEI(m) model is that it can be applied to the analysis of data collected by the cost effective case-cohort design (Prentice, 1986), with inference based a pseudo-likelihood approach; for details, see the companion paper F. Chen and K. Chen (2014).

Figure 1 .
Figure 1.The estimates of the cumulative baseline intensity function (left) and of the standard errors (right) based on the simulated data

Figure 2 .
Figure 2. Bladder tumor recurrence (solid point) and censoring (end of line) times of the 118 bladder cancer patients We fitted the CoxSEI(m) model to the modified data with m = 0, 1, . . ., 9 and calculated the corresponding values of the Akaike information criterion (AIC), which is defined as minus twice the maximized log-partial likelihood value plus twice the number of parameters involved in the partial likelihood.With m = 0 the AIC value was 1626.5, while with m ≥ 1 the AIC values was in the range [1552.9,1571.6] with the minimal value achieved by m = 2, which suggests the CoxSEI(2) model gives the best fit to the data.The results of fitting the CoxPI and the CoxSEI(2) models are shown in Table3.It is noted that by the CoxPI model the treatment thiotepa has a statistically significant suppressing effect on tumor recurrence intensity in the presence of other covariates.However, in the CoxSEI(2) model, while thiotepa still seems to have a beneficial effect in the presence of other covariates and the self-exciting effect, the beneficial effect is much less conclusive with a p-value substantially greater than 0.05, even if a single-sided alternative is assumed.Since the estimated α parameter of the self-exciting term is positive and statistically highly significant, and the AIC suggests CoxSEI(m) with m > 0 fits much better to the data than the CoxPI model, it seems plausible to conclude the self-exciting effect among bladder tumor recurrences is genuine.
. codes: 0 *** 0.001 ** 0.01 * 0.5.0.1 1In Figure3we show the estimated cumulative baseline intensity function and a local linear estimate of the baseline intensity function using the method discussed in Remark 7. From the figure we note the baseline intensity of return exceedance at market open is substantially much higher than in the rest of the trading hours and the intensity during the morning hours are generally higher than in the afternoon hours.The very high return exceedance intensity at market open is to be expected considering that the occurrence of large and sudden price changes of the constituents of the index are likely to be due to the availability of price impacting information accumulated overnight when the local exchange is closed but many overseas exchanges are still running.The relatively high intensity during the rest of morning hours could be linked with the opening of Asian stock exchanges, such as the Malaysian and Singapore stock exchanges at 11:00 AEST (Australian Eastern Standard Time), the Hong Kong and Mainland China exchanges at 11:30 AEST.The opening times are to be postponed by an hour when Australia observes the Daylight Saving Time from the first Sunday of October to the first Saturday of April.

Figure 3 .
Figure 3.Estimated cumulative baseline intensity (left) and baseline intensity (right) of the All Ordinaries Index return exceedance process with point-wise 95% confidence limits 5. Discussion In this paper we have considered an extension of the CoxPI model called the CoxSEI(m) model for the analysis of recurrent event data that has the feature of temporal clustering of events experienced by the same individual.

Table 1 .
Results of the simulation-fitting the correct model value for the K-S test of normality 0.976 0.937 0.833 0.761 0.622

Table 2 .
Results of the simulation-fitting the CoxPI model to data generated by CoxSEI(2) models

Table 3 .
Results of fitting the CoxPI and CoxSEI(2) models to the bladder cancer data

Table 4 .
Parametric part of the results of fitting the CoxSEI(1) model to the all ordinaries index data