Qu Test for Structural Breaks in Quantile Regressions

The Qu (2008) test for changing coefficients in quantile regression is analyzed in order to investigate its diagnostic propensities when appropriately decomposed. The goal is to reveal the diagnostic capabilities of this test: it can indeed be implemented to detect the location of a break and to pin point its impact on each regression coefficient. In addition it is possible to analyze the evolution of the impact of a break on a coefficient as the quantile changes. To implement the Qu test as a diagnostic tool, the original definition is modified. The behavior of the modified test is investigated using real data and through a Monte Carlo study.


Introduction
There is a vast literature on tests for changing coefficients with unknown break points.A general overview of this topic is discussed in Perron (2006) while, to cite only some of the most recent contributions, Elliott and Muller (2007) focus on a single break in time series, Bai and Perron (2003) consider the estimation of multiple breaks models, Qu and Perron (2007) extend the analysis to systems of equations, Bai (2010) analyses breaks in panel data.In particular, the present analysis focuses on the works by Qu (2008), Su and Xiao (2008), and Oka and Qu (2011), where test on parameter stability in quantile regressions are proposed.The Qu test (2008) for structural breaks in quantile regression is here modified in order to investigate its diagnostic capability.We propose a transformation in the definition of the test function, in order to use it as a diagnostic tool to signal the position of a break, and, eventually, its nature.Indeed, in a Monte Carlo study focusing on a simple linear regression model, we find that a location and/or a scale shift is individually signaled by each coefficient of the equation.Furthermore, the real datasets here considered show how a break does not have the same impact either on each coefficient or across quantiles.In a simple linear equation analyzing consumption, the modified test function clearly shows when the break occurs and at what quantile it has been more effective: an early break affected mostly the median consumption while a later break caused a marked change in the propensity to consume at the third quartile.A multivariate regression is then discussed using data related to the international OECD-PISA test, that investigates students' scores in a reading proficiency test.The model analyzes the influence of curricula on reading ability.By appropriately ordering the sample, from north-center regions to southern ones in one case, or from girls to boys in a different ordering, the modified test function finds the change point.Focusing the analysis on one explanatory variable, in this example gender or region, makes the graphical patterns very smooth and the identification of the break point extremely clear.
The diagnostic propensity of the Qu test function is here investigated by computing the test individually for each regression coefficient (from now on 'individual Qu test') in real datasets and in a Monte Carlo study.
The paper is organized as follows: Section 2 describes the test function and the proposed modification, while Section 3 considers two real data examples showing how this modification of the Qu test provides relevant information on the behavior of the regression coefficients.The focus moves then to a small Monte Carlo study, which analyses the Qu test capability to detect a break in a controlled setting: Section 4 considers the behavior of the standard Qu test in the presence of shifts in the location of the regression model, while Section 5 looks at experiments with break in the scale parameter.Section 6 considers the capability of the individual Qu test to locate the breakpoint, and the final section draws the conclusions.

The Quantile Regression Test Function
Consider the standard linear regression model, y i = β 1 + β 2 x i + u i for i = 1, ..., n.The quantile regression objective function at the chosen τ is given by: & Bassett, 1978) (Note 1).The Qu test function is given by: where 0 ≤ λ ≤ 1 represents the fraction of the analyzed sample, and H λ,n is the partial sum of the gradient, defined as: and: is a fluctuation type test, based on sequentially evaluated sub-gradients H λ,n .In case of break, the gradient H 1,n will significantly differ from the sub-gradient H λ,n for some λ.In absence of break, instead, any sub-gradient is a good proxy of the gradient.The test looks for the maximum value across all regression coefficients and across all quantiles.In an additional paper, Oka and Qu (2011) implement this same test in subsets of the sample in order to find multiple breaks.
By searching the maximum difference between H λ and H 1 across all coefficients and all quantiles, SQ τ loses information and disposes of any diagnostic capability.In alternative we propose to implement a fluctuation test individually for each regression coefficient.The individual fluctuation test allows us to check if the break is partial or global, that is if the change affects each regression parameter by the same extent, or only some of them in a relevant way.In addition, by comparing partial versus total gradients for a coefficient computed at different quantiles, it is possible to find the pattern of a coefficient across quantiles and also the pattern of the impact of a break on a coefficient at the different quantiles.Indeed a break does not have the same relevance at all quantiles: for instance a policy measure may have a large impact at lower quantiles and a small one at the higher quantiles, or vice versa.The knowledge of the effect of a break across quantiles can be a relevant issue for policy makers.

Two Real Data Examples
In this section the behavior of the individual Qu test is investigated considering two real datasets.The former is a small sample of annual data on consumption and gross domestic product (GDP) in Italy and it is used to investigate the Qu statistics in a univariate regression model.A larger dataset concerning OECD-PISA test on student performance is taken into account in order to examine a multivariate model.In both the examples, the analysis of the individual Qu statistic, decomposed with respect to the different coefficients and the different conditional quantiles, provides interesting and otherwise disregarded results.

A Univariate Example: The Consumption Function
Time series of GDP and consumption, expressed in euros using the year 2000 chain deflator for the 1970-2009 sample, is here analysed (Note 2).
A univariate quantile regression model, using GDP as regressor and consumption as dependent variable, is estimated at the three conditional quartiles.The estimated regression coefficients, together with their Student-t values, are: The intercepts are not statistically significant, while the three slopes are significant and the Qu statistics, equal to 1.502, leads to reject the null of no break at the 1% level, against a critical value of 0.942 (see Table 2 in Qu, 2008, for the case of one regressor in the equation).The Qu statistic reaches its highest value in 1980.
However, more details can be obtained through the analysis of the individual Qu statistic decomposed with respect to the different coefficients and the various quantiles.The value SQ = 1.502 is reached at the median by the slope coefficient.At the third quartile, the individual Qu test once again rejects the null of constant slope, while at the first quartile the null is not rejected.Figure 1 presents the patterns of the individual Qu test for the slope coefficient at the three conditional quartiles here selected.The graphs show the presence of two major breaks, one near 1980 and another around 1993.In particular, the Qu series for the first quartile and the median reaches its maximum in year 1980 and presents a second peak in 1993.For the third conditional quartile, the strength of the two peaks is inverted, as the Qu series presents the maximum in 1993 and the second high peak in 1980.Therefore the analysis of the Qu series, decomposed with respect to each coefficient and to the selected quartiles, reveals a pattern for the upper part of the conditional distribution differing from the first two quartiles.The first two rows of Table 1 collect the results of the individual Qu test.A possible explanation of the first break can be related to Italy joining the European Monetary System (EMS) in 1979, with a consequent alteration in the exchange rate management by the Italian central bank: this had an impact on prices, affecting consumption particularly at the median.In 1992 there was a currency crisis, a sharp devaluation of the national currency and the Government abolished the automatic indexing system connecting wages and inflation, leading to a less inflationary price regime.The measures to curb inflation had a larger impact at the higher quartiles than elsewhere.Thus, while the former shock affects consumption mostly at the median, the second shock, although by a lower extent, is predominant at the third quartile.The relevance of the two peaks is inverted at the third quartile with respect to first two quartiles

A Multivariate Example: The OECD-PISA Dataset
In this section the international OECD-PISA test on student performance is considered (Note 3).In particular, the score on the reading test of Italian students in 2009 is related to curricula, academic or technical track with respect to vocational schools.The analysis is carried out using a sample of size n = 29243.2 presents the patterns of the individually computed test at the first, second and third quartile both for the intercept and the two slopes.The graphs show interesting patterns.At the median, the estimated value of the test is 1.80 for the intercept, 3.12 for academic and 2.78 for technical track, implying that the academic track coefficient is the one presenting the larger changes across the sample.The null of no break is rejected at the 1% level, for a critical value of 0.991 in the case of two regressors in the equation (Table 2 in Qu, 2008).The values of the individual Qu test are 1.87 and 2.28 at the first quartile, while 2.38 and 2.28 at the third quartile, respectively for academic and technical track.The intercept is the more responsive to the break at the third quartile, reaching the value of 3.35.The standard Qu test would signal only this last result: a break at the third quartile with an estimated value of 3.35, and no further hint is available on the behavior of the coefficients at the other quartiles, nor there is any indication about which one of the regression coefficient is influenced most.However, the analysis on the presence of changing coefficients can offer additional insights.So far the location of the break did not matter, since the OECD-PISA dataset does not involve time series.But there are additional country related characteristics that can offer some insights.Italian data are generally affected by a regional discrepancy, linked to an economic lag of the southern regions with respect to the more industrialized north-center area.Therefore the data can be sorted with respect to a regional variable, where data of the northern regions are at the beginning of the dataset, followed by data of the southern schools.The individual Qu test with this ordering of the sample assumes values of 2.22 for the academic coefficient and 3.42 for the technical one, as computed at the median.This time is the technical coefficient to be more affected by a regional break.The third section of Table 1 presents these results.Figure 3 presents the patterns of the individual Qu tests and it is quite evident where is the turning point.The dashed lines in each panel show the location of the regional change in the dataset.In these panels the turning point is quite near the shift from northern to southern data.The partial sum of the gradient increasingly diverges from the total sum when analyzing data from the north-center regions.When the southern regions start to enter the summation, the discrepancy between the partial and the total sum drops sharply.Figure 3 is very informative, clearly signaling the turning point and the quite larger impact of the regional discrepancy on all regression coefficients.The southern regions backlog is particularly effective for technical track students.The turning point is also very evident in the statistic corresponding to the intercept throughout the three quartiles, thus showing that by some extent the reading score quartile differs between the two regions regardless of the track.The academic track coefficient instead anticipates the peak with respect to the sample shift, and this may be related to a different factor.The comparison of Figures 2 to 4 is quite informative.Figure 2 presents peaks but we cannot relate these peaks to any particular source.The ordering of the sample by a variable not included in the set of regressors can be interpreted like a sort of conditioning.In this example, ordering the sample by region (Figure 3) yields the most regular pattern in the modified Qu test function, particularly for the technical track coefficient, while the ordering by gender (Figure 4) seems to better explain the academic track coefficient changes.This would signal a strong regional component causing changing coefficients in the reading scores of technical track students and a gender effect in the performance of academic track students.In both cases the intercept has a very regular pattern, clearly signaling the change point.Of course both region and gender can be introduced as dummy variables in the main equation, yielding the following estimates at the three quartiles: Reading score = 399.17+ 123.4 academic + 61.9 technical-45.2south-36.9boys first quartile, (s.e.= 2.5) (s.e.= 2.7) (s.e.= 2.7) (s.e.= 2.1) (s.e.= 1.9)Reading score = 445.6 + 124 academic + 61.3 technical -47.7 south -30.6 boys median, (s.e.= 2.3) (s.e.= 2.2) (s.e.= 2.4) (s.e.= 2.1) (s.e.= 1.9)Reading score = 493.9+ 118.2 academic + 56.4 technical -45.8 south -26.2 boys third quartile.(s.e.= 2.3) (s.e.= 2.4) (s.e.= 2.7) (s.e.= 2.1) (s.e.= 2.1) The two dummy variables are statistically significant, but this does not say anything about which track is more responsive to gender and to regional differences, that is how each regression coefficient changes with respect to region or gender.

A Monte Carlo Experiment for the Location Shift
Following Qu (2008), the data generating process to evaluate the behavior of the test in case of a change in location is here adopted: where: u i ∼ i.i.d.N(0, 1) and x i ∼ i.i.d.χ 2 3 /3; n b is the break point, set to n b = n/4, n b = n/2 and n b = 3n/4; the sample size is n = 200; and δ = (−2, −1.5, −1, −0.5, 0.5, 1, 1.5, 2) defines the extent of the break.Negative values of δ are added to the experiments considered in Qu (2008), in order to check the behavior of the test in case of a decreasing shift in response to a break.All simulation results are computed in 10000 replicates and the SQ τ statistic is computed at the following quantiles τ = (0.15, 0.3, 0.5, 0.7, 0.85).
The first attempt is to analyze whether the standard Qu test can pin point the position of a break.The density of SQ τ in case of a change in location is shown in the plots on the left-hand side of Figure 5, which considers a break point at n b = n/4 in the top section, at n b = n/2 in the middle section, and at n b = 3n/4 in the bottom section.In these graphs the densities are generally close together except for the experiments with wider break, in particular for negative vales of δ, δ = (−2, −1.5),where both skewness and kurtosis diverge markedly.The right-hand side shows the position of SQ τ with respect to the sample in the 10000 replicates, at the three different break points, n b = n/4, n/2, 3n/4, respectively in the first, second and third section.Each of these graphs presents a spike at, or close to, the break point, thus suggesting that the statistic SQ τ can be used to spot the position of a break.The spike is less evident for small values of δ, δ = ±0.5 and sometimes at δ = ±1 but is quite clear in the experiments with a wider break.
Table 2 considers the finite sample power for α = 0.05, in section (a) for n b = n/4, for n b = n/2 in section (b), and finally for n b = 3n/4 in section (c).The rows consider the selected values of δ, while the columns report the results at the different quantiles τ.The rejection rates for location changes are comparable to Qu (2008) results when δ is small, δ = 0.5.In addition, the results show that SQ τ performs better at the quantiles away from the median.

A Monte Carlo Experiment for the Scale Shift
To evaluate scale changes, as in Qu (2008), the following data generating process is considered: The sample size is set to n = 200 and the break points are at n b = n/4, n/2, 3n/4, as for the location shift experiments.
In a first group of experiments the scale is increasing after the break, and the equation is evaluated at δ = (1, 2, 3, 4).In addition, a second group of experiments has been introduced with respect to Qu (2008), where the scale decreases after the break and the negative shifts assume the values δ = (−1, −2, −3, −4).
The behavior of SQ τ in case of a scale change is depicted in Figure 6 which, on the left-hand side, presents the SQ τ densities at the different break points, while on the right-hand side, presents the position of SQ τ with respect to the sample in 10000 replicates.As in the previous figure, the plots refer to n b = n/4, n/2, 3n/4 respectively in the top, middle and bottom section.The densities are more dissimilar and spread out than in the location shift experiments of Figure 5, in particular for large breaks, with δ = (±3, ±4).The graphs on the right present a spike at, or close to, the break points.The spike is less pronounced when δ = ±1, but is quite evident for larger values of δ.
The finite sample power is reported in Table 3: the left-hand side section for the break n b set to n/4, the middle section for n b = n/2, while the results for n b = 3n/4 are in the right section.The rows of the three tables refer to the selected values of δ, while the columns refer to the different conditional quantile τ implemented to compute SQ τ .
Note: In bold the results of the experiments comparable to those in Qu (Table 6, 2008).

Further Diagnostic Analysis
Two additional tables show how many times the standard SQ τ test correctly recognizes the break point.Table 4 and Table 5 report the results for, respectively, location and scale shift models.For a shift in location at the extreme quantiles, the rate of detection goes from 20% to 30%.In case of a shift in the scale, the correct detection of the break point at the extreme quantiles is decidedly higher, going from 20% at the median up to 60-70% at the extreme quantiles.Of course the detection capability decreases as the shift gets smaller but also as the quantile approaches the median.This holds for both location and scale shifts.

Figure 1 .
Figure 1.Qu statistics for the slope decomposed with respect to three conditional quartiles (different panels) for the consumption function.Two major peaks are present in all the panels, corresponding to years 1980 and 1993.The relevance of the two peaks is inverted at the third quartile with respect to first two quartiles

Figure 2 .
Figure 2. Qu test for the intercept, academic and technical coefficients (rows) respectively at the first, second and third quartile (columns) in the OECD-PISA estimated equations

Figure 3 .
Figure 3. Qu test for the median regression coefficients (rows) of the OECD-PISA equation, regional break with north-center data at the beginning of the sample.The three conditional quartiles are depicted on the columns and the dashed lines show the region change in the ordered sample

Figure 4 .
Figure 4. Qu test for the median regression coefficients (rows) of the OECD-PISA equation, sample ordered by gender with girls at the beginning of the sample.The three conditional quartiles are depicted on the columns and the dashed lines are located the turning point, from girls to boys, in the ordered sample

Figure 5 .
Figure 5. SQ τ test in case of location shift, n = 200 and n b = n/4, n/2, 3n/4 respectively in the top, middle and bottom section.To the left the density of SQ τ is depicted while to the right the position of SQ τ with respect to the sample in 10000 replicates is reported.The right panels present a spike at the break point, showing that the Qu test function can indeed be a good indicator of the position of a break in case of a location shift and academic schools greatly improve the reading performance with respect to the score of students enrolled in vocational schools.The standard errors in parenthesis show that the above coefficients are all statistically significant.At the first and the third quartiles the estimated regressions are:The Qu test is then computed individually for each coefficient.The second section of Table1presents the results of the individual Qu test and Figure The estimated median regression is: Reading score = 408.8+ 131.3 academic + 60.2 technical median, (s.e.=1.8) (s.e.=2.4) (s.e.= 2.6) suggesting that technical

Table 1 .
Qu, 2008) of the individual Qu test at the three quartiles, consumption function and OECD-PISA datasets The critical value at the 1% level with one regressor is 0.942 and with two regressors is 0.991 (Table2inQu, 2008).In bold are the highest values of the individual Qu test within each row.